This study explored the extent to which sequential auditory grouping affects the perception of temporal synchrony. In Experiment 1, listeners discriminated between 2 pairs of asynchronous “target” tones at different frequencies, A and B, in which the B tone either led or lagged. Thresholds were markedly higher when the target tones were temporally surrounded by “captor tones” at the A frequency than when the captor tones were absent or at a remote frequency. Experiment 2 extended these findings to asynchrony detection, revealing that the perception of synchrony, one of the most potent cues for simultaneous auditory grouping, is not immune to competing effects of sequential grouping. Experiment 3 examined the influence of ear separation on the interactions between sequential and simultaneous grouping cues. The results showed that, although ear separation could facilitate perceptual segregation and impair asynchrony detection, it did not prevent the perceptual integration of simultaneous sounds.
perceptual organization; auditory perception; stream segregation; asynchrony
The extraction of the distance between an object and an observer is fast when angular declination is informative, as it is with targets placed on the ground. To what extent does angular declination drive performance when viewing time is limited? Participants judged target distances in a real-world environment with viewing durations ranging from 36–220 ms. An important role for angular declination was supported by experiments showing that the cue provides information about egocentric distance even on the very first glimpse, and that it supports a sensitive response to distance in the absence of other useful cues. Performance was better at 220 ms viewing durations than for briefer glimpses, suggesting that the perception of distance is dynamic even within the time frame of a typical eye fixation. Critically, performance in limited viewing trials was better when preceded by a 15 second preview of the room without a designated target. The results indicate that the perception of distance is powerfully shaped by memory from prior visual experience with the scene. A theoretical framework for the dynamic perception of distance is presented.
distance perception; angular declination; walking; time course; intrinsic bias
Although it is intuitive that familiarity with complex visual objects should aid their preservation in visual working memory (WM), empirical evidence for this is lacking. This study used a conventional change-detection procedure to assess visual WM for unfamiliar and famous faces in healthy adults. Across experiments, faces were upright or inverted and a low- or high-load concurrent verbal WM task was administered to suppress contribution from verbal WM. Even with a high verbal memory load, visual WM performance was significantly better and capacity estimated as significantly greater for famous versus unfamiliar faces. Face inversion abolished this effect. Thus, neither strategic, explicit support from verbal WM nor low-level feature processing easily accounts for the observed benefit of high familiarity for visual WM. These results demonstrate that storage of items in visual WM can be enhanced if robust visual representations of them already exist in long-term memory.
working memory; faces; familiarity; long-term memory; face identification
Contour interpolation automatically binds targets with distractors to impair multiple object tracking (Keane, Mettler, Tsoi, & Kellman, 2011). Is interpolation special in this regard, or can other features produce the same effect? To address this question, we examined the influence of eight features on tracking: color, contrast polarity, orientation, size, shape, depth, interpolation and a combination (shape, color, size). In each case, subjects tracked 4 of 8 objects that began as undifferentiated shapes, changed features as motion began (to enable grouping), and returned to their undifferentiated states before halting. The features were always irrelevant to the task instructions. We found that inter-target grouping improved performance for all feature types, except orientation and interpolation (Experiment 1 and Experiment 2). Most importantly, target-distractor grouping impaired performance for color, size, shape, combination, and interpolation. The impairments were at times large (>15% decrement in ac curacy) and occurred relative to a homogeneous condition in which all objects had the same features at each moment of a trial (Experiment 2) and relative to a “diversity” condition in which targets and distractors had different features at each moment (Experiment 3). We conclude that feature-based grouping occurs for a variety of features besides interpolation, even when irrelevant to task instructions and contrary to the task demands, suggesting that interpolation is not unique in promoting automatic grouping in tracking tasks. Our results also imply that various kinds of features are encoded automatically and in parallel during tracking.
multiple object tracking; attention; perceptual grouping; perceptual organization
Sound sequences such as music are usually organized perceptually into concurrent “streams.” The mechanisms underlying this “auditory streaming” phenomenon are not completely known. The present study sought to test the hypothesis that synchrony limits listeners' ability to separate sound streams. To this aim, both perceptual-organization judgments and performance measures were used. In experiment 1, listeners indicated whether they perceived sequences of alternating or synchronous tones as a single stream or as two streams. In experiments 2 and 3, listeners detected rare changes in the intensity of “target” tones at one frequency in the presence of synchronous or asynchronous random-intensity “distractor” tones at another frequency. The results of these experiments showed that, for large frequency separations between the tones, the probability of perceiving two streams was lower on average for synchronous than for alternating tones, and that sensitivity to intensity changes in the target sequence was greater for asynchronous than for synchronous distractors. Overall, these results are consistent with the hypothesis that synchrony limits listeners' ability to form separate streams and/or to attend selectively to certain sounds in the presence of other sounds, even when the target and distractor sounds are well separated from each other in frequency.
perceptual organization; auditory stream; stream segregation; temporal coherence
Speech perception flexibly adapts to short-term regularities of ambient speech input. Recent research demonstrates that the function of an acoustic dimension for speech categorization at a given time is relative to its relationship to the evolving distribution of dimensional regularity across time, and not simply to a fixed value along the dimension. Two experiments examine the nature of this dimension-based statistical learning in online word recognition, testing generalization of learning across phonetic categories. While engaged in a word recognition task guided by perceptually unambiguous voice-onset time (VOT) acoustics signaling stop voicing in either bilabial rhymes, beer and pier, or alveolar rhymes, deer and tear, listeners were exposed incidentally to an artificial “accent” deviating from English norms in its correlation of the pitch onset of the following vowel (F0) with VOT (Experiment 1). Exposure to the change in the correlation of F0 with VOT led listeners to down-weight reliance on F0 in voicing categorization, indicating dimension-based statistical learning. This learning was observed only for the “accented” contrast varying in its F0/VOT relationship during exposure; learning did not generalize to the other place of articulation. Another group of listeners experienced competing F0/VOT correlations across place of articulation such that the global correlation for voicing was stable, but locally correlations across voicing pairs were opposing (e.g., “accented” beer and pier, “canonical” deer and tear, Experiment 2). Listeners showed dimension-based learning only for the accented pair, not the canonical pair, indicating that they are able to track separate acoustic statistics across place of articulation, that is, for /b-p/ and /d-t/. This suggests that dimension-based learning does not operate obligatorily at the phonological level of stop voicing.
cue weighting; dimension-based learning; generalization; speech perception; statistical learning
Handwritten word recognition is a field of study that has largely been neglected in the psychological literature, despite its prevalence in society. Whereas studies of spoken word recognition almost exclusively employ natural, human voices as stimuli, studies of visual word recognition use synthetic typefaces, thus simplifying the process of word recognition. The current study examined the effects of handwriting on a series of lexical variables thought to influence bottom-up and top-down processing, including word frequency, regularity, bidirectional consistency, and imageability. The results suggest that the natural physical ambiguity of handwritten stimuli forces a greater reliance on top-down processes, because almost all effects were magnified, relative to conditions with computer print. These findings suggest that processes of word perception naturally adapt to handwriting, compensating for physical ambiguity by increasing top-down feedback.
lexical access; reading; handwriting; top-down processing
Face processing has been studied for decades. However, most of the empirical investigations have been conducted using static face images as stimuli. Little is known about whether static face processing findings can be generalized to real world contexts, in which faces are constantly moving. The present study investigates the nature of face processing (holistic vs. part-based) in elastic moving faces. Specifically, we focus on whether elastic moving faces, as compared to static ones, can facilitate holistic or part-based face processing. Using the composite paradigm, participants were asked to remember either an elastic moving face (i.e., a face that blinks and chews) or a static face, and then tested with a static composite face. The composite effect was (1) significantly smaller in the dynamic condition than in the static condition, (2) consistently found with different face encoding times (Experiments 1–3), and (3) present for the recognition of both upper and lower face parts (Experiment 4). These results suggest that elastic facial motion facilitates part-based processing, rather than holistic processing. Thus, while previous work with static faces has emphasized an important role for holistic processing, the current work highlights an important role for featural processing with moving faces.
elastic facial movement; holistic processing; part-based processing; composite face paradigm
It is well established that fixation durations during reading vary with processing difficulty, but there are different views on how oculomotor control, visual perception, shifts of attention, and lexical (and higher cognitive) processing are coordinated. Evidence for a one-to-one translation of input delay into saccadic latency would provide a much needed constraint for current theoretical proposals. Here, we tested predictions of such a direct-control perspective using the stimulus-onset delay (SOD) paradigm. Words in sentences were initially masked and, upon fixation, were individually unmasked with a delay (0-ms, 33-ms, 66-ms, 99-ms SODs). In Experiment 1, SODs were constant for all words in a sentence; in Experiment 2, SODs were manipulated on target words, while non-targets were unmasked without delay. In accordance with predictions of direct control, non-zero SODs entailed equivalent increases in fixation durations in both experiments. Yet, a population of short fixations pointed to rapid saccades as a consequence of low-level information at non-optimal viewing positions rather than of lexical processing. Implications of these results for theoretical accounts of oculomotor control are discussed.
stimulus-onset delay (SOD); oculomotor control; fixation durations; sentence reading
Visual short-term memory (VSTM) is limited, especially for complex objects. Its capacity, however, is greater for faces than for other objects, an advantage that may stem from the holistic nature of face processing. If the holistic processing explains this advantage, then object expertise—which also relies on holistic processing—should endow experts with a VSTM advantage. We compared VSTM for cars among car experts to that among car novices. Car experts, but not car novices, demonstrated a VSTM advantage similar to that for faces; this advantage was orientation-specific and was correlated with an individual's level of car expertise. Control experiments ruled out accounts based solely on verbal- or long-term memory representations. These findings suggest that the processing advantages afforded by visual expertise result in domain-specific increases in VSTM capacity, perhaps by allowing experts to maximize the use of an inherently limited VSTM system.
Faces; objects; expertise; visual short-term memory; holistic processing
Valence and edibility are two important features of olfactory perception, but it remains unclear how they are read out from an olfactory input. For a given odor object (e.g., the smell of rose or garlic), does perceptual identification of that object necessarily precede retrieval of information about its valence and edibility, or alternatively, are these processes independent? In the present study, we studied rapid, binary perceptual decisions regarding odor detection, object identity, valence, and edibility for a set of common odors. We found that decisions regarding odor-object identity were faster than decisions regarding odor valence or edibility, but slower than detection. Mediation analysis revealed that odor valence and edibility decision response times were predicted by a model in which odor-object identity served as a mediator along the perceptual pathway from detection to both valence and edibility. According to this model, odor valence is determined through both a “low road” that bypasses odor objects and a “high road” that utilizes odor-object information. Edibility evaluations are constrained to processing via the high road. The results outline a novel causal framework that explains how major perceptual features might be rapidly extracted from odors through engagement of odor objects early in the processing stream.
Olfactory perception; odor object coding; valence; emotion
Hebrew provides an intriguing contrast to European languages. On the one hand, like any European language, it has an alphabetic script. On the other hand, being a Semitic language, it differs in the structure of base words. By monitoring eye movements, we examined the time-course of processing letter transpositions in Hebrew, and assessed their impact on reading different types of Hebrew words that differ in their internal structure. We found that letter transposition resulted in dramatic reading costs for words with Semitic word structure, and much smaller costs for non-Semitic words. Moreover, the strongest impact of transposition occurred where root-letter transposition resulted in a pseudo-root, where significant interference emerged already in first fixation duration. Our findings thus suggest that Hebrew readers differentiate between Semitic and non-Semitic forms already at first fixation, at the early phase of word recognition. Moreover, letters are differentially processed across the visual array, given their morphological structure and their contribution to recovering semantic meaning. We conclude that flexibility or rigidity in encoding letter position is determined by cues regarding the internal structure of printed words.
TL; letter position coding; word-recognition; Hebrew; Morphology
Nonspeech materials are widely used to identify basic mechanisms underlying speech perception. For instance, they have been used to examine the origin of compensation for coarticulation, the observation that listeners’ categorization of phonetic segments depends on neighboring segments (Mann, 1980). Specifically, nonspeech precursors matched to critical formant frequencies of speech precursors, have been shown to produce similar categorization shifts as speech contexts. This observation has been interpreted to mean that spectrally-contrastive frequency relations between neighboring segments underlie the categorization shifts observed after speech as well as nonspeech precursors (Lotto & Kluender, 1998). From the gestural perspective, however, categorization shifts in speech contexts occur due to listeners’ sensitivity to acoustic information for coarticulatory gestural overlap in production; in nonspeech contexts, this occurs due to energetic masking of acoustic information for gestures.
In two experiments, we distinguish the energetic masking and spectral contrast accounts. In Experiment 1, we investigated the effects of varying precursor tone frequency on speech categorization. Consistent only with the masking account, tonal effects were greater for frequencies close enough to those in the target syllables for masking to occur. In Experiment 2, we filtered the target stimuli to simulate effects of masking and obtained behavioral outcomes that closely resemble those with non-speech tones. We conclude that masking provides the more plausible account of nonspeech context effects. More generally, we suggest that similar results from the use of speech and nonspeech materials do not automatically imply identical origins and that the use of nonspeech in speech studies entails careful examination of the nature of information in the nonspeech materials.
Spectral Contrast; Energetic Masking; Compensation for Coarticulation; Nonspeech Context Effects; Speech Perception
The search-step paradigm addresses the processes involved in changing movement plans, usually saccadic eye-movements. Subjects move their eyes to a target (T1) among distractors, but when the target steps to a new location (T2), subjects are instructed to move their eyes directly from fixation to the new location. We ask whether moving to T2 requires a separate stop process that inhibits the movement to T1. It need not. The movement plan for the second response may inhibit the first response. To distinguish these hypotheses, we decoupled the offset of T1 from the onset of T2. If the second movement is sufficient to inhibit the first, then the probability of responding to T1 should depend only on T2 onset. If a separate stop process is required, then the probability of responding to T1 should depend only on T1 offset, which acts as a stop signal. We tested these hypotheses in manual and saccadic search-step tasks and found that the probability of responding to T1 depended most strongly on T1 offset, supporting the hypothesis that changing from one movement plan to another involves a separate stop process that inhibits the first plan.
Cognitive Control; Eye Movements; Inhibition; Race Model; Search-Step Task
The coordination of word-recognition and oculomotor processes during reading was evaluated in two eye-tracking experiments that examined how word skipping, where a word is not fixated during first-pass reading, is affected by the lexical status of a letter string in the parafovea and ease of recognizing that string. Ease of lexical recognition was manipulated through target-word frequency (Experiment 1) and through repetition priming between prime-target pairs embedded in a sentence (Experiment 2). Using the gaze-contingent boundary technique the target word appeared in the parafovea either with full preview or with transposed-letter (TL) preview. The TL preview strings were nonwords in Experiment 1 (e.g., bilnk created from the target blink), but were words in Experiment 2 (e.g., sacred created from the target scared). Experiment 1 showed greater skipping for high-frequency than low-frequency target words in the full preview condition but not in the TL preview (nonword) condition. Experiment 2 showed greater skipping for target words that repeated an earlier prime word than for those that did not, with this repetition priming occurring both with preview of the full target and with preview of the target’s TL neighbor word. However, time to progress from the word after the target was greater following skips of the TL preview word, whose meaning was anomalous in the sentence context, than following skips of the full preview word whose meaning fit sensibly into the sentence context. Together, the results support the idea that coordination between word-recognition and oculomotor processes occurs at the level of implicit lexical decisions.
In four experiments, we tested whether sustained visual attention is required for the selective maintenance of objects in VWM. Participants performed a color change-detection task. During the retention interval, a valid cue indicated the item that would be tested. Change detection performance was higher in the valid-cue condition than in a neutral-cue control condition. To probe the role of visual attention in the cuing effect, on half of the trials, a difficult search task was inserted after the cue, precluding sustained attention on the cued item. The addition of the search task produced no observable decrement in the magnitude of the cuing effect. In a complementary test, search efficiency was not impaired by simultaneously prioritizing an object for retention in VWM. The results demonstrate that selective maintenance in VWM can be dissociated from the locus of visual attention.
A central process in music cognition involves the identification of key, however little is known about how listeners accomplish this task in real time. This study derives from work that suggests overlap between the neural and cognitive resources underlying the analyses of both music and speech, and is the first to explore the timescales at which the brain infers musical key. We investigated the temporal psychophysics of key-finding over a wide range of tempi using melodic sequences with strong structural cues, where statistical information about overall key profile was ambiguous. Listeners were able to provide robust judgments within specific limits, at rates as high as 400 beats per minute (~7 Hz) and as low as 30 bpm (0.5 Hz), but not outside those bounds. These boundaries on reliable performance show that the process of key-finding is restricted to timescales that are closely aligned with beat induction and speech processing.
music perception; tonal induction; temporal processing; speech; rate
Perceptual grouping can lead observers to perceive a multielement scene as a smaller number of hierarchical units. Past work has shown that grouping enables more elements to be stored in visual working memory (WM). Although this may appear to contradict so-called discrete resource models that argue for fixed item limits in WM storage, it is also possible that grouping reduces the effective number of “items” in the display. To test this hypothesis, we examined how mnemonic resolution declined as the number of items to be stored increased. Discrete resource models predict that precision will reach a stable plateau at relatively early set sizes, because no further items can be stored once putative item limits are exceeded. Thus, we examined whether the precision by set size function was bilinear when storage was enhanced via perceptual grouping. In line with the hypothesis that each perceptual group counted as a single “item,” precision still reached a clear plateau at a set size determined by the number of stored groups. Moreover, the maximum number of elements stored was doubled, and electrophysiological measures showed that selection and storage-related neural responses were the same for a single element and a multielement perceptual group. Thus, perceptual grouping allows more elements to be held in working memory while storage is still constrained by a discrete item limit.
working memory; individual differences; perceptual organization; ERP
The importance of vocabulary in reading comprehension emphasizes the need to accurately assess an individual’s familiarity with words. The present article highlights problems with using occurrence counts in corpora as an index of word familiarity, especially when studying individuals varying in reading experience. We demonstrate via computational simulations and norming studies that corpus-based word frequencies systematically overestimate strengths of word representations, especially in the low-frequency range and in smaller-size vocabularies. Experience-driven differences in word familiarity prove to be faithfully captured by the subjective frequency ratings collected from responders at different experience levels. When matched on those levels, this lexical measure explains more variance than corpus-based frequencies in eye-movement and lexical decision latencies to English words, attested in populations with varied reading experience and skill. Furthermore, the use of subjective frequencies removes the widely reported (corpus) frequency-by-skill interaction, showing that more skilled readers are equally faster in processing any word than the less skilled readers, not disproportionally faster in processing lower-frequency words. This finding challenges the view that the more skilled an individual is in generic mechanisms of word processing the less reliant he/she will be on the actual lexical characteristics of that word.
Sensitivity to frequency ratios is essential for the perceptual processing of complex sounds and the appreciation of music. This study assessed the effect of ratio simplicity on ratio discrimination for pure tones presented either simultaneously or sequentially. Each stimulus consisted of four 100-ms pure tones, equally spaced in terms of frequency ratio and presented at a low intensity to limit interactions in the auditory periphery. Listeners had to discriminate between a reference frequency ratio of 0.97 octave (about 1.96:1) and target frequency ratios, which were larger than the reference. In the simultaneous condition, the obtained psychometric functions were nonmonotonic: as the target frequency ratio increased from 0.98 octave to 1.04 octaves, discrimination performance initially increased, then decreased, and then increased again; performance was better when the target was exactly one octave (2:1) than when the target was slightly larger. In the sequential condition, by contrast, the psychometric functions were monotonic and there was no effect of frequency ratio simplicity. A control experiment verified that the nonmonotonicity observed in the simultaneous condition did not originate from peripheral interactions between the tones. Our results indicate that simultaneous octaves are recognized as “special” frequency intervals by a mechanism that is insensitive to the sign (positive or negative) of deviations from the octave, whereas this is apparently not the case for sequential octaves.
spectral fusion; musical intervals; octave; harmony; melody
This investigation examined how children and adults negotiate a challenging perceptual-motor problem with significant real-world implications – bicycling across two lanes of opposing traffic. Twelve- and 14-year-olds and adults rode a bicycling simulator through an immersive virtual environment. Participants crossed intersections with continuous cross traffic coming from opposing directions. Opportunities for crossing were divided into aligned (far gap opens with or before near gap) and rolling (far gap opens after near gap) gap pairs. Children and adults preferred rolling to aligned gap pairs, though this preference was stronger for adults than for children. Crossing aligned versus rolling gap pairs produced substantial differences in direction of travel, speed of crossing, and timing of entry into the near and far lanes. For both aligned and rolling gap pairs, children demonstrated less skill than adults in coordinating self and object movement. These findings have implications for understanding perception-action-cognition links and for understanding risk factors underlying car-bicycle collisions.
affordances; road crossing; perceptual-motor development; virtual environments