Tinnitus is a common disorder that often complicates hearing loss. Its mechanisms are incompletely understood. Current theories proposing pathophysiology from the ear to the cortex cannot individually – or collectively – explain the range of experimental evidence available. We propose a new framework, based on predictive coding, in which spontaneous activity in the subcortical auditory pathway constitutes a ‘tinnitus precursor’ which is normally ignored as imprecise evidence against the prevailing percept of ‘silence’. Extant models feature as contributory mechanisms acting to increase either the intensity of the precursor or its precision. If precision (i.e., postsynaptic gain) rises sufficiently then tinnitus is perceived. Perpetuation arises through focused attention, which further increases the precision of the precursor, and resetting of the default prediction to expect tinnitus.
Existing tinnitus models, including mutually exclusive mechanisms, invoke causes from the ear to high-level cortical brain networks.
The generic framework of predictive coding explains perception as the integration of sensory information and prior predictions, each weighted by its precision.
In our model, previously proposed neural correlates of ‘tinnitus’ largely relate to hearing damage, rather than to tinnitus per se, and reflect an increase in the precision of spontaneous activity in the auditory pathway, which acts as a tinnitus precursor.
Perception of tinnitus emerges if the precision of the precursor rises sufficiently to override the default (null hypothesis) percept of ‘silence’.
Tinnitus becomes chronic when perceptual inference mechanisms learn to expect tinnitus, engaging connections between auditory and parahippocampal cortex.
tinnitus; precision; predictive coding; auditory cortex
The brain basis for auditory working memory, the process of actively maintaining sounds in memory over short periods of time, is controversial. Using functional magnetic resonance imaging in human participants, we demonstrate that the maintenance of single tones in memory is associated with activation in auditory cortex. In addition, sustained activation was observed in hippocampus and inferior frontal gyrus. Multivoxel pattern analysis showed that patterns of activity in auditory cortex and left inferior frontal gyrus distinguished the tone that was maintained in memory. Functional connectivity during maintenance was demonstrated between auditory cortex and both the hippocampus and inferior frontal cortex. The data support a system for auditory working memory based on the maintenance of sound-specific representations in auditory cortex by projections from higher-order areas, including the hippocampus and frontal cortex.
SIGNIFICANCE STATEMENT In this work, we demonstrate a system for maintaining sound in working memory based on activity in auditory cortex, hippocampus, and frontal cortex, and functional connectivity among them. Specifically, our work makes three advances from the previous work. First, we robustly demonstrate hippocampal involvement in all phases of auditory working memory (encoding, maintenance, and retrieval): the role of hippocampus in working memory is controversial. Second, using a pattern classification technique, we show that activity in the auditory cortex and inferior frontal gyrus is specific to the maintained tones in working memory. Third, we show long-range connectivity of auditory cortex to hippocampus and frontal cortex, which may be responsible for keeping such representations active during working memory maintenance.
auditory cortex; fMRI; hippocampus; MVPA; working memory
The human auditory system is adept at detecting sound sources of interest from a complex mixture of several other simultaneous sounds. The ability to selectively attend to the speech of one speaker whilst ignoring other speakers and background noise is of vital biological significance—the capacity to make sense of complex ‘auditory scenes’ is significantly impaired in aging populations as well as those with hearing loss. We investigated this problem by designing a synthetic signal, termed the ‘stochastic figure-ground’ stimulus that captures essential aspects of complex sounds in the natural environment. Previously, we showed that under controlled laboratory conditions, young listeners sampled from the university subject pool (n = 10) performed very well in detecting targets embedded in the stochastic figure-ground signal. Here, we presented a modified version of this cocktail party paradigm as a ‘game’ featured in a smartphone app (The Great Brain Experiment) and obtained data from a large population with diverse demographical patterns (n = 5148). Despite differences in paradigms and experimental settings, the observed target-detection performance by users of the app was robust and consistent with our previous results from the psychophysical study. Our results highlight the potential use of smartphone apps in capturing robust large-scale auditory behavioral data from normal healthy volunteers, which can also be extended to study auditory deficits in clinical populations with hearing impairments and central auditory disorders.
Generative models, such as predictive coding, posit that perception results from a combination of sensory input and prior prediction, each weighted by its precision (inverse variance), with incongruence between these termed prediction error (deviation from prediction) or surprise (negative log probability of the sensory input). However, direct evidence for such a system, and the physiological basis of its computations, is lacking. Using an auditory stimulus whose pitch value changed according to specific rules, we controlled and separated the three key computational variables underlying perception, and discovered, using direct recordings from human auditory cortex, that surprise due to prediction violations is encoded by local field potential oscillations in the gamma band (>30 Hz), changes to predictions in the beta band (12-30 Hz), and that the precision of predictions appears to quantitatively relate to alpha band oscillations (8-12 Hz). These results confirm oscillatory codes for critical aspects of generative models of perception.
Our perception of the world is not only based on input from our senses. Instead, what we perceive is also heavily altered by the context of what is being sensed and our expectations about it. Some researchers have suggested that perception results from combining information from our senses and our predictions. This school of thought, referred to as “predictive coding”, essentially proposed that the brain stores a model of the world and weighs it up against information from our senses in order to determine what we perceive.
Nevertheless, direct evidence for the brain working in this way was still missing. While neuroscientists had seen the brain respond when there was a mismatch between an expectation and incoming sensory information, no one has observed the predictions themselves within the brain.
Sedley et al. now provide such direct evidence for predictions about upcoming sensory information, by directly recording the electrical activity in the brains of human volunteers who were undergoing surgery for epilepsy. The experiment made use of a new method in which the volunteers listened to a sequence of sounds that was semi-predictable. That is to say that, at first, the volunteers heard a selection of similarly pitched sounds. After random intervals, the average pitch of these sounds changed and they became more or less variable for a while before randomly changing again. This approach meant that the volunteers had to continually update their predictions throughout the experiment
In keeping with previous studies, the unexpected sounds, which caused a mismatch between the sensory information and the brain’s prediction, were linked to high-frequency brainwaves. However, Sedley et al. discovered that updating the predictions themselves was linked to middle-frequency brainwaves; this confirms what the predictive coding model had suggested. Finally, this approach also unexpectedly revealed that how confident the volunteer was about the prediction was linked to low-frequency brainwaves.
In the future, this new method will provide an easy way of directly studying elements of perception in humans and, since the experiments do not require complex learning, in other animals too.
perception; predictions; surprise; prediction error; predictive coding; auditory cortex; Human
Tinnitus can occur when damage to the peripheral auditory system leads to spontaneous brain activity that is interpreted as sound [1, 2]. Many abnormalities of brain activity are associated with tinnitus, but it is unclear how these relate to the phantom sound itself, as opposed to predisposing factors or secondary consequences . Demonstrating “core” tinnitus correlates (processes that are both necessary and sufficient for tinnitus perception) requires high-precision recordings of neural activity combined with a behavioral paradigm in which the perception of tinnitus is manipulated and accurately reported by the subject. This has been previously impossible in animal and human research. Here we present extensive intracranial recordings from an awake, behaving tinnitus patient during short-term modifications in perceived tinnitus loudness after acoustic stimulation (residual inhibition) , permitting robust characterization of core tinnitus processes. As anticipated, we observed tinnitus-linked low-frequency (delta) oscillations [5–9], thought to be triggered by low-frequency bursting in the thalamus [10, 11]. Contrary to expectation, these delta changes extended far beyond circumscribed auditory cortical regions to encompass almost all of auditory cortex, plus large parts of temporal, parietal, sensorimotor, and limbic cortex. In discrete auditory, parahippocampal, and inferior parietal “hub” regions , these delta oscillations interacted with middle-frequency (alpha) and high-frequency (beta and gamma) activity, resulting in a coherent system of tightly coupled oscillations associated with high-level functions including memory and perception.
•Extensive intracranial recordings were made from an awake, behaving tinnitus patient•Tinnitus intensity was modulated with tight control over other factors•Tinnitus is linked to widespread coherent delta-band cortical oscillations•Rich local cross-frequency interactions link delta to all other frequency bands
Recording from an extensive array of intracranial electrodes in an awake, behaving human patient, Sedley, Gander et al. expose the detailed workings of a brain system responsible for generating tinnitus.
The advent of diffusion magnetic resonance imaging (MRI) allows researchers to virtually dissect white matter fiber pathways in the brain in vivo. This, for example, allows us to characterize and quantify how fiber tracts differ across populations in health and disease, and change as a function of training. Based on diffusion MRI, prior literature reports the absence of the arcuate fasciculus (AF) in some control individuals and as well in those with congenital amusia. The complete absence of such a major anatomical tract is surprising given the subtle impairments that characterize amusia. Thus, we hypothesize that failure to detect the AF in this population may relate to the tracking algorithm used, and is not necessarily reflective of their phenotype. Diffusion data in control and amusic individuals were analyzed using three different tracking algorithms: deterministic and probabilistic, the latter either modeling two or one fiber populations. Across the three algorithms, we replicate prior findings of a left greater than right AF volume, but do not find group differences or an interaction. We detect the AF in all individuals using the probabilistic 2-fiber model, however, tracking failed in some control and amusic individuals when deterministic tractography was applied. These findings show that the ability to detect the AF in our sample is dependent on the type of tractography algorithm. This raises the question of whether failure to detect the AF in prior studies may be unrelated to the underlying anatomy or phenotype.
arcuate fasciculus; congenital amusia; diffusion magnetic resonance imaging; tractography; deterministic; probabilistic; crossing fibers
This work considers bases for working memory for non-verbal sounds. Specifically we address whether sounds are represented as integrated objects or individual features in auditory working memory and whether the representational format influences WM capacity. The experiments used sounds in which two different stimulus features, spectral passband and temporal amplitude modulation rate, could be combined to produce different auditory objects. Participants had to memorize sequences of auditory objects of variable length (1–4 items). They either maintained sequences of whole objects or sequences of individual features until recall for one of the items was tested. Memory recall was more accurate when the objects had to be maintained as a whole compared to the individual features alone. This is due to interference between features of the same object. Additionally a feature extraction cost was associated with maintenance and recall of individual features, when extracted from bound object representations. An interpretation of our findings is that, at some stage of processing, sounds might be stored as objects in WM with features bound into coherent wholes. The results have implications for feature-integration theory in the context of WM in the auditory system.
auditory; working memory; object; feature; representation
Previous behavioural studies have shown that repeated presentation of a randomly chosen acoustic pattern leads to the unsupervised learning of some of its specific acoustic features. The objective of our study was to determine the neural substrate for the representation of freshly learnt acoustic patterns. Subjects first performed a behavioural task that resulted in the incidental learning of three different noise-like acoustic patterns. During subsequent high-resolution functional magnetic resonance imaging scanning, subjects were then exposed again to these three learnt patterns and to others that had not been learned. Multi-voxel pattern analysis was used to test if the learnt acoustic patterns could be ‘decoded’ from the patterns of activity in the auditory cortex and medial temporal lobe. We found that activity in planum temporale and the hippocampus reliably distinguished between the learnt acoustic patterns. Our results demonstrate that these structures are involved in the neural representation of specific acoustic patterns after they have been learnt.
acoustic patterns; fMRI; auditory cortex; multi-voxel pattern analysis; hippocampus
A role for the cerebellum in cognition has been proposed based on studies suggesting a profile of cognitive deficits due to cerebellar stroke. Such studies are limited in the determination of the detailed organisation of cerebellar subregions that are critical for different aspects of cognition. In this study we examined the correlation between cognitive performance and cerebellar integrity in a specific degeneration of the cerebellar cortex: Spinocerebellar Ataxia type 6 (SCA6). The results demonstrate a critical relationship between verbal working memory and grey matter density in superior (bilateral lobules VI and crus I of lobule VII) and inferior (bilateral lobules VIIIa and VIIIb, and right lobule IX) parts of the cerebellum. We demonstrate that distinct cerebellar regions subserve different components of the prevalent psychological model for verbal working memory based on a phonological loop. The work confirms the involvement of the cerebellum in verbal working memory and defines specific subsystems for this within the cerebellum.
SCA-6; Cerebellum; Cognition; MRI; VBM; Neurodegeneration
The physiological basis for musical hallucinations (MH) is not understood. One obstacle to understanding has been the lack of a method to manipulate the intensity of hallucination during the course of experiment. Residual inhibition, transient suppression of a phantom percept after the offset of a masking stimulus, has been used in the study of tinnitus. We report here a human subject whose MH were residually inhibited by short periods of music. Magnetoencephalography (MEG) allowed us to examine variation in the underlying oscillatory brain activity in different states. Source-space analysis capable of single-subject inference defined left-lateralised power increases, associated with stronger hallucinations, in the gamma band in left anterior superior temporal gyrus, and in the beta band in motor cortex and posteromedial cortex. The data indicate that these areas form a crucial network in the generation of MH, and are consistent with a model in which MH are generated by persistent reciprocal communication in a predictive coding hierarchy.
Musical hallucinations; Magnetoencephalography; Auditory cortex; Gamma oscillations; Beta oscillations; Predictive coding
The relationship between auditory processing and language skills has been debated for decades. Previous findings have been inconsistent, both in typically developing and impaired subjects, including those with dyslexia or specific language impairment. Whether correlations between auditory and language skills are consistent between different populations has hardly been addressed at all. The present work presents an exploratory approach of testing for patterns of correlations in a range of measures of auditory processing. In a recent study, we reported findings from a large cohort of eleven-year olds on a range of auditory measures and the data supported a specific role for the processing of short sequences in pitch and time in typical language development. Here we tested whether a group of individuals with dyslexic traits (DT group; n = 28) from the same year group would show the same pattern of correlations between auditory and language skills as the typically developing group (TD group; n = 173). Regarding the raw scores, the DT group showed a significantly poorer performance on the language but not the auditory measures, including measures of pitch, time and rhythm, and timbre (modulation). In terms of correlations, there was a tendency to decrease in correlations between short-sequence processing and language skills, contrasted by a significant increase in correlation for basic, single-sound processing, in particular in the domain of modulation. The data support the notion that the fundamental relationship between auditory and language skills might differ in atypical compared to typical language development, with the implication that merging data or drawing inference between populations might be problematic. Further examination of the relationship between both basic sound feature analysis and music-like sound analysis and language skills in impaired populations might allow the development of appropriate training strategies. These might include types of musical training to augment language skills via their common bases in sound sequence analysis.
This article is part of a Special Issue entitled .
•Auditory and language skills were tested in 28 11-year olds with dyslexic traits.•Auditory processing of pitch, rhythm and modulation did not differ from controls.•The pattern of correlation with language skills differed from that seen in controls.•Differences in patterns of correlation merit further testing in prospective cohorts.
TD, Typically developing; DT, Dyslexic traits
This paper proposes a methodology for estimating Neural Response Functions (NRFs) from fMRI data. These NRFs describe non-linear relationships between experimental stimuli and neuronal population responses. The method is based on a two-stage model comprising an NRF and a Hemodynamic Response Function (HRF) that are simultaneously fitted to fMRI data using a Bayesian optimization algorithm. This algorithm also produces a model evidence score, providing a formal model comparison method for evaluating alternative NRFs. The HRF is characterized using previously established “Balloon” and BOLD signal models. We illustrate the method with two example applications based on fMRI studies of the auditory system. In the first, we estimate the time constants of repetition suppression and facilitation, and in the second we estimate the parameters of population receptive fields in a tonotopic mapping study.
neural response function; population receptive field; parametric modulation; Bayesian inference; auditory perception; repetition suppression; Tonotopic Mapping; Balloon model
In this work, we show that electrophysiological responses during pitch perception are best explained by distributed activity in a hierarchy of cortical sources and, crucially, that the effective connectivity between these sources is modulated with pitch-strength. Local field potentials were recorded in two subjects from primary auditory cortex and adjacent auditory cortical areas along the axis of Heschl's gyrus (HG) while they listened to stimuli of varying pitch strength. Dynamic Causal Modelling was used to compare system architectures that might explain the recorded activity. The data show that representation of pitch requires an interaction between non-primary and primary auditory cortex along HG that is consistent with the principle of predictive coding.
In contrast to the complex acoustic environments we encounter everyday, most studies of auditory segregation have used relatively simple signals. Here, we synthesized a new stimulus to examine the detection of coherent patterns (‘figures’) from overlapping ‘background’ signals. In a series of experiments, we demonstrate that human listeners are remarkably sensitive to the emergence of such figures and can tolerate a variety of spectral and temporal perturbations. This robust behavior is consistent with the existence of automatic auditory segregation mechanisms that are highly sensitive to correlations across frequency and time. The observed behavior cannot be explained purely on the basis of adaptation-based models used to explain the segregation of deterministic narrowband signals. We show that the present results are consistent with the predictions of a model of auditory perceptual organization based on temporal coherence. Our data thus support a role for temporal coherence as an organizational principle underlying auditory segregation.
Even when seated in the middle of a crowded restaurant, we are still able to distinguish the speech of the person sitting opposite us from the conversations of fellow diners and a host of other background noise. While we generally perform this task almost effortlessly, it is unclear how the brain solves what is in reality a complex information processing problem.
In the 1970s, researchers began to address this question using stimuli consisting of simple tones. When subjects are played a sequence of alternating high and low frequency tones, they perceive them as two independent streams of sound. Similar experiments in macaque monkeys reveal that each stream activates a different area of auditory cortex, suggesting that the brain may distinguish acoustic stimuli on the basis of their frequency.
However, the simple tones that are used in laboratory experiments bear little resemblance to the complex sounds we encounter in everyday life. These are often made up of multiple frequencies, and overlap—both in frequency and in time—with other sounds in the environment. Moreover, recent experiments have shown that if a subject hears two tones simultaneously, he or she perceives them as belonging to a single stream of sound even if they have different frequencies: models that assume that we distinguish stimuli from noise on the basis of frequency alone struggle to explain this observation.
Now, Teki, Chait, et al. have used more complex sounds, in which frequency components of the target stimuli overlap with those of background signals, to obtain new insights into how the brain solves this problem. Subjects were extremely good at discriminating these complex target stimuli from background noise, and computational modelling confirmed that they did so via integration of both frequency and temporal information. The work of Teki, Chait, et al. thus offers the first explanation for our ability to home in on speech and other pertinent sounds, even amidst a sea of background noise.
auditory scene analysis; temporal coherence; psychophysics; segregation; Human
This study addresses the neuronal representation of aversive sounds that are perceived as unpleasant. Functional magnetic resonance imaging (fMRI) in humans demonstrated responses in the amygdala and auditory cortex to aversive sounds. We show that the amygdala encodes both the acoustic features of a stimulus and its valence (perceived unpleasantness). Dynamic Causal Modelling (DCM) of this system revealed that evoked responses to sounds are relayed to the amygdala via auditory cortex. While acoustic features modulate effective connectivity from auditory cortex to the amygdala, the valence modulates the effective connectivity from amygdala to the auditory cortex. These results support a complex (recurrent) interaction between the auditory cortex and amygdala based on object-level analysis in the auditory cortex that portends the assignment of emotional valence in amygdala that in turn influences the representation of salient information in auditory cortex.
Over a typical career piano tuners spend tens of thousands of hours exploring a specialized acoustic environment. Tuning requires accurate perception and adjustment of beats in two-note chords that serve as a navigational device to move between points in previously learned acoustic scenes. It is a two-stage process that depends on: firstly, selective listening to beats within frequency windows and, secondly, the subsequent use of those beats to navigate through a complex soundscape. The neuroanatomical substrates underlying brain specialization for such fundamental organization of sound scenes are unknown.
Here, we demonstrate that professional piano tuners are significantly better than controls matched for age and musical ability on a psychophysical task simulating active listening to beats within frequency windows that is based on amplitude modulation rate discrimination. Tuners show a categorical increase in grey matter volume in the right frontal operculum and right superior temporal lobe. Tuners also show a striking enhancement of grey matter volume in the anterior hippocampus, parahippocampal gyrus, and superior temporal gyrus, and an increase in white matter volume in the posterior hippocampus as a function of years of tuning experience. The relationship with GM volume is sensitive to years of tuning experience and starting age but not actual age or level of musicality.
Our findings support a role for a core set of regions in the hippocampus and superior temporal cortex in skilled exploration of complex sound scenes in which precise sound ‘templates’ are encoded and consolidated into memory over time in an experience-dependent manner.
This study used magnetoencephalography to record oscillatory activity in a group of 17 patients with chronic tinnitus. Two methods, residual inhibition and residual excitation, were used to bring about transient changes in spontaneous tinnitus intensity in order to measure dynamic tinnitus correlates in individual patients. In residual inhibition, a positive correlation was seen between tinnitus intensity and both delta/theta (6/14 patients) and gamma band (8/14 patients) oscillations in auditory cortex, suggesting an increased thalamocortical input and cortical gamma response, respectively, associated with higher tinnitus states. Conversely, 4/4 patients exhibiting residual excitation demonstrated an inverse correlation between perceived tinnitus intensity and auditory cortex gamma oscillations (with no delta/theta changes) that cannot be explained by existing models. Significant oscillatory power changes were also identified in a variety of cortical regions, most commonly midline lobar regions in the default mode network, cerebellum, insula and anterior temporal lobe. These were highly variable across patients in terms of areas and frequency bands involved, and in direction of power change. We suggest a model based on a local circuit function of cortical gamma-band oscillations as a process of mutual inhibition that might suppress abnormal cortical activity in tinnitus. The work implicates auditory cortex gamma-band oscillations as a fundamental intrinsic mechanism for attenuating phantom auditory perception.
tinnitus; gamma oscillations; mutual inhibition; auditory cortex; magnetoencephalography
This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence.
auditory sequence analysis; pitch; rhythm; phonological skill; adolescence; language
We have previously used direct electrode recordings in two human subjects to identify neural correlates of the perception of pitch (Griffiths, Kumar, Sedley et al., Direct recordings of pitch responses from human auditory cortex, Curr. Biol. 22 (2010), pp. 1128–1132). The present study was carried out to assess virtual-electrode measures of pitch perception based on non-invasive magnetoencephalography (MEG). We recorded pitch responses in 13 healthy volunteers using a passive listening paradigm and the same pitch-evoking stimuli (regular interval noise; RIN) as in the previous study. Source activity was reconstructed using a beamformer approach, which was used to place virtual electrodes in auditory cortex. Time-frequency decomposition of these data revealed oscillatory responses to pitch in the gamma frequency band to occur, in Heschl's gyrus, from 60 Hz upwards. Direct comparison of these pitch responses to the previous depth electrode recordings shows a striking congruence in terms of spectrotemporal profile and anatomical distribution. These findings provide further support that auditory high gamma oscillations occur in association with RIN pitch stimuli, and validate the use of MEG to assess neural correlates of normal and abnormal pitch perception.
► High gamma-band correlates of pitch perception identified with MEG beamforming. ► Results correlate strongly with invasive electrode recordings of same responses. ► Validation of accuracy of MEG beamformer approach.
Pitch; Auditory; Magnetoencephalography; Gamma; Beamformer; Perception
Research on interval timing strongly implicates the cerebellum and the basal ganglia as part of the timing network of the brain. Here we tested the hypothesis that the brain uses differential timing mechanisms and networks—specifically, that the cerebellum subserves the perception of the absolute duration of time intervals, whereas the basal ganglia mediate perception of time intervals relative to a regular beat. In a functional magnetic resonance imaging experiment, we asked human subjects to judge the difference in duration of two successive time intervals as a function of the preceding context of an irregular sequence of clicks (where the task relies on encoding the absolute duration of time intervals) or a regular sequence of clicks (where the regular beat provides an extra cue for relative timing). We found significant activations in an olivocerebellar network comprising the inferior olive, vermis, and deep cerebellar nuclei including the dentate nucleus during absolute, duration-based timing and a striato-thalamo-cortical network comprising the putamen, caudate nucleus, thalamus, supplementary motor area, premotor cortex, and dorsolateral prefrontal cortex during relative, beat-based timing. Our results support two distinct timing mechanisms and underlying subsystems: first, a network comprising the inferior olive and the cerebellum that acts as a precision clock to mediate absolute, duration-based timing, and second, a distinct network for relative, beat-based timing incorporating a striato-thalamo-cortical network.
Auditory figure–ground segregation, listeners’ ability to selectively hear out a sound of interest from a background of competing sounds, is a fundamental aspect of scene analysis. In contrast to the disordered acoustic environment we experience during everyday listening, most studies of auditory segregation have used relatively simple, temporally regular signals. We developed a new figure–ground stimulus that incorporates stochastic variation of the figure and background that captures the rich spectrotemporal complexity of natural acoustic scenes. Figure and background signals overlap in spectrotemporal space, but vary in the statistics of fluctuation, such that the only way to extract the figure is by integrating the patterns over time and frequency. Our behavioral results demonstrate that human listeners are remarkably sensitive to the appearance of such figures.
In a functional magnetic resonance imaging experiment, aimed at investigating preattentive, stimulus-driven, auditory segregation mechanisms, naive subjects listened to these stimuli while performing an irrelevant task. Results demonstrate significant activations in the intraparietal sulcus (IPS) and the superior temporal sulcus related to bottom-up, stimulus-driven figure–ground decomposition. We did not observe any significant activation in the primary auditory cortex. Our results support a role for automatic, bottom-up mechanisms in the IPS in mediating stimulus-driven, auditory figure–ground segregation, which is consistent with accumulating evidence implicating the IPS in structuring sensory input and perceptual organization.
Auditory object analysis requires two fundamental perceptual processes: the definition of the boundaries between objects, and the abstraction and maintenance of an object's characteristic features. While it is intuitive to assume that the detection of the discontinuities at an object's boundaries precedes the subsequent precise representation of the object, the specific underlying cortical mechanisms for segregating and representing auditory objects within the auditory scene are unknown. We investigated the cortical bases of these two processes for one type of auditory object, an ‘acoustic texture’, composed of multiple frequency-modulated ramps. In these stimuli we independently manipulated the statistical rules governing a) the frequency-time space within individual textures (comprising ramps with a given spectrotemporal coherence) and b) the boundaries between textures (adjacent textures with different spectrotemporal coherence). Using functional magnetic resonance imaging (fMRI), we show mechanisms defining boundaries between textures with different coherence in primary and association auditory cortex, while texture coherence is represented only in association cortex. Furthermore, participants' superior detection of boundaries across which texture coherence increased (as opposed to decreased) was reflected in a greater neural response in auditory association cortex at these boundaries. The results suggest a hierarchical mechanism for processing acoustic textures that is relevant to auditory object analysis: boundaries between objects are first detected as a change in statistical rules over frequency-time space, before a representation that corresponds to the characteristics of the perceived object is formed.
Auditory cortex; Auditory; fMRI; Frequency; Acoustic; Object Recognition
Pitch is a fundamental percept with a complex relationship to the associated sound structure . Pitch perception requires brain representation of both the structure of the stimulus and the pitch that is perceived. We describe direct recordings of local field potentials from human auditory cortex made while subjects perceived the transition between noise and a noise with a regular repetitive structure in the time domain at the millisecond level called regular-interval noise (RIN) . RIN is perceived to have a pitch when the rate is above the lower limit of pitch , at approximately 30 Hz. Sustained time-locked responses are observed to be related to the temporal regularity of the stimulus, commonly emphasized as a relevant stimulus feature in models of pitch perception (e.g., ). Sustained oscillatory responses are also demonstrated in the high gamma range (80–120 Hz). The regularity responses occur irrespective of whether the response is associated with pitch perception. In contrast, the oscillatory responses only occur for pitch. Both responses occur in primary auditory cortex and adjacent nonprimary areas. The research suggests that two types of pitch-related activity occur in humans in early auditory cortex: time-locked neural correlates of stimulus regularity and an oscillatory response related to the pitch percept.
► We report direct recordings of electrical activity from human auditory cortex ► We distinguish activity related to stimulus regularity and to perceived pitch ► Both are demonstrated in primary cortex and adjacent “core” areas
Natural sounds contain multiple spectral components that vary over time. The degree of variation can be characterized in terms of correlation between successive time frames of the spectrum, or as a time window within which any two frames show a minimum degree of correlation: the greater the correlation of the spectrum between successive time frames, the longer the time window. Recent studies suggest differences in the encoding of shorter and longer time windows in left and right auditory cortex, respectively. The present functional magnetic resonance imaging study assessed brain activation in response to the systematic variation of the time window in complex spectra that are more similar to natural sounds than in previous studies. The data show bilateral activity in the planum temporale and anterior superior temporal gyrus as a function of increasing time windows, as well as activity in the superior temporal sulcus that was significantly lateralized to the right. The results suggest a coexistence of hierarchical and lateralization schemes for representing increasing time windows in auditory association cortex.
auditory cortex; time windows; spectrotemporal correlation; fMRI; sound; speech
The entropy metric derived from information theory provides a means to quantify the amount of information transmitted in acoustic streams like speech or music. By systematically varying the entropy of pitch sequences, we sought brain areas where neural activity and energetic demands increase as a function of entropy. Such a relationship is predicted to occur in an efficient encoding mechanism that uses less computational resource when less information is present in the signal: we specifically tested the hypothesis that such a relationship is present in the planum temporale (PT). In two convergent functional MRI studies, we demonstrated this relationship in PT for encoding, while furthermore showing that a distributed fronto-parietal network for retrieval of acoustic information is independent of entropy. The results establish PT as an efficient neural engine that demands less computational resource to encode redundant signals than those with high information content.
Understanding how the brain makes sense of our acoustic environment remains a major challenge. One way to describe the complexity of our acoustic environment is in terms of information entropy: acoustic signals with high entropy convey large amounts of information, whereas low entropy signifies redundancy. To investigate how the brain processes this information, we controlled the amount of entropy in the signal by using pitch sequences. Participants listened to pitch sequences with varying amounts of entropy while we measured their brain activity using functional magnetic resonance imaging (fMRI). We show that the planum temporale (PT), a region of auditory association cortex, is sensitive to the entropy in pitch sequences. In two convergent fMRI studies, activity in PT increases as the entropy in the pitch sequence increases. The results establish PT as an important “computational hub” that requires less resource to encode redundant signals than it does to encode signals with high information content.
A part of the auditory cortex (planum temporale) encodes the information content of pitch sequences.