The auditory system is constantly faced with the challenge of decomposing the complex mixture of sound arriving at the eardrums into an accurate representation of the acoustic environment. This decomposition, termed auditory scene analysis (ASA, Bregman,
1994), is critical for survival and communication and its failure is a common symptom reported by elderly individuals and those with sensorineural hearing loss. Despite its importance in daily life, the neural mechanisms of auditory scene analysis remain unclear (Carlyon,
2004; Micheyl et al.,
2007; Snyder and Alain,
2007b; Elhilali and Shamma,
2008; Nelken and Bar-Yosef,
2008; Bidet-Caulet and Bertrand,
2009; Winkler et al.,
2009; Shamma and Micheyl,
2010; Shamma et al.,
2010). One aspect of ASA – auditory streaming (the segregation of time-varying acoustic energy into distinct perceptual objects) – can be studied in a controlled setting using sequences of pure-tone triplets of the form ABA-ABA- (Miller and Heise,
1950; van Noorden,
1975; Bregman,
1994), where A and B denote tones of different frequencies separated by a silent gap (Figure A). Many psychophysical studies dating back to the 1950s have shown that when the frequency separation (ΔF) between the A and B tones is small, listeners hear the sequence as a single stream comprised of both A and B tones and that when ΔF is large, they hear the sequence as two isochronous streams, one of A tones and one of B tones (Miller and Heise,
1950; van Noorden,
1975; see
http://web.mit.edu/~adykstra/Public/streaming_demo.wav for a demo). Interestingly, percepts evoked by sequences with intermediate ΔF are bistable (i.e., can be heard as either one stream or two) and can switch between two stable states, either spontaneously or with effort (van Noorden,
1975; Anstis and Saida,
1985; Carlyon et al.,
2001).
Recent interest in the neural underpinnings of auditory streaming has produced several studies using ABA tone sequences while recording from the auditory cortex in a variety of species including insects (Schul and Sheridan,
2006), fish (Fay,
1998,
2000), bats (Kanwal et al.,
2003), songbirds (Bee and Klump,
2004,
2005; Itatani and Klump,
2009,
2010; Bee et al.,
2010), ferrets (Elhilali et al.,
2009), non-human primates (Fishman et al.,
2001,
2004; Micheyl et al.,
2005), and humans (Sussman et al.,
1999; Deike et al.,
2004,
2010; Cusack,
2005; Gutschalk et al.,
2005,
2007; Snyder et al.,
2006; Snyder and Alain,
2007a; Wilson et al.,
2007; Kondo and Kashino,
2009; Schadwinkel and Gutschalk,
2010a,
b). A prevailing model from these studies posits that a two-stream percept will be evoked whenever the A and B tones excite non-overlapping populations of neurons (but see Elhilali et al.,
2009). However, inherent limitations in previous work related to spatiotemporal resolution, sparsity of coverage, and lack of direct behavioral measures in experimental animals preclude straight-forward interpretation. A general extension of this model is schematized in Figure B. Specifically, a parametric variation of a given stimulus or stimulus feature could produce neural activity patterns which vary linearly or categorically as shown by the blue and red curves, respectively. Noise in the response of a population showing a linear relationship with the stimulus, when fed to a population showing a more categorical relationship, could engender sufficient trial-to-trial variability for bistable perception. While such activity patterns have been widely reported in vision (for reviews see Logothetis,
1998; Leopold and Logothetis,
1999; Sterzer et al.,
2009), only limited evidence for such a mechanism exists in the auditory system (Cusack,
2005; Gutschalk et al.,
2005,
2008; Kondo and Kashino,
2009).
Here, we report the results from experiments in which direct cortical recordings were made from widespread brain areas of neurosurgical patients with epilepsy (Engel et al.,
2005) while they participated in a classical auditory streaming paradigm. Our aims were to better characterize the neurophysiological correlates of auditory streaming, extend them into brain areas outside the auditory cortex and frequency regions less observable with non-invasive measure (Crone et al.,
2001), and test the idea of neuronal variability as a mechanism for perceptual bistability in the auditory modality (Almonte et al.,
2005; Moreno-Bote et al.,
2007; Deco and Romo,
2008; Deco et al.,
2008; Gigante et al.,
2009; Shpiro et al.,
2009) by comparing evoked responses to physically identical stimuli when they were perceived as one vs. two streams. Our participants listened to ABA tone sequences and indicated at the end of each sequence whether they were hearing one or two streams at the end of the sequence. For each electrode sampled in a given patient, we compared responses across ΔF conditions as well as perceptual report in an attempt to identify correlates of both during a classical auditory streaming task. We hypothesized that when a participant perceived one (two) stream(s), the evoked response would be similar to those conditions which consistently engender a one-stream (two-stream) percept. Responses from widespread brain areas showed robust correlates with ΔF but, surprisingly, rarely differed based on percept
per se.