|Home | About | Journals | Submit | Contact Us | Français|
Neural activity in the cerebral cortex can explain many aspects of sensory perception. Extensive psychophysical and neurophysiological studies of visual motion and vibrotactile processing show that the firing rate of cortical neurons averaged across 50–500 ms is well correlated with discrimination ability. In this study, we tested the hypothesis that primary auditory cortex (A1) neurons use temporal precision on the order of 1–10 ms to represent speech sounds shifted into the rat hearing range. Neural discrimination was highly correlated with behavioral performance on 11 consonant-discrimination tasks when spike timing was preserved and was not correlated when spike timing was eliminated. This result suggests that spike timing contributes to the auditory cortex representation of consonant sounds.
The debate about the importance of spike timing began with the first recordings of neural activity and remains unresolved1,2. Although coding strategies based on precise timing have the potential to transmit more information than strategies based on firing rate averaged over long intervals, psychophysical studies of tactile modulation rate and visual movement indicate that rate-based descriptions of sensory events provide the best predictions of behavioral discrimination ability2–6. The use of precise temporal information by somatosensory cortex has been rejected because neurometric analysis predicts much better discrimination ability than is observed behaviorally5.
The auditory system is sensitive to precise temporal information and is a logical place to study perceptual correlates of neural representations based on precise spike timing1,7–9. However, few behavioral studies have examined the relationship between neural activity and auditory discrimination8,10–12. Psychophysical studies have demonstrated that newborn and adult humans, as well as rats and chinchillas, can reliably distinguish consonants based on acoustic information found within 40 ms of sound onset13–19. Similarly, the onset response of neurons in the central auditory system recorded in awake and anesthetized subjects reliably encodes the rapid acoustic transitions that provide information about consonant identity20–24. A1 lesions impair judgments of complex sounds, including speech25–29. Here we report that the precise spatiotemporal activity pattern evoked by the onset of consonant sounds is well correlated with the ability of rats to discriminate these sounds.
We recorded neural responses to 20 English consonants (Fig. 1 and Supplementary Fig. 1 online) from single neurons and multiunit clusters of A1 neurons in awake and barbiturate-anesthetized rats. To illustrate the response to each sound, we constructed neurograms from the average onset response of 445 multiunit A1 recording sites ordered by characteristic frequency (Fig. 2 and Supplementary Fig. 2 online). As expected, each consonant evoked a distinct spatiotemporal activity pattern in A1 (Supplementary Video 1 online).
Consonants differing only in their place of articulation resulted in different spatial activity patterns14,21,22. For example, the /s/ sound activated high frequency neurons, whereas /sh/ activated mid-frequency neurons (Fig. 2, third column and Supplementary Data online). Manner of articulation (for example, stop, fricative or glide) substantially altered the temporal profile of the population response (Fig. 2, top row). As in earlier studies, stop consonants generated the sharpest onset peaks20,30,31. Nasals, glides and liquids resulted in the weakest onset responses; fricatives and affricates resulted in intermediate onset responses (Supplementary Data). Whereas the voiced stop consonants (/b/, /d/, /g/) evoked a single burst of activity, unvoiced stop consonants (/p/, /t/, /k/) resulted in a second peak of activity at voicing onset, consistent with previous reports in cats, monkeys and humans (Supplementary Fig. 3, Supplementary Video 2 and Supplementary Data online)20,24,30,31.
Although it is reasonable to expect that sounds that evoke similar cortical responses will be more difficult to discriminate than sounds that evoke distinct responses, this is the first study to test whether this relationship requires precise temporal information (that is, 1-ms bins) or whether the rate-based strategies observed in visual and somatosensory cortex (that is, 50- to 500-ms bins) predict behavioral performance.
We quantified the difference between each pair of neurograms using euclidean distance (Figs. 2 and and3).3). When 1-ms windows were used, the spatiotemporal patterns evoked by the consonants /d/ and /b/ were much more distinct than the patterns evoked by /m/ and /n/ (Fig. 3, part 1), leading to the prediction that /d/ versus /b/ would be one of the easiest consonant pairs to discriminate and /m/ versus /n/ would be one of the hardest. Alternatively, if information about precise timing is not used, /d/ versus /b/ was predicted to be a very difficult discrimination (Fig. 3, part 2). To test these contrasting predictions, we evaluated the ability of rats to distinguish between these and nine other consonant pairs using an operant go/no-go procedure wherein rats were rewarded for a lever press after the presentation of a target consonant. The tasks were chosen so that each consonant pair differed by one articulatory feature (place, voicing or manner; Fig. 1). Rats were able to reliably discriminate 9 of the 11 pairs tested (Fig. 4 and Supplementary Fig. 4 online). These results extend earlier observations that monkeys, cats, birds and rodents can discriminate consonant sounds17,18,31–36. The wide range of difficulty across the 11 tasks is advantageous for identifying neural correlates.
Consistent with our hypothesis that A1 representations make use of precise spike timing, /d/ versus /b/ was one of the easiest tasks (Fig. 4), and differences in the A1 onset response patterns were highly correlated with performance on the 11 tasks when 1-ms bins were used (R2 = 0.75, P = 0.0006, Fig. 5a). A1 responses were not correlated with behavior when spike timing information was removed (R2 = 0.046, P = 0.5; Fig. 5b; Supplementary Fig. 5a and Supplementary Data online).
Although it is interesting that the average neural response to each consonant was related to behavior, in practice, individual speech sounds must be identified during single trials, not based on the average of many trials. Analysis using a nearest-neighbor classifier makes it possible to document neural discrimination on the basis of single trial data and allows the direct correlation between neural and behavioral discrimination in units of percentage correct. This classifier (which compares the poststimulus time histogram (PSTH) evoked by each stimulus presentation with the average PSTH evoked by each consonant and selects the most similar; see Methods) is effective in identifying tactile patterns and animal vocalizations using cortical activity8,37.
Behavioral performance was well predicted by classifier performance when activity was binned with 1-ms precision. For example, a single sweep of activity from one multiunit cluster was able to discriminate /d/ from /b/ 79.5 ± 0.8% (mean ± s.e.m.) of the time and /m/ from /n/ 60.1 ± 0.7% of the time; 50% is chance performance. Consistent with previous psychophysical evidence that the first 40 ms contain sufficient information to discriminate consonant sounds13–16, the correlation between the behavioral and neural discrimination was highest when the classifier was provided A1 activity patterns during the first 40 ms of the cortical response (R2 = 0.66, P = 0.002; Figs. 5c and and6a,6a, part 1, Supplementary Fig. 6 and Supplementary Data online). This correlation was equally strong in awake rats (R2 = 0.63, P = 0.004; Supplementary Fig. 7 online). Neural discrimination correlated well with behavior provided that onset responses were used (5–100 ms) and temporal information was preserved (1–10 ms, Supplementary Fig. 5b).
Because of a ceiling effect caused by greatly improved neural discrimination, the correlation between the behavioral and neural discrimination was not significant (R2 = 0.02, P = 0.6) when the classifier was given all 700 ms of activity (Fig. 6b, part 3). Neural discrimination was greatly reduced when temporal information was eliminated (that is, mean firing rate over 700 ms) and no relationship with behavior was observed (R2 = 0.06, P = 0.5). For example, on the easiest task (/d/ versus /s/), rats were correct on 92.5 ± 0.8% of trials, whereas the classifier was correct only 55.4 ± 0.6% of the time when spike timing was removed (Fig. 6b, part 4). The correlation between classifier and behavior was also not significant when the mean onset response rate was used (40-ms bin, R2 = 0.14, P = 0.2; Figs. 5d and and6a,6a, part 2). These results show that the distinctness of the precise temporal activity patterns evoked by consonant onsets is highly correlated with discrimination ability in rats.
To determine the neural population size that best correlates with behavior, we compared behavioral discrimination with neural discrimination using individual single units, 16 single units, individual multiunits and sets of 16 multiunits. Stringent spike-sorting criteria were used to increase our confidence that we were recording from individual neurons. We collected a total of 16 well isolated single units from 16 different recording sites distributed across A1. Consonant discrimination was evaluated for each of the 16 single units individually and for the set of all 16 single units. When the classifier was provided with activity from all 16 sites, each pattern was a matrix of 16 columns and a number of rows determined by the bin size used. We used the same technique to evaluate classifier performance using multiunit activity from sets of 16 recording sites randomly selected from the full set of 445 recording sites. Each population size was evaluated with or without precise temporal information using the onset response (that is, 1-ms or 40-ms bins) or the entire response (that is, 1-ms or 700-ms bins).
Neural discrimination using single units did not correlate with behavior regardless of the coding strategy used in the analysis (Fig. 7a). The poor correlation may be related to the poor neural discrimination of single units (Fig. 7b), which was probably due to the small number of action potentials in single-unit responses compared to multiunit responses. Although discrimination using all 16 single units was better than individual single units, neural discrimination on the 11 tasks was still not significantly correlated with behavior (Fig. 7), perhaps because of the anatomical distance between the 16 recording sites.
Neural discrimination using 16 randomly selected multiunit sites correlated with behavior but did so only when temporally precise onset responses were used (Fig. 7a). Although the dependence on temporally precise onset responses was similar to results based on single multiunit sites, the average neural performance using 16 multiunit sites significantly exceeded actual behavioral performance (Fig. 7b). This excessive accuracy resulted in a ceiling effect, which probably explains why the correlation with behavior was lower when large populations were used. After exploring a large set of neural readouts using various time windows and population sizes, we found that discrimination using onset activity patterns from individual multiunit sites correlated best with behavioral discrimination.
Our observation that multiunit responses were highly correlated with behavioral performance is consistent with earlier reports that multiunit responses are superior to single-unit responses for identifying complex stimuli. For example, V1 single units provide an unreliable estimate of the local contrast in natural images, whereas multiunit responses encode this information efficiently38. Similarly, multiunit clusters in the bird homolog of A1 are better than single units at discriminating song from simpler sounds, including tones, ripples and noise39.
Although theoretical studies have suggested that precise spike timing can provide a rapid and accurate code for stimulus recognition and categorization40,41, studies in visual and somatosensory cortex have indicated that firing rates averaged across 50–500 ms are best correlated with behavior2–6. Our results suggest that the representation of consonant sounds in A1 is based on time windows that are approximately 50 times more precise.
The greater temporal precision observed in this study could be specific to the auditory system1,7–9. However, it is also possible that spike timing is important in all modalities when transient stimuli are involved38,39,42. The latter hypothesis is supported by observations of a rate-based code for steady-state vowels23,43–45 and by computational studies showing that cortical neurons can efficiently extract temporal patterns from populations of neurons in a manner that promotes accurate consonant categorization46. It will be important to test whether neural correlates of transient visual and tactile stimuli make use of spike timing.
Error-trial analysis in an awake behaving preparation, as well as lesion and microstimulation experiments, are needed to evaluate our hypothesis that consonant processing depends upon precise spike timing in A1. Recordings in higher cortical areas will be needed to establish whether temporal patterns or mean firing rates are better correlated with behavioral discrimination.
We recorded 20 English words ending in /ad/ (as in ‘sad’) in a double-walled, soundproof booth. The initial consonants differed in voicing (voiced /d/ versus voiceless /t/), place of articulation (lips /b/ versus back of mouth /g/) or manner of articulation (fricative /sh/ versus nasal /n/) (Fig. 1). The fundamental frequency and spectrum envelope of each word was shifted up in frequency by a factor of two using the STRAIGHT vocoder47 in order to better match the rat hearing range. The intensity of each sound was adjusted so that the intensity during the most intense 100 ms was 60 dB SPL.
Eleven rats were trained using an operant go/no-go procedure to discriminate words differing in their initial consonant sound. Each rat trained for two 1-h sessions each day (5 d/week). Rats first underwent a shaping period during which they were taught to press the lever. Each time the rat was in close proximity to the lever, the rat heard the target sound and received a pellet (45-mg sugar pellet). Eventually, the rat began to press the lever without assistance. After each lever press, the rat heard the target sound and received a pellet. The shaping period lasted until the rat was able to obtain at least 100 pellets per session for two consecutive sessions. This stage lasted on average 3.5 d. After the shaping period, rats began a detection task in which they learned to press the lever each time the target sound was presented. Silent periods were randomly interleaved with the target sounds during each training session. Sounds were initially presented every 10 s, and the rat was given an 8-s window to press the lever. The sound interval was gradually decreased to 6 s, and the lever-press window was decreased to 3 s. Once rats reached the performance criteria of a d′ ≥ 1.5 for ten sessions, they advanced to a consonant discrimination task. The quantity d′ is a measure of discriminability of two sets of samples based on signal detection theory.
During each consonant discrimination task, rats learned to discriminate the target sound from the distractor sounds. Trials began every 6 s, and silent catch trials were randomly interleaved 20–33% of the time. Rats were only rewarded for lever presses to the target (conditioned) stimulus. Pressing the lever on a stimulus other than the target resulted in a time-out during which the house light was extinguished and the training program paused for a period of approximately 6 s. Training took place in a soundproof, double-walled training booth that included a house light, a video camera for monitoring, a speaker (Optimus Bullet Horn Tweeter) and a cage (8 inches length × 8 inches width × 8 inches height) that included a lever, lever light and pellet receptacle. A pellet dispenser was mounted outside the double-walled, foam-lined booth to reduce noise. Rats were food deprived to motivate behavior, but were fed on days off to maintain between 80% and 90% ad lib body weight. Rats were housed individually and maintained on a reverse 12-h light-dark cycle.
Each consonant discrimination task lasted for 20 training sessions over 2 weeks. Six rats performed each of four different consonant-discrimination tasks (/d/ versus /s/, /d/ versus /t/, /r/ versus /l/, and /d/ versus /b/ and /g/), and five rats performed each of three different consonant-discrimination tasks (/m/ versus /n/; /sh/ versus /f/, /s/ and /h/; and /sh/ versus /ch/ and /j/). Each group of rats trained on each of the tasks for 2 weeks, in the order given. We subsequently tested them for 2 d on each task to ensure that discrimination ability was not strongly influenced by task order (see Supplementary Data). Data shown in Figure 4 was collected on the seventh and eighth days (that is, four sessions) of training on each task. Over these 2 d, each rat performed 940 ± 173 trials (mean ± s.d.). An example learning curve is shown in Supplementary Figure 4.
We recorded multiunit (n = 445) and single-unit (n = 16) responses from right primary auditory cortex (A1) of anesthetized, experimentally naive, female Sprague-Dawley rats in a soundproof recording booth (n = 11 rats). Rats were anesthetized with pentobarbital (50 mg kg−1) and received supplemental dilute pentobarbital (8 mg ml−1) every 0.5–1 h as needed to maintain areflexia, along with a 1:1 mixture of dextrose (5%) and standard Ringer’s lactate48 to prevent dehydration. Heart rate and body temperature were monitored throughout the experiment. Four Parylene-coated tungsten microelectrodes (1–2 MΩ, FHC) were simultaneously lowered to 600 μm below the surface of the right primary auditory cortex (layer 4/5). Electrode penetrations were marked using blood vessels as landmarks.
We recorded multiunit A1 responses (n = 40) in six awake rats using chronically implanted microwire arrays, which have been described in detail in previous publications49,50. Briefly, 14-channel microwire electrodes were implanted in the right primary auditory cortex using a custom-built mechanical insertion device to rapidly insert electrodes in layers 4/5 (depth, 550 μm)50. Restraint jackets were used to minimize movement artifacts during recording sessions, conducted 1–7 d after implantation.
Twenty 60-dB speech stimuli were randomly interleaved and presented every 2,000 ms for 20 repeats per site at each recording site. Brief (25-ms) tones were presented at 81 frequencies (1–32 kHz) at 16 intensities (0–75 dB) to determine the characteristic frequency of each site. All tones were separated by 560 ms and randomly interleaved. Sounds were presented approximately 10 cm from the left ear of the rat. Stimulus generation, data acquisition and spike sorting were performed with Tucker-Davis hardware (RP2.1 and RX5) and software (Brain-ware). Single units refer to well isolated waveforms likely to have been evoked by a single neuron. Multiunits include action potentials from more than one nearby neuron. The University of Texas at Dallas Institutional Animal Care and Use Committee approved all protocols and recording procedures.
Neurogram similarity was computed using euclidean distance. The euclidean distance between any two neurograms (X, Y) is the square root of the sum of the squared differences between the firing rate at each bin (j) for each recording site (i). For the analysis in Figure 3, part 1 and Figure 5a, we used activity from 40 1-ms bins from all 445 sites to compute the similarity between neurogram pairs. For the analysis in Figure 3, part 2 and Figure 5b, we used activity from a single 40-ms bin from each of 445 sites to compute the similarity between neurogram pairs
where nsites and nbins are the total numbers of sites and bins, respectively.
We used a nearest-neighbor classifier to quantify neural discrimination performance based on single-trial activity patterns8,37. The classifier binned activity using 1-ms to 700-ms intervals and then compared the response of each single trial with the average activity pattern (PSTH) evoked by each of the speech stimuli presented. The given trial being considered was not included in the average activity pattern, to prevent artifact. This model assumes that the brain region reading out the information in the spike trials has previously heard each of the sounds 19 times and attempts to identify which of the possible choices was most likely to have generated the trial under consideration. It uses euclidean distance to determine how similar each response was to the average activity evoked by each of the sounds. The classifier guesses that the single-trial pattern was generated by the sound whose average pattern it most closely resembles (that is, minimum euclidean distance). The onset response to each sound is defined as the 40-ms interval beginning when neural activity exceeded the spontaneous firing rate by three s.d. Error estimates are s.e.m. Pearson’s correlation coefficient was used to examine the relationship between neural and behavioral discrimination on the 11 tasks (n = 11).
The authors would like to thank J. Roland, R. Jain and D. Listhrop for assistance with microelectrode mappings. We would like to thank R. Rennaker for technical assistance and training and for providing microelectrode arrays and inserter. We would also like to thank M. Perry, C. Heydrick, A. McMenamy, A. Meepe, C. Dablain, J. Choi, V. Badhiwala, J. Riley, N. Hatate, P. Kan, M. Lazo de la Vega and A. Hudson for help with behavioral training. We would also like to thank S. Blumstein, Y. Cohen, H. Read, S. Denham, L. Miller, S. Edelman, V. Dragoi, H. Abdi, P. Assmann, X. Wang and R. Romo for their suggestions about earlier versions of the manuscript. This work was supported by grants from the US National Institute for Deafness and Other Communicative Disorders and the James S. McDonnell Foundation.
Note: Supplementary information is available on the Nature Neuroscience website.
AUTHOR CONTRIBUTIONSC.T.E., C.A.P., R.S.C. and A.C.R. collected behavioral training data. C.T.E., C.A.P., Y.H.C., R.S.C., V.J. and K.Q.C. recorded anesthetized cortical responses. J.A.S. recorded awake cortical responses. M.P.K. and C.T.E. wrote the manuscript and performed data analysis. All authors discussed the paper and commented on the manuscript.
Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions