|Home | About | Journals | Submit | Contact Us | Français|
Language is uniquely human, but its acquisition may involve cognitive capacities shared with other species [1-5]. During development, language experience alters speech sound (phoneme) categorization [6-8]. Newborn infants distinguish the phonemes in all languages, but by 10 months show adult-like greater sensitivity to native language phonemic contrasts than non-native contrasts [8, 9]. Distributional theories account for phonetic learning by positing that infants infer category boundaries from modal distributions of speech sounds along acoustic continua [10, 11]. For example, tokens of the sounds /b/ and /p/ cluster around different mean voice onset times. To disambiguate overlapping distributions, contextual theories propose that phonetic category learning is informed by higher-level patterns (e.g. words) in which phonemes normally occur [12-15]. For example, the vowel sounds /I/ and /e/ can occupy similar perceptual spaces, but can be distinguished in the context of “with” and “well”. Both distributional and contextual cues appear to function in speech acquisition [10-12, 16-21]. Non-human species also benefit from distributional cues for category learning [22-24], but whether category learning benefits from contextual information in non-human animals is unknown. The use of higher-level patterns to guide lower-level category learning may reflect uniquely human capacities tied to language acquisition, or more general learning abilities reflecting shared neurobiological mechanisms. Using songbirds, European starlings, we show that higher-level pattern learning covertly enhances categorization of the natural communication sounds. This observation mirrors the support for contextual theories of phonemic category learning in humans, and demonstrates a general form of learning not unique to humans or language.
The complex vocalizations (songs) of starlings follow a hierarchical acoustic structure [25-28], with short (200 – 800 ms long) stereotyped patterns of simple notes grouped into “motifs” (e.g., Figures 1B and S1), and longer (~ 1min long) well-defined sequences of motifs organized into bouts. Starling song motifs can be classified by their acoustic characteristics into four species-typical, open-ended, perceptual categories: whistles, warbles, rattles, and high-frequencies[25, 26, 29-31]. The sequential patterning of motifs in bouts underlies successful individual recognition  and mate selection . In controlled operant settings, starlings can accurately classify and generalize arbitrary motif patterns of the forms AABB and ABAB , where ‘A’ and ‘B’ represent sets of “warble” and “rattle” motifs. As in humans, the ability of starlings to generalize learned patterns is constrained by the integrity of the categorical boundaries for the pattern elements (e.g., warbles and rattles) [34, 35]. Thus, the patterning rule is defined at the level of the category, and pattern generalization requires the acoustic structure of the category to be well defined.
Here, we ask whether the acoustic structure of underlying categories, in addition to aiding pattern generalization, may also be shaped directly by pattern learning. This is the correlate to the question of whether, in humans, lexical context influences phonetic category learning. To do this, we trained one group of starlings (“pattern-relevant”, N=4) using operant techniques to differentiate complex auditory patterns, following the form AABB and BBAA from those that followed the form ABAB and BABA, where A and B denote natural motif categories of warbles and rattles. In addition, we trained a second group of starlings (“pattern-irrelevant”, N=4) to classify the same AABB, BBAA, ABAB and BABA motif sequences, but shuffled so that the patterning rules were non-informative for correct classification (see Table 1). We then compared how rapidly pattern-relevant and -irrelevant groups learned to classify the individual A and B motifs that had already experienced. We hypothesized that the pattern-relevant experience would improve perceptual expertise for lower-level acoustic categorization. If true, then the pattern-relevant birds should show advantages in motif categorization over naive birds, and over the pattern-irrelevant birds for whom the patterned motif sequences were familiar but not behaviorally relevant.
All of the pattern-relevant subjects learned to classify AABB and BBAA from ABAB and BABA patterns. The mean percentage of correct responses began improving rapidly after about 5-6 thousand trials (Fig. 1C), and by 10,000 trials was well-above chance (single sample t-test; t = 11.09; p = 0.008, chance = 50%). To measure pattern generalization, we then tested subjects on 500 novel 4-motif sequences, built with the same motifs and following the same patterns used during training. Mean classification accuracy during this generalization test was significantly above chance (single sample t-test; t = 3.9; p = 0.0298, chance = 50%; Fig. 1C). This pattern generalization effect is observed at the individual level for 3 out of 4 subjects (Bird 681: p < 0.0001, Bird 716: p < 0. 002, Bird 827: p < 0.0001, Bird 828: p = 0.227; binomial tests where chance is 0.5). This corroborates previous results indicating that starlings recognize auditory patterns of motif categories based on their underlying temporal structures [33, 34].
For subjects in the pattern-irrelevant training group, who served as controls for sequence and motif exposure, performance never exceeded chance thresholds (single sample t-test; t = -0.696 ; p = 0.536721, chance = 50%, Fig. 1C). To ensure that the pattern-irrelevant birds got at least as much exposure to the motifs and sequences as birds in the pattern-relevant group, we randomly paired birds between the two groups, and then exposed each pattern-irrelevant bird to at least as many training trials (159.25 ± 21.47 100-trial blocks) as its paired pattern-relevant counterpart had received (119.25 ± 27.59 100-trial blocks; matched pairs t-test t = 3.22; p = 0.0487; Fig. 1C). The pattern-irrelevant subjects were also given 500 dummy pattern generalization trials, where they encountered the same generalization test stimuli as pattern-trained birds. As with their training stimuli, however, there was no fixed relationship between pattern and reward (see Table 1) and performance did not differ significantly from chance (single sample t-test; t = −0.233; p = 0.831, chance = 50%; Fig. 1C).
Following the pattern-relevant and -irrelevant training, we assessed categorization of the individual warble and rattle motifs the animals had heard in the 4-motif patterns. We also trained a group of experimentally naïve birds on the same motif categorization task as an additional control. Birds in the pattern-relevant group showed a clear advantage in motif categorization compared to both the pattern-irrelevant and naïve birds. Figure 2 shows the mean performance for the three groups across the first 600 trials, highlighting initial categorization. Over this interval, the mean performance of the pattern-relevant birds was significantly better than that for both other groups (LMM, F(2,9) = 9.96; p = 0.0052, main effect of group; Tukey's HSD post-hocs: pattern-relevant versus -irrelevant p = 0.0295, and pattern-relevant versus naïve p = 0.0049, pattern-irrelevant versus naïve p = 0.4873). Likewise, over the first 600-trials, the performance of the pattern-relevant birds improved at a significantly faster rate than that for the other two groups (LMM, F(10,45) = 3.551; p = 0.0016, group x training block interaction). Post-hoc analyses comparing group performance in each of the first six 100-trial blocks reveal significant differences between groups emerging in blocks 5 and 6 (Bonferroni-corrected α = 0.0083; p = 0.002 and p < 0.0001, respectively; Fig. 2). Birds in the pattern-relevant group reached our arbitrary learning criterion (three consecutive blocks with d-prime > 1.0, Supplemental Information) in 5.5 ± 0.8 (μ ±SE) blocks, whereas birds in the pattern-irrelevant and naïve groups required 14.5 ± 1.2 and 16.75 ± 4.9 blocks, respectively, to achieve the same stable, accurate motif classification.
Strong advantages for motif classification are also observed in the individual data, where in block 5, two of four, and in block 6, four of four subjects in the pattern-relevant group performed significantly better than expected by chance (binomial test, chance = 0.5, p < 0.05 each case). For each of the pattern-trained birds, average performance over the first 600 trials was significantly above chance (binomial test, chance = 0.5, p < 0.05 all four cases). In contrast, average performance for none of the naïve birds and only one of the pattern-irrelevant birds was above chance across the first six blocks (binomial test, chance = 0.5). Therefore, we conclude that auditory pattern learning, but not exposure to or rote memorization of acoustic sequences, enhances the perceptual mechanisms that underlie acoustic categorization in songbirds.
To confirm that subjects in all three groups could ultimately learn to categorize warbles and rattles with similar proficiency, we continued training all subjects past the initial 600-trial period until their performance was consistently better than chance for multiple consecutive blocks (Supplemental Information). At the end of this extended training, motif classification accuracy was similarly high in all three groups (F(2,9) = 0.28; p = 0.7608, main effect of group in final 100-trial block; Fig. 2). Thus, the motif categories are learnable for all subjects.
Our results support the idea that high-level pattern learning improves lower-level acoustic categorization. However, the poor performance of the pattern-irrelevant birds during initial training (Fig. 1C) could have led to stimulus independent response strategies that delayed subsequent acquisition for the motif classification. To examine whether pattern-relevant and -irrelevant groups used the operant apparatus in similar ways, we compared several stimulus independent response measures. During pattern training, if a subject responded incorrectly, we delivered a correction trial in which the same stimulus was repeated on the next trial, and all trials thereafter until the animal responded correctly (Supplemental Information). As subjects learn the operant contingencies, the number of consecutive correction trials decreases, approaching an optimum of 1. All subjects showed significant decreases in the number of consecutive correction trials over the course of pattern training (Pearson's correlation: in all 8 cases, p < 0.05), and the mean rate of this decrease did not differ significantly between the pattern-relevant and -irrelevant groups (unmatched t-test t = −1.86; p = 0.152). By these measures both groups were equally adept at working the operant apparatus. Likewise, there was no significant difference between the mean reaction times for subjects in the two groups during the last 5 100-trial blocks of pattern training (RT for Go stimuli: t = 0.642; p = 0.55; RT for NoGo stimuli: t = 1.66, p = 0.16). Thus despite the strong difference in response accuracy (Fig. 1C), both group aligned their responses to stimulus offset. Finally, we note that during the motif classification, acquisition rates for birds in the pattern-irrelevant and naïve groups did not differ significantly (paired t-test, p=0.1192, over the first 15 blocks, for which we have data from all subjects) further indicating that the pattern-irrelevant birds had not learned to ignore the song stimuli altogether, as they readily used them when their diagnostic value for the task was salient.
We show that learning to classify patterned sequences of species-specific vocalizations enhances categorization of the sequence components. This enhancement is not driven by simple exposure to or familiarity with category exemplars or sequences, but rather by interaction with behaviorally relevant patterning rules operating on the acoustic categories.
Our results have important parallels to perceptual changes during the first year of human development in which infants acquire adult-like phonetic categories emphasizing the phonemic contrasts relevant to their own language environment [see 7 for review]. One hypothesis for the emergence of phonetic categories is that infants learn, in an unsupervised way, the statistical properties of distributions of speech sounds along acoustic continua [10, 11, 36]. These categories could then enable access to more complex lexical information with phonemes (rather than explicit sounds) patterned into words. A second hypothesis is that phonetic category learning is shaped by the lexical (or other higher level) contexts within which speech sounds normally occur [12-15]. Distributional and contextual sources of information are not mutually exclusive, and empirical evidence consistent with both accounts has been observed [10, 11, 16, 17]. For instance, looking time experiments with 8-month-olds suggest that infants apply word-level information to guide the perception of vowel categories . Interestingly, computational studies of phonetic category learning indicate that attending to contextual cues yields more efficient phonetic category learning than distributional cues [12, 19-21], and infants are attentive to this “higher-level” information at times when phonetic categories are still developing [7, 37, 38]. Our observation of a top-down contextual learning mechanism in songbirds supports the idea that speech acquisition could co-opt general learning mechanisms not unique to humans or language.
Although we demonstrate a “top-down” effect of pattern learning on classification, it is important to note that our task does not precisely model phonetic category learning. In our study, the perceptual boundary between the warble and rattle motif categories emphasized by pattern-relevant training is well-defined acoustically, and the motifs within each category are generally distinguishable. Phonemic boundaries, on the other hand, tend to parse continuous perceptual dimensions, and the elements within phoneme classes are typically indistinguishable. Likewise, the structure of reinforcement is another potentially important difference between our study and the infant studies. Given that speech-like categorical perception is well-documented in non-human animals [24, 39, 40], it will be important for future studies to examine whether the top-down learning mechanisms observed here can influence more subtle, psychophysical measures of categorization acquired with unsupervised feedback.
In principle, our results could be accounted for by a mechanism that tunes perceptual representations to ‘category-relevant’ acoustic features of the component sounds, or by a mechanism that biases the associative processing of already salient features. Attention is an obvious candidate to control top-down modulation of either mechanism, as expectations gleaned from pattern structure could bias attention to specific features of sound patterns that are either about to occur or are held in working memory. This is consistent with top-down influences on phoneme perception in human adults, where ambiguous speech sounds are resolved perceptually based on the subject's knowledge of a word . For example, classic psychological experiments  show that if a sound located in the middle of the /d/ - /t/ phonetic continuum precedes “_ask”, listeners will report hearing the word “task” as opposed to the non-word “dask.” Contrarily, if the same stimulus precedes “_ash” subjects report hearing the word “dash” over the non-word “tash.” The contributions of similar attentional and working memory processes to phonemic category learning remains an open question.
To our knowledge we provide the first demonstration that high-level pattern learning can shape lower-level perceptual representations in a non-human animal. Starlings already serve as an important model species to investigate how experience alters the response properties of sensory neurons throughout the avian forebrain [43-50]. The strong parallels between the present results and human phonemic category learning suggest that this species may also serve as a suitable nonhuman model system to understand the basic biology for a range of perceptual, categorical, and learning-related mechanisms that lie at the core of infant speech acquisition .
Complete procedures are detailed in the Supplemental Information. All procedures were approved by the UCSD institutional animal care and use committee.
Twelve wild-caught European starlings (Sturnus vulgaris) served as subjects. Figure 1A illustrates the operant apparatus used in the go-nogo procedure  to train starlings on the 4-motif pattern and single-motif classification tasks. The 4-motif patterned stimuli (e.g., Fig. 1B) were constructed from sixteen acoustically distinct warble and rattle motifs (eight motifs per class, labeled “A” and “B”, respectively, Fig. S1) assembled into 4-motif sequences of the form AABB, BBAA, ABAB, and BABA (Table 1). We trained one group of subjects (pattern-relevant; N = 4) using 32 (out of a possible 16,384) patterned stimuli to distinguish 8 AABB and 8 BBAA sequences from 8 ABAB and 8 BABA sequences (Table 1). To control for motif and sequence exposure, we trained a second group of birds (pattern-irrelevant; N = 4) to distinguish 4 AABB, 4 BBAA, 4 ABAB and 4 BABA sequences from 4 AABB, 4 BBAA, 4 ABAB and 4 BABA sequences (Table 1). Birds in the pattern-relevant group could solve the task by determining whether the sequence on a given trial followed the pattern XXYY or XYXY, where X and Y denote either A or B, but birds in the pattern-irrelevant group could not (Table 1). Stimuli for the motif categorization task were the eight warble and eight rattle motifs used to construct the patterned sequence stimuli, with a single motif presented on each trial. We compared percent correct scores across groups using a linear mixed effects model (LMM), and, where necessary, single and matched-pairs t-tests. We analyzed individual subject data using binomial tests comparing raw numbers of correct responses in a given trial-block, with chance = 50% of all responses.
Work supported by NSF Graduate Research Fellowship 2011122846 to JAC and NIH DC008358 to TQG.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
JAC and TQG designed the research, JAC performed the research, JAC and TQG analyzed the data, JAC and TQG wrote the paper.