|Home | About | Journals | Submit | Contact Us | Français|
Previous work has demonstrated that adults are capable of learning patterned relationships among adjacent syllables or tones in continuous sequences, but not among non-adjacent syllables. However, adults are capable of learning patterned relationships among non-adjacent elements (segments or tones) if those elements are perceptually similar. The present study significantly broadens the scope of this previous work by demonstrating that adults are capable of encoding the same types of structure among unfamiliar non-linguistic and non-musical elements, but only after much more extensive exposure. We presented participants with continuous streams of non-linguistic noises and tested their ability to recognize patterned relationships. Participants learned the patterns among noises within adjacent groups, but not within non-adjacent groups unless a perceptual similarity cue was added. This provides evidence that statistical learning mechanisms empower adults to extract structure among non-linguistic and non-musical elements, and that perceptual similarity eases constraints on non-adjacent pattern learning.
Statistical learning studies have demonstrated that adults, young children, and infants are capable of rapidly learning consistent relationships among temporally adjacent speech sounds or musical tones and of grouping these elements into larger coherent units such as words or melodies (Aslin, Saffran, & Newport, 1998; Perruchet & Pacton, 2006; Saffran, Aslin, & Newport; 1996; Saffran, Johnson, Aslin, & Newport, 1999; Saffran, Newport, & Aslin, 1996; Saffran, Newport, Aslin, Tunick, & Barrueco, 1997). Similarly, adults and infants are capable of grouping temporally adjacent patterned visual elements into coherent units (Fiser & Aslin, 2002; Kirkham, Slemmer, & Johnson, 2002).
In contrast, however, the ability to learn dependencies among non-adjacent elements is more selective. Natural languages exhibit only certain limited non-adjacent dependencies among sounds and word classes (Chomsky, 1957). In artificial language experiments, only certain types of non-adjacent patterns are readily learned (Cleeremans & McClelland, 1991; Gómez, 2002; Newport & Aslin, 2004; Onnis, Monaghan, Richmond, & Chater, 2005) and are particularly difficult to learn when the materials are complex or are presented in lengthy or continuous streams. Newport and Aslin (2004) showed that statistical learning of patterns between non-adjacent syllables is difficult1 but that similar relationships between non-adjacent segments (consonants or vowels), which are common in natural languages, can be learned quite easily. They suggested that, while non-adjacent relationships are more difficult to acquire than adjacent ones, this difficulty can be ameliorated when the non-adjacent elements are perceptually similar to one another (e.g., all consonants) and distinct from the intervening elements (e.g., vowels). Creel, Newport, and Aslin (2004) showed that patterns among non-adjacent tones can be learned if the non-adjacent elements are of a similar pitch range or timbre.
The present experiments significantly broaden these results to examine the same questions for patterns composed of non-linguistic and non-musical elements. We use non-linguistic noises that have no “names” and that do not fall along a single dimension (e.g., pitch for tones). We ask whether such unfamiliar noises show the same signature properties of statistical learning that have been demonstrated for familiar speech materials, in particular whether adults readily learn adjacent groupings and whether non-adjacent groupings are more selectively learned, based on whether the related elements are perceptually similar.
The materials and procedures were analogous to those used in previous studies of speech and tonal melodies. Adults were exposed to a continuous familiarization stream of non-linguistic noises and tested for their ability to recognize patterns they had heard. In each experiment, we constructed the familiarization stream by creating four strings of 3 non-linguistic sounds (“noise triplets”) and then sequencing tokens of these noise triplets in random order (excluding immediate repeats). In Experiment 1, in which the regularities are among adjacent noises, we show that these patterns of unfamiliar noises can be learned. In Experiments 2 and 3, in which the regularities are among non-adjacent noises, we demonstrate that these patterns cannot be learned, unless a perceptual similarity cue links the non-adjacent elements to one another.
We originally planned for the present series of studies to be completely analogous to previous experiments of statistical learning in speech streams (Newport & Aslin, 2004; Saffran, Newport, & Aslin, 1996). However, extensive piloting demonstrated that two important experimental design parameters required modification in the present studies.
First, 150 ms of silence was inserted between each noise in the familiarization streams, both within and across noise triplets. While participants could not perceive the silences, this slight spacing between sounds improved performance, likely because this helped participants encode the distinct, unfamiliar noises.
Second, the total exposure duration to the streams needed to be significantly lengthened, relative to that in previous studies with speech and tones (Creel et al., 2004; Newport & Aslin, 2004; Saffran, Aslin, & Newport; 1996; Saffran et al., 1999; Saffran, Newport, & Aslin, 1996). In previous studies in the speech domain, infants (Saffran, Aslin, & Newport, 1996) and adults (Newport & Aslin, 2004 and unpublished data) learned adjacent syllable dependencies and nonadjacent segment dependencies with familiarization periods of 2 to 20 minutes. However, our piloting in the noise domain showed that adults failed to learn adjacent dependencies when the total familiarization was 20, 35, or 40 minutes. It took 100 minutes of exposure, across three familiarization sessions, for participants to learn the adjacent statistical regularities. We then used this same 100-minute exposure in testing non-adjacent dependency learning.
Thus, in the present series of studies, we extend our previous results of statistical learning to non-linguistic and non-musical items. We show that the learning process is much more difficult and requires a much longer familiarization exposure than when the elements are speech sounds or tones; but nonetheless the same type of selective learning occurs for these patterns. In the General Discussion, we consider what these findings tell us about the mechanisms involved in statistical pattern learning.
The structure of the patterned regularities in the present experiment was identical to that in the easiest of our previous languages, designed for infants (Saffran, Aslin, & Newport, 1996). Four non-linguistic noise triplets were constructed, each of which contained three unique noises; these triplets were then sequenced in a constrained random order with no immediate repetitions to form a familiarization stream. Over the familiarization stream, the transitional probabilities between noises within a triplet were all 1.0; the transitional probabilities at the triplet boundaries were all 0.33. Each individual noise was of the same duration, loudness, and frequency, with the same interval from one noise to another. The only available information for segmenting the non-linguistic noise triplets was the greater statistical regularity of adjacent noise sequences within a triplet than of adjacent noise sequences that spanned a triplet boundary.
Sixteen University of Rochester undergraduates participated, for a payment of $30 each. In this and the following experiments, participants were monolingual English speakers with normal hearing and no diagnosed learning disabilities or attention disorders. None had previously participated in a statistical learning experiment.
An inventory of 12 non-linguistic sounds, comprised of Macintosh Operating System 9 alert sounds and iMovie sounds (from http://www.apple.com), was used in this experiment.2 SoundEdit 16 Version 2 (Macromedia, Inc.) was used to edit each individual sound to a duration of 0.22 - 0.25 s, to standardize volume across sounds, and to fade each sound in and out (10 msec ramp).
Four different non-linguistic noise triplets were created for each of two different phonetic instantiations (“Language I” and “Language II”), to guard against participants' idiosyncratic preferences for particular individual sounds or their combinations. Each noise triplet consisted of three unique sounds from the sound inventory (Table 1). In this and the following experiments, a 150-ms silent interval was inserted between each sound within and between each noise triplet, using SoundEdit. Eight participants were tested in each of the two Language conditions.
For each condition, 24 tokens of each of the four noise triplets were sequenced in random order (excluding immediate repeats) to create a continuous sound stream. The stream was then looped 21 times to create a familiarization of approximately 40 minutes (with two one-minute silent rest periods at equally-spaced intervals), which participants heard during each of the first two testing sessions. The stream was looped 10 times to create the familiarization of approximately 20 minutes (with a one-minute silent rest period at the halfway point), which participants heard during the third session. Participants were thus exposed to a total of 4992 noise triplet tokens during the three-session familiarization.
All participants were tested individually in a quiet room while listening to the recordings on a Sony minidisk player through Sennheiser Symphony HD 570 headphones. Participants were instructed to listen attentively to the continuous sound stream, that they might begin to recognize some patterns, and that at the end of the third session they would be tested to determine how well they recognized the patterns.
The experiment consisted of two phases: familiarization and test. There were three sessions on consecutive days. During each of the first two sessions, participants heard the 40-minute familiarization stream; during the third, participants heard the 20-minute familiarization stream and then completed the test.
Test trials were constructed by pairing each noise triplet with each noise part-triplet, to determine whether participants could recognize the more statistically consistent patterns (the noise triplets). Each noise triplet/noise part-triplet combination occurred twice (in counterbalanced order) during the test, rendering a total of 32 trials. In this and the following experiments, two different randomized presentation orders were used for the trials (counterbalanced across participants).
For each test trial, participants heard a noise triplet and a noise part-triplet, separated by a 1-s silent interval. Participants were instructed to indicate which was more familiar, based upon the recording they had heard, by circling either “1” or “2” on a pre-printed answer sheet. A 5-s silent interval followed each trial to allow participants time to record their response.
We report the pooled data for Languages I and II, since there was no significant difference between participants' performance on the two languages [t(14) = 1.24, p = 0.23]. The first bar in Fig. 1 shows the pooled test results. Participants did readily acquire the regularities within the languages. Overall, test performance significantly exceeded chance (mean = 67.58% correct, t(15) = 4.58, p = 0.0004).
In this experiment we ask whether participants are able to learn a set of statistics in a non-linguistic sound stream in which the statistical regularities occur among non-adjacent elements that are not selected to be perceptually similar. We created a continuous sound stream that contained a pattern in which the only available information for segmenting the “noise triplets” was the greater statistical regularity of non-adjacent noises forming triplets than of any of the adjacent noises. The transitional probabilities between non-adjacent noises within each triplet were all 1.0; those between adjacent noises within each triplet were all 0.5; those between adjacent noises that spanned a triplet boundary were all 0.5.
A new set of sixteen University of Rochester undergraduates participated for a payment of $30 each.
An inventory of six sounds was used to create four noise triplets for each of two languages (I and II). This inventory was a subset of that from Experiment 1 (see Table 1 and Appendix). The non-adjacent patterned relationships occurred between the first and third sounds of each noise triplet (e.g., between sounds “A” and “B”), skipping over the intervening sound element (e.g., “x”). As shown in Table 1, there were two non-adjacent frames for participants to detect: A[x or y]B and C[x or y]D.
The randomization and looping schemes for creating the streams were similar to those in Experiment 1. Noise triplets were sequenced in a constrained random order (excluding immediate repeats) to create a continuous sound stream. Words 1 and 4 could be followed only by Words 2 and 3; Words 2 and 3 could be followed only by Words 1 and 4 (see Table 1; words are listed in consecutive order). Each word and each juncture occurred equally often. Eight participants were tested in each of the Language conditions.
The procedure was identical to that of Experiment 1, except for the use of different stimulus and test materials.
We report the pooled data for Languages I and II, since there was no significant difference between participants' performance on the two languages [t(14) = −0.081, p = 0.94]. The second bar in Fig. 1 shows the pooled test results. Participants did not acquire the patterned relationships. Overall, test performance was significantly below chance (mean = 37.11% correct, t(15) = −2.78, p = 0.014).
It is unclear why participants scored below chance on the test. However, we have previously observed similar performance levels in tests of statistical learning among non-adjacent speech syllables (Newport & Aslin, 2004). Importantly, performance in Experiment 2 was significantly worse than performance in Experiment 1 (F(1,30) = 25.59, p < 0.0001). The failure to learn these non-adjacent relationships occurred despite the fact that the number of different noise triplets (four) was the same across the two experiments, and there were fewer patterns (two) to be learned in Experiment 2 than in Experiment 1 (eight). (See Table 1.)3
Previous studies have provided evidence that the learning of non-adjacent patterned relationships is highly selective (Creel et al., 2004; Newport & Aslin, 2004; Onnis et al., 2005). With speech materials, learners readily acquire non-adjacent relationships among consonants (with the intervening vowels unrelated) or among vowels (with the intervening consonants unrelated), but not among non-adjacent syllables. One hypothesis is that these results are particular to speech and arise from differences in learning segments versus syllables. But another hypothesis is that these results are not domain-specific and rather reflect the greater ease of pattern learning among perceptually similar elements. If the latter hypothesis is correct, comparable results should appear in the learning of patterns in non-linguistic materials.
In the present experiment, we investigate whether perceptual similarity among non-adjacent, non-linguistic sounds facilitates participants' pattern learning. We added four new sounds to our inventory, which were perceptually similar to one another (tonal) but perceptually different from other (raspy) sounds in our inventory. While the same type of AxB non-adjacent patterns were used here as in Experiment 2, the non-adjacent patterned relationships in the present experiment occurred among the raspy sounds, with intervening tonal sounds.
As in Experiment 2, the continuous sound stream contained a pattern in which the only available information for segmenting the noise triplets was the greater statistical regularity of non-adjacent noises than of any of the adjacent noise sequences. The transitional probability structure was identical to that in Experiment 2.
A new set of sixteen University of Rochester undergraduates participated for a payment of $30 each.
An inventory of ten sounds was used to create four noise triplets for each of two languages (I and II). This inventory included six sounds from Experiment 2 and four new sounds from: http://www.partnersinrhyme.com (see Table 1 and Appendix). As in Experiments 1 and 2, each sound was edited to a duration of 0.22 - 0.25 s, normalized to standardize volume across sounds, and faded in and out. Five naïve adults rated each individual sound on a scale of 1-7 (with "1" being “not tonal at all” and “7” being “very tonal”). The mean of subjects' ratings for the atonal (raspy) sounds (2.97) that had been pre-selected for the experiment was significantly different from the mean of subjects' ratings for the tonal sounds (4.50) that had been pre-selected for the experiment [t(4) = 7.19, p = 0.002].
In each noise triplet, the non-adjacent raspy elements (e.g., “A” and “B” from Table 1) were perceptually similar to one another, but perceptually different from the intervening tonal element (e.g., “x”).
The randomization and looping schemes were identical to those in Experiment 2. Eight participants were tested in each of the two Language conditions.
The procedure was identical to that of Experiments 1 and 2, except for the use of different stimulus and test materials.
For the test, the structure of the noise part-triplets was identical to that in Experiments 1 and 2. Test trials were constructed by pairing each noise triplet with each noise part-triplet. Each noise triplet/noise part-triplet combination occurred once (in counterbalanced order) during the test, rendering a total of 16 trials. The test structure was otherwise identical to that in Experiments 1 and 2.
We report the pooled data for Languages I and II, since there was no significant difference between participants' performance on the two languages [t(14) = 1.74, p = 0.10]. The third bar in Fig. 1 shows the pooled test results. Participants did readily acquire the regularities within these languages. Overall, test performance significantly exceeded chance (mean = 68.36% correct, t(15) = 3.66, p = 0.0023). Thus, participants learned the non-adjacent relationships when the non-adjacent elements were perceptually similar to one another but perceptually different from the intervening elements.4 Performance in Experiment 3 significantly exceeded performance in Experiment 2 (F(1,30) = 20.88, p < 0.0001).
The results of the present series of three experiments provide important new information about how adults learn patterned relations among elements in continuous streams. First, the ability to extract statistical patterns between temporally adjacent elements is rendered much more difficult when the elements are unfamiliar, complex, unlabeled noises than when the elements are familiar speech syllables. Minimum exposure duration required for successful learning of adjacent patterns differed by a factor of 5 from previous studies of statistical learning with speech or simple tones. Second, the pattern of results – superior learning of adjacent statistics and successful learning of non-adjacent statistics only when they were defined by a correlated acoustic cue – replicated findings from studies of both speech and tones. Thus, while the efficiency of statistical learning was reduced by element unfamiliarity, the overall pattern of statistical learning remained invariant.
One question that arises from the much greater exposure duration required for learning in the present experiments is whether the rapidity of statistical learning with speech materials benefits from extensive prior exposure to speech (thereby rendering its elements highly familiar), or from a natural, more efficient encoding of speech sounds and tones by adults and infants. Marcus, Fernandes, and Johnson (2007) have argued for the “special” status of speech, whereas Johnson et al. (in press) have softened that claim based on finding comparable patterns in studies of visual statistical learning (see also Saffran, Pollak, Seibel, & Shkolnik, 2007). At minimum, we know that rapid statistical learning of patterns across speech syllables does not require a species-specific mechanism unique to humans, since both tamarin monkeys (Hauser, Newport, & Aslin, 2001) and rats (Toro & Trobalón, 2005) learn statistical relations between adjacent syllables. Future research with non-speech sounds will be required to test whether infants are predisposed to process speech in a particularly efficient manner, or whether any initially unfamiliar and complex set of non-speech sounds could be learned efficiently given sufficiently extensive prior exposure.
Most significant, however, is the finding that, even with materials that are highly unfamiliar and require much greater exposure for successful statistical learning, the same findings obtain regarding the types of patterns that are relatively easy and relatively hard to learn. Statistically consistent patterns among arbitrarily chosen adjacent elements are easier to learn than those among arbitrarily chosen non-adjacent elements, and this advantage for adjacent over non-adjacent patterns holds even when there are many fewer non-adjacent patterns to be learned. However, when the non-adjacent elements are perceptually similar and distinct from the intervening elements, learning is successful. Our results support the view that perceptual similarity is one of several contributing constraints on learning. Taken together, these findings suggest that temporal proximity and perceptual similarity can each support statistical learning across a wide range of materials.
This research was supported by NIH grants (HD-37082 and DC-00167) and by a Packard Foundation grant to the second and third authors. The first author was supported by an NIH training grant (T32-MH19942).
|Sound||Expt. 1(I)||Expt. 1(II)||Expt. 2(I)||Expt. 2(II)||Expt. 3(I)||Expt. 3(II)|
Publisher's Disclaimer: This manuscript was accepted for publication in Psychonomic Bulletin & Review on December 28, 2008. The copyright is held by Psychonomic Society Publications. This document may not exactly correspond to the final published version. Psychonomic Society Publications disclaims any responsibility or liability for errors in this manuscript.
The following materials and links may be accessed through the Psychonomic Society's Norms, Stimuli, and Data Archive, www.psychonomic.org/Archive. To access these files or links, search the archive for this article using the journal (Psychonomic Bulletin & Review), the first author's name (Gebhart) and the publication year (2009).
DESCRIPTION: The compressed archive file contains 16 files (.wav files of the individual sounds that comprise the noise triplets used in the present experiments). Refer also to Table 1 and the Appendix. The files are: SimpleBeep.wav, ChuToy.wav, HighSproing.wav, Bell.wav, DigitalLand.wav, Purr.wav, Indigo.wav, MagicMorph.wav, Sosumi.wav, Chirp.wav, Voltage.wav, Temple.wav, Button.wav, Spo.wav, BeepFM.wav, and BeepPure.wav.
1See Peña, Bonatti, Nespor, & Mehler (2002) for counterevidence.
3One possible reason for the significantly below-chance performance is that frames (i.e., A_B or C_D) were allowed to repeat in the constrained randomization. Thus, an AxBAyB sequence may have heightened attention to the bigram (BA) at a triplet boundary.
4Endress & Mehler (in press) claim that learners might not learn three-syllable words as units, but rather become sensitive to co-occurrences of syllable pairs (enabling them to distinguish words from part-words on a 2AFC test). We are agnostic about whether noise triplets have been learned as a chunk in this sense; we argue only for stream segmentation, which could be based on sets of bigram groupings.