|Home | About | Journals | Submit | Contact Us | Français|
Adults rapidly learn phonotactic constraints from brief production or perception experience. Three experiments asked whether this learning is modality-specific, occurring separately in production and perception, or whether perception transfers to production. Participant pairs took turns repeating syllables in which particular consonants were restricted to particular syllable positions. Speakers' errors reflected learning of the constraints present in the sequences they produced, regardless of whether their partner produced syllables with the same constraints, or opposing constraints. Although partial transfer could be induced (Experiment 3), simply hearing and encoding syllables produced by others did not affect speech production to the extent that error patterns were altered. Learning of new phonotactic constraints was predominantly restricted to the modality in which those constraints were experienced.
When native English speakers hear the word “ngungseh” (/ŋλŋsε/), they immediately suspect that it is foreign. The word violates the phonotactic constraint in English that /ŋ/ only appears in syllable-final (coda) position. Phonotactic constraints like this one are language-specific. In fact, the example given above is a word in Cantonese, which permits /ŋ/ in syllable-initial (onset) position.
How do we learn language-specific phonotactic constraints? Obviously, we learn from experience. But linguistic experience with word forms comes from two sources, what we hear (perception) and what we say (production). The experiments presented in this paper investigate the roles that perception and production play in constructing phonotactic knowledge, and their potential interactions with one another during that construction.
Sensitivity to phonotactics begins early in life. Infants as young as 9 months old discriminate between sound sequences that are legal or illegal in their native language (e.g.,Jusczyk, Friederici, Wessels, Svenkerud, & Jusczyk, 1993). Among legal sound sequences, they discriminate between those that are of high and low phonotactic probability (Jusczyk, Luce & Charles-Luce, 1994). The influence of this knowledge is felt throughout life in both perception (e.g., Massaro & Cohen, 1983; Dupoux, Kakehi, Hirose, Pallier, & Mehler, 1999; McQueen, 1998; Redford, 2008) and production (e.g., Fromkin, 1971; Redford, 2008). In this paper, we will be particularly concerned with the expression of phonotactic knowledge in production performance, and specifically with how phonotactics shape speech errors.
Although phonotactic learning begins early in life, recent evidence suggests that the resulting knowledge is far from static in adulthood. It is the guiding hypothesis of the present research that although adults already possess rich phonotactic knowledge, they can still learn new phonotactic-like constraints from ongoing experience. Dell, Reed, Adams, and Meyer (2000) provided the first demonstration of adults’ learning of new consonant-position constraints by requiring participants to produce many syllables exhibiting those constraints. The participants recited sequences of four consonant-vowel-consonant (CVC) syllables, such as “kes feng heg men”. The critical manipulation involved the artificial restriction of particular consonants to particular syllable positions. For example, half of the participants experienced /f/ always as an onset and /s/ always as a coda, and half experienced the reverse assignment. As a result, the syllables to be recited reflected multiple levels of constraints on consonant positions: /h/ and /ŋ/ were restricted by language-wide constraints. As required by English phonotactics, /h/ only appeared in onset position, and /ŋ/ only appeared in coda position. /f/ and /s/ were subject to the artificially-imposed experiment-wide constraint. The other consonants (/k/, /g/, /m/, /n/) were unrestricted and appeared in both onset and coda positions. Learning was measured by whether the experiment-wide constraint pattern emerged in participants’ speech errors over the course of four testing sessions, each session occurring on a separate day. The unrestricted consonants served as a baseline for comparison with the consonants subject to the experiment-wide constraints.
Participants’ speech errors in this task reflected the constraints present in the materials. Errors involving /h/ and /ŋ/ never violated the language-wide constraints. That is, even in errors, /h/ only surfaced as an onset and /ŋ/ only as a coda. The key results concerned the consonants subject to the experiment-wide constraints: Errors involving the constrained consonants (/f/ and /s/) were overwhelmingly likely to be “legal” errors, which means that, when these consonants slipped to a new syllable, they preserved their original status as an onset or a coda (98% and 95% legal in Experiments 1 and 2, respectively). These percentages were much larger than their respective unrestricted baselines (68% and 77% legal). This suggests that participants implicitly picked up the experiment-wide constraint from reciting the syllables and that this learning caused the experimentally restricted consonants to preserve their syllabic position to a greater extent than the unrestricted consonants.
The new consonant-position constraints were learned very rapidly. Even though participants had four sessions of training, Dell et al. (2000) found evidence of learning in the first session. The time course of learning consonant-position constraints was probed in more detail by Taylor and Houghton (2005). They revised the procedure of Dell et al. by introducing a reversal of the consonant position constraint in the middle of the experiment (e.g., /f/ always an onset, /s/ always a coda switched to /s/ always an onset, /f/ always a coda). Participants’ speech errors reflected this new constraint within 9 trials, further supporting the rapid learning of constraints that depend only on consonant position within a syllable. Goldrick (2004) further modified the materials so that constraints defined at the level of the segment (/f/ is always an onset) and constraints defined at the level of phonetic features (the feature “bilabial” can occur in the coda) were simultaneously present, and found speech errors to be sensitive to both kinds of constraints. Other studies using this paradigm have demonstrated the learning of probabilistic (Goldrick & Larson, 2008) and context-dependent (Warker & Dell, 2006; Warker, Dell, Whalen, & Gereg, 2008) constraints on consonant position.
Another indispensable source of phonotactic information comes from perception experience. Listening to others speak is the first step in learning a language. Information about the organization of the sound patterns in a language is certainly available to perceivers, but is listening experience alone enough for learning phonotactic constraints? Onishi, Chambers, and Fisher (2002) found that it was. They instructed adult participants first simply to listen to a set of CVC syllables, in which one group of consonants was restricted to onset position (e.g., /b/, /k/, /m/, /t/) and another group of consonants was restricted to coda position (e.g., /p/, /g/, /n/, /č/). During a subsequent test phase, the participants shadowed (repeated immediately upon hearing) both studied and unstudied syllables. Each unstudied syllable was legal or illegal, depending on the consonant-position constraints present in the familiarization materials. Shadowing latencies were shorter for legal than for illegal syllables, suggesting that the participants learned the new phonotactic constraints and applied them to new syllables in the test. Since the participants only experienced the constraints in the familiarization materials, they must have learned the experimental constraints from listening experience.
Infants can also learn new phonotactic constraints from brief perception experience. Chambers, Onishi and Fisher (2003) examined 16.5-month-old infants’ ability to learn consonant-position constraints using the head-turn preference procedure. Infants heard sets of CVC syllables containing artificially imposed consonant-position constraints, and were then tested on legal and illegal unstudied items. Infants preferred to listen to illegal items relative to legal items, showing that they discriminated the two types of items and were more interested in those that displayed a different pattern from the syllables in the study phase. Further studies showed that 9- and 10-month-old infants succeeded in similar tasks (Chambers, 2004; Saffran & Thiessen, 2003; Seidl & Buckley, 2005).
To sum up, previous research has showed that both infants and adults can learn new phonotactic-like constraints. This can occur in production, where repeated recitation of constrained syllables affects speech errors, and in perception, where listening to constrained syllables affects performance in subsequent perceptual tests.
The current study investigated whether perception and production experience interact with one another during phonotactic learning, a question inspired by the previous studies. The relationship between perception and production has always been controversial. A common intuition is that perception and production processes must access the same linguistic representation systems (phonology, semantics, etc.), since we speak the same language that we hear and understand. However, other phenomena suggest that the relationship between perception and production is not so straightforward. Perception and production development are not synchronized in early language acquisition, with children typically exhibiting perception abilities that may not be expressed in their production (e.g., E. Clark & Hecht, 1983; Menn, 1983). This unbalanced relationship between perception and production persists into adulthood. Adults may experience and understand more than they can say themselves, either in acquiring a second language, or in gaining familiarity with accents or dialects that they do not produce (e.g., Bradlow & Bent, 2008; E. Clark & Hecht, 1983; H. Clark & Malt, 1984). For example, evidence from sociolinguistic mergers shows that even in their own dialect, adults may be able to perceive a distinction between sound categories but then be unable to produce that difference in their own speech (Labov, 1994; Thomas & Hay, 2006). These observations suggest that the language perception and production systems are separate and autonomous to some degree.
The view that language perception and production are separate systems, at least at some levels (e.g., sublexical levels), is not only supported by observational studies, but also by experimental evidence. For example, Shallice, McLeod and Lewis (1985) found that, compared to single task performance, cross-modal dual tasks (e.g., reading aloud visually presented words while detecting auditorily presented proper names) produced less interference than within-modality dual tasks (e.g., auditory name detection + shadowing), suggesting that the input and output tasks could access separate pathways.
More evidence for separate pathways comes from the aphasia literature. Patterns of clinical symptoms suggest that speech input and output systems can be selectively impaired. For example, patients may do well in perception tasks such as auditory lexical decision, but poorly in single or multiple word production (e.g., Shallice, Rumiati, & Zadini, 2000). Some studies have found no reliable correlation between phonological error rates in production tasks (e.g., written-word naming) and in perception tasks (e.g., synonymy judgments) (e.g., Nickels & Howard, 1995), suggesting that one system can stay intact while the other breaks down. Dell, Schwartz, Martin, Saffran and Gagnon (1997) were not successful in simulating aphasic patients’ performance on picture-naming and word-repetition tasks using a shared input-output phonology in their interactive activation model. When they, instead, assumed separate input and output representations (e.g., the /k/-onset unit for input was different from the one for output), they could then simulate most of the error patterns in naming and repetition tasks (Foygel & Dell, 2000; Dell, Martin, & Schwartz, 2007). This provided further support for a separation between input and output phonological representations.
On the other hand, a number of studies with normal subjects have reported that the perception and production system influence each other. Monsell (1987) found that participants’ auditory lexical decision was primed by a previous encounter with the target word involving hearing, reading or silently mouthing it in a given sentence. Within-modality priming (i.e. hearing the word) produced the greatest facilitation. In comparison, cross-modality priming (e.g., saying or silently mouthing the word) produced a slightly smaller but still reliable facilitation. Monsell interpreted the results as evidence for separate input and output phonology (because priming was reduced across modalities), with interactions at the sub-lexical level (because there was significant priming across modalities).
Perception experience has also been shown to influence production. Cooper (1979) reported that repeated auditory presentation of one syllable (e.g., /phi/) reduced the voice onset times (VOTs) of participants’ production of the same syllable or syllables sharing one or more features (e.g., the VOTs of /phi/ and /thi/ were reduced), implying that production might selectively adapt to recent perceptual experience. However, this effect was restricted to unvoiced plosive consonants, which Cooper accounted for by positing different susceptibility of consonant classes to selective adaptation.
The literature on spontaneous imitation also indicates that perception may subtly influence production. Goldinger (1998) found that participants spontaneously imitated the words or non-words they heard in immediate single-word shadowing, with more discernable imitation for low frequency items. Goldinger and Azuma (2004) found the same imitation effect in participants’ production style after auditory training over 2 weeks, showing that the implicit imitation was a long-lasting effect. Recent evidence shows that the same kind of spontaneous imitation occurs in conversational interaction (Pardo, 2006). Possibly via the same mechanism, production of a second language can improve based on perceptual training (e.g., Bradlow, Pisoni, Yamada & Tohkura, 1997; Sancier & Fowler, 1997).
Some findings from studies of aphasia also imply strong connections between input and output phonology. Martin and Saffran (2002) reported significant negative correlations of input phonological measures (composite phonological scores from performance on phoneme discrimination and rhyme judgments) with output phonological error rate (phonologically-related nonword errors in a naming task), but not with output lexical-semantic measures (semantically-related errors in a naming task). From this, they concluded that even if there were separate input and output phonological processing systems, they must be functionally related.
Overall, previous studies have provided inconsistent evidence regarding the degree of independence between input and output phonological processing. The present study probed this question by looking at the transfer of phonotactic learning from the perception domain to the production domain. To what extent does a newly learned constraint acquired through perception express itself in production?
Three experiments were carried out to examine whether phonotactic learning via perception transfers to production as measured by participants’ speech errors. As in the experiments of Dell et al. (2000), each trial consisted of a sequence of four CVC syllables. However, in the present experiments, a trial could either be produced (production trial) or perceived (listening trial). Participants received equal numbers of perception and production trials. The crucial manipulation was that half of the participants received the same experiment-wide constraint in the production and listening trials such that the production and perceptual experience reinforced one another (Same-Constraint condition, e.g., perception: /f/ is always an onset; production: /f/ is always an onset), while the other half received different experiment-wide constraints such that the production and perception experience contradicted one another (Opposite-Constraint condition, e.g., perception: /f/ is always an onset; production: /f/ is always a coda). If a constraint learned via perception is directly available for use in production, errors made by participants who received opposite constraints should not reflect the experimental constraint embedded in their production trials. Rather, their speech error patterns should be constraint neutral, meaning that restricted and unrestricted consonants should have approximately equal legality percentages.
To expose participants to both production and listening trials, we tested participants in pairs. Two participants took turns repeating sequences of four CVC syllables (e.g., “fes keng heg men”) and listening to their partner’s productions. As in Dell et al. (2000), the materials for production included two consonants subject to language-wide constraints (/h/ always an onset and /ŋ/ always a coda), two consonants subject to experiment-wide constraints (/f/ always an onset and /s/ always a coda, or vice versa), and four unrestricted consonants (/k/, /g/, /m/, /n/ ) which appeared freely in onset and coda position. Speech error patterns were examined to see if the participants learned the experiment-wide phonotactic constraint. During the listening trials, a simple task was used to encourage participants to pay attention to what they heard. In Experiments 1 and 2, on each trial participants answered the question: “How many times did you hear ‘heng’?” by circling a number indicating the answer. In Experiment 3, participants monitored their partners’ repetitions of the sequences and circled incorrectly produced syllables.
In the Same-Constraint group, we expected speech errors to reflect learning of the experiment-wide constraint, replicating the findings of Dell et al. (2000) in this revised task. Specifically, errors involving the constrained consonants (/f/ and /s/) should include a higher percentage of legal errors than would errors involving the unrestricted consonants (/k/, /g/, /m/, /n/), thus demonstrating learning of the newly-experienced constraint. However, for the Opposite-Constraint group, there are three possibilities: (1) If phonotactic learning in perception and production are tightly integrated, then the opposite constraints should cancel one another out. For example, /f/ would be neither biased toward onset or coda, if it was experienced as an onset in the perception trials and a coda in the production trials. As a result, speech errors involving restricted consonants would show no greater tendency to be legal than would errors involving unrestricted consonants. (2) If learning in production is separate from perceptual experience, then the opposing constraint in the perception trials should not reduce learning of the constraint experienced in the production trials. Thus, speech errors involving restricted consonants should reflect the constraint in the production materials to the same extent that errors in the Same-Constraint group do. (3) If there is partial transfer of phonotactic learning from perception to production, then speech error patterns should show some adherence to the production constraint, but less than that found in the Same-Constraint group.
Sixteen students who were either undergraduate or graduate students at the University of Illinois at Urbana-Champaign participated in Experiment 1. For scheduling convenience, participants who directly contacted the experimenter in response to a posted ad were asked to bring a friend to the experiment. All participants received a small payment in exchange for their participation. None of the participants reported any hearing problems and all were native English speakers.
The participant pairs were randomly and equally assigned to either the Same-Constraint or Opposite-Constraint condition. In the Same-Constraint condition, the experiment-wide constraint in the materials for a pair of participants was the same. For half of the participant pairs in this condition, /f/ was an onset and /s/ was a coda in both production and listening trials, and for the other half, /s/ was an onset and /f/ was a coda in both kinds of trials. In the Opposite-Constraint condition, the experiment-wide constraints in the materials for a pair of participants were the opposite, creating a potential conflict in learning from perception and production experience. Again, the constraints were counterbalanced so that half of the participant pairs experienced /f/ onsets and /s/ codas in production and the opposite pattern in listening while half experienced /s/ onsets and /f/ codas in production and the opposite in listening.
Each participant received a unique set of 96 sequences for their production trials, each consisting of four CVC syllables printed on 9 sheets of paper. The recitation of these sequences was interleaved with 96 listening trials, during which the participant listened to his/her partner producing his/her sequences, and answered a question about what they heard. The syllables for the listening trials were not printed on the participants’ sheets of paper. However, each participant did see a printed question on each line on the paper corresponding to the partner’s sequence. This question was always “How many times did you hear ‘heng’?” It was followed by a 0, 1, or 2, as potential answers. (“heng” was chosen because this syllable does not contain experimentally-restricted or unrestricted consonants). The example below presents four trials (two production trials and two listening trials), as they would appear to a participant who started by repeating a sequence.
|How many times did you hear “heng”?||0||1||2|
|How many times did you hear “heng”?||0||1||2|
Each sequence to be produced contained eight consonants (/h/, /ŋ/, /f/, /s/, /k/, /g/, /m/ and /n/), and each consonant appeared exactly once in each sequence. Among the eight consonants, /h/ and /ŋ/ were the language-wide restricted consonants, /f/ and /s/ were experiment-wide restricted consonants, and the other four consonants (/k/, /g/, /m/ and /n/) were unrestricted. These consonants were combined with the vowel /ε/ to form CVC syllables (e.g., /kεf/).
A computer program randomly produced 16 unique sets of such materials. The sequences for the production trials were printed in 16-point Arial font with one sequence per line and 11 sequences per page. All the syllables were printed using ordinary English spelling. The vowel /ε/ was spelled as “e”, and /ŋ/ was spelled as “ng” (e.g., /hεŋ/ was spelled as “heng”). All the other consonants were spelled the same way as their phonetic symbols (“h” for /h/, “f” for /f/, etc.). The questions for the listening trials were also printed in 16-point Arial font with one question per line and 11 questions per page, interleaved with the 11 sequences. In total, there were 22 trials on each page and 192 trials in the experiment.
Two participants cooperated in the experiment. They took turns playing the role of speaker and listener. In each trial, one of the participants, the speaker, repeated a sequence printed on his or her paper (e.g., “hem geng nek fes”) twice in time to a metronome. While the speaker was producing the sequence, the other participant, the listener, was asked to listen and to indicate how many times the speaker produced the syllable “heng”. The listener responded by circling the appropriate number after the question for this trial. “0” would mean that there was not a “heng” in the sequence, “1” would mean that there was a “heng” in the sequence but the speaker only produced it correctly once, and “2” would mean that there was a “heng” in the sequence and the speaker produced it correctly both times. After this, the two participants continued to the next trial and switched roles. Participants’ responses in the experiment were recorded on CDs for later analysis of speech error patterns.
The sequences and the questions were visually presented to participants one at a time. Participants were instructed to use a piece of paper with a cut-out window to guide their progress and focus only on the present trial. They were also asked not to peek at their partner’s materials. The metronome was set to 2.53 beats/second in order to induce speech errors.
Before the experiment began, participants were presented with five sample sequences and asked to recite the sequences one time slowly in time to the metronome, set to 1 beat/second, in order to familiarize themselves with the pronunciation of the syllables and with the procedure. These sample sequences were not used in the real experiment and were not recorded.
Each participant’s productions were transcribed by native English speakers for speech errors. As in Dell et al. (2000), the speech errors were categorized as either legal or illegal according to whether or not the error maintained its syllable position in the sequence. In this experiment, the legality of errors was determined by the participant’s own repetition sequences. The legality of errors on the unrestricted consonants (/k/, /g/, /m/ and /n/) was based on their position within that specific sequence. To illustrate, in a sequence that a participant produced, if a consonant moved to another syllable but maintained its position within the syllable (i.e. onset or coda), it was coded as a legal error (e.g., “kes fem” → “fes kem” contained one legal /f/ error and one legal /k/ error). On the other hand, if a consonant moved to another syllable and changed its position within the syllable (i.e. from onset position to coda position or vice versa), it was coded as an illegal error (e.g., “kes fem” → “mes fem” contained one illegal /m/ error). Cutoff errors, such as “f…heng”, were included in the analysis: an /f/ error in the onset position would be coded for this instance. Errors involving consonants or vowels that did not appear in the materials were rare and were excluded from analysis, as were errors that were unintelligible.
Three transcribers independently coded participant’s responses. We used the combined coding results for all the analyses: an error was counted only when at least two transcribers agreed on the existence and nature of the error. For example, if the target sequence was “keg neng fes hem”, only when two or more transcribers transcribed the first syllable as “keng” did we accept it as a true error. In the combined coding for Experiment 1, there were 894 errors out of the 24576 (16 participants*96 sequences*8 consonants*2 repetitions) possibilities for consonant errors, resulting in an overall error rate of 3.64%.
The primary transcriber was more experienced than the second and third transcribers; therefore reliabilities were calculated conditioned on the primary transcriber’s coding. Overall, agreement between transcribers was good. Between the primary and the second transcriber, the overall agreement rate, which was agreement on correct repetitions plus agreement on the presence and nature of the errors, was 99.42%. Of the 24576 possibilities for consonant errors, the primary and the second transcriber agreed on 23717 non-errors and 717 errors. The agreement rate on transcribed errors conditioned on the primary transcriber was 83.47%. Between the primary and the third transcriber, the overall agreement rate was 99.43%. They agreed on 23698 non-errors, and 737 errors. The agreement rate on transcribed errors between the primary and the third transcriber, conditioned on the primary transcriber, was 83.94%.
The key results of Experiment 1 are shown in Figure 1. The Same-Constraint group and the Opposite-Constraint group had very similar percentages of legal errors, among all transcribed errors, for all three categories of consonants.
In the Same-Constraint condition, as anticipated, errors involving the language-wide constrained consonants /h/ and /ŋ/ were legal errors 100% of the time (SE = 0, based on 108 total errors). Errors involving the experiment-wide restricted consonants /f/ and /s/ were legal errors 96.54% of the time (SE = 2.55, based on 77 total errors). Errors involving restricted consonants were significantly more likely to be legal errors than were errors involving the unrestricted consonants /k/, /g/, /m/ and /n/ (M = 65.38%, SE = 5.45, based on 206 total errors; Wilcoxon Z = 2.366, p = .009). All reported p-values are relative to the null hypothesis that the legality of restricted-consonant errors is not greater than that of unrestricted-consonant errors. Seven out of 8 participants in the Same-Constraint group had a higher percentage of legal errors involving restricted than unrestricted consonants, and one participant had equal percentages of legal errors in the two categories.
Similar results emerged in the Opposite-Constraint condition. Again, all errors involving the language-wide constrained consonants /h/ and /ŋ/ were legal (SE = 0, based on 108 total errors). Errors involving experiment-wide restricted consonants were legal 96.88% of the time (SE = 2.05, based on 48 total errors); errors involving the unrestricted consonants were significantly less likely to be legal (M = 70.45%, SE = 4.63, based on 153 total errors; Wilcoxon Z = 2.380, p = .009; 7 of 8 participants showed a difference in the predicted direction).
These results show that participants in both conditions learned the constraints embedded in their own repetition sequences. Since we were most interested in whether or not participants in the two conditions performed differently on the experiment-wide restricted consonants, we carried out a nonparametric Mann-Whitney test comparing the Same- and Opposite-Constraint conditions on the difference of percentages of legal errors involving experiment-wide restricted and percentages of legal errors involving unrestricted consonants. The effect of condition on these differences (equivalent to the interaction of Condition and Restrictedness) was not significant (U = −1.050, ns). Thus, the inferential statistics support the conclusion that restricted consonant errors exhibited greater legality than unrestricted-consonant errors and that this difference was similar in the Same- and Opposite-Constraint conditions.
The results of Experiment 1 replicated the findings of Dell et al. (2000). First, participants’ speech errors always respected the language-wide constraint on the positions of /h/ and /ŋ/ within a syllable. Second, and most importantly, in both conditions, speech errors involving the experiment-wide restricted consonants adhered to their syllabic positions to a greater extent than errors involving unrestricted consonants, suggesting that participants implicitly learned the distributional patterns of the experiment-wide restricted consonants, and that knowledge affected their repetitions of the sequences.
We found no evidence for the transfer of phonotactic learning from the perception domain to the production domain under the conditions examined. There was no decrement in the percentage of legal errors involving the experiment-wide restricted consonants due to experience with an opposing constraint in listening trials. In the Opposite-Constraint condition, equal numbers of exemplars supported opposing constraints on the position of /f/ and /s/. If both constraints entered the same phonotactic system regardless of which modality they came from, the opposing constraints would cancel each other out and participants should exhibit no learning of the experiment-wide constraint in their productions. However, this was not what we found. We found exactly the same sensitivity to the experiment-wide constraint in the two conditions. Participants in the Opposite-Constraint condition learned their own constraint in their production trials; this suggests that either they did not encode the experiment-wide constraint from their listening experience, or they did not integrate this constraint into the same knowledge system that encoded the production constraints, and therefore the constraint in the listening trials did not influence production.
One difficulty in interpreting the results of Experiment 1 was that we did not have an effective tool to assess learning from perception experience. During listening trials, we did ask participants to indicate how many times they heard “heng” in their partners’ repetitions. The average percentage of correct answers in this perception task was 82.81% (ranging from 65%–91%), which suggests at least some attention to their partners’ productions. However, monitoring for “heng” might not require attentive processing of other syllables in the listening trials. In other words, the other syllables in the listening trials might be treated as background while “heng” was highlighted. To find out whether this was true, in a second experiment, we tried to replicate the first experiment and also added a recognition memory task at the end to see whether or not participants paid enough attention to the syllables in the listening trials to be able to recognize them after the experiment. Crucially, this memory test will assess whether the syllables exhibiting the experiment-wide constraints in both production and listening trials were attended to and encoded.
A new set of sixteen students at the University of Illinois at Urbana-Champaign participated in Experiment 2. As in Experiment 1, participants who directly contacted the experimenter in response to a posted ad were asked to bring a friend to the study. All participants received a small payment in exchange for their participation. None of the participants reported any hearing problems and all were native English speakers.
The materials were the same as in Experiment 1, except that an auditory old-new recognition test was given after participants had finished the repetition and listening trials. In the memory test, all possible combinations of the eight consonants (/h/, /ŋ/, /f/, /s/, /m/, /n/, /k/ and /g/) with the vowel /ε/ were used, including 6 non-occurring syllables which had the same consonant in onset and coda position (e.g., /kεk/, /fεf/). Only syllables that violated the language-wide phonotactic constraints were left out (e.g., /hεh/, /ŋεk/). The total number of test syllables was 49, and each pair of participants heard all 49 syllables in a different randomized order. The test syllables can be categorized into three sets as shown in Table 1. One set included the Non-occuring items (N = 6) that participants in both conditions never experienced in the experiment; thus they should say “no” to these items. The second set included items (N = 21) containing the language-wide restricted and/or unrestricted consonants that both groups experienced in the experiment; thus they should say “yes” to these items (e.g., /hek/, /meng/). We call these Neutral items. The third set included items (N = 22) that contained the experiment-wide constrained consonants (/f/ and /s/). Half of them were legal and the other half were illegal, according to the constraint present in each participant’s production trials. This was the crucial set of memory-test items, because participants in the Same-Constraint condition only experienced the legal half of these items, whereas those in the Opposite-Constraint condition experienced both the legal half of the items from their own production trials and the illegal half of the items from listening to their partner. If participants in the Opposite-Constraint condition did not pay attention to all of the syllables during the listening trials, they would not recognize syllables that violated their own production constraint as having occurred during the experiment. Since this was already a hard task, we did not ask participants to specify whether the syllables came from listening or production trials. Instead, we used the results as an index of how well participants processed the syllables in both trial types.
The procedure for the first part of the experiment was identical to Experiment 1. Each pair of participants took turns repeating their own sequences or monitoring their partner’s repetitions for “heng”. At the end of the task, each participant was given a sheet of paper with the numbers from 1 to 49 on it. They were asked to listen carefully as the experimenter read a list of syllables one by one, and to indicate whether they had encountered (either heard or said) each syllable during the experiment by writing “yes” or “no”. Participants were instructed to work on their own sheets without consulting their partners’ answers. They were encouraged to ask the experimenter to repeat the syllable if they did not hear it clearly.
Three native English speakers independently transcribed each participant’s repetitions. As in Experiment 1, an error was counted only when at least two transcribers agreed on the existence and nature of the error. In the combined coding, there were 828 errors out of the 24576 possibilities for consonant errors (16 participants*96 sequences*8 consonants*2 repetitions), resulting in an overall error rate of 3.37%.
Again, reliabilities were calculated conditioned on the primary transcriber’s coding. Between the primary and the second transcriber, the overall agreement rate was 99.55%. Of the 24576 possibilities for consonant errors, they agreed on 23816 non-errors and 649 errors. The agreement rate on transcribed errors conditioned on the primary transcriber was 85.39%. Between the primary and the third transcriber, the overall agreement rate was 99.35%, with agreement on 23718 non-errors and 698 errors. The agreement rate on transcribed errors conditioned on the primary transcriber was 81.35%. Overall, as in Experiment 1, the reliabilities were very good.
As in Experiment 1, the speech error patterns in the Same-Constraint and Opposite-Constraint conditions were highly similar for all consonant categories (see Figure 2). In the Same-Constraint condition, all errors involving the language-wide constrained consonants (/h/ and /ŋ/) were legal errors (SE=0, based on 112 total errors). Errors involving the experiment-wide restricted consonants (/f/ and /s/) were legal 91.44% of the time (SE = 4.13, based on 73 total errors); errors involving unrestricted consonants (/k/, /g/, /m/ and /n/) were significantly less likely to be legal errors (M = 62.64%, SE = 5.02, based on 222 total errors; Wilcoxon Z = 2.521, p = .006, 8 of 8 participants in the predicted direction).
Similar patterns were found in the Opposite-Constraint condition. All errors involving the language-wide constrained consonants were legal errors (SE = 0, based on 101 total errors). Errors involving experiment-wide restricted consonants were legal 91.22% of the time (SE = 4.44, based on 60 total errors); errors involving unrestricted consonants were significantly less likely to be legal errors (M = 66.93%, SE = 3.74, based on 260 total errors; Wilcoxon Z = 2.366, p = .009, 7 of 8 participants in the predicted direction with 1 tie).
A Mann-Whitney test of the difference between the legalities of restricted and unrestricted errors as a function of Condition (Same-Constraint versus Opposite-Constraint) and Restrictedness (experiment-wide restricted versus unrestricted) did not yield significance (U = −0.473, ns). Thus, as in Experiment 1, restricted consonant errors exhibited greater legality than unrestricted-consonant errors, and this difference was similar in the Same- and Opposite-constraint conditions.
The results of the memory test suggested that participants in both conditions recognized most items from the experiment, including items that they only heard in their partners’ sequences.
First, participants in the Opposite- and Same-Constraint conditions performed similarly in rejecting the non-occurring items (e.g., “kek”) and accepting the neutral items (e.g., “gek”). Correct responses to these two types of items were on average 74.07% for the Same-Constraint condition and 77.78% for the Opposite-Constraint condition.
The critical comparison was between the percentages of “yes” answers for syllables containing experiment-wide restricted consonants (/f/ and /s/). As shown in Table 2, participants in the Same-Constraint condition were much more likely to accept syllables that were legal (on average 72.73%) rather than illegal (4.55%), according to the constraint in their own repetition sequences; while those in the Opposite-Constraint condition accepted more similar proportions of legal and illegal syllables (on average 67.05% and 57.95%, respectively).
The difference in the responses to legal and illegal items for the Same- and Opposite-Constraint conditions was confirmed by nonparametric tests on the acceptance rates. There was a significant interaction of legality and condition (Mann-Whitney test U = 3.055, p = .001). The Same-Constraint group accepted more legal than illegal items (Wilcoxon Z = 2.527, p = .006), while the null hypothesis that legal and illegal items are equally accepted could not be rejected for the Opposite-Constraint group (Wilcoxon Z = .850, ns). The results from the memory test clearly showed that participants in the Opposite-Constraint condition encoded information about the syllables in the listening trials. Nonetheless, this encoding did not seem to affect their speech error patterns.
In Experiment 2, we closely replicated the speech error patterns found in Experiment 1. Participants’ speech errors involving experiment-wide restricted consonants adhered to the constraint contained in their production materials, and the extent of this adherence was unaffected by whether the participants experienced the same or the opposite constraint in the listening trials.
We used two tools to measure how well the participants processed the materials in listening trials. The first index was how correctly the participants answered the question “how many times did you hear ‘heng’?” In Experiment 2, the average percentage of correct answers was 85.29% (ranging from 65%–98%). As in Experiment 1, participants did a reasonable job on this perception task. The second was a new feature of Experiment 2. Participants received a recognition memory test at the end of the experiment in which they were asked to discriminate old from new syllables. Whereas participants in the Same-Constraint condition overwhelmingly endorsed items that were legal--and rejected items that were illegal--with respect to the constraint in their materials, participants in the Opposite-Constraint condition were nearly equally likely to say “yes” to legal and illegal items, demonstrating that they recognized syllables from the listening trials. Given that these syllables were effectively processed and stored in memory, we can then assume that they would trigger the implicit learning of their phonological patterns as in previous experiments showing phonotactic learning from listening (Onishi et al., 2002). If learning in perception and production were fully integrated, this learning should influence the production error patterns in the present experiment. This is not what we found. Instead, in the Opposite-Constraint condition, there was no evidence of a decrement in the percentage of legal errors involving the experiment-wide restricted consonants due to experience with the opposing constraint in listening trials.
Why was there no transfer from perception to production? Perhaps the lack of transfer in Experiments 1 and 2 was due to the way the experimental constraints were manipulated in the Opposite-Constraint condition. The constraints were designed so that they were in opposition to one another; that is, what participants said (e.g., /f/ is an onset and never a coda) directly contradicted what they heard (/f/ is a coda and never an onset). When faced with this kind of conflict, perhaps one modality dominates. In speaking, production experience may take precedence over perception experience. If so, then a perceived constraint that directly opposes one learned through production may never penetrate the production system. Could hearing evidence for a consonant-position constraint affect production if we do not pit perception experience directly against production experience in this way?
To address this question, we created a version of Experiment 2 in which the constraints established in the listening and production trials were orthogonal rather than contradictory. For example, a participant might listen to syllables exhibiting an imbalanced distribution of /k/’s (e.g., /k/ is always an onset), but say syllables in which the position of /k/ is unrestricted, while other consonants are restricted (e.g., /f/ is always an onset). In this manner we could test whether phonotactic constraints in perception experience can affect production, if they are unopposed by production experience.
Eight new pairs of participants were tested in a procedure like that of Experiment 2. The experimental constraints embedded in the sequences, however, were manipulated to be orthogonal rather than in opposition to one another, eliminating the need for Same and Opposite-Constraint conditions. For one member of each participant pair, the experiment-restricted consonants were /f/ and /s/ and the unrestricted consonants were /k,g,m,n/. For their partners, the experiment-restricted consonants were /k/ and /g/ and the unrestricted consonants were /f,s,m,n/. Participant pairs were randomly assigned to the four conditions required for counterbalancing: (1) /f/-onset, /s/-coda; partner: /k/-onset, /g/-coda, (2) /f/-onset, /s/-coda; partner: /g/-onset, /k/-coda, (3) /s/-onset, /f/-coda; partner: /k/-onset, /g/-coda, (4) /s/-onset, /f/-coda; partner: /g/-onset, /k/-coda.
Errors were first scored as legal or illegal according to the constraint present in the sequences that the participant produced. As in the previous experiments, errors involving the restricted consonants (either /f/-/s/ or /k/-/g/) were legal more often (M = 84.6%, SE = 4.76, based on 90 errors) than were errors involving unrestricted consonants, which were either /k, g, m, n/ or /f, s, m, n/ (M = 65.1, SE = 5.25, based on 265 errors; Wilcoxon Z = 2.00, p = .023).
Errors involving the partner’s restricted consonants were then scored as partner-legal or partner-illegal, according to the constraint in the partner’s sequences. This notion of legality differs from the notion defined elsewhere. By design, the sequences produced on speaking trials contained only legal instances of restricted consonants, so a legal error is one in which the consonant moved from onset to onset or coda to coda position within the sequence. In contrast, the consonants that were restricted in the partner’s sequences appeared in both onset and coda positions in the speaker’s own sequences. If the partner’s constraint had an effect, the percentage of partner-legal errors should be significantly greater than the null hypothesis expectation of 50%. However, there was no evidence for transfer. Of the 147 errors involving misplacement of a consonant that was restricted for the partner, only 75 or 51.0% were partner-legal. Thus, the orthogonal constraint that participants heard their partner say had no discernable influence on their own speech errors. As in Experiment 2, we also tested participants’ recognition memory for all experienced syllables.1 Participants were more likely to recognize items that followed their partner’s constraint (M = 66.4%, SE = 4.63) than items that did not (M = 57.1%, SE = 3.54; Wilcoxon Z = 1.84, p = .033) even though both kinds of syllables were present in their own productions. This implies that participants were paying attention to and encoding the syllables that their partner was saying; their responses in the final memory test reflected the greater frequency of the partner-legal sequences within the session. Taken together, these results indicate that the orthogonal constraint produced by their partners did not transfer to the participants’ own productions despite good recognition memory for items that their partners said. Moreover, these results suggest that the lack of transfer in Experiment 1 and 2 was not due to the opposing constraint manipulation. The seeming insulation of constraints learned from perception and production in this task suggests that there are two separate and independent phonotactic systems responsible for perception and production.
The results of these experiments revealed the modality-dependent nature of the implicit learning of new phonotactic patterns. The findings are consistent with a recent study of artificial grammar learning by Conway and Christiansen (2006). In that study, participants were exposed to audio and visual stimuli simultaneously. These stimuli followed two separate complex grammars, one expressed through visual color arrays (e.g.. a grammatical sequence would be XMXM, where X is a red square and M is a green square) and one expressed in tone sequences (e.g., a grammatical sequence might be XXM, where X is a 333 Hz tone and M is a 389 Hz tone). Later they were presented with new materials in only one modality (color or tone) and judged whether the new items followed the grammar of that modality or not. Participants’ performance was just as good in this dual-grammar condition as in a single-grammar condition in which participants learned only a color or a tone grammar and were tested in the same modality. Conway and Christiansen interpreted the results as evidence that implicit grammar learning is modality-dependent. People can learn different grammars simultaneously from two modalities without interference across modalities. Experiments 1 and 2 in the current study showed that in the linguistic domain, implicit learning of new phonotactic constraints was also restricted to the modality in which the grammar was exemplified.
How can we reconcile our results with the fact that one learns to speak the same language that one understands if there is no direct linking between the perceptual learning and production learning systems? It seems obvious that the input and output language processing systems must communicate in some manner in order to maintain the consistency between one’s comprehension and production of the language. Our experiments do not aim to systematically investigate the factors required to promote transfer of implicit learning from perception to production. However, given our finding that implicit learning of phonotactics is modality-specific, it is important to create a condition under which we expect at least some transfer to occur, and then to see whether a modality-specific component remains. Hence, we adopted a “sledgehammer” approach to transfer for Experiment 3. First, we tried to maximize attention to every syllable of the heard materials, by instituting an error monitoring task. Participants were asked to identify all errors in what they heard their partners say. Second, we provided a common format for the heard and spoken materials. Both were printed, and the error-monitoring task required the participants to circle any errors that they heard in their partners’ productions as they examined the printed syllables. Finally, the syllables to be spoken and to be heard were printed in the same format on the same sheets of paper, to maximize the likelihood that the spoken and heard syllables were treated as parts of the same “language.” Thus, participants had attention-engaging tasks for both perception and production trials and a mediating orthographic representation that suggested a common source for both kinds of trials. Under such conditions, transfer is quite possible. The important question is whether there remains a production-specific component to the learning, as would be expected from a task-sensitive implicit learning perspective.
Twenty undergraduate students at the University of Illinois at Urbana-Champaign participated in Experiment 3. The participants received partial credit in an introductory psychology course. Two participants who happened to sign up for the same session became partners. None of the participants reported any hearing problems and none had participated in Experiment 1 or 2. All of the participants were native English speakers.
The materials were revised so that the sequences for the listening trials were also printed on the paper given to each participant. The new materials consisted of four-CVC-syllable sequences with the word “Say” or “Listen” preceding each sequence to remind the participant what their role was. In addition, the listening trials were highlighted in grey to make the two types of trials more distinguishable. For example,
A computer program generated 20 unique sets of such materials. Each set contained 96 four-CVC-syllable sequences for repetition, interleaved with 96 four-CVC-syllable sequences for listening. The sequences were constructed under the same set of constraints as in Experiments 1 and 2.
As in Experiment 2, each participant received a recognition memory test at the end of the experiment. The task and the items used were exactly the same as in the memory test in Experiment 2. However, instead of auditory presentation, each participant received a unique randomized list of all 49 syllables printed on a sheet of paper and was asked to circle the words he or she had encountered during the experiment. The visual format was used because this best matched the presentation of words in both the perception and production trials.
As in Experiments 1 and 2, participants received equal numbers of production and listening trials, interleaved with one another. The listening task was changed so that participants were given the actual sequences that their partner produced and were asked to indicate any mistakes their partner made by circling the syllables that were incorrectly produced. The production task was identical to that of Experiments 1 and 2.
After all the listening and production trials, the participants were each given a sheet of paper with 49 syllables printed on it in a random order, and were asked to circle the words that they had encountered during the experiment. Participants were instructed to work on their own sheets without consulting their partners’ answers.
Three native English speakers independently transcribed each participant’s repetitions. As in Experiments 1 and 2, errors were coded only when at least two transcribers agreed on the existence and nature of the errors. In the combined coding, there were 1525 errors out of the 30720 possibilities (20 participants*96 sequences*8 consonants*2 repetitions) for consonant errors, resulting in an overall error rate of 4.96%.
Reliabilities were calculated as before and were very good. Between the primary and the second transcriber, the overall agreement rate (agreement on correct repetitions plus agreement on the presence and nature of the errors) was 99.16% with agreement on 29330 non-errors, and 1132 errors. The agreement rate on transcribed errors between the primary and the second transcriber conditioned on the primary transcriber was 81.44%. Between the primary and the third transcriber, the overall agreement rate was 99.22% (29414 non-errors, and 1066 errors). The agreement rate on transcribed errors between the primary and the third transcriber conditioned on the primary transcriber was 81.62%.
The results of the speech error patterns in Experiment 3 are shown in Figure 3. Speech errors involving language-wide restricted and unrestricted consonants exhibited patterns similar to those found in Experiments 1 and 2, for both the Same-Constraint and Opposite-Constraint condition. However, now for the first time, there were differences between the two constraint conditions with regard to errors involving experimentally restricted consonants.
In the Same-Constraint condition, errors involving the language-wide restricted consonants (/h/ and /ŋ/) were legal 100% of the time (SE = 0, based on 194 total errors). Errors involving the experiment-wide restricted consonants were also legal 100% of the time (SE = 0, based on 79 total errors), a rate which was significantly larger than for those involving the unrestricted consonants (M = 69.23%, SE = 2.52, based on 403 total errors; Wilcoxon Z = 2.803, p = .003, 10 out of 10 participants in the right direction). The results suggested, as before, quite robust learning of the experiment-wide constraint.
In the Opposite-Constraint condition, errors involving the language-wide constrained consonants (/h/ and /ŋ/) were again legal 100% of the time (SE = 0, based on 204 total errors). Errors involving experiment-wide restricted consonants were legal 77.35% of the time (SE = 5.02, based on 133 total errors), and this legality percentage was significantly greater than for errors involving unrestricted consonants (M = 63.86%, SE = 4.28, based on 512 total errors; Wilcoxon Z = 1.883, p =.03; 7 of 10 participants in the right direction). These results showed that the Opposite-Constraint group also learned the experiment-wide phonotactic constraint in their production trials.
The learning effect present in the errors of the Opposite-Constraint group, however, was reduced by interference from the opposite constraint in the speaker’s listening experience, implying partial transfer of phonotactic learning from the perception domain to the production domain. This transfer effect (reduction in learning) was verified by a Mann Whitney test of the difference between the legalities of restricted and unrestricted errors as a function of Condition (U = 1.965, p = .049). The difference between the percentage of legal errors involving experiment-wide restricted consonants and unrestricted consonants was greater in the Same-Constraint condition than in the Opposite-Constraint condition. In general, the pattern of results is consistent with partial transfer of phonotactic learning from the perception to the production domain.
Overall, participants’ performance in the memory test was very similar to what we found in Experiment 2 (see Table 3). The Opposite-Constraint group was as accurate on the non-occurring (e.g., “fef”) and neutral items (e.g., “gek”) as the Same-Constraint group. Correct responses to these two types of items were on average 78.89% for the Same-Constraint condition and 82.22% for the Opposite-Constraint condition.
The critical comparison of participants’ “yes” answers to legal and illegal syllables containing the experimentally-restricted consonants showed a difference between the Same-Constraint and the Opposite-Constraint conditions. As in Experiment 2, participants in the Same-Constraint condition were much more likely to say “yes” to legal items (on average 72.73%) than to illegal items (7.27%), according to the constraint in their own repetition sequences; those in the Opposite-Constraint condition accepted a similar proportion of legal items but many more illegal items (on average 74.55% and 69.09%, respectively).
A Mann-Whitney test showed that the rate of acceptance of legal and illegal items varied significantly as a function of Condition (Mann-Whitney U = 3.701, p < .001). The Same-Constraint group was much more likely to accept legal items than illegal items (Wilcoxon Z = 2.836, p = .003) while the Opposite-Constraint group accepted similar proportions of legal and illegal items (Wilcoxon Z = 1.186, ns). Overall, as in Experiment 2, participants in both conditions recognized the syllables experienced in production trials as well as those experienced in listening trials.
By changing the task and the presentation of the stimuli for the listening trials, we found partial transfer of phonotactic learning from the perception domain to the production domain in Experiment 3. In spite of this transfer, however, there remained modality-specific learning. In the Opposite-Constraint condition, errors involving experiment-wide restricted consonants showed adherence to the constraint that was present only in the produced sequences, albeit to a smaller extent than errors in the Same-Constraint condition.
Although it was not our goal to isolate the specific conditions that can lead to transfer, it is useful to consider some of the possibilities as a guide to future research. Experiment 3, which yielded some transfer, differed from the other experiments with regard to the listening task and the presentation of the stimuli. Participants in Experiment 3 monitored the listening trials for errors, instead of counting the number of “heng”s. The error-monitoring task intuitively requires increased attention to the heard syllables and hence could have contributed to transfer. It is worth noting, though, that this hypothesized greater attention for error monitoring in Experiment 3 did not lead to greater “transfer” on the recognition memory test in comparison to Experiment 2. Thus, a pure attentional explanation does not accord with the similar recognition performance. Perhaps it is not so much the demands of the task, but the nature of the task that is important. Error monitoring may force listeners to internally generate expectations for what they will hear. These expectations would most likely come either directly from the printed text, or from their memory of the first repetition of the sequence (recall that during each trial, the sequence is produced twice in a row). Generating what you expect to hear is, computationally, much like production (Federmeier, 2007; Chang, Dell, & Bock 2006). Predicting that you will hear “fes” is akin to internally producing it. On this analysis, error monitoring actively engages the production system; as a result, patterns present in the heard syllables may affect production errors.
The second possible basis for the transfer in Experiment 3 is found in the orthographic presentation of the syllables that were both heard and produced. Perhaps relating both produced and heard syllables to the same mediating representation promotes transfer. In early phonological development, orthographic representations are not available, of course, but one can nonetheless imagine that other mediating mechanisms could develop. Modality-specific phonological representations could be linked to one another via a developing abstract modality-independent representation comprising abstract phonological units (e.g., Plaut & Kello, 1999). We would suggest, however, that these abstractions do not supplant the modality-specific representations. The production and perception systems can learn on their own, leading to the modality-specificity of phonotactic learning that we report here. Moreover, they can be damaged on their own, leading to the neuropsychological evidence for separate input and output systems that we reviewed earlier.
The presence of orthographic representations in our experiments raises another question. Perhaps the implicit phonotactic learning that occurs in our experiments and similar ones occurs not within the speech production and speech perception systems, but instead in a system that learns how to perceive the letters of the syllables. That is, the slips that we observe to follow the experimental constraints are not slips of production, but slips of reading. This is highly unlikely. Previous studies have clearly shown that phonotactic learning in perception or production experience does not require orthographic presentation of the stimuli (perception: Chambers et al., 2003; Onishi et al., 2002; Saffran & Thiessen, 2003; Seidl & Buckley, 2005; production: Taylor, 2003). Moreover, decades of study of rapid syllable production from orthographically presented stimuli have demonstrated that the resulting errors are errors of speech output rather than orthographic input. For example, the errors are sensitive to phonetic similarity (Goldrick, 2004; Oppenheim & Dell, 2008), and overall error probability reflects speech-output rate rather than input conditions as long as the stimuli are not degraded (e.g., Dell, 1986).
The major question of the present research was whether the implicit learning of phonotactic-like patterns occurs separately in production and perception. From prior studies, we know that people of all ages can rapidly learn artificial phonotactic constraints by listening to syllables that exhibit those constraints (Chambers, 2004; Chambers et al., 2003; Onishi et al., 2002; Redford, 2008; Saffran & Thiessen, 2003; Seidl & Buckley, 2005), and also that adults can learn such constraints by producing constrained syllables (Dell et al., 2000; Goldrick, 2004; Goldrick & Larson, 2008; Taylor & Houghton, 2005; Warker & Dell, 2006; Warker et al., 2008).
Is this learning modality specific? Or is there a single integrated system that learns from both produced and perceived syllables, and in which new learning is subsequently expressed in both production and perceptual performance, regardless of the modality of the training experience? Although perception and production clearly use different resources at the periphery (e.g. audition and articulation, respectively), it has been proposed by some that the level at which phonological regularities are stored is shared by perception and production. An example of such an integrated system is the Node Structure Theory (MacKay, 1982). In this theory, the same phonological units or nodes are traversed in a bottom-up fashion during perception and a top-down fashion during production. Similarly, at least one computational model of the relation between word production and word reception in aphasia uses the same phonemic units for perception and production (Martin et al., 1994).
Our findings—given certain interpretations and caveats that we detail below—are incompatible with models that lack separate input and output representations of phonological patterns. In three experiments using speech errors as a measure of learning within the production system, we found a strikingly high degree of separation between the modalities. For the most part, what happened in the perception system stayed in the perception system. In Experiments 1 and 2 (and in the follow-up experiment testing transfer of orthogonal constraints), there was no transfer at all from perception to production. Speakers’ errors in the Opposite-Constraint condition strongly reflected the experiment-wide constraint on consonant positions that was present in the produced syllables and failed to exhibit any sensitivity to the reverse constraint present in the syllables produced by their partner. Nonetheless, the speakers attended to their partner’s productions (Experiments 1, 2, and 3) and their episodic memory for what they heard was quite good (Experiments 2 and 3). The small, but significant, amount of transfer observed in Experiment 3 did not obscure the fact that speech errors were still more affected by the constraint in the produced syllables than by the constraint in the heard syllables. The partial transfer in phonotactic learning that was observed in Experiment 3 suggests that transfer is possible, but the three experiments together tell us that simply hearing and encoding syllables produced by others does not necessarily affect production to the extent that error patterns are altered.
The results are in line with those of Kraljic, Brennan, and Samuel (2008) who also found a lack of transfer from perception to production. Kraljic et al. documented perceptual learning of pronunciation variations (e.g., hearing /s/ pronounced similarly to /S/). This perceptual learning, however, did not influence participants’ later productions of the affected phonemes, just as the heard phonotactic constraints in our experiments did not influence participants' speech errors for the most part. These results are also generally consistent with neuropsychological findings suggesting that input and output phonology are distinct. Some brain-damaged patients exhibit disturbed phonological output processing but intact phonological input processing (e.g,. Martin, 2003; Howard & Nickels, 2005). Our findings suggest that the learning revealed in speech errors occurs within an output phonological processing system or within the translation of such a system into articulatory representations (e.g., a phonetic-articulatory syllabary; Cholin, Schiller, & Levelt, 2004). The results are also consistent with experimental studies of lexical priming suggesting that priming within the perceptual and production modalities is stronger than cross-modality priming (e.g., Monsell, 1987). Our research adds to this literature by demonstrating this asymmetry in the learning of new phonotactic-like constraints, rather than in the priming of individual lexical items.
Given our conclusion that the observed phonotactic learning was internal to the production system, we should consider where, in that system, the learning resides. The key facts are that the learning expressed itself through phonological speech errors and that it concerned the placement of consonants within a syllable. In most theories of speech errors (e.g., Dell, 1986; Shattuck-Hufnagel & Klatt, 1979; Stemberger, 1985), phonological movement slips occur in the process of inserting segments into slots in syllabically organized frames. Moreover, this insertion process is assumed to be sensitive to phonotactic constraints. For example, an /ŋ/ would not be allowed in an onset slot. If this view is correct, then the learning of experiment-wide constraints may occur during this insertion process. Warker and Dell (2006) proposed a model of phonotactic learning in which both the errors and the learning occurred during the assignment of retrieved phonological material to syllable positions. However, this phonological material should not be entirely characterized as holistic segments or phonemes. Phonological features affect both phonotactic learning (Goldrick, 2004) and phonological speech errors (e.g., MacKay, 1970). So, a role for features is required. Furthermore, the claim that phonological errors result during the insertion of material into syllable frames is not universally accepted. An alternative possibility is that slips happen later on, when an already syllabified phonological representation is mapped onto a phonetic-articulatory representation (Levelt, Roelofs, & Meyer, 1999). If errors occur during this mapping, then the locus of the relevant learning could be after the construction of the syllabified frame. For example, perhaps the learning occurs during the transition between a static representation of a word or syllable form (e.g., a set of segments or features) and a sequence of phonetic-articulatory units. Models of phonological acquisition by Plaut and Kello (1999) and Gupta and Dell (1999) hypothesize a representational level that mediates between static phonology and sequences of output units.
The separation of input and output processing at the periphery of the language processing system stands in stark contrast to the seeming lack of separation at the center of the language processing system (e.g., Bock, 1982; Caramazza, 1997; Levelt et al., 1999). Conceptual and semantic representations are assumed to be shared between production and comprehension processes. Production starts, and comprehension ends, with these levels. Researchers may debate the extent to which they are amodal, but the alleged modality or lack thereof has to do with what is represented (e.g., whether a feature such as round is visual or not), rather than with language production versus comprehension. More controversially, there is evidence that syntactic representations are shared between production and comprehension. Bock, Dell, Chang, and Onishi (2007) found that a previously comprehended syntactic structure (e.g., a double-object dative) primes a structural choice in production (e.g. double-object versus prepositional dative) just as strongly as a previously produced structure does (see Ferreira & Bock, 2006, and Pickering & Ferreira, 2008, for reviews). Thus, this syntactic priming study examined transfer from input to output processing, just as we did. The result, though, was complete transfer. This transfer could be accounted for in a connectionist model in which comprehension and production are both top-down predictive processes within a network responsible for learning sequential regularities across words (Chang et al., 2006). Comprehension and production both involve the generation of words, one at a time, each prediction being constrained by meaning and by previously produced or heard words. Hence, the acquired syntactic-sequential patterns inhabit the same connection weights, and learning from comprehending a sentence transfers to producing a subsequent one.
The extent to which production and comprehension share representations can be seen by viewing the language processor as an inverted Y (see Figure 4), with conceptual processes at the top, and acoustic and articulatory representations at each corner of the bottom (e.g., Plaut & Kello, 1999). Starting at the top, representations are shared. But because articulation and speech perception are different processes, at least insofar as one involves audition and the other action, the series of representations that underlie language processing will eventually have to reflect whether the task is production or comprehension. That is, the representations will, at some point, split into input and output versions. This split need not be all or none; it could be graded, with more central representations sharing more resources or links than peripheral ones (e.g., Gupta & MacWhinney, 1997). The structural priming study of Bock et al. (2007) suggests that the production-comprehension split occurs in representations more peripheral than those responsible for the computation of syntactic structures. Our study demonstrates that the split is well underway at the point that phonotactic-like constraints affect phonological speech errors. Thus, if one accepts the inverted-Y model, the comparison between the present study, and that of Bock et al., bracket the split point.
Our conclusions about the separation of input and output during phonotactic learning relate to another finding in this domain--the relative slowness of learning second-order constraints. Warker and Dell (2006) and Warker et al. (2008) found that contextually conditioned constraints, e.g. /f/ is an onset if the vowel is /æ/, did not affect speech errors until the second testing session, which happened to be on the next day. They interpreted this relative difficulty within a connectionist model that needed hidden units to represent the conjunctions of phonological material (e.g. conjunctions of /f/-onset and vowel identity) and posited that these representations take time or extra training to develop. One can view modality (perception versus production) as a context, too. Thus, our participants in the opposite-constraint condition were exposed to second-order constraints such as /f/ is an onset if I am listening but /f/ is a coda if I am speaking. Our finding of little or no transfer across modalities could be viewed, instead, as perfect and extremely rapid learning of a second-order modality constraint. On this interpretation, the rapidity of the second-order learning is unique and stands in contrast to all other attempts to learn these constraints using the speech-error method that we employed here. If this is second-order learning, it is a remarkable kind of such learning, one that would require postulating that the system is in some way “prepared” to learn separate patterns for production and perception (e.g. in the sense of Garcia & Koelling's (1966) demonstration that particular animals are prepared to learn particular CS-US associations). The separateness of the relevant representations that we postulate could be described as an instance of such preparedness.
Note that separate input and output representations of sound structure could co-exist with a shared phonological system. For example, there may be common phonological units and also units that serve as conjunctive representations of input/output modality and phonological properties (akin to the hidden units in Warker & Dell, 2006, that mediate 2nd-order learning). According to this view, the hidden units that support specific learning within the production system--phonological material broken down by modality--are already in place prior to the learning that happens in our experiments. They constitute the modality-specific representations that are quickly tuned to recent experience and that, when damaged, create the modality-specific phonological deficits that neuropsychologists have identified. In this sense, the system is prepared to learn specific perceptual or production patterns, if the situation warrants it. Our experiments simply set up such a situation.
We conclude with three caveats. First, our experiments investigate the learning of artificial constraints, not natural phonotactic constraints. Although artificial constraints influence speech errors just as natural ones do, the relevance of our results for an understanding of acquisition of natural constraints is uncertain. We assume, but cannot prove, that the experiential component of natural acquisition is tapped in our experiments. Second, our finding of little or no transfer from perception to production must be considered in the context of our measure of learning, the adherence of speech errors to the constraints present in the materials. Perhaps other production measures or learning situations would show more transfer. For instance, an experimental situation involving dialogue between two participants may promote transfer from perception to production, because the participants may be more engaged in processing what their partner is saying (Branigan, Pickering, & Cleland, 2000). Finally, we acknowledge that, of course, perceptual experience has to transfer to production. It has been observed to do so in experiments, and ultimately it must in order to guide phonological acquisition. Recall, however, that the observed cases of such transfer have involved transfer of how particular speech sounds are said (e.g., Bradlow et al., 1999; Goldinger, 1998; Pardo, 2006), not the higher level constraints on sound sequences that we have studied here. Perhaps in the low-level case, the articulatory plan to say a syllable calls up a set of desired auditory consequences of the articulation. Moreover, this set includes auditory representations of recently heard instances of that syllable so that the expected consequences are biased to reflect recent experience. Since a skilled speaker-listener knows something about how to adjust an articulatory plan to achieve a desired auditory consequence, there is transfer. With regard to phonotactic patterns, however, transfer is more difficult, because these patterns are relevant to the assembly of the planned syllable itself, not its articulatory details. This assembly is more removed from the auditory-target guided articulation process.
These speculations aside, our key finding is that the phonotactic properties of auditory syllables, even when they are attended to and encoded into memory, do not easily penetrate the production system and affect the misplacement of consonants in slips. Even when we obtained some transfer, produced syllables had a much greater influence than heard syllables. Thus, robust perception-production transfer at this level in the system may require more complex interactions between the modalities (e.g., production-like processes that occur during perception, Chang et al., 2006, Galantucci, Fowler, & Turvey, 2006; and motoric processes that provide structured input that may constrain perceptual learning, Redford, 2008). We leave the discovery of the necessary and sufficient conditions for transfer to future research.
This research was supported by NIH grants HD-44458 and DC-00191. We thank Jennifer Cole and Hahn Koo for helpful discussions. We also thank Nancy Wai for help with transcription and data collection. Correspondence may be directed to Jill A. Warker, Beckman Institute, 405 N. Mathews Ave., University of Illinois at Urbana-Champaign, Urbana, IL 61801.
1One item was inadvertently left off of the memory test due to experimenter error.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.