|Home | About | Journals | Submit | Contact Us | Français|
We applaud Stoel-Gammon’s (this issue) call for a more comprehensive account of the relationship between lexicon and phonology, and we strongly endorse her suggestions for future research. However, we think that it will not be enough simply to integrate findings and methods from the adult-centered and child-centered literatures. Both of these literatures suggest that we need to rethink standard assumptions about what phonological representations are and how they emerge to support the very large vocabularies that speakers develop over the course of a lifetime. Our commentary focuses on three themes relevant to this reconceptualization.
The first theme is that the adult phonological system is highly complex because phonetic events are indexed to multiple types of information. The complexity challenges standard assumptions about the relationship between signal properties and the phonological descriptors that encode them in the long-term memory of lexical forms.
The second theme is that this complex system is learned through the dynamics of the production-perception loop. We suspect that a better understanding of these dynamics will challenge Stoel-Gammon’s interpretation of the child-centered literature as supporting the idea that “the developing phonological system affects lexical acquisition to a greater degree than lexical factors affect phonological development.”
The third theme is that, given the complexity of this system and the dynamics of its acquisition, we need to reset our thinking on how to study phonological development. We need to develop new methodologies to quantify robustness of knowledge at multiple levels of abstraction and in multiple sensory domains. Let us consider each of these themes in more detail.
As we have argued elsewhere (Beckman, Munson, & Edwards, 2007; Beckman, 2003; Munson, Edwards, Schellinger, Beckman, & Meyer, 2010), the end-state in phonological acquisition is a rich and detailed set of mappings over the different physical domains that are the primary sensory encodings of speech, as well as a set of mappings to category spaces at multiple levels of abstraction away from primary sensory encodings. Consider what adults know about a single sound, /s/ (paraphrased from Munson, Edwards, and Beckman, 2005). First, adults know the acoustic characteristics of /s/ and have a robust encoding of these characteristics in an auditory mapping that allows them to parse that there is an /s/ (and not some other similar sound such as /θ/, or /δ/, if the language is English) in any talker’s production of any utterance that contains /s/ in any legal segmental and prosodic environment. Second, adults know the articulatory characteristics of /s/ and have a robust enough encoding of these characteristics in their motor and somatosensory mappings to be able to successfully produce a recognizable /s/ in any utterance containing /s/ in any of these legal segmental and prosodic contexts. Third, adults know the contrastive function of /s/ within the phonological system. For example, English-speaking adults know that changing the word-initial /s/ in sack to an /∫/ will change the word’s meaning and they know that /s/ cannot appear in a word-initial cluster following a /p/, so that /psari/ is not a possible word of English. Finally, adults know that the acoustic characteristics of /s/ vary systematically, and that some of this variation is exploited by talkers to code social group membership, referred to as social-indexical variation. For example, in most English speaking cultures, adults know that some male talkers exploit allowable variability in the spectral skewness of /s/ to express their sexual orientation (Munson, McDonald, DeBoe, & White, 2006). Not only is the end-state of phonological acquisition highly complex, it is also language-specific. There are cross-linguistic differences at every level of representation, and it is clear that children do not learn a “universal” representation that is then refined. They learn to parse and produce words that must be encoded in a language-specific representation long before the encodings are robustly adult-like (Stoel-Gammon, Williams, & Buder, 1994, Arbisi-Kelm, Edwards, Munson, & Kong,, 2009; Edwards & Beckman, 2008; Li, Beckman, & Edwards 2009; Kong, Beckman, & Edwards, 2007). Our conceptualization of phonological acquisition differs from much of the work summarized by Stoel-Gammon, in which there is a strict modular distinction between “phonetics” and “phonology.” In this conceptualization, speech sounds are seen as simple deterministic chains of articulatory and acoustic events, equivalent across contexts and speakers, that are mapped onto phonological categories. In our view, by contrast, phonology emerges as a consequence of generalizations over the parametric phonetics and generalizations over the lexicon.
This brings us to our second theme, that phonological representations are learned through the dynamics of the production-perception loop. As soon as they can vocalize, babies begin to learn the mappings among four different sets of sensory representations: the set of articulatory maneuvers that they can produce, the acoustic consequences of these maneuvers, visual representations of others’ productions, and auditory representations of other’s productions. Learning these mappings provides the necessary scaffolding for learning the higher-order mapping between well-rehearsed forms in babbling and the word meanings that will become assigned to them by the adults with whom children interact. We are only now beginning to fully appreciate how much the learning of these mappings is facilitated by adults' modification of the signals that they produce to children to maximize the perceptibility of key signal characteristics. Cristià (2009), for example, showed that individual differences in the robustness of contrast of the /s/-/∫/ distinction in the mothers’ speech were correlated with the robustness of contrast in their infants' perception. These results give us a way to begin to model more exactly the role of maternal contingent responding in the literature that Stoel-Gammon reviews in her paper. Given these findings, it would be a mistake to conclude from the older literature on babbling and early words that learning of these “purely phonetic” mappings is essentially complete and that “the basic word structures, syllable shapes and sound classes are present” by 24 months, to support subsequent vocabulary growth. As Pierrehumbert (2001) cogently argues, phonological constraints are coarser-grained than phonetic ones because speech unfolds in time; fluent adult parsing of speech requires higher-order phonological structure to enable top-down processing. In a usage-based account of phonological development, these higher-order structures are modeled as emergent properties of word learning. That is, the task of learning words, accessing them during recognition, and planning them in production is facilitated if children are able to parse and represent words as sequences of context-independent categories, such as phonemes, syllables, and stress feet. Conversely, there is also evidence that these higher-order context-independent representations develop as the learner acquires a sufficiently large lexicon.
In this view, categories such as phonemes do not exist in nature, to be "discovered" by children. Rather, they emerge gradually as children make increasingly robust abstractions over the words that they learn, so as to be able to generalize well-rehearsed auditory and motor patterns to be able to parse and say and learn many new words each day. Research by ourselves and others has shown that children's lexical expansion is strongly associated with changes in performance on tests of higher-order phonological knowledge—such as measures of phonological awareness, gated word recognition, and repetition of low-frequency sequences in nonwords. These studies are reviewed in detail by Stoel-Gammon (Edwards, Beckman, & Munson, 2004; Metsala & Walley, 1998; Munson, Edwards, & Beckman, 2005, Munson, Kurtz, & Windsor, 2005; Storkel, 2002; Walley, 1988). This view suggests that we should not be studying either phonological acquisition or vocabulary growth in isolation from each other. Rather, we need to examine phonological and lexical development together in longitudinal studies so that we can understand how growth trajectories in different components of phonological knowledge interact with vocabulary growth.
This dynamic view of how children acquire phonological and lexical representations brings us to our final theme, namely that we need to capture phonological knowledge using much finer-grained representations than we have used previously. Counting errors and analyzing substitution patterns in phonetic transcriptions cannot remain our primary methods for measuring children's generative phonological competence. Spoken words are not simple chains of unidimensional phonetic schema that are equivalent across talkers, styles, and prosodic contexts. Further, children do not progress directly and categorically from incorrect substitutions to correct productions. In a series of studies (e.g., Kaiser, Munson, Li, Holliday, Beckman, Edwards, & Schellinger, 2009; Munson, Kaiser, & Urberg-Carlson, 2008; Schellinger, Edwards, Munson, & Beckman, 2008; Urberg-Carlson, Munson, & Kaiser, 2008, 2009), we have used visual analog scaling (VAS) to evaluate whether naive listeners are able to perceive gradient distinctions in children’s speech productions. (In VAS rating tasks, participants are asked to scale a psychophysical parameter by indicating their percept on an idealized visual display.) We have consistently found that naïve adult listeners systematically rate intermediate productions (such as a production transcribed as in between /t/ and /k/) and even clear substitutions (such as “cap” for tap) as less good exemplars of sounds relative to correct productions across many different contrasts. This result suggests that children gradually develop adult-like contrasts, although gradient acquisition is difficult to observe if we rely solely on phonetic transcription. This is why we are currently working to develop finer-grained measures of robustness of contrast in production using acoustic parameters. These finer-grained measures can be used to supplement perception-based measures such as VAS (see, e.g., Holliday, Beckman, & May, submitted). The strength of the relationship between phonology and the lexicon that we have found thus far using phonetic transcription may be dwarfed by the strength of the relationship we will find when we have developed sufficiently fine-grained measures of phonological knowledge at multiple levels of representation.
To summarize, Stoel-Gammon’s paper is important in its emphasis on the relationships between phonological and lexical knowledge in the developing child and in the adult system. We have argued that these relationships can best be understood by a reconceptualization of the relationship between the phonological system and the lexicon. The phonology is not encapsulated away from the lexicon. It is not something that stands only below the lexicon, to mediate between the speech signal and the words that the young child learns to recognize. Rather, phonology is something develops together with the lexicon, and stands in parallel to it. Such a reconceptualization requires a reconsideration of how to study both lexical and phonological acquisition and the interactions between them.
This work was supported in part by NIH grant R01 DC02932 and NSF grant BCS0729140 to Jan Edwards, by NSF grant BCS0729277 to Benjamin Munson, and by NSF grant BCS0729306 to Mary E. Beckman.