|Home | About | Journals | Submit | Contact Us | Français|
The last decade has produced an explosion in neuroscience research examining young children’s early processing of language. Noninvasive, safe functional brain measurements have now been proven feasible for use with children starting at birth. The phonetic level of language is especially accessible to experimental studies that document the innate state and the effect of learning on the brain. The neural signatures of learning at the phonetic level can be documented at a remarkably early point in development. Continuity in linguistic development from infants’ earliest brain responses to phonetic stimuli is reflected in their language and pre-reading abilities in the second, third and fifth year of life, a finding with theoretical and clinical impact. There is evidence that early mastery of the phonetic units of language requires learning in a social context. Neuroscience on early language learning is beginning to reveal the multiple brain systems that underlie the human language faculty.
Neural and behavioral research studies show that exposure to language in the first year of life influences the brain’s neural circuitry even before infants speak their first words. What do we know of the neural architecture underlying infants’ remarkable capacity for language and the role of experience in shaping that neural circuitry?
The goal of the review is to explore this topic, focusing on the data and arguments about infants’ neural responses to the consonants and vowels that make up words. Infants’ responses to these basic building blocks of speech—the phonemes used in the world’s languages—provide an experimentally tractable window on the roles of nature and nurture in language acquisition. Comparative studies at the phonetic level have allowed us to examine the uniqueness of humans’ language processing abilities. Moreover, infants’ responses to native and nonnative phonemes have documented the effects of experience as infants are bathed in a specific language. We are also beginning to discover how exposure to two languages early in infancy produces a bilingual brain. We focus here on when and how infants master the sound structure of their language(s), and the role of experience in explaining this important developmental change. As the data attest, infants’ neural commitment to the elementary units of language begins early, and the review showcases the extent to which the tools of modern neuroscience are advancing our understanding of infants’ uniquely human capacity for language.
Humans’ capacity for speech and language provoked classic debates on nature vs. nurture by strong proponents of nativism (Chomsky, 1959) and learning (Skinner, 1959). While we are far beyond these debates and informed by a great deal of data about infants, their innate predispositions, and their incredible abilities to learn once exposed to natural language (Kuhl, 2009; Saffran et al., 2006), we are still just breaking ground with regard to the neural mechanisms that underlie language development (see Friederici and Wartenburger, 2010; Kuhl and Rivera-Gaxiola, 2008). This decade may represent the dawn of a golden age with regard to the developmental neuroscience of language in humans.
The last decade has produced rapid advances in noninvasive techniques that examine language processing in young children (Figure 1). They include Electroencephalography (EEG)/Event-related Potentials (ERPs), Magnetoencephalography (MEG), functional Magnetic Resonance Imaging (fMRI), and Near- Infrared Spectroscopy (NIRS).
Event-related Potentials (ERPs) have been widely used to study speech and language processing in infants and young children (for reviews, see Conboy et al., 2008a; Friederici, 2005; Kuhl, 2004). ERPs, a part of the EEG, reflect electrical activity that is time-locked to the presentation of a specific sensory stimulus (for example, syllables or words) or a cognitive process (recognition of a semantic violation within a sentence or phrase). By placing sensors on a child’s scalp, the activity of neural networks firing in a coordinated and synchronous fashion in open field configurations can be measured, and voltage changes occurring as a function of cortical neural activity can be detected. ERPs provide precise time resolution (milliseconds), making them well suited for studying the high-speed and temporally ordered structure of human speech. ERP experiments can also be carried out in populations who cannot provide overt responses because of age or cognitive impairment. Spatial resolution of the source of brain activation is, however, limited.
Magnetoencephalography (MEG) is another brain imaging technique that tracks activity in the brain with exquisite temporal resolution. The SQUID (superconducting quantum interference device) sensors located within the MEG helmet measure the minute magnetic fields associated with electrical currents that are produced by the brain when it is performing sensory, motor, or cognitive tasks. MEG allows precise localization of the neural currents responsible for the sources of the magnetic fields. Cheour et al. (2004) and Imada et al. (2006) used new head-tracking methods and MEG to show phonetic discrimination in newborns and infants in the first year of life. Sophisticated head-tracking software and hardware enables investigators to correct for infants’ head movements, and allows the examination of multiple brain areas as infants listen to speech (Imada et al., 2006). MEG (as well as EEG) techniques are completely safe and noiseless.
Magnetic resonance imaging (MRI) can be combined with MEG and/or EEG, providing static structural/anatomical pictures of the brain. Structural MRIs show anatomical differences in brain regions across the lifespan, and have recently been used to predict second-language phonetic learning in adults (Golestani, Molko, Stanislas, LeBihan and Pallier, 2007). Structural MRI measures in young infants identify the size of various brain structures and these measures have been shown to be related to language abilities later in childhood (Ortiz-Mantilla, Choe, Flax, Grant, and Benasich, 2010). When structural MRI images are superimposed on the physiological activity detected by MEG or EEG, the spatial localization of brain activities recorded by these methods can be improved.
Functional magnetic resonance imaging (fMRI) is a popular method of neuroimaging in adults because it provides high spatial-resolution maps of neural activity across the entire brain (e.g., Gernsbacher and Kaschak, 2003). Unlike EEG and MEG, fMRI does not directly detect neural activity, but rather the changes in blood-oxygenation that occur in response to neural activation. Neural events happen in milliseconds; however, the blood-oxygenation changes that they induce are spread out over several seconds, thereby severely limiting fMRI’s temporal resolution. Few studies have attempted fMRI with infants because the technique requires infants to be perfectly still, and because the MRI device produces loud sounds making it necessary to shield infants’ ears. fMRI studies allow precise localization of brain activity and a few pioneering studies show remarkable similarity in the structures responsive to language in infants and adults (Dehaene-Lambertz et al., 2002, 2006).
Near-Infrared Spectroscopy (NIRS) also measures cerebral hemodynamic responses in relation to neural activity, but utilizes the absorption of light, which is sensitive to the concentration of hemoglobin, to measure activation (Aslin and Mehler, 2005). NIRS measures changes in blood oxy- and deoxy-hemoglobin concentrations in the brain as well as total blood volume changes in various regions of the cerebral cortex using near infrared light. The NIRS system can determine the activity in specific regions of the brain by continuously monitoring blood hemoglobin level. Reports have begun to appear on infants in the first two years of life, testing infant responses to phonemes as well as longer stretches of speech such as “motherese” and forward versus reversed sentences (Bortfeld et al., 2007; Homae et al., 2006; Peña et al., 2002; Taga and Asakawa, 2007). As with other hemodynamic techniques such as fMRI, NIRS typically does not provide good temporal resolution. However, event-related NIRS paradigms are being developed (Gratton and Fabiani, 2001). One of the most important potential uses of the NIRS technique is possible co-registration with other testing techniques such as EEG and MEG.
Perception of the phonetic units of speech—the vowels and consonants that make up words—is one of the most widely studied linguistic skills in infancy and adulthood. Phonetic perception and the role of experience in learning is studied in newborns, during development as infants are exposed to a particular language, in adults from different cultures, in children with developmental disabilities, and in nonhuman animals. Phonetic perception studies provide critical tests of theories of language development and its evolution. An extensive literature on developmental speech perception exists and brain measures are adding substantially to our knowledge of phonetic development and learning (see Kuhl, 2004; Kuhl et al., 2008; Werker and Curtin, 2005).
In the last decade, brain and behavioral studies indicate a very complex set of interacting brain systems in the initial acquisition of language, many of which appear to reflect adult language processing, even early in infancy (Dehaene-Lambertz et al., 2006). In adulthood, language is highly modularized, which accounts for the very specific patterns of language deficits and brain damage in adult patients following stroke (Kuhl and Damasio, in press). Infants, however, must begin life with brain systems that allow them to acquire any and all languages to which they are exposed, and can acquire language as either an auditory-vocal or a visual-manual code, on roughly the same timetable (Petitto and Marentette, 1991). We are in a nascent stage of understanding the brain mechanisms underlying infants’ early flexibility with regard to the acquisition of language – their ability to acquire language by eye or by ear, and acquire one or multiple languages – and also the reduction in this initial flexibility that occurs with age, which dramatically decreases our capacity to acquire a new language as adults (Newport, 1990). The infant brain is exquisitely poised to “crack the speech code” in a way that the adult brain cannot. Uncovering why this is the case is a very interesting puzzle.
In this review I will also explore a current working hypothesis and its implications for brain development—that to crack the speech code requires infants to combine a powerful set of domain-general computational and cognitive skills with their equally extraordinary social skills. Thus, the underlying brain systems must mutually influence one another during development. Experience with more than one language, for example, as in the case of people who are bilingual, is related to increases in particular cognitive skills, both in adults (Bialystok, 1991) and in children (Carlson & Meltzoff, 2008). Moreover, social interaction appears to be necessary for language acquisition, and an individual infant’s social behavior can be linked to their ability to learn new language material (Kuhl et al., 2003; Conboy, Brooks, Meltzoff and Kuhl, 2008).
Regarding the social effects, I have suggested that the social brain—in ways we have yet to understand—“gates” the computational mechanisms underlying learning in the domain of language (Kuhl, 2007). The assertion that social factors gate language learning explains not only how typically developing children acquire language, but also why children with autism exhibit twin deficits in social cognition and language, and why nonhuman animals with impressive computational abilities do not acquire language. Moreover, this gating hypothesis may explain why social factors play a far more significant role than previously realized in human learning across domains throughout our lifetimes (Meltzoff et al., 2009). Theories of social learning have traditionally emphasized the role of social factors in language acquisition (Bruner, 1983; Vygotsky, 1962; Tomasello, 2003a,b). However, these models have emphasized the development of lexical understanding and the use of others’ communicative intentions to help understand the mapping between words and objects. The new data indicate that social interaction “gates” an even more basic aspect of language — learning of the elementary phonetic units of language — and this suggests a more fundamental connection between the brain mechanisms underlying human social understanding and the origins of language than has previously been hypothesized.
In the next decade, the methods of modern neuroscience will be used to explore how the integration of brain activity across specialized brain systems involved in linguistic, social, and cognitive analyses take place. These approaches, as well as others described here, will lead us towards a view of language acquisition in the human child that could be transformational.
Language learning is a deep puzzle that our theories and machines struggle to solve but children accomplish with ease. How do infants discover the sounds and words used in their particular language(s) when the most sophisticated computers cannot? What is it about the human mind that allows a young child, merely one year old, to understand the words that induce meaning in our collective minds, and to begin to use those words to convey their innermost thoughts and desires? A child’s budding ability to express a thought through words is a breathtaking feat of the human mind.
Research on infants’ phonetic perception in the first year of life shows how computational, cognitive, and social skills combine to form a very powerful learning mechanism. Interestingly, this mechanism does not resemble Skinner’s operant conditioning and reinforcement model of learning, nor Chomsky’s detailed view of parameter setting. The learning processes that infants employ when learning from exposure to language are complex and multi-modal, but also child’s play in that it grows out of infants’ heightened attention to items and events in the natural world: the faces, actions, and voices of other people.
A stage-setting concept for human language learning is the graph shown in Figure 1, redrawn from a study by Johnson and Newport on English grammar in native speakers of Korean learning English as a second language (1989). The graph as rendered shows a simplified schematic of second language competence as a function of the age of second language acquisition.
Figure 2 is surprising from the standpoint of more general human learning. In the domain of language, infants and young children are superior learners when compared to adults, in spite of adults’ cognitive superiority. Language is one of the classic examples of a “critical” or “sensitive” period in neurobiology (Bruer, 2008; Johnson and Newport 1989; Knudsen, 2004; Kuhl, 2004; Newport et al., 2001).
Scientists are generally in agreement that this learning curve is representative of data across a wide variety of second-language learning studies (Bialystok and Hakuta, 1994; Birdsong and Molis, 2001; Flege et al., 1999; Johnson & Newport, 1989; Kuhl et al., 2005a; Kuhl et al., 2008; Mayberry and Lock, 2003; Neville et al., 1997; Weber-Fox and Neville, 1999; Yeni-Komshian et al., 2000; though see Birdsong, 1992; White and Genesee, 1996). Moreover, not all aspects of language exhibit the same temporally defined critical “windows.” The developmental timing of critical periods for learning phonetic, lexical, and syntactic levels of language vary, though studies cannot yet document the precise timing at each individual level. Studies indicate, for example, that the critical period for phonetic learning occurs prior to the end of the first year, whereas syntactic learning flourishes between 18 and 36 months of age. Vocabulary development “explodes” at 18 months of age, but does not appear to be as restricted by age as other aspects of language learning—one can learn new vocabulary items at any age. One goal of future research will be to document the “opening” and “closing” of critical periods for all levels of language and understand how they overlap and why they differ.
Given widespread agreement on the fact that we do not learn equally well over the lifespan, theory is currently focused on attempts to explain the phenomenon. What accounts for adults’ inability to learn a new language with the facility of an infant?
One of the candidate explanations was Lenneberg’s hypothesis that development of the corpus callosum affected language learning (Lenneberg, 1967; Newport et al., 2001). More recent hypotheses take a different perspective. Newport raised a “less is more” hypothesis, which suggests that infants’ limited cognitive capacities actually allow superior learning of the simplified language spoken to infants (Newport, 1990). Work in my laboratory led me to advance the concept of neural commitment, the idea that neural circuitry and overall architecture develops early in infancy to detect the phonetic and prosodic patterns of speech (Kuhl, 2004; Zhang et al., 2005, 2009). This architecture is designed to maximize the efficiency of processing for the language(s) experienced by the infant. Once established, the neural architecture arising from French or Tagalog, for example, impedes learning of new patterns that do not conform. I will return to the concept of the critical period for language learning, and the role that computational, cognitive, and social skills may play in accounting for the relatively poor performance of adults attempting to learn a second language.
The world’s languages contain approximately 600 consonants and 200 vowels (Ladefoged, 2001). Each language uses a unique set of about 40 distinct elements, phonemes, which change the meaning of a word (e.g. from bat to pat in English). But phonemes are actually groups of non-identical sounds, phonetic units, which are functionally equivalent in the language. Japanese-learning infants have to group the phonetic units r and l into a single phonemic category (Japanese r), whereas English-learning infants must uphold the distinction to separate rake from lake. Similarly, Spanish learning infants must distinguish phonetic units critical to Spanish words (bano and pano), whereas English learning infants must combine them into a single category (English b). If infants were exposed only to the subset of phonetic units that will eventually be used phonemically to differentiate words in their language, the problem would be trivial. But infants are exposed to many more phonetic variants than will be used phonemically, and have to derive the appropriate groupings used in their specific language. The baby’s task in the first year of life, therefore, is to make some progress in figuring out the composition of the 40-odd phonemic categories in their language(s) before trying to acquire words that depend on these elementary units.
Learning to produce the sounds that will characterize infants as speakers of their “mother tongue” is equally challenging, and is not completely mastered until the age of 8 years (Ferguson et al., 1992). Yet, by 10 months of age, differences can be discerned in the babbling of infants raised in different countries (de Boysson-Bardies, 1993), and in the laboratory, vocal imitation can be elicited by 20 weeks (Kuhl and Meltzoff, 1982). The speaking patterns we adopt early in life last a lifetime (Flege, 1991). My colleagues and I have suggested that this kind of indelible learning stems from a linkage between sensory and motor experience; sensory experience with a specific language establishes auditory patterns stored in memory that are unique to that language, and these representations guide infants’ successive motor approximations until a match is achieved (Kuhl and Meltzoff, 1996). This ability to imitate vocally may also depend on the brain’s social understanding mechanisms which form a human mirroring system for seamless social interaction (Hari and Kujala, 2009), and we will revisit the impact of the brain’s social understanding systems later in this review.
What enables the kind of learning we see in infants for speech? No machine in the world can derive the phonemic inventory of a language from natural language input (Rabiner and Huang, 1993), though models improve when exposed to “motherese,” the linguistically simplified and acoustically exaggerated speech that adults universally use when speaking to infants (de Boer and Kuhl, 2003). The variability in speech input is simply too enormous; Japanese adults produce both English r- and l- like sounds, exposing Japanese infants to both sounds (Lotto et al., 2004; Werker et al., 2007). How do Japanese infants learn that these two sounds do not distinguish words in their language, and that these differences should be ignored? Similarly, English speakers produce Spanish b and p, exposing American infants to both categories of sound (Abramson and Lisker, 1970). How do American infants learn that these sounds do not distinguish words in English? An important discovery in the 1970s was that infants initially hear all these phonetic differences (Eimas, 1975; Eimas et al., 1971; Lasky et al., 1975; Werker and Lalonde, 1988). What we must explain is how infants learn to group phonetic units into phonemic categories that make a difference in their language.
Another important discovery in the 1980s identified the timing of a crucial change in infant perception. The transition from an early universal perceptual ability to distinguish all the phonetic units of all languages to a more language specific pattern of perception occurred very early in development—between 6 and 12 months of age (Werker and Tees, 1984), and initial work demonstrated that infants’ perception of nonnative distinctions declines during the second half of the first year of life (Best and McRoberts, 2003; Rivera-Gaxiola et al., 2005; Tsao et al., 2006; Werker and Tees, 1984). Work in this laboratory also established a new fact: At the same time that nonnative perception declines, native language speech perception shows a significant increase. Japanese infants’ discrimination of English r-l declines between 8 and 10 months of age, while at the same time in development, American infants’ discrimination of the same sounds shows an increase (Kuhl et al., 2006) (Figure 3).
We argued that the increase observed in native-language phonetic perception represented a critical step in initial language learning and promoted language growth (Kuhl et al., 2006). To test this hypothesis, we designed a longitudinal study examining whether a measure of phonetic perception predicted children’s language skills measured 18 months later. The study demonstrated that infants’ phonetic discrimination ability at 6 months of age was significantly correlated with their success in language learning at 13, 16, and 24 months of age (Tsao et al., 2004). However, we recognized that in this initial study the association we observed might be due to infants’ cognitive skills, such as the ability to perform in the behavioral task, or to sensory abilities that affected auditory resolution of the differences in formant frequencies that underlie phonetic distinctions.
To address these issues, we assessed both native and nonnative phonetic discrimination in 7-month-old infants, and used both a behavioral (Kuhl et al., 2005) and an event-related potential measure, the mismatch negativity (MMN), to assess infants’ performance (Kuhl et al., 2008). Using a neural measure removed potential cognitive effects on performance; the use of both native and nonnative contrasts addressed the sensory issue, since better sensory abilities would be expected to improve both native and nonnative speech discrimination.
The native language neural commitment (NLNC) view suggested that future language measures would be associated with early performance on both native and nonnative contrasts, but in opposite directions. The results conformed to this prediction. When both native and nonnative phonetic discrimination was measured in the same infants at 7.5 months of age, better native language perception predicted significantly higher language abilities at 14, 18, 24, and 30 months of age, whereas better nonnative phonetic perception at the same age predicted poorer language abilities at the same four future points in time (Kuhl et al., 2005a; Kuhl et al., 2008). As shown in Figure 4, the ERP measure at 7.5 months of age (Fig 4A) provided an MMN measure of speech discrimination for both native and nonnative contrasts; greater negativity of the MMN reflects greater discrimination (Fig 4B). Hierarchical linear growth modeling of vocabulary between 14 and 30 months for MMN values of +1SD and −1SD (Fig 4C) revealed that both native and nonnative phonetic discrimination significantly predict future language, but in opposite directions with better native MMNs predicting advanced future language development and better nonnative MMNs predicting less advanced future language development.
The results are explained by NLNC: better native phonetic discrimination enhances infants’ skills in detecting words and this vaults them towards language, whereas better nonnative abilities indicated that infants remained at an earlier phase of development – sensitive to all phonetic differences. Infants’ ability to learn which phonetic units are relevant in the language(s) they are exposed to, while decreasing or inhibiting their attention to the phonetic units that do not distinguish words in their language, is the necessary step required to begin the path toward language. These data led to a theoretical argument that an implicit learning process commits the brain’s neural circuitry to the properties of native-language speech, and that neural commitment has bi-directional effects – it increases learning for patterns (such as words) that are compatible with the learned phonetic structure, while decreasing perception of nonnative patterns that do not match the learned scheme (Kuhl, 2004).
Recent data indicate very long-term associations between infants’ phonetic perception and future language and reading skills. Our studies show that the ability to discriminate two simple vowels at 6 months of age predicts language abilities and pre-reading skills such as rhyming at the age of 5 years, an association that holds regardless of socio-economic status and the children’s language skills at 2.5 years of age (Cardillo, 2010).
A surprising new form of learning, referred to as “statistical learning” (Saffran et al., 1996), was discovered in the 1990s. Statistical learning is computational in nature, and reflects implicit rather than explicit learning. It relies on the ability to automatically pick up and learn from the statistical regularities that exist in the stream of sensory information we process, and strongly influences both phonetic learning and early word learning.
For example, data show that the developmental change in phonetic perception between the ages of 6 and 12 months is supported by infants’ sensitivity to the distributional frequencies of the sounds in the language they hear, and that this affects perception. To illustrate, adult speakers of English and Japanese produce both English r- and l-like sounds, even though English speakers hear /r/ and /l/ as distinct and Japanese adults hear them as identical. Japanese infants are therefore exposed to both /r/ and /l/ sounds, even though they do not represent distinct categories in Japanese. The presence of a particular sound in ambient language, therefore, does not account for infant learning. However, distributional frequency analyses of English and Japanese show differential patterns of distributional frequency; in English, /r/ and /l/ occur very frequently; in Japanese, the most frequent sound of this type is Japanese /r/ which is related to but distinct from both the English variants. Can infants learn from this kind of distributional information in speech input?
A variety of studies show that infants’ perception of phonetic categories is affected by distributional patterns in the sounds they hear. In one study using very simple stimuli and short-term exposure in the laboratory, 6- and 8-month-old infants were exposed for 2 minutes to 8 sounds that formed a continuum of sounds from /da/ to /ta/ (Maye et al., 2002; see also Maye et al., 2008). All infants heard all the stimuli on the continuum, but experienced different distributional frequencies of the sounds. A “bimodal” group heard more frequent presentations of stimuli at the ends of the continuum; a “unimodal” group heard more frequent presentations of stimuli from the middle of the continuum. After familiarization, infants in the bimodal group discriminated the /da/ and /ta/ sounds, whereas those in the unimodal group did not. Furthermore, while previous studies show that infants integrate the auditory and visual instantiations of speech (Kuhl and Meltzoff, 1982; Patterson and Werker, 1999), more recent studies show that infants’ detection of statistical patterns in speech stimuli, like those used by Maye and her colleagues, is influenced both by the auditory event and the sight of a face articulating the sounds. When exposed only to the ambiguous auditory stimuli in the middle of a speech continuum, infants discriminated the /da-ta/ contrast when each auditory stimulus was paired with the appropriate face articulating either /da/ or /ta/; discrimination did not occur if only one face was used with all auditory stimuli (Teinonen et al., 2008).
Cross-cultural studies also indicate that infants are sensitive to the statistical distribution of sounds they hear in natural language. Infants tested in Sweden and the United States at 6 months of age showed a unique response to vowel sounds that represent the distributional mean in productions of adults who speak the language (i.e., “prototypes”); this response was shown only for stimuli infants had been exposed to in natural language (native-vowel prototypes), not foreign-language vowel prototypes (Kuhl et al., 1992). Taken as a whole, these studies indicate infants pick up the distributional frequency patterns in ambient speech, whether they experience them during short-term laboratory experiments, or over months in natural environments, and can learn from them.
Statistical learning also supports word learning. Unlike written language, spoken language has no reliable markers to indicate word boundaries in typical phrases. How do infants find words? New experiments show that, before 8-month-old infants know the meaning of a single word, they detect likely word candidates through sensitivity to the transitional probabilities between adjacent syllables. In typical words, like in the phrase, “pretty baby,” the transitional probabilities between the two syllables within a word, such as those between “pre” and “tty,” and between “ba” and “by,” are higher than those between syllables that cross word boundaries, such and “tty” and “ba.” Infants are sensitive to these probabilities. When exposed to a 2-min string of nonsense syllables, with no acoustic breaks or other cues to word boundaries, they treat syllables that have high transitional probabilities as “words” (Saffran et al., 1996). Recent findings show that even sleeping newborns detect this kind of statistical structure in speech, as shown in studies using event-related brain potentials (Teinonen et al., 2009). Statistical learning has been shown in nonhuman animals (Hauser et al., 2001), and in humans for stimuli outside the realm of speech, operating for musical and visual patterns in the same way as speech (Fiser and Aslin, 2002; Kirkham et al., 2002; Saffran et al., 1999). Thus, a very basic implicit learning mechanism allows infants, from birth, to detect statistical structure in speech and in other signals. Infants’ sensitivity to this statistical structure can influence both phoneme and word learning.
As reviewed, infants show robust learning effects in statistical learning studies when tested in the laboratory with very simple stimuli (Maye et al., 2002; Maye et al. 2008; Saffran et al., 1996). However, complex natural language learning may challenge infants in a way that these experiments do not. Are there constraints on statistical learning as an explanation for natural language learning? A series of later studies suggest that this is the case. Laboratory studies testing infant phonetic and word learning from exposure to a complex natural language suggest limits on statistical learning, and provide new information suggesting that social brain systems are integrally involved, and, in fact, may be necessary to explain natural language learning.
The new experiments tested infants in the following way: At 9 months of age, the age at which the initial universal pattern of infant perception has changed to one that is more language-specific, infants were exposed to a foreign language for the first time (Kuhl et al., 2003). Nine-month-old American infants listened to 4 different native speakers of Mandarin during 12 sessions scheduled over 4–5 weeks. The foreign language “tutors” read books and played with toys in sessions that were unscripted. A control group was also exposed for 12 sessions but heard only English from native speakers. After infants in the experimental Mandarin exposure group and the English control group completed their sessions, all were tested with a Mandarin phonetic contrast that does not occur in English. Both behavioral and ERP methods were used. The results indicated that infants had a remarkable ability to learn from the “live-person” sessions – after exposure, they performed significantly better on the Mandarin contrast when compared to the control group that heard only English. In fact, they performed equivalently to infants of the same age tested in Taiwan who had been listening to Mandarin for 10 months (Kuhl et al., 2003).
The study revealed that infants can learn from first-time natural exposure to a foreign language at 9 months, and answered what was initially the experimental question: can infants learn the statistical structure of phonemes in a new language given first-time exposure at 9 months of age? If infants required a long-term history of listening to that language— as would be the case if infants needed to build up statistical distributions over the initial 9 months of life—the answer to our question would have been no. However, the data clearly showed that infants are capable of learning at 9 months when exposed to a new language. Moreover, learning was durable. Infants returned to the laboratory for their behavioral discrimination tests between 2 and 12 days after the final language exposure session, and between 8 and 33 days for their ERP measurements. No “forgetting” of the Mandarin contrast occurred during the 2 to 33 day delay.
We were struck by the fact that infants exposed to Mandarin were socially very engaged in the language sessions and began to wonder about the role of social interaction in learning. Would infants learn if they were exposed to the same information in the absence of a human being, say, via television or an audiotape? If statistical learning is sufficient, the television and audio-only conditions should produce learning. Infants who were exposed to the same foreign-language material at the same time and at the same rate, but via standard television or audiotape only, showed no learning—their performance equaled that of infants in the control group who had not been exposed to Mandarin at all (Figure 5).
Thus, the presence of a human being interacting with the infant during language exposure, while not required for simpler statistical-learning tasks (Maye et al., 2002; Saffran et al., 1996), is critical for learning in complex natural language-learning situations in which infants heard an average of 33,000 Mandarin syllables from a total of four different talkers over a 4–5-week period (Kuhl et al., 2003).
The impact of social interaction on language learning (Kuhl et al., 2003) led to the development of the Social Gating Hypothesis (Kuhl, 2007). “Gating” suggested that social interaction creates a vastly different learning situation, one in which additional factors introduced by a social context influence learning. Gating could operate by increasing: (1) attention and/or arousal, (2) information, (3) a sense of relationship, and/or (4) activation of brain mechanisms linking perception and action.
Attention and arousal affect learning in a wide variety of domains (Posner, 2004), and could impact infant learning during exposure to a new language. Infant attention, measured in the original studies, was significantly higher in response to the live person than to either inanimate source (Kuhl et al., 2003). Attention has been shown to play a role in the statistical learning studies as well. “High-attender” 10-month-olds, measured as the amount of infant “looking time,” learned from bimodal stimulus distributions when “low-attenders” did not (Yoshida et al., 2006; see also Yoshida et al., 2010). Heightened attention and arousal could produce an overall increase in the quantity or quality of the speech information that infants encode and remember. Recent data suggest a role for attention in adult second-language phonetic learning as well (Guion and Pederson, 2007).
A second hypothesis was raised to explain the effectiveness of social interaction – the live learning situation allowed the infants and tutors to interact, and this added contingent and reciprocal social behaviors that increased information that could foster learning. During live exposure, tutors focused their visual gaze on pictures in the books or on the toys as they spoke, and the infants’ gaze tended to follow the speaker’s gaze, as previously observed in social learning studies (Baldwin, 1995; Brooks and Meltzoff, 2002). Referential information is present in both the live and televised conditions, but it is more difficult to pick up via television, and is totally absent during audio-only presentations. Gaze following is a significant predictor of receptive vocabulary (Baldwin, 1995; Brooks and Meltzoff, 2005; Mundy and Gomes, 1998), and may help infants link the foreign speech to the objects they see. When 9-month-old infants follow a tutor’s line of regard in our foreign-language learning situation, the tutor’s specific meaningful social cues, such as eye gaze and pointing to an object of reference, might help infants segment word-like units from ongoing speech, thus facilitating phonetic learning of the sounds contained in those words.
If this hypothesis is correct, then the degree to which infants interact and engage socially with the tutor in the social language-learning situation should correlate with learning. In studies testing this hypothesis, 9-month-old infants were exposed to Spanish (Conboy and Kuhl, in press), extending the experiment to a new language. Other changes in method expanded the tests of language learning to include both Spanish phonetic learning and Spanish word learning, as well as adding measures of specific interactions between the tutor and the infant to examine whether interactive episodes could be related to learning of either phonemes or words.
The results confirmed Spanish language learning, both of the phonetic units of the language and the lexical units of the language (Conboy and Kuhl, in press). In addition, these studies answered a key question—does the degree of infants’ social engagement during the Spanish exposure sessions predict the degree of language learning as shown by ERP measures of Spanish phoneme discrimination? Our results (Figure 7) show that they do (Conboy et al., submitted). Infants who shifted their gaze between the tutor’s eyes and newly introduced toys during the Spanish exposure sessions showed a more negative MMN (indicating greater neural discrimination) in response to the Spanish phonetic contrast. Infants who simply gazed at the tutor or at the toy, showing fewer gaze shifts, produced less negative MMN responses. The degree of infants’ social engagement during sessions predicted both phonetic and word learning—infants who were more socially engaged showed greater learning as reflected by ERP brain measures of both phonetic and word learning.
Specific cognitive abilities, particularly the executive control of attention and the ability to inhibit a pre-potent response (inhibitory control), are associated with exposure to more than one language. Bilingual adult speakers show enhanced executive control skills (Bialystok, 1999, 2001; Bialystok and Hakuta, 1994; Wang et al., 2009), a finding that has been extended to young school-aged bilingual children (Carlson and Meltzoff, 2008). In monolingual infants, the decline in discrimination of nonnative contrasts (which promotes more rapid growth in language, see Fig. 4C) is associated with enhanced inhibitory control, suggesting that domain-general cognitive mechanisms underlying attention may play an role in enhancing performance on native and suppressing performance on nonnative phonetic contrasts early in development (Conboy et al, 2008b; Kuhl et al, 2008). In support of this view, it is noteworthy that in the Spanish exposure studies, a median split of the post-exposure MMN phonetic discrimination data revealed that infants showing greater phonetic learning had higher cognitive control scores post-exposure. These same infants did not differ in their pre-exposure cognitive control tests (Conboy, Sommerville, and Kuhl, in preparation). Taken as a whole, the data are consistent with the notion that cognitive skills are strongly linked to phonetic learning at the initial stage of phonetic development (Kuhl et al., 2008).
While attention and the information provided by interaction with another may help explain social learning effects for language, it is also possible that social contexts are connected to language learning through even more fundamental mechanisms. Social interaction may activate brain mechanisms that invoke a sense of relationship between the self and other, as well as social understanding systems that link perception and action (Hari and Kujala, 2009). Neuroscience research focused on shared neural systems for perception and action have a long tradition in speech research (Liberman and Mattingly, 1985), and interest in “mirror systems” for social cognition have re-invigorated this tradition (Kuhl and Meltzoff, 1996; Meltzoff and Decety, 2003; Pulvermuller, 2005; Rizzolatti, 2005; Rizzolatti and Craighero, 2004). Might the brain systems that link perception and production for speech be engaged when infants experience social interaction during language learning?
The effects of Spanish language exposure extend to speech production, and provide evidence of an early coupling of sensory-motor learning in speech. The English-learning infants who were exposed to 12 sessions of Spanish (Conboy and Kuhl, in press) showed subsequent changes in their patterns of vocalization (Ward et al., 2009). When presented with language from a Spanish speaker (but not from an English speaker), a new pattern of infant vocalizations was evoked, one that reflected the prosodic patterns of Spanish, rather than English. This only occurred in response to Spanish, and only occurred in infants who had been exposed to Spanish in the laboratory experiment.
Neuroscience studies using speech and imaging techniques have the capacity to examine whether the brain systems involved in speech production are activated when infants listen to speech. Two new infant studies take a first step towards an answer to this developmental issue. Imada et al. (2006) used magnetoenchephalography (MEG) to study newborns, 6-month-old infants, and 12-month-old infants while they listened to nonspeech, harmonics, and syllables (Figure 7). Dehaene-Lambertz and colleagues (2006) used fMRI to scan 3-month-old infants while they listened to sentences. Both studies show activation in brain areas responsible for speech production (the inferior frontal, Broca’s area) in response to auditorally presented speech. Imada et al. reported synchronized activation in response to speech in auditory and motor areas at 6 and 12 months, and Dehaene et al. reported activation in motor speech areas in response to sentences in 3-month-olds. Is activation of Broca’s area to the pure perception of speech present at birth? Newborns tested by Imada et al. (2006) showed no activation in motor speech areas for any signals, whereas auditory areas responded robustly to all signals, suggesting the possibility that perception-action linkages for speech develop by 3 months of age as infants begin to produce vowel-like sounds.
Using the tools of modern neuroscience, we can now ask how the brain systems responsible for speech perception and speech production forge links early in development, and whether these same brain areas are involved when language is presented socially, but not when language is presented through a disembodied source such as a television set.
MEG studies will provide an opportunity to examine brain rhythms associated with broader cognitive abilities during speech learning. Brain oscillations in various frequency bands have been associated with cognitive abilities. The induced brain rhythms have been linked to attention and cognitive effort, and are of primary interest since MEG studies with adults have shown that cognitive effort is increased when processing nonnative speech (Zhang et al., 2005; 2009). In the adult MEG studies, participants listened to their native- and to nonnative-language sounds. The results indicated that when listening to native language, the brain’s activation was more focal, and faster, than when listening to nonnative-language sounds (Zhang et al., 2005). In other words, there was greater neural efficiency for native as opposed to nonnative speech processing. Training studies show that adults can improve nonnative phonetic perception when training occurs under more social learning conditions, and MEG measures before and after training indicate that neural efficiency increases after training (Zhang et al., 2009). Similar patterns of neural inefficiency occur as young children learn words. Young children’s event-related brain potential responses are more diffuse and become more focally lateralized in the left hemisphere’s temporal regions as they develop (Conboy et al., 2008a; Durston et al., 2002; Mills, et al., 1993, 1997; Tamm et al., 2002) and studies with young children with autism show this same pattern – more diffuse activation – when compared to typically developing children of the same age (Coffey-Corina et al., 2008).
Brain rhythms may be reflective of these same processes in infants as they learn language. Brain oscillations in four frequency bands have been associated with cognitive effects: theta (4–7 Hz), alpha (8–12 Hz), beta (13–30 Hz) and gamma (30–100 Hz). Resting gamma has been related to early language and cognitive skills in the first three years (Benasich et al., 2008). The induced theta rhythm has been linked to attention and cognitive effort, and will be of strong interest to speech researchers. Power in the theta band increases with memory load in adults tested in either verbal or nonverbal tasks (Gevins et al., 1997; Krause et al., 2000) and in 8-month-old infants tested in working memory tasks (Bell and Wolfe, 2007). Examining brain rhythms in infants using speech stimuli is now underway using EEG with high-risk infants (Percaccio et al., 2010) and using MEG with typically developing infants (Bosseler et al., 2010), as they listen to native and nonnative speech. Comparisons between native and nonnative speech may allow us to examine whether there is increased cognitive effort associated with processing nonnative language, across age and populations. We are also testing whether language presented in a social environment affects brain rhythms in a way that television and audiotape presentations do not. Neural efficiency is not observable with behavioral approaches—and one promise of brain rhythms is that they provide the opportunity to compare the higher-level processes that likely underlie humans’ neural plasticity for language early in development in typical children as well as in children at risk for autism spectrum disorder, and in adults learning a second language. These kinds of studies may reveal the cortical dynamics underlying the “Critical Period” for language.
These results underscore the importance of a social interest in speech early in development in both typical and atypical populations. An interest in “motherese,” the universal style with which adults address infants across cultures (Fernald and Simon, 1984; Greiser and Kuhl, 1988) provides a good metric of the value of a social interest in speech. The acoustic stretching in motherese, observed across languages, makes phonetic units more distinct from one another (Burnham et al., 2002; Englund, 2005; Kuhl et al., 1997; Liu et al, 2003, 2007). Mothers who use the exaggerated phonetic patterns to a greater extent when talking to their typically developing 2-month-old infants have infants who show significantly better performance in phonetic discrimination tasks when tested in the laboratory (Liu et al., 2003). New data show that the potential benefits of early motherese extend to the age of 5 years (Liu et al., 2009). Recent ERP studies indicate that infants’ brain responses to the exaggerated patterns of motherese elicit an enhanced N250 as well as increased neural synchronization at frontal-central-parietal sites (Zhang, Koerner, Miller, Grice-Patil, Svec, Tusler and Carney, in press).
It is also noteworthy that children with Autism Spectrum Disorder (ASD) prefer to listen to non-speech rather than speech, when given a choice, and this preference is strongly correlated with the children’s ERP brain responses to speech, as well as with the severity of their autistic symptoms (Kuhl et al., 2005b). Early speech measures may therefore provide an early biomarker of risk for ASD. Neuroscience studies in both typically developing and children with ASD that examine the coherence and causality of interaction between social and linguistic brain systems will provide valuable new theoretical data as well as potentially improving the early diagnosis and treatment of children with autism.
Humans are not the only species in which communicative learning is affected by social interaction (see Fitch et al., 2010 for review). Young zebra finches need visual interaction with a tutor bird to learn song in the laboratory (Eales, 1989). A zebra finch will override its innate preference for conspecific song if a Bengalese finch foster father feeds it, even when adult zebra finch males can be heard nearby (Immelmann, 1969). More recent data indicate that male zebra finches vary their songs across social contexts; songs produced when singing to females vary from those produced in isolation, and females prefer these ‘directed’ songs (Wooley and Doupe, 2008). Moreover, gene expression in high-level auditory areas is involved in this kind of social context perception (Woolley & Doupe, 2008). White-crowned sparrows, which reject the audiotaped songs of alien species, learn the same alien songs when a live tutor sings them (Baptista and Petrinovich, 1986). In barn owls (Brainard and Knudsen, 1998) and white-crowned sparrows (Baptista and Petrinovich, 1986), a richer social environment extends the duration of the sensitive period for learning. Social contexts also advance song production in birds; male cowbirds respond to the social gestures and displays of females, which affect the rate, quality, and retention of song elements in their repertoires (West and King, 1988), and white-crowned sparrow tutors provide acoustic feedback that affects the repertoires of young birds (Nelson and Marler, 1994). Studies of the brain systems linking social and auditory-vocal learning in humans and birds may significantly advance theories in the near future (Doupe and Kuhl, 2008).
Our current model of neural commitment to language describes a significant role for cognitive processes such as attention in language learning (Kuhl et al., 2008). Studies of brain rhythms in infants and other neuroscience research in the next decade promise to reveal the intricate relationships between language and cognitive processes.
Language evolved to address a need for social communication and evolution may have forged a link between language and the social brain in humans (Adolphs, 2003; Dunbar, 1998; Kuhl, 2007; Pulvermuller, 2005). Social interaction appears to be necessary for language learning in infants (Kuhl et al., 2003), and an individual infant’s social behavior is linked to their ability to learn new language material (Conboy and Kuhl, in press). In fact, social “gating” may explain why social factors play a far more significant role than previously realized in human learning across domains throughout our lifetimes (Meltzoff et al., 2009). If social factors “gate” computational learning, as proposed, infants would be protected from meaningless calculations – learning would be restricted to signals that derive from live humans rather than other sources (Doupe and Kuhl, 2008; Evans and Marler, 1995; Marler, 1991). Constraints of this kind appear to exist for infant imitation: when infants hear nonspeech sounds with the same frequency components as speech, they do not attempt to imitate them (Kuhl et al., 1991).
Research has begun to appear on the development of the neural networks in humans that constitute the “social brain” and invoke a sense of relationship between the self and other, as well as on social understanding systems that link perception and action (Hari and Kujala, 2009). Neuroscience studies using speech and imaging techniques are beginning to examine links between sensory and motor brain systems (Pulvermuller, 2005; Rizzilatti and Craighero, 2004), and the fact that MEG has now been demonstrated to be feasible for developmental studies of speech perception in infants during the first year of life (Imada et al, 2006) provides exciting opportunities. MEG studies of brain activation in infants during social versus nonsocial language experience will allow us to investigate cognitive effects via brain rhythms and also examine whether social brain networks are activated differentially under the two conditions.
Many questions remain about the impact of cognitive skills and social interaction on natural speech and language learning. As reviewed, new data show the extensive interface between cognition and language and indicate that whether or not multiple languages are experienced in infancy affects cognitive brain systems. The idea that social interaction is integral to language learning has been raised previously for word learning; however, previous data and theorizing have not tied early phonetic learning to social factors. Doing so suggests a more fundamental connection between the motivation to learn socially and the mechanisms that enable language learning.
Understanding how language learning, cognition, and social processing interact in development may ultimately explain the mechanisms underlying the critical period for language learning. Furthermore, understanding the mechanism underlying the critical period may help us develop methods that more effectively teach second languages to adult learners. Neuroscience studies over the next decade will lead the way on this theoretical work, and also advance our understanding of the practical results of training methods, both for adults learning new languages, and children with developmental disabilities struggling to learn their first language. These advances will promote the science of learning in the domain of language, and potentially, shed light on human learning mechanisms more generally.
The author and research reported here were supported by a grant from the National Science Foundation’s Science of Learning Program to the University of Washington LIFE Center (SBE-0354453), and by grants from the National Institutes of Health (HD37954, HD55782, HD02274, DC04661).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.