|Home | About | Journals | Submit | Contact Us | Français|
Infants rapidly accrue information via imitation from multiple sources, including television and electronic toys. In two experiments, we examined whether adding sound effects to video or live demonstrations would influence imitation by 6-, 12-, and 18-month-old infants. In Experiment 1, we added matching and mismatching sound effects to target actions presented by a televised model. We found that 6-month-olds reproduced the target actions regardless of whether the sound effects were matched or mismatched, but 12- and 18-month-olds reproduced the actions only when the sound effects were matched. In Experiment 2, we added matching sound effects to target actions presented by a live model. The addition of sound effects disrupted imitation performance by 6-, 12-, and 18-month-olds. Overall, imitation provides a clear behavioral measure of rapid changes in learning from television and electronic toys during infancy. These findings have practical implications for producers and parents regarding learning in the digital age and theoretical implications regarding the development of integrated action-perception representational systems.
Recently, there has been an increase in both the number of infant-directed television shows and electronic toys sold to parents of young children and many infants are exposed to television and electronic toys on a daily basis (Hirsh-Pasek, Golinkoff, & Dyer, 2003; Rideout & Hamel, 2006; Rideout, Vandewater, & Wartella, 2003; Zimmerman, Christakis, & Meltzoff, 2007). Parents ascribe academic learning to such structured play activities (Fisher, Hirsh-Pasek, Golinkoff, & Gryfe, 2008; Garrison & Christakis, 2005; Hirsh-Pasek et al., 2003; Singer & Singer 2005; Wong, Uribe-Zarain, Golinkoff, Fisher, & Hirsh-Pasek, 2008) but there is a lack of empirical investigation examining learning from television or electronic devices during infancy.
Infants and young children typically learn better from a live adult than they do from a video presentation, a phenomenon known as the video deficit effect (Anderson & Pempek, 2005). In Barr and Hayne's (1999) study, infants were shown a puppet task modeled live or on television. During the demonstration session, an experimenter removed, shook, and replaced a mitten on the puppet's hand. The 12-, 15-, and 18-month-olds imitated the live model after a 24-hour delay. In contrast, only 18-month-olds imitated the televised model, and their performance was inferior to that of the age-matched group who had seen the live model. Imitation tasks reveal evidence of the video deficit effect until children are 3 years of age, and perhaps beyond depending on task complexity (Flynn & Whiten, 2008; Hayne, Herbert, & Simcock, 2003; Hudson & Sheffield, 1999; Klein, Hauf, & Aschersleben, 2006; McCall, Parke, & Kavanaugh, 1977; Strouse & Troseth, 2008). The video deficit is also exhibited in object search tasks (Deocampo & Hudson, 2005; Schmitt & Anderson, 2002; Suddendorf, 2003; Troseth, 2003; Troseth & DeLoache, 1998), emotion processing tasks with infants (Mumme & Fernald, 2003), self-recognition tasks (Povinelli, Landau, & Perilloux, 1996; Skouteris, Spataro, & Lazaridis, 2006; Suddendorf, Simcock, & Nielsen, 2007), and language-based tasks with infants, toddlers (Kuhl, Tsao, & Liu, 2003), and preschoolers (Sell, Ray, & Lovelace, 1995). Similarly, researchers using event-related potentials (ERPs) have demonstrated that 18-month-olds process 2D images more slowly than they process 3D objects, recognizing a familiar 3D object very early in the attention process and recognizing a 2D digital photo significantly later (Carver, Meltzoff, & Dawson, 2006). Having failed to incorporate typical attention-capturing formal features of television into the experimental stimuli, prior experimental studies may have underestimated the capacity of infants to learn from television (Barr & Hayne, 1999; Hayne et al., 2003; Hudson & Sheffield, 1999; McCall et al., 1977; Meltzoff, 1988).
Formal features are the auditory and visual production and editing techniques characterizing television, such as action, sound effects, and pacing (the rate of scene and character changes). Some features, such as sound effects and rapid action are perceptually salient and likely to elicit attention and interest, whereas other features such as dialogue are not salient but are important in processing the narrative (Huston & Wright, 1983).
There are a number of salient formal features and specific content that consistently increase toddlers' and preschoolers' selective attention to television content. In particular, attention to televised content increases and remains high in the presence of female adults, character action, children, puppets, animation, active movement (including dancing and repetition), singing and lively music, peculiar voices, and sound effects (Anderson & Levin, 1976; Calvert, Huston, Watkins, & Wright, 1982; Huston & Wright, 1983; Schmitt, Anderson, & Collins, 1999). Attention decreases as the length of a segment increases, during low action sequences, and during periods of adult narration or abstract adult dialogue (Anderson & Levin, 1976; Calvert et al., 1982; Huston & Wright, 1983; Schmitt et al., 1999). Research has also shown that formal visual effects such as cuts, zooms, and pans, known as montage, enhance the attention of preschoolers (Calvert et al., 1982; Schmitt et al., 1999; Smith, Anderson, & Fischer, 1985) but not as effectively as formal auditory features, including sound effects (Huston-Stein & Wright, 1979; Rice, Huston, & Wright, 1982).
Processing of such formal features takes both cognitive skills and experience and comprehension of televised content depends upon comprehension of formal features (Beentjes, de Koning, & Huysmans, 2001). According to the sampling model of attention, during early childhood, attention increases in the presence of a perceptually salient formal feature such as a sound effect because it elicits a primitive orienting response, thereby improving comprehension of contiguously presented content (Calvert & Scott, 1989; Calvert et al., 1982; Huston & Wright, 1983; Rice et al., 1982; Schmitt et al., 1999). Calvert and colleagues (1982) found support for the sampling model of attention theory and reported that salient features, such as sound effects and visual special effects, increased children's attention. Furthermore, when these features were accompanied by central plot information during the television program, comprehension scores of the central messages increased. Theoretically, sound effects initially create an attentional orienting response, which later becomes a learned signal or marker that important content will follow, thereby disrupting any habituation process (Calvert et al., 1982; Huston & Wright, 1983). Through this process, children learn to use sound effects as guides for their selective attention to important plot-relevant content. Such formal features could be used to provide an entry point for very young children's viewing. That is, features like sound effects could be used to assist very young children's attention to, and imitation of, targeted content and overcome the video deficit (Calvert et al., 1982; Rice et al., 1982). According to common coding theory, it is also possible that the addition of salient sound effects to live demonstrations would increase attention and information processing in similarly positive ways.
According to the common coding theory, there is a similar representation for action perception and action production (Prinz, 1997; see also Aschersleben, 2006; Meltzoff, 1993). A second major assumption of this theory is that action effects should have large consequences for action production. That is, a salient consequence of an action should be coded and increase the likelihood that the action is reproduced for any given goal-directed sequence. These assumptions have large consequences for imitative behavior. If actions are perceived and acted upon using a similar representational coding scheme, then imitation becomes likely (Aschersleben, 2006). Alternatively, if the match between perceived actions and later opportunities to reproduce those actions decreases, then the likelihood of imitation decreases. For example, if perceptual information, such as auditory and visual features, is mismatched, this mismatch will decrease the likelihood that later imitation will occur. Televised demonstrations provide an avenue to test this assumption.
Studies analyzing the role of action effects on action control have shown that adding action effects increases imitation scores (e.g., Elsner, Hauf, & Aschersleben, 2007; Klein et al., 2006; for review see Aschersleben, 2006). For example, Klein and colleagues (2006) examined whether adding sound effects to different parts of a sequence influenced imitation from television by 12-month-olds. They found that the target action accompanied by the sound effect was most likely to be reproduced and most likely to be produced first. They concluded that the salient action effect allowed the infant to infer the goal of the demonstrator and thereby increased the imitation of the goal-directed behavior. These authors have not, however, tested the effect of mismatching the action effect and the action during the demonstration. It is possible, that the salience of the action effect simply directed additional attention to the target action. A stronger test of the hypothesis would be to examine whether the salient action effect needs occur at the same time. Experimentally, it is possible to do this with televised demonstrations.
The video deficit may not be due to a perceptual processing problem, rather a difficulty transferring perceptual information to action-based information. For example, Hofer, Hauf, and Aschersleben (2007) showed that 6-month-olds infants perceived goal-directed actions from video as readily as they perceived them from a live demonstration. The authors argue that a potential reason for the video deficit might be that infants have to match the perceptual information and transfer it to their behavioral repertoire (for similar arguments see also Barr & Hayne, 1999; Suddendorf, 2003).
It is also possible that other factors that influence the coupling of action and perception are what govern learning from television and electronic devices. The video deficit effect can be ameliorated by repeating the target actions (Barr, Muentener, & Garcia, 2007a; Barr, Muentener, Garcia, Chavez, & Fujimoto, 2007b). These findings are consistent with the notion that infants require more streams of information to encode and retrieve information from a 2D source than from a 3D source (Johnson, 2000). These findings also suggest that the addition of auditory information might enhance imitation from television. Alternatively, the addition of audio-visual components may increase the task complexity and decrease task performance. Similarly, electronic toys present additional streams of information for infants to assimilate into their developing representational systems and, as such, may enhance learning by focusing attention selectively on target information or increase cognitive load by requiring additional integration of auditory and visual streams of information.
Nonverbal imitation paradigms allow researchers to directly examine learning from media and electronic toys by the youngest audiences (Barr et al., 2007a, b; Barr & Hayne, 1999; Hayne et al., 2003; Huang & Charman, 2005; Hudson & Sheffield, 1999; Klein et al., 2006; McCall et al., 1977; Meltzoff, 1988). The present study is designed to examine age-related changes in imitation as a function of the addition of sound effects during televised and live demonstrations. We have 2 research questions: 1) Can 6-, 12-, and 18-month-old infants imitate from a video demonstration if added sound effects are matched and mismatched to target actions? (Experiment 1) and 2) Can 6-, 12-, and 18-month-old infants imitate from a live demonstration if sound effects are matched to target actions? (Experiment 2).
We chose to use a puppet imitation task for two major reasons. First, it has an age invariant baseline from 6 to 24 months (Barr, Dowden, & Hayne, 1996). Second, infants attend well to puppets on television and salient sound effects have been added to electronic toys, making the stimulus an ideal choice to examine how adding electronic sound effects influences learning from live and televised models (Anderson & Levin, 1976). The following experiments are a replication of research conducted by Barr and colleagues (2007a), except that sound effects were added to the video and live demonstration conditions. Barr and colleagues (2007a) examined the influence of repeating the target actions on imitation from television during infancy. They demonstrated that 6-month-olds learned equally well from live and televised models, 12-month-olds learned equally well from live and televised models if the televised actions were repeated and 18-month-olds continued to learn less from televised than from live models regardless of repetition. The finding that infants as young as 6 months can imitate limited actions demonstrated by videotaped models after a 24-hour delay was surprising given the complex representational nature of the task. To our knowledge, this was the first study to demonstrate deferred imitation from television by infants during the first year of life. There is one important parameter difference that changes with this task at different ages. Due to immature motor planning skills (infants only begin reaching at 5 months), 6-month-olds require additional time at test to reproduce the target actions. Specifically, infants are given 120s rather than 90s at test to reproduce the target actions. In the present study, we decided to keep the number of demonstrations for the video condition constant across age and to continue to give 6-month-olds a 120s test session.
Imitation paradigms allow the manipulation of a number of important media variables. First, people do not typically encounter actual television actors. In imitation studies, this can be simulated by having one experimenter demonstrate the target actions on television and another experimenter interact with the child in the real world. Second, infants would only infrequently have immediate access to what they see on television. To more closely simulate these real-world conditions, researchers therefore use a deferred imitation procedure. The experimenter demonstrates the target actions on television, and infants are tested after a delay. From a theoretical perspective, deferred imitation from television is a complex representational task (e.g., Barr & Hayne, 1999, 2000; Deloache & Korac, 2003; Meltzoff, 1988; Strouse & Troseth, 2008; Suddendorf, 2003). It has been argued that successful completion of the imitation task from a videotaped model requires participants to form a memory of the event on television, and only after a delay, participants must transfer that memory to 3D objects in the real world and reproduce the target actions.
The experimental design is based on habituation studies that are used to test the Intersensory Redundancy Hypothesis (e.g., Bahrick & Lickliter, 2000). Intersensory redundancy occurs when an action is accompanied by simultaneous bimodal information. For example, when a book falls on the ground, the observer simultaneously sees the book fall and hit the ground, and hears the book hitting the ground. Bahrick and Lickliter (2004) examined the influence of bimodal presentation on habituation patterns in 5-month-olds. Five-month-olds were habituated to the rhythmic tapping of a hammer. The action was presented bimodally with either a synchronous audio-visual presentation, or an asynchronous audio-visual presentation. At test, infants were presented with a different rhythm. Recovery of looking time indexed whether infants discriminated the new rhythm from the old rhythm. Performance was compared to no-change control groups who did not recover looking time. Only infants in the bimodal synchronous condition significantly recovered looking time.
We were interested in how sound effects would influence learning from television, and adopted a similar design strategy to Bahrick and Lickliter (2004) for the present experiment. Six-, 12-, and 18-month-old infants were randomly assigned to the following conditions, video matched sound effects (SE), video mismatched sound effects (SE), or baseline control. We also conducted a cross-experiment comparison using data collected by Barr and colleagues (2007a) to contrast matched, mismatched, and no sound effects groups. It was hypothesized that the addition of matched sound effects would enhance imitation by increasing the salience and comprehensibility of the modeled actions whereas mismatched sound effects would diminish rates of imitation.
The final sample consisted of 94 infants (31 6-month-olds, 30 12-month-olds, and 33 18-month-olds; 43 boys, 51 girls) recruited from commercial mailing lists and by word of mouth, all of whom were randomly assigned to the video matched sound effects (SE), video mismatched sound effects (SE), and baseline control (n=18) groups. There were three additional infants (1 6-month-old and 2 18-month-olds) in the video matched sound effects groups, but otherwise there were n = 12 per group for video matched SE and video mismatched SE groups. The 6-month-olds had a mean age of 6 months, 18 days (SD = 8 days), the 12-month-olds had a mean age of 12 months, 17 days (SD = 11 days) and the 18-month-olds had a mean age of 18 months, 13 days (SD = 8 days). Participants were African-American (n = 4), Asian (n = 3), Caucasian (n = 62), Latino (n = 4), Native American (n = 1), Mixed (n = 17), and one family did not report. The parents' mean educational attainment was 17.23 years (SD = 1.10), and the mean rank of socioeconomic index (SEI, Nakao & Treas, 1992) was 76.7 (SD = 15.2) reported by 93.6% and 100% of the sample, respectively. Educational attainment, occupational status, and annual income are the major components of socioeconomic status. The SEI ranks 503 occupations listed in the 1980 US census on a scale of 1 to 100, with higher status occupations (e.g., physician) being accorded higher ranks (Nakao & Treas, 1992). Testing was discontinued on additional infants for refusal to touch the stimuli at test (n = 13), excessive crying (n = 2), sibling interference (n = 1), failure to pass manipulation check (n = 1), refusal to sit during test (n = 1), less than 50% attention during the demonstration (n = 3), equipment failure (n = 2), or experimenter error (n = 3).
Using a partial replication approach, a pooled baseline was created by including 12 additional baseline control infants at each age from our most recently published study using the same stimuli (Barr et al., 2007a) to make a total control group of n = 18 per age group (for a similar rationale see also Barr, Rovee-Collier, & Campanella, 2005). Twelve infants at each age (20 female and 16 male) were included. These infants were recruited from the same geographical location and their demographics are very similar. The 6-month-olds had a mean age of 6 months, 17 days (SD = 12 days), the 12-month-olds had a mean age of 12 months, 17 days (SD = 9 days) and the 18-month-olds had a mean age of 18 months, 17 days (SD = 7 days). Participants were Asian (n = 4), Caucasian (n = 24), Latino (n = 3), Mixed (n = 3), and one family did not report. The parents' mean educational attainment was 16.23 years (SD = .94), and the mean rank of socioeconomic index (SEI, Nakao & Treas, 1992) was 75.1 (SD = 15.9) reported by 94.4% and 97.2% of the sample, respectively. These infants did not see a demonstration of the target actions prior to the test.
Data from Barr et al. (2007a) were also used to gain a cross-experiment comparison group of infants who observed actions presented via video but without sound effects. The demographics of the Barr et al. (2007a) video no SE groups were also very similar to infants in the video matched SE and video mismatched SE groups. The 6-month-olds had a mean age of 6 months, 22 days (SD = 6.1 days), the 12-month-olds had a mean age of 12 months, 18 days (SD = 6.7 days) and the 18-month-olds had a mean age of 18 months, 15 days (SD = 8.2 days). Participants were African-American (n = 1), Asian (n = 4), Caucasian (n = 23), Latino (n = 4), or of Mixed race (n = 4). The parents' mean educational attainment was 17. 0 years (SD = 1.1), and the mean rank of socioeconomic status (Nakao & Treas, 1992) was 74.2 (SD = 16.5) reported by 100% and 95% of the sample, respectively.
Four hand puppets (a pastel pink rabbit, a pale grey mouse, a black-and-white cow, and a yellow duck) were constructed for these experiments and were not commercially available. All puppets were 60 cm in height and were made of soft, acrylic fur. A removable felt mitten (8 cm × 9 cm) was placed over the right hand of each puppet. The mitten was pink, grey, black, or yellow and matched the color of the rabbit, mouse, cow, or duck, respectively. During the demonstration session, a large jingle bell was secured inside the mitten. The puppets (rabbit, mouse, cow, or duck) were counterbalanced within groups.
Eight professionally-produced 60 s video segments, two for each stimulus, one with matched sound effects and one with mismatched sound effects, were made for the study. In the matched condition, there were 4 separate sound effects (.5s remove sound effect, .5s swoosh for movement across the puppet, .5s pause, 5s bell ringing, pause 1s then .5s squelch sound for replace the mitten). For the mismatched condition, the exact same sound effects were used but they were out of synchrony with the demonstration (5s bell ringing, 1s pause, .5s remove sound effect, .5s swoosh sound effect, 1s pause, .5s replace squelch after the mitten had been replaced). In the mismatched condition, the bell ringing occurred when the mitten was being removed and also stopped while the mitten was still moving. In each video segment, the puppet was centered in the middle of the screen and was filmed at a close range. Similar to a live demonstration, the adult model's hands and arms were visible throughout the presentation, and the face of the experimenter was only partially visible because the puppet was placed in front of his face. The segments were recorded onto both videotapes and DVDs.
Infants were tested in their own homes at a time when parents said they were most likely to be awake and alert. This time varied across infants but remained relatively constant across sessions of the same infant. They viewed the demonstration on their own televisions. Family television screens ranged from 23 to 152 cm (M = 68 cm, SD = 22). All sessions were videotaped for later analysis.
Infants in video matched SE and video mismatched SE groups participated in this session. An experimenter demonstrated three specific actions on the puppet 6 times in succession on videotape/DVD. The puppet target actions lasted a total of 52s and the entire video demonstration, allowing for the experimenter to narrate standard phrases and say hello and goodbye, lasted 65s. For the video matched SE group, a cartoon sound effect accompanied each of the actions; remove, shake and replace the mitten. For the video mismatched SE group, the sound effects were mismatched to the target actions, such that the sound of bell ringing occurred when the mitten had not been removed and stopped while the mitten was still in motion. The caregiver and infant were seated approximately 80 cm from the family's own television set such that the screen was at the infant's eye level but out of reach. The video started after the infant and caregiver were correctly positioned. Both the caregiver and the experimenter directed the infant's attention to the television screen using the child's name and the word “look” but did not describe the target actions. To increase the ecological validity of the study, the video model was not present in the home because infants do not typically meet television presenters.
The test session occurred following a 24-hour delay and was identical for all groups. During the test session, there was no bell in the mitten. The experimenter placed the puppet within the infant's reach, and the infant was allowed 90s (120s for 6 month-olds) from the time that he or she first touched the puppet to imitate the target actions. Infants in the experimental groups were tested with the same puppet that they had seen the day before. Performance was compared to that of the age-matched baseline control groups. The control group is used to assess the spontaneous production of the target actions in the absence of the demonstration. Infants in the baseline control group did not participate in the demonstration session. Rather, they were shown the test stimuli for the first time during the test session.
Looking time was coded from videotaped sessions using a computer timer. The coder pressed a key to mark the beginning and end of the demonstration and pressed a key when infants looked at or away from the demonstration. The duration of the looks and overall percent looking were subsequently calculated (e.g., Anderson & Levin, 1976). Data were not recorded for four infants due to technical errors. Based on 47% of the sessions, an intraclass correlation on percent looking time yielded an interobserver reliability coefficient of .88.
An observer noted the total number of target actions (remove, shake, replace or attempt to replace the mitten) that each infant imitated during the 90s for 12- and 18-month-olds and 120s for the 6-month-olds from when the infant first touched the puppet (range 0-3). Based on 59% of the test sessions, interobserver reliability was 97.4% (Kappa = .94). When the two raters differed, the primary rater's score was assigned.
Preliminary analyses revealed that there were no main effects of gender, stimulus, or TV size on either percent looking time or imitation score so data were collapsed across these variables for all subsequent analyses in both experiments.
Percent looking time to the video matched SE demonstration and the video mismatched SE demonstration was high (94.2%, SE = 1.5 or 56.4s, and 90.3%, SE = 1.8, or 53.4s, respectively). A 3 (Age) × 2 (Group; video matched SE, video mismatched SE) between subjects analysis of variance (ANOVA) across percent looking time to the demonstration yielded no main effect of age, F(2, 64) = 2.01, p = .14, or group, F(1, 64) = 3.74, p = .06, and no age × group interaction, F(2, 64) = 1.30, p = .28. The difference in looking time between the matched and mismatched SE groups was 3s and was not significant. Subsequent differences in imitation cannot be attributed to failures to look during the demonstration since there were no differences in percent looking time as a function of mode of presentation or age. Note that looking time is sometimes considered a reflection of endogenous attention and encoding of information (Kannass & Colombo, 2007).
The critical question for the present experiment was whether there were age-related differences as a function of the presence of sound effects in the matched and mismatched groups. We asked: Did infants of any age perform significantly above baseline in either the matched or mismatched groups? Deferred imitation is operationally defined as group performance that is significantly above baseline (see Barr & Hayne, 2000; Meltzoff, 1990 for conceptual reasons for using this operational definition). To answer the question, we conducted separate one-way ANOVAs at each age. Although ANOVAs indicate whether groups differ, they do not answer our primary question, namely whether any experimental group performed significantly better than the baseline control group. If an ANOVA was significant, we used Dunnett's t tests to assess whether the mean imitation score of each experimental group was significantly higher than that of the baseline control group (p < .05). This test controls for Type I errors across multiple comparisons with a control group (Dunnett, 1955).
For 6-month-olds there was an effect of group, F(2, 40) = 4.44, p < .02, partial η2 = .18. As shown in Figure 1, post-hoc Dunnetts t-tests (p < .05) showed that 6-month-olds in the video matched SE and video mismatched SE groups performed significantly above baseline. That is, the mismatching of the sound effects did not influence imitation performance. For 12-month-olds, there was an effect of group, F(2, 39) = 6.43, p < .01, partial η2 = .25, but the pattern of results was very different. Post-hoc Dunnett's t tests (p < .05) showed that 12-month-olds in the video matched SE group performed significantly above baseline but those in the video mismatched SE group did not. Finally, the 18-month-olds also showed an effect of group F(2, 41) = 3.41, p < .05, partial η2 = .14. Post-hoc Dunnett's t tests (p < .05) showed that 18-month-olds in the video matched SE group performed significantly above baseline but those in the video mismatched SE group did not.
To confirm that there was an inhibitory effect of mismatched sound effects on imitation performance, we conducted a cross-experiment comparison with data collected by Barr and colleagues (2007a). Once again, we conducted separate one-way ANOVAs at each age but included the cross-experiment comparison with the video no sound effects group from Barr et al. (2007a). We used post-hoc Student Newman Keuls tests (SNK, p < .05) to assess whether performance was enhanced or inhibited relative to the video no sound effects (SE) groups. For 6-month-olds, there was an effect of group, F(3,57) = 5.79, p < .002, partial η2 = .23. Post-hoc Student Newman Keuls tests (SNK, p < .05) showed that 6-month-olds in the video no SE (M = 1.0, SE = .25), video matched SE (M = .92, SE = .24) and video mismatched SE (M = .92, SE = .25) groups performed significantly above baseline (M = .22, SE = .20), and did not differ from one another. That is, the mismatching of the sound effects did not disrupt imitation performance and matched sound effects did not enhance performance. For 12-month-olds, there was an effect of group, F(3, 56) = 5.24, p < .003, partial η2 = .22. Post-hoc SNK tests (p < .05) showed that 12-month-olds in the video no SE (M = .83, SE = .25) and video matched SE group (M = .92, SE = .25) performed significantly above baseline (M = .06, SE = .06), and did not differ from one another but those in the video mismatched SE group (M = .08, SE = .08) did not. Finally, the 18-month-olds, also showed an effect of group, F(3, 58) = 4.52, p < .006, partial η2 = .19. Post-hoc SNK tests (p < .05) showed that 18-month-olds in the video no SE group (M = 1.0, SE = .25) and the video matched SE group (M = 1.0, SE = .24) performed significantly above baseline (M = .27, SE = .21), and did not differ from one another but those in the video mismatched SE group (M = .33, SE = .37) did not. That is, mismatching sound effects that commonly convey informational content interfered with performance by 12- and 18-month-olds, presumably because they expected the sound to match up with an important visual event; such expectations did not seem to exist for the younger 6-month-old infants. Matching of sound effects to content did not, however, enhance imitation performance above the video no sound effects groups.
Although performance of the video 6x matched SE and video 6x no SE groups did not differ, it is possible that matched sound effects may enhance processing of the target actions such that fewer televised demonstrations would be required in order for performance to exceed baseline performance. To untangle the effects of repetition and the addition of matched sound effects, we collected additional data halving the number of demonstrations in the video matched SE groups for 12- and 18-month-olds. There was no amelioration of the video deficit effect for either the 12- or the 18-month-olds as a function of the addition of the salient sound effects to the shorter video demonstration. That is, although the matched sound effects did not disrupt performance, they also did not enhance performance.
Overall, the findings show that imitation varied as a function of both age and whether the sound effects were matched or mismatched to the target actions. Our findings differ from those of Bahrick and Lickliter (2004) who found that 5-month-olds discriminated between matched and mismatched conditions. Findings from the present experiment are consistent with the view that transferring information from 2D to 3D contexts may emerge later than visual processing of 2D information (see also Elsner & Aschersleben, 2003; Hofer et al., 2007). Elsner and Aschersleben (2003) reported that 9-month-olds did not notice a change in action effect contingency but 12- and 15-month-olds did. In their experiment, an experimenter demonstrated an action accompanied by a salient sound effect, and then infants were given the opportunity to repeat the action. There were two variations of this condition. In one condition, the action effect was the same for the infant as for the experimenter. In a second condition, the action effect was changed. Performance was compared to a baseline group that did not see the demonstration. They found that 9-month-olds performed equally in the demonstration and baseline conditions. The 12-month-olds performed significantly better in the demonstration condition but performance did not differ depending on which action effect accompanied their behavior. The 15- and 18-month-olds, however, were more likely to reproduce the action if the action effect matched the action effect that occurred during the demonstration.
Taken together, findings of the present experiment and those of Elsner and Aschersleben (2003) suggest that there are age-related changes in how infants match actions and action effects and this in turn changes the likelihood that infants will reproduce the target actions. Furthermore, this age-related change may be indicative of different forms of observational learning mechanisms. Want and Harris (2002) outlined a number of different mechanisms. Specifically, they argued that mimicry (veridical copying of actions in the absence of understanding intention) emerges first. Both emulation (non-veridical goal-directed copying) and imitation (both goal-directed and veridical copying), emerge later. Unfortunately, it is not possible to directly conclude from the present findings using the puppet task whether there are mechanism differences. The observed age-related differences are, however, consistent with the hypothesis that older infants rely more on imitation or emulation and younger infants rely more on mimicking (Call, Carpenter & Tomasello, 2005; Elsner et al., 2007; Huang & Charman, 2005; Want & Harris, 2002).
There are an increasing number of products available for infants that include electronic sound effects, including books, music stations, play tables, wind-up toys and rattles; such sound effects have even been added to plush toys. Toy producers have increasingly appealed to parents to buy such toys to enhance infants' ability to understand cause and effect (Hirsh-Pasek et al., 2003; Wong et al., 2008). These sound effects are often disconnected from the site of action and, at times, are not completely matched to cause and effect relationships. There are few studies regarding whether such electronic effects enhance learning. Wong and Golinkoff (2008) have recently demonstrated that parent-child interaction changes as a function of whether toys have electronic components added to them or not. In their study, they compared parent-child interaction with a traditional shape-sorter toy that had no electronic sound effects or an electronic shape-sorter toys that had electronic sound effects. They found that parents referred to geometric shapes and colors significantly more with the traditional than with the electronic toy and referred to information unrelated to shapes or colors more with the electronic toy than with the traditional toy. They concluded that learning may be negatively influenced by this change in parent-child interaction. The present experiment was designed to examine whether the addition of electronic sound effects to a live demonstration influences a direct measure of infant learning.
One criticism of previous work comparing live and video conditions is that there are a number of different cues during a video and live demonstration. In particular, the timing of a live presentation is somewhat contingent upon the behavior of the infant and is unconsciously paced to the behavior of the infant. In the present experiment, the same sound effects used on the video presentation were added to the live demonstration. The experimenter was trained to present the target actions at precisely the same rate as the video presentation. We asked, “Can 6-, 12-, and 18-month-old infants imitate from a live demonstration if sound effects are matched to the target actions?” Due to additional cognitive load and decreased social interaction, we hypothesized that the addition of sound effects to the live demonstration would impair performance.
The final sample consisted of 36 infants (12 6-month-olds, 12 12-month-olds, and 12 18-month-olds; 24 boys, 12 girls) recruited from commercial mailing lists and by word of mouth, all of whom were assigned to the live matched SE group. The 6-month-olds had a mean age of 6 months, 18 days (SD = 10 days), the 12-month-olds had a mean age of 12 months, 20 days (SD = 8.3 days) and the 18-month-olds had a mean age of 18 months, 21 days (SD = 14.6 days). Participants were African-American (n = 1), Caucasian (n = 26), Latino (n = 2), or of Mixed race (n = 4), and one family did not report. The parents' mean educational attainment was 17.65 years (SD = 1.15), and the mean rank of socioeconomic status (Nakao & Treas, 1992) was 82.3 (SD = 16.3) reported by 94.4% and 86.1% of the sample, respectively. Testing was discontinued on additional infants due to refusal to touch the stimulus (n = 2), experimenter error (n = 2), equipment failure (n = 2), crying (n =1), or less than 50% looking time (n = 1). The age-matched baseline control groups from Experiment 1 and the live no sound effects (live no SE) group from Barr and colleagues (2007a) were used in a cross-experiment comparison.
The demographics of the Barr et al. (2007a) live no SE group were very similar to infants in the live matched SE group. The 6-month-olds had a mean age of 6 months, 18 days (SD = 8.2 days), the 12-month-olds had a mean age of 12 months, 12 days (SD = 10.4 days) and the 18-month-olds had a mean age of 18 months, 13 days (SD = 6.4 days). Participants were African American (n = 2), Asian (n = 4), Caucasian (n = 22), Latino (n = 5), Native American (n = 1), or of Mixed race (n = 1), and one family did not report. The parents' mean educational attainment was 17. 1 years (SD = 1.7), and the mean rank of socioeconomic status (Nakao & Treas, 1992) was 76.3 (SD = 17.7) reported by 86% and 91% of the sample, respectively.
The same puppets and counterbalancing procedures that were used in Experiment 1 were used in Experiment 2. For the live matched SE group, small speakers were inserted into the puppet and connected to a small mp3 player, which would then be used to play the sound effects during the demonstration.
The procedures were identical to Experiment 1 except that the demonstration session occurred in real time with a real person. The exact same sound effects were used.
The live matched SE group was shown the target actions three times in succession demonstrated by an experimenter in the home (but 6 times in succession for 6 month-olds, see Barr et al., 1996). During the live demonstration, the infant sat on the caregiver's knees. The experimenter knelt in front of the infant, placed the puppet on her right hand, and positioned the puppet at the infant's eye level and out of reach, approximately 80 cm from the infant's chest. The experimenter then removed the mitten from the puppet's right hand, shook it three times, and replaced it on the puppet's hand. At the same time, the experimenter pressed a button on the mp3 player so that the same sound effects that were added to the video demonstration were also played during the live demonstration. This sequence took approximately 10s and was repeated five more times for a total duration of approximately 60s (M = 66.43s, SE = 1.33) for the 6-month-olds, and two more times for a total duration of approximately 30s (M = 32.97s, SE = .76, M = 32.02s, SE = .79 for the 12- and 18-month-olds respectively). Allowing for time for the experimenter to say standard phrases and hello and goodbye, the entire demonstration duration was approximately 1 min 30s for the 6-month-olds (M = 90.18s, SE = 2.15) and 45s (M = 48.2s, SE = 1.50, M = 45.9s, SE = 1.01 for the 12- and 18-month-olds, respectively). The face of the experimenter was partially obscured by the puppet throughout the demonstration. The caregiver directed the infant's attention to the experimenter, but did not describe the target actions.
The test session was identical to Experiment 1.
Looking time was coded as before. Based on 33% of the sessions, an intraclass correlation on percent looking time yielded an interobserver reliability coefficient of .90.
Test sessions were coded as before. Based on 66% of the test sessions, interobserver reliability was 93.6% (Kappa = .78). When the two raters differed, the primary rater's score was assigned.
Percent looking time to the live matched SE demonstration was high at all ages (93.16%, SD = 4.43, 92.34%, SD = 3.12, and 92.20, SD = 6.52 for 6-, 12-, and 18-month-olds respectively) and comparable to live no SE experimental groups collected by Barr et al. (2007a; 94.21%, SD = 5.08, 94.36%, SD = 6.28, and 93.30, SD = 9.78 for 6-, 12-, and 18-month-olds respectively). A 3 (Age) × 2 (Experimental group, live matched SE, live no SE) ANOVA indicated that there was no significant main effect of age, F(1, 65) = 1.02, n.s., or group, F < 1, or interaction with age and experimental group, F < 1 on % looking time. Differences in looking time, therefore, could not account for any subsequent differences in group performance.
The critical question for the present experiment was whether infants could imitate actions above baseline following a live demonstration accompanied by electronic sound effects. First, did infants in the live matched SE groups at any age perform significantly above baseline? Second, did this differ from the performance of the live no SE groups?
Once again, we conducted one-way ANOVAs across groups at each age, and used post-hoc Dunnett's t tests to assess whether performance of individual groups differed from baseline controls. For the 6-month-olds, there was a significant main effect of group, F(2,39) = 3.97, p < .03, partial η2 = .17. Post-hoc Dunnett's t tests indicated that the live no SE group performed above baseline and the live matched SE group did not. For the 12-month-olds, there was a significant main effect of group, F(2, 39) = 4.70, p < .02, partial η2 = .19. Post-hoc Dunnett's t tests (p < .05) tests indicated that the live no SE group (M = .83, SE = .25) performed above baseline (M = .06, SE = .06) but the live matched SE (M = .67, SE = .25) did not. For the 18-month-olds, there was also a significant main effect of group, F(2, 39) = 8.69, p < .01, partial η2 = .31. Post-hoc Dunnett's t tests (p < .05) tests indicated that the live no SE group (M = 1.83, SE = .25) performed above baseline (M = .33, SE = .25) but the live matched SE (M = .75, SE = .35) did not. As hypothesized, no age group performed above baseline in the live matched SE group.
The present findings demonstrate that the addition of sound effects to the live demonstration interferes with imitation at all ages. It is possible that infants are detecting a lack of social contingency in the interaction that is disrupting performance. Researchers have suggested that learning from television may be impaired due to the lack of contingency in video presentations. During live interactions, social partners engage in contingent ongoing behaviors with one another. For example, when one partner asks a question, he/she pauses and waits until the other social partner responds. It has been argued that the lack of such social contingency is the critical factor missing from televised information. Consistent with that argument, research with older toddlers and preschoolers has demonstrated that the lack of contingency reduces levels of interactivity and comprehension of video material (Calvert, Strong, & Gallagher, 2005; Crawley, Anderson, Wilder, Williams, & Santomero, 1999; Flynn & Whiten, 2008; Nielsen, Simcock, & Jenkins, 2008; Troseth, 2003; Troseth, Saylor, & Archer, 2006).
Troseth and colleagues (2006) hypothesized that increasing social contingency would improve learning from television. Toddlers in a contingent condition interacted with an experimenter across a close-circuit television screen for five minutes. At the end of the interaction, the experimenter told children where they could find the hidden toy in the room next door and asked them to go and find it. Toddlers in the non-contingent control group watched pre-taped social interactions that were not contingent upon their behavior. The 2-year-olds who received contingent feedback were significantly more likely to find the hidden toy than were the toddlers who had seen a pretaped non-contingent interaction. Troseth et al. (2006) concluded that, during the second year of life, toddlers increasingly expect to obtain relevant information from a contingent social partner, and lack of contingency during the televised demonstration disrupts the transfer of information from television to real-life activities. Similarly, in the present experiment, the reduction of social contingency may have disrupted imitation performance even when there was no task requirement to transfer information from television to real-life activities.
The present study examined whether adding perceptually salient sound effects to videotaped or live demonstrations would influence imitation by 6- to 18-month-old infants. At all ages, performance was not disrupted by the addition of matched sound effects to a video demonstration, but was disrupted by the addition of matched sound effects to a live demonstration. There was an age-related change in the effect of the addition of mismatched sound effects to target actions during a video demonstration. Six-month-olds exhibited deferred imitation of the target actions regardless of whether the sound effects were matched or mismatched to the target actions. In contrast, 12- and 18-month-olds imitated target actions from television when the sound effects were matched to the target actions but performed at baseline when the target actions and sound effects were deliberately out of synchrony with the target actions. That is, mismatching sound effects that commonly convey informational content on television interfered with performance by 12- and 18-month-olds, presumably because they expected the sound to match up with an important visual event; such expectations did not seem to exist for the younger 6-month-old infants (Huston & Wright, 1983). In contrast, the results of Experiment 2 demonstrated that the addition of sound effects to a live demonstration interfered with the imitation performance at all ages from 6 to 18 months. This is one of the first studies to demonstrate a learning deficit from toys with electronic sound effects.
These findings are also consistent with studies directly examining action-perception coupling (e.g., Elsner & Aschersleben, 2003; Klein et al., 2006). In Klein and colleagues' (2006) second experiment, it was revealed that the addition of matching sound effects per se was not enhancing overall imitation performance from television. It is possible that because a sound effect was added to each step of the sequence in the present study that no particular action was highlighted and hence there was no overall enhancement (see Klein et al., 2006). In Experiment 2, sound effects were added to a live demonstration and imitation performance was disrupted across all ages. These findings may be related to infants' understanding of actions and their outcomes as a function of experience with electronic sound effects or to the diminished social interaction that occurred when sound effects were added to the demonstration.
Taken together, these findings are consistent with two lines of research, 1) action-perception coupling studies (e.g., Aschersleben, 2006) and 2) sampling model of attention (Huston & Wright, 1983). As proposed by the common coding theory, action perception and action production are based upon the same representation and dependent upon action effects. The development of action-effect understanding and observation is, however, gradual across time with action-effect understanding increasing at approximately 12-15 months of age (Elsner & Aschersleben, 2003) and the ability to encode information from demonstrations changing gradually between the first and second year of life (e.g., Elsner et al., 2007). The age-related changes in the present study are consistent with changes in action-perception understanding.
The findings are also consistent with the “sampling model of attention” which states that the specific exposure to sound effects that accompany television influences comprehension of televised material. That is, once infants gain an understanding of the way that certain actions are paired with certain sounds they get confused when this pairing does not occur. As proposed in the “sampling model of attention,” toddlers and preschoolers begin to decide when to view television and when to play with toys based on their knowledge of formal features (Huston & Wright, 1983). That is, attention can be divided between toy play and television viewing because children learn that formal features signal and mark specific media content. There is also a developmental component to this theory. In terms of television specific features, attention to television is initially directed by perceptually-driven processes, but with development and experience children come to learn that different perceptually salient features serve to mark content for further processing as well as provide visual and verbal modes that children can use to represent content (Anderson, Lorch, Field, & Saunders, 1981; Calvert et al., 1982; Huston & Wright, 1983). The present study suggests that, initially, infants attend to perceptually salient sound effects, but over a relatively brief 6 month interval mismatching sound effects disrupts typical information processing. There are a number of formal features, including zooms, cuts, music, and pacing, that are part of infant-directed programming that have received no empirical attention.
The present study has broader implications for both the social and cognitive functions of imitation (Uzgiris, 1981). Recent perspectives on imitation suggest that early copying behavior is driven by the imitator's social awareness and motivation to be like the model (Learmonth, Lamberth, & Rovee-Collier, 2005; Nielsen, 2006; Nielsen et al., 2008; Meltzoff, 2007) as well as their understanding of the models' goals and intentions (Carpenter, Call, & Tomasello, 2002; 2005). Further evidence showing that imitation requires social insight comes from studies showing an imitation deficit in toddlers with autism, who lack social awareness and theory of mind skills (Rogers, Hepburn, Stackhouse, & Whener, 2003; Nadel, 2004). The present study adds to this growing body of data suggesting that even slight constraints on contingency during a live demonstration session imposed by timing target actions exactly to those of the video presentation disrupted imitation performance. The timing of the target actions during the live no sound effects demonstration was more sensitive with regards to the infants' responses. This finding provides additional insight into the social function and necessary constraints involved in imitation.
From a practical perspective, learning at all ages was disrupted by the addition of quite simple sound effects to a live demonstration. This is one of the few empirical studies to show a learning deficit in a live face-to-face interaction involving 3D objects and as such requires further empirical investigation. This demonstration of a deficit in learning implies that learning from electronically enhanced toys may counter-intuitively disrupt learning. Hirsh-Pasek, Golinkoff and colleagues report that, although parents were likely to endorse electronic toys, they and other academics argue that increasing play and decreasing electronic toys may be more conducive to healthy social, linguistic and cognitive development (for review see Hirsh-Pasek & Golinkoff, 2008; Hirsh-Pasek et al., 2003). If electronic toy makers want to produce toys that enhance learning, they may need to pay specific attention to the design of toys and examine whether or not additional electronic components enhance or disrupt learning.
From a cognitive perspective, as an index of learning, imitation performance was inhibited by adding mismatching auditory cues to the video demonstration and adding auditory cues to live demonstrations. Imitation indexes the transfer of information across different contexts and provides important information about the flexibility of the memory system as a function of the organism's history (Barnat, Klein, & Meltzoff, 1996; Jones & Herbert, 2006; Hayne, Barr & Herbert, 2003; Hayne, Boniface & Barr, 2000; Hayne, MacDonald & Barr, 1997; Klein & Meltzoff, 1999; Learmonth, Lamberth & Rovee-Collier, 2004). Furthermore, it suggests that examining imitation from television and electronic devices may not only inform us about the potential for learning from television and toys during infancy but may also provide some insight into the processes governing imitation itself (see also Flynn & Whiten, 2008).
Overall, this study demonstrates that infants' processing of media content changes gradually between 6- and 18-months of age. During this time, most infants from this population will accumulate approximately 350-700 hours of television exposure (Rideout & Hamel, 2006; Zimmerman et al., 2007) and likely a similar number of hours interacting with sound effect-enhanced electronic toys. Individual differences in exposure to both television and electronic devices are also likely to contribute to learning from such devices (see Strouse & Troseth, 2008). The ability to comprehend the content stems from the ability to process the formal features such as sound effects that accompany and highlight critical target information, as well as the ability to integrate auditory and visual information together. Patterns of imitation shown in the present study demonstrate that imitation is constrained by both social processes and contingencies of interactions and memory processes which limit transfer of information.
A very special thank you to all the families who made this research possible and to Katherine Salerno and Lauren Shuck for their help in data collection. Support for this research was provided by NSF Grants to Sandra Calvert (#0126014) and Department of Education Ready to Learn Initiative grant (#9300-71000) to Deborah Linebarger and to Rachel Barr NIH grant # HD043047.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.