|Home | About | Journals | Submit | Contact Us | Français|
We report on an automatic technique for quantifying two types of repetitive speech: repetitions of what the child says him/herself (self-repeats) and of what is uttered by an interlocutor (echolalia). We apply this technique to a sample of 111 children between the ages of four and eight: 42 typically developing children (TD), 19 children with specific language impairment (SLI), 25 children with autism spectrum disorders (ASD) plus language impairment (ALI), and 25 children with ASD with normal, non-impaired language (ALN). The results indicate robust differences in echolalia between the TD and ASD groups as a whole (ALN + ALI), and between TD and ALN children. There were no significant differences between ALI and SLI children for echolalia or self-repetitions. The results confirm previous findings that children with ASD repeat the language of others more than other populations of children. On the other hand, self-repetition does not appear to be significantly more frequent in ASD, nor does it matter whether the child’s echolalia occurred within one (immediate) or two turns (near-immediate) of the adult’s original utterance. Furthermore, non-significant differences between ALN and SLI, between TD and SLI, and between ALI and TD are suggestive that echolalia may not be specific to ALN or to ASD in general. One important innovation of this work is an objective fully automatic technique for assessing the amount of repetition in a transcript of a child’s utterances.
Autism spectrum disorders (ASD) are characterized by deficits in social interaction and social communication, as well as by the presence of repetitive behaviors and restricted interests. Within the domain of repetitive behaviors, a prominent feature of the symptom pattern in ASD is repetitive speech [Tager-Flusberg, Paul, & Lord, 2005; Tager-Flusberg et al., 2009], which may take the form of a child repeating what he/she said previously (self-repeats) or repeating what another person said previously (echolalia). Definitions of self-repetitive speech have varied widely in previous studies, including repetition of phonemes, words, phrases, topics, and of conversational devices [e.g. “That’s a wrap”; Murphy & Abbeduto, 2007]. Definitions of echolalia have also been wide-ranging, including repetitions that occur immediately or after some unspecified lapse of time [delayed; Prizant, 1983b; Simon, 1975], and those that are exact or approximate (mitigated; allowing pronoun reversals, i.e. “Where do you sleep?”/“I sleep,” grammatical conversions and syntactical supplements, i.e. “Is it a boy or a girl?”/“Huh? Boy or girl”) [examples from Fay & Butler, 1968). Children with ASD have been found to display more echolalic than repetitive speech [Sudhalter, Cohen, Silverman, & Wolf-Schein, 1990], and the presence of echolalia is included in current diagnostic criteria [American Psychiatric Association [APA], 2000, 2011) and standardized diagnostic assessments [e.g. Lord et al., 1997; Lord et al., 2000].
Although repetitive speech is commonly noted as part of the language phenotype for some children with ASD [Kanner, 1946; Prizant, 1983a; Troyb, Knoch, & Barton, 2011], there are no precise, specifically quantitative, assessment methods for these behaviors in ASD. This is not surprising given the lack of precise and consistent definitions across studies [see discussions in Murphy & Abbeduto, 2007; Prizant, 1983a; Schuler, 1979]. For example, several researchers defined immediate echolalia as consisting of “at least segmental similarities (one word or more) to the utterance of the previous speaker, involving rigid echoing of the model utterance (pure echolalia) or selective repetition of elements occurring with two utterances of the original sentence” [McEvoy, Loveland, & Landry, 1988, p. 662; following Prizant & Duchan, 1981]. The latter category, mitigated echolalia, has been explicitly measured in some studies [e.g. Paccia & Curcio, 1982; Roberts, 1989], whereas others provided little detail regarding how instances of echolalia were defined or measured [e.g. Paul et al., 1987]. Research on self-repetitive speech has also been inconsistent in terms of defining what constitutes a repetition [see Murphy & Abbeduto, 2007]. Research has also varied in terms of repetition length. For example, the majority of previous studies have broadly considered repetitions of “words or phrases,” which may have led to inconsistent measures across studies. Exacerbating these issues is the fact that previous research has relied solely on human perceptual judgment to identify repetitive speech behaviors, which is likely to be time-intensive and unreliable across long-duration recordings.
Further clouding the status of repetitive speech in ASD is the fact that the majority of previous research in this area was conducted when diagnostic criteria reflected more qualitatively severe forms of autism [Fombonne, Quirke, & Hagen, 2011] and included impairments in language [Ritvo & Freeman, 1977; Rutter, 1968]. For example, using earlier diagnostic criteria, Rutter, Greenfield, and Lockyer  reported that approximately 75% of verbal children with ASD engage in echolalia at some point in development. Previous research suggests that echolalia may be relatively more common in ASD than in other clinical populations [e.g. Fragile X; Paul et al., 1987]. However, repetitive speech behaviors have also been observed in individuals with Fragile X [Murphy & Abbeduto, 2007], Prader–Willi syndrome [Sarimski, 1997], Alzheimer’s disease [Cullen et al., 2005], blind children, children with language impairments, and even in typically developing (TD) children [Yule & Rutter, 1987]. These findings raise the question of what, if any, features of repetitive speech are specific to ASD. Additionally, because of the changing status of language impairments in the diagnostic criteria for ASD, it is unclear whether and to what extent impairments in language impact repetitive speech.
The lack of clear definitions of repetitive speech behaviors, the challenging nature of objectively, accurately, and quantitatively measuring them, and the usage of diagnostic criteria that have changed over the last decades may have contributed to the current lack of knowledge about the prevalence and range of repetitive speech behaviors observed in ASD, the specificity of these behaviors to ASD, as well as the specific features of repetitive speech that are most clinically informative. Recent changes to diagnostic criteria, improvements in standardized methods for diagnosis, and advances in computational analysis call for objective, quantitative approaches to study repetitive speech in ASD.
We describe and apply novel automated quantitative techniques for measuring self-repeats and echolalia that count occurrences of repetitions using word sequence matching algorithms, using as input verbatim unannotated transcripts of the Autism Diagnostic Observation Schedule [ADOS-G; Lord et al., 2000] sessions. While these transcripts are generated by human transcribers, the techniques are fully automatic. Consistent with previous research, we operationally define echolalia as repetitions occurring immediately following the adult’s turn (immediate) and those occurring within two turns of the adult (near-immediate, see Table 1 for examples). We also distinguished between self-repetitions occurring within one turn and those occurring between child turns (i.e. with an intervening turn by the examiner). We then used these measures to compare children with ASD (with and without language impairment: ALI and ALN) with those with language impairment but without autism (specific language impairment, SLI) and children who are typically developing. These between-group analyses are conducted in order to characterize the range of repetitive speech behaviors exhibited by children with ASD, as well as to examine the role of language impairment in repetitive speech phenomena. For children with ASD, we also explore whether repetitive speech measures are associated with measures of cognitive and language abilities and autistic symptomatology.
In total, 111 children (ages 4–8) from the Portland, OR, metro area participated in this study: 42 TD children (30 males), 19 children with SLI (12 males), and 50 children with ASD (45 males). The children with ASD were divided into two groups: those with ASD and without language impairment (ALN; N = 25; 22 males), and those with ASD and language impairment (ALI; N = 25; 23 males). ASD participants were recruited through local healthcare specialists, education service districts, autism clinics, parent groups, and non-profit autism organizations. SLI participants were recruited though local speech clinics, speech language pathologists, and the Oregon Speech and Hearing Association. In addition, advertisements were placed in local newspapers and at “community tables” in local elementary schools.
All children had full-scale IQ scores of 70 or higher as measured by the Wechsler Preschool and Primary Scale of Intelligence [WPPSI-III; Wechsler, 2002] at ages 4.0–6.11, or the Wechsler Intelligence Scale for Children [WISC-IV; Wechsler, 2003] for children age 7 and older. Participants were excluded if they had any of the following: (a) identified metabolic, neurological, or genetic disorder; (b) gross sensory or motor impairment; (c) brain lesion; (d) orofacial abnormality (e.g. cleft palate); or (e) mental retardation. All participants spoke English as their native language and produced a mean length of utterance (measured in morphemes per utterance) of at least three. During an initial screening procedure, a certified speech language pathologist confirmed the absence of speech intelligibility impairments.
Best estimate clinical (BEC) judgment by experienced clinicians is considered to be the “gold standard” for diagnosis of ASD [Klin, Lang, Cicchetti, & Volkmar, 2000; Spitzer & Siegel, 1990]. In this study, clinicians (two clinical psychologists, a speech language pathologist, and an occupational therapist, all with specific expertise with ASD) used the DSM-IV-TR criteria [American Psychiatric Association, 2002] where they base their judgments. Only children for whom a consensus BEC diagnosis of ASD was reached were included in this study (either ALI or ALN). In addition, BEC diagnosis was confirmed by above-threshold scores for ASD on both the ADOS-G [Gotham, Risi, Pickles, & Lord, 2007] and the Social Communication Questionnaire [SCQ; Rutter, Bailey, & Lord, 2003] according to the recommended cutoff score of 12 for research purposes [Lee, David, Rusyniak, Landa, & Newschaffer, 2007]. The majority of children met ADOS-G criteria for autism (ALN = 19; ALI = 23); a total of eight children met ASD but not autism criteria (ALN = 6; ALI = 2).
Diagnosis of language impairment (ALI and SLI groups) was made if the child received a Core Language Score lower than one standard deviation below the mean (standard score < 85) on the Clinical Evaluation of Language Fundamentals (CELF), a composite summary of receptive and expressive language abilities. For children younger than 6 years, the CELF Preschool-2 [Semel, Wiig, & Secord, 2004] was administered; the CELF 4 [Semel, Wiig, & Secord, 2003] was used for children age 6 and older. Twenty-five children with ASD (50%) were language impaired using this criterion. Inclusion in the SLI group further required the following: (a) documented history of language delay and/or deficits, and (b) BEC consensus judgment of language impairment in the absence of ASD based on all available evidence: medical and family history, our assessments, earlier locally based assessments, and school information. Nevertheless, several children with BEC diagnosis of SLI received above threshold scores for ASD on either the ADOS-G (n = 4) or the SCQ (n = 8). Previous research has documented that children with BEC diagnoses of SLI often score above ASD threshold on at least one diagnostic assessment [Bishop & Norbury, 2002], and one study found that 41% of children with BEC diagnoses of SLI exceeded cutoffs for ASD on diagnostic measures [Leyfer, Tager-Flusberg, Dowd, Tomblin, & Folstein, 2008]. It is generally not recommended to use either measure in isolation to determine diagnostic status; in the case of elevated scores on either measure, BEC judgment is considered to be the most accurate [Jones & Lord, 2012]. Thus, we included children with a BEC diagnosis of SLI if they exceeded cutoffs on only one diagnostic measure (ADOS-G or SCQ, but not both) in order to maintain a representative sample of the SLI population.
Children in the TD group were excluded if they had a family member diagnosed with ASD or SLI, a history of psychiatric disturbance (e.g. attention deficit hyperactivity disorder, anxiety disorder), or if the child scored above ASD threshold on either the ADOS-G or the SCQ.
Core autistic symptomatology was evaluated using the ADOS-G and the SCQ. The CELF (Preschool-2 or 4) was administered as an omnibus measure of language ability. IQ was measured using the WPPSI-III or the WISC-IV, depending on the child’s age. The Peabody Picture Vocabulary Test – 3rd Edition [PPVT-III; Dunn & Dunn, 1997] was administered to all participants as a measure of receptive vocabulary based on established norms for children 2–6 through adulthood. Semantic verbal fluency was measured using subtests from the NEPSY [Korkman, Kirk, & Kemp, 1998], in which children were asked to name as many animals as possible within 60 sec; the task was repeated with foods. The Nonword Repetition Task was administered and scored according to Dollaghan and Campbell .
All procedures were approved by the Oregon Health & Science University’s institutional review board. Participating families were fully informed about the study procedures and were provided written consent. Participants completed a battery of experimental tasks and cognitive, language, and neuropsychological assessments over approximately six sessions (2–3 hr each). ADOS sessions were recorded, and child and examiner speech were later transcribed; transcribers were unaware of the study hypotheses, as well as the child’s diagnostic status, cognitive level, and language ability. The automated analysis methods were applied to these unannotated, “raw,” transcripts. In our experience, such transcripts can be obtained relatively quickly and do not require special expertise or training. In a separate annotation procedure, turns containing mazes (false starts and reformulations) were flagged and excluded from analysis.
The measure we are after is the proportion of a child’s verbal output that can be attributed to repetition of what was said previously, either by the examiner, or by the child himself/herself. Consider the following dialogue fragment (here, a turn is defined as an interval of speech by one speaker uninterrupted by the other speaker):
In his first turn (Turn 2), the child repeats verbatim what the examiner said in Turn 1, then repeats it again in his second turn (Turn 4), along with some other material. In his third turn (Turn 6), he repeats a portion (“sitting on”) again. In this dialogue fragment, the child uttered 25 words, counting “they’re” as two words. In order to determine what proportion of those words could be attributed to repetition, we need to consider four issues.
First, what word sequence length, n, should be considered? To illustrate, for n = 3, in Turn 2, the child repeated from the examiner (Turn 1) the following word triples: “all the cameras,” “the cameras are,” “cameras are sitting,” “are sitting on,” “sitting on a,” “on a tripod.” In the above dialogue fragment, we italicized all (15) words that occur in at least one of these repeated word triples. Letting r(n) denote the number of the thus italicized words (15 in the example), and k the total number of words (25 in the example), we define the overall repetition ratio, or ORR(n), as r(n)/k, or 0.6. This repetition ratio can be computed similarly for an entire dialogue. Throughout the current study, n = 2.
Second, it is not enough to specify that a word occurs in a repeated word sequence: We need to keep track of the origin of the sequence. Toward this end, we introduce the concept of a chain. In the dialogue above, the examiner says a sentence, and the child repeats portions of that sentence in subsequent turns. Consider again a three-word sequence, “cameras are sitting.” The child’s use of this sequence in Turn 4 could be traced back to the child’s use in the same sequence in Turn 2, and ultimately to the examiner’s use in Turn 1. We could represent this history as a chain consisting of a sequence of nodes labeled with word sequences (e.g. “all the cameras”), turn identifiers (e.g. Turn 4), positions (starting word, ending word positions) in the turn, and talker identifier (e.g. Child), with pointers back to previous nodes in the chain (notated as arrows here):
To clarify the notation: the first tuple [“cameras are sitting”, 4, (2, 4), CHILD] describes a case where the CHILD uttered the word sequence “cameras are sitting,” in turn number 4, spanning words 2–4 of that turn. This can be considered as a repetition of what appears in the second tuple, wherein the CHILD uttered the same sequence of words in Turn 2, spanning words 3–5 of that turn. And this in turn is a repetition of what the EXAMINER said in Turn 1, in words 3–5 of that turn.
In subsequent discussion, we will refer to the most recent (boldfaced above) element in the chain, the one containing the repetition of interest, as the head of the chain. We also make the important stipulation that a chain for a given word sequence always starts at the first occurrence (the origin, italicized above) of that word sequence in the dialogue. Thus, it is assumed that there is no turn such as the following:
If this turn were present, the EXAMINER repeated the child, and the above chain would exemplify a self-repeat instead of echolalia because Turn 0 would contain the origin.
Third, once we have collected all chains for a dialogue based on repetitions of word sequences of length n, we need to define what type of chain we want to analyze. To illustrate, another chain involving the triple “cameras are sitting” might be the following:
Thus, for measuring echolalia, we may wish to exclude cases where the child repeats something the examiner says but where the examiner turn is itself a repetition of what the child had said on a previous turn. Furthermore, for measuring immediate echolalia, one may wish to insist that the child’s repetition of the examiner’s phrase be in the immediately following turn. Definitions and examples of the types of repetitive speech included in the following analyses are provided in Table 1.
A fourth issue is whether we only consider exact matches or also approximate (“mitigated”) matches, and in the latter case, what types of approximation would be allowed. For example, independent of repetition, common errors among children on the autism spectrum include pronoun substitution where “you” may be substituted with “I” or vice versa. Also, children with language impairments may omit obligatory elements, such as articles or forms of the verb “be.” Thus, “man on table” could count as a repetition of “the man is on the table.” The repetition detection algorithm is general and can be optionally set to allow for a limited set of substitutions, shown below, as well as for deletions of the words indicated below: Allowable substitutions for the pronoun “you” included “I,” “me,” “we,” “us” (and vice versa); substitutions of “him”/“her” and “he”/“she” were also allowed. Allowable deletions included “the,” “a,” “an,” “is,” “are,” “am,” “ ’m,” “ ’s,” “is,” and “ ’re.” Thus, two word sequences can be compared using an approximate string-matching algorithm, where any of these substitutions are allowed and at most one consecutive deletion. Given that “you”/”I” substitutions may be a feature of ASD [e.g. Hobson, Garcia-Perez, & Lee, 2010], we also performed analyses that specifically accounted for these occurrences.
We recapitulate the algorithm for computing repetition ratios. For n = 2, one computes chains for the entire dialogue, optionally allowing for a selected type of approximate matching. One then computes the proportion of words uttered by the child in the entire dialogue that occur in the word sequences comprising the heads of all chains of the type of interest.
Between-group differences were assessed using planned contrasts between pairs of groups using t-tests. Although these pairs were already close on specific pairwise matching measures (e.g. VIQ for the ALI–SLI pair), an iterative “greedy” selection algorithm was applied that removes children from the to-be-compared groups until all t-tests on these measures are non-significant (P > 0.2, two-tailed) while keeping group sizes as large and equal as possible [van Santen, Prud’hommeaux, Black, & Mitchell, 2010]. Table 2 represents descriptive statistics prior to this procedure, and Table 3 shows these same statistics after the procedure, for four specific contrasts: ASD vs. TD (matched on CA, NVIQ); ALN vs. TD (matched on CA, NVIQ, VIQ); ALI vs. SLI (matched on CA, NVIQ, VIQ); and ALI vs. ALN (matched on CA, SCQ, ADOS).
Significance thresholds and estimates of effect size (Cohen’s d for t-tests and η2 for two-sample comparisons based on analysis of covariance [ANCOVA]) are noted for each pairwise contrast. Corollary ANCOVA analyses are also reported in text with the pairwise-matching measures entered as covariates. However, we note that the ANCOVA results should be interpreted with caution given the assumptions of ANCOVA regarding homogeneity of regression slopes, which were not confirmed in preliminary analyses. For this reason, we report results in detailed tabular form only for the t-tests. Finally, for children with ASD (ALN and ALI), we examined correlations between the ORR, and measures of language ability and autism severity.
For each type of repetitive speech measure (total self-repeats, total echolalia), a Bonferroni correction was used to control the Type I error rate at 0.05 across the set of four pairwise contrasts (thus requiring a nominal P-value of 0.0125). A second Bonferroni adjustment was applied to the set of eight pairwise comparisons within each type of repetitive speech measure (e.g. intra-turn vs. between-turn self-repeats; immediate vs. near-immediate echolalia), thus maintaining the Type I error rate for each repetitive speech measure at 0.05 across all tests and contrasts (thus requiring a nominal P-value of 0.00625).
Results of the matched-pairs analysis are presented in Table 4. These results are based on analyses in which turns containing mazes were included (thus no manual coding of mazes was required) and approximate matches were allowed.
Table 4 shows that the relative frequencies of repetitive language behaviors of the types considered were in the 0.01–0.06 range. In each of the four subgroups, self-repeats were far more common (by a factor of two) than echolalia (one-sample t-tests—not presented in the table—showed that P < 0.0001 and d > 1.3 for each of the four groups). Similar results were obtained for immediate vs. near-immediate echolalia (P < 0.00015 and d > 1.66 for all groups). No such differences were found for intra-turn vs. between-turn self-repeats for the ALN, ALI, and SLI groups (P > 0.25 and −0.34 < d < 0.03 for all groups), yet a difference was found for the TD group (P < 0.005, d = 0.72).
There were no significant differences between the ASD and TD groups for self-repeats (d = 0.09; η2 = 0.005), although there was a marginally significant and small-to-medium size difference for between-turn self-repeats (P < 0.1, d = 0.39, η2 = 0.05); a similar difference was found for the ALN–TD contrast (P < 0.15, d = 0.44, η2 = 0.042). None of the other pairwise contrasts approximated significance, with commensurably small values of d.
Children in the ASD group did have significantly higher rates of echolalia than TD children (P < 0.01; d = 0.86; η2 = 0.13). These differences were also significant (P < 0.006) both for immediate (d = 0.85; η2 = 0.13) and near-immediate echolalia (d = 0.61; η2 = 0.07). Contrasts between ALN and TD were essentially similar to those for the ASD vs. TD comparison, with even larger effect sizes for total echolalia (P < 0.01; d = 0.94; η2 = 0.17) as well as for immediate (P < 0.006; d = 0.93; η2 = 0.17) but not near-immediate echolalia after the Bonferroni adjustment (P = 0.05; d = 0.60; η2 = 0.07). No significant group differences in echolalia were found between ALI and SLI. Effect size values for this contrast were small, suggesting that these null results are not due to lack of statistical power. Likewise, no significant group differences were found between ALI and ALN.
Certain contrasts could not be measured using the matched-pairs procedure because of large differences between groups on matching measures, in which case only the ANCOVA could be applied; however, exactly because of these large differences, ANCOVA results should be treated with caution. Nevertheless, some of the trends are of interest. First, a comparison between the ALI and TD groups using ANCOVA with CA and NVIQ as covariates showed t(64) = 3.92, P < 0.001 for echolalia, and t(62) = 2.65, P = 0.01 with gender and total word count as additional covariates. Second, a similar pattern emerged comparing the SLI and TD groups, using the same analyses, with for echolalia, t(58) = 2.39, P = 0.02, and with gender and total word count as additional covariates, t(56) = 1.92, P = 0.06. Third, the same analyses did not show any differences between the ALN and SLI groups.
The ALN and ALI groups are relatively comparable on measures of autism symptomatology (ADOS and SCQ), but differ significantly on language measures (as would be expected based on group criteria; see Table 3). This makes it relevant and meaningful to explore whether our features are correlated with specific language measures. The correlations between the self-repeat and echolalia measures were quite small, with |r| < 0.06 (P > 0.70) in all cases, while the correlations between self-repeat (i.e. between the intra-turn and between-turn repetition ratios) and echolalia (i.e. between the immediate and near-immediate repetition ratios) measures were significant but modest (r > 0.34 in both cases, P < 0.05). Results of correlation analyses are summarized in Table 5. Only age and receptive language (as measured by the PPVT-III) correlated significantly with repetitive speech features; however, these associations did not maintain significance after a Bonferroni correction for multiple comparisons (cutoff is 0.47).
Several additional analyses were conducted to address the potential confounding effects of certain variables, as well as to ensure that results did not depend on optional details of the algorithms, such as the rules for approximate matching. Briefly, the above results were replicated in the following conditions: (a) only male participants were analyzed; (b) turns containing mazes were excluded; (c) total word count and gender were included as covariates in the ANCOVA; (d) average turn lengths of the examiner and of the child were included as covariates in the ANCOVA; (e) any other matching rules were used: only verbatim repeats or verbatim repeats of personal pronouns (“you”/“I”) were counted or specifically ignored; (f) the ORR was measured, as in our main analysis, via r(n)/k, via log((r(n)+1)/(k+1)), or via Anscombe’s transform , asin((r(n)+3/8)/(k-3/4)).
Repetitive speech is an often described yet understudied characteristic of the language phenotype in ASD. The primary contribution of the current study is a measurement tool for quantifying repetitive speech behaviors. The previous lack of tools for measuring these behaviors may have precluded investigations into the prevalence and underlying mechanisms of repetitive speech behaviors in ASD and other disorders. The transcript-based automated technique described here will potentially facilitate larger scale investigations into self-repetitive and echolalic speech because the only human labor required is verbatim transcription. Using this automated technique, we highlight several novel findings regarding the nature of repetitive speech in ASD and language impairment.
In terms of overall frequency of occurrence, self-repeats were far more common than echolalia for children in all four groups, in contrast to previous findings [Sudhalter et al., 1990]. Notably, this was found even when “mazing” was excluded from analysis, which Shriberg et al.  suggested may have driven repetitions in their analysis on speakers with ASD. Thus, self-repetitions in the current study are unlikely to reflect formulation difficulties, which could have inflated the rates in children in the language-impaired groups [see Dollaghan & Campbell, 1998]. For echolalic speech, rates for immediate echolalia tended to be approximately double the rates of near-immediate echolalia, while, except for the TD group, no comparable difference was found for intra-turn vs. between-turn self-repeat rates.
The between-groups comparisons revealed that children with ASD reliably displayed more echolalia than TD children. Analyses indicated that rates of echolalia reliably differentiated ASD, ALN, and ALI from TD. The two ASD subgroups had nearly identical rates of echolalia despite significant differences in overall language ability. Regarding the SLI group, we cautiously conclude that it was in-between the ASD groups and the TD group, and somewhat closer to the former than to the latter: it did not differ significantly from the ALI group while a tentative ANCOVA analysis revealed it to display more echolalia than the TD group.
Overall, this pattern of results supports the view that while echolalia may be a salient characteristic of spoken language in ASD, it may not be unique to children on the spectrum and may be partially shared by children with SLI [Tager-Flusberg et al., 2005]. The absence of correlations within the ASD group between echolalia rate and measures of ASD severity (ADOS and SCQ) may further cast doubt on the specificity of echolalia to autism.
The lack of any effects of language ability (as measured by the CELF Core Language Score) on echolalia within the ASD group is surprising, given that the SLI group exhibited more echolalia than the TD group and given that great care was taken to exclude children from the SLI group who exhibited autism symptoms.
No systematic group differences were found on measures of self-repetitive speech. This may be due to the fact that our analyses were limited to repetitions at most across two consecutive child conversational turns. Self-repetitions may need to be measured across longer segments in order to be meaningful, as may be expected when a child refers to a specific topic repeatedly throughout a conversation [e.g. Roberts et al., 2007]. The study’s sample size did not allow for further exploration of the strong difference between intra-turn and between-turn self-repeats in the TD group but not in the other groups.
Turning now to the correlational analyses, the overall trend is for rates of self-repeats and echolalia to be uncorrelated with IQ measures, language measures, autism symptom measures, and chronological age. The few correlations that reached significance (age, PPVT) did not remain so after a Bonferroni correction for multiple comparisons.
We note that the new methods differ from conventional methodology in ways additional to their automaticity: they do not have the fine granularity of, for example, the functional distinctions proposed by Prizant and Duchan  for echolalia. There are several reasons for this. First, automating such distinctions is challenging. Second, given the already low incidence of repetitive speech in high-functioning, verbal ASD, such fine granularity would face even lower probabilities of being observed in reasonable-sized natural language samples. Third, previous studies have questioned the validity of defining functional categories for echolalia because they have not been empirically supported, even when reliably coded [McEvoy et al., 1988]. Fourth, one of the ultimate criteria for the success of the methods is their ability to classify the children correctly. Our working hypothesis is that the extremely high reliability of our straightforwardly computable quantities (only limited by transcriber inaccuracy), together with the substantial size of the speech samples, and the coarse granularity of our measures, will provide superior reliability compared with any fine-grained human coding scheme applied to language samples that, for this coding to be feasible in practice, are substantially smaller. This superior reliability may go a long way compensating for the purported superior criterion validity of the concepts underlying these finer grained human codes.
These results suggest several areas for follow-up research. Future research integrating analysis of repetitive speech with repetitive intonation patterns may reveal further, and perhaps more clinically significant, group differences. Also, automated methods that capture communicative functions of echolalia, if such methods can be developed, may reveal such differences. Finally, while there is a trend for immediate echolalia to provide better group separation than near-immediate echolalia, this trend is not powerful and needs further exploration.
Limitations of the study include the following. First, our participant age range (4–8 years) encompasses children at significantly different stages of language development. Thus, our analyses may have obscured important group and individual differences in repetitive speech behaviors due to the wide age range of the participants. Second, participants have IQ scores in the normal range, and therefore are not representative of the ASD population at large. Third, we did not address intonational patterns of echolalic or self-repetitive speech, which is often noted as a defining characteristic of echolalia in particular. Fourth, we did not attempt to define the functions of repetitive speech in this study. Given that our participants were high-functioning based on the study inclusion criteria, it is likely that many instances of echolalia and self-repetitive speech did have a meaningful communicative function within the context of the ongoing conversation. This is in contrast to research on vocal stereotypy in ASD, defined as non-functional and not contextually appropriate repetitions [e.g. Ahearn, Clark, & MacDonald, 2007; Grossi, Marcone, Cinquegrana, & Gallucci, 2012; MacDonald et al., 2007].
We have presented a simple computational technique for computing the amount of repetition in a child’s turns. All one needs is a transcript of the examiner’s and child’s turns, and the system will compute a measure of the amount of repetition performed by the child. We believe that it is an important result that an automatic technique of this kind can produce robust results, such as those that we have presented here. Because the technique is automatic, it removes the judgment that is inevitably required if a human annotator attempts the task. It also enables larger scale studies on repetitive language because manual labor is confined to the relatively easy task of transcription, and because automatic speech recognition methods are on the verge of achieving adequate accuracy levels. Perhaps most important, while not exploited in the current paper, this methodology enables exploration of new features, such as temporal distributions of, or contextual effects on, repetitive speech behaviors.
Grant sponsor: National Institute on Deafness and Other Communication Disorders; Grant numbers: R01DC012033, R01DC007129-01.
Grant sponsor: Autism Speaks; Grant number: Innovative Technology for Autism Grant 2407.
The work reported here was supported in part by R01DC012033 and R01DC007129-01 from the National Institute on Deafness and Other Communication Disorders, and by Innovative Technology for Autism Grant 2407 from Autism Speaks. We thank and remember Lois M. Black, without whose pioneering work and without whose leadership of the data collection effort this project could not have been initiated. We also would like to thank the colleagues involved in the original recruitment, diagnosis, and neurocognitive assessment for this study: Beth Hoover Langhorst, Robbyn Sanger, and Jessica Sharlow. Finally, we would like to thank Mabel Rice for helpful discussion on criteria for classifying our language-impaired subjects.