Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Cognition. Author manuscript; available in PMC 2010 November 1.
Published in final edited form as:
PMCID: PMC2763958

Learning in reverse: 8-month-old infants track backward transitional probabilities


Numerous recent studies suggest that human learners, including both infants and adults, readily track sequential statistics computed between adjacent elements. One such statistic, transitional probability, is typically calculated as the likelihood that one element predicts another. However, little is known about whether listeners are sensitive to the directionality of this computation. To address this issue, we tested 8-month-old infants in a word segmentation task, using fluent speech drawn from an unfamiliar natural language. Critically, test items were distinguished solely by their backward transitional probabilities. The results provide the first evidence that infants track backward statistics in fluent speech.

Language comprehension relies on the identification of basic lexical units. Preverbal infants quickly begin to recognize word-like sequences presented in fluent speech (Jusczyk & Aslin, 1995), suggesting that there must be some set of reliable cues marking word boundaries in the speech signal. Indeed, infants are sensitive to numerous acoustic and phonological properties correlated with word boundaries (for a recent review, see Saffran, Werker, & Werner, 2006).

In particular, infants’ sensitivity to transitional probability (TP), the probability of event Y given event X, has received a great deal of attention (e.g., Aslin, Saffran, & Newport, 1998; Saffran, Aslin, & Newport, 1996; Saffran, Johnson, Newport, & Aslin, 1999). TP is typically calculated according to Equation 1:

TP=P(Y[mid ]X)=frequency(XY)frequency(X)

On this construal, the frequency of the first element in the pair, X, is normalized as a function of its overall frequency in the corpus. TP is thus a measure of the strength with which X predicts Y. Analyses of infant-directed speech corpora suggest that TP cues could, in principle, help infants find word boundaries, at least when acting in concert with other cues (Swingley, 2005). Furthermore, infants track TPs when exposed to a synthetic speech stream drawn from a miniature artificial language, in which TP is the only available cue to word boundaries (Aslin et al., 1998; Graf Estes, Evans, Alibali, & Saffran, 2007). Infants can also track TPs in natural speech drawn from an unfamiliar language, Italian (Pelucchi, Hay, & Saffran, 2009).

Despite this emerging body of work, remarkably little is known about the computational underpinnings of these findings. In particular, it is not clear whether infants compute TPs solely in a forward direction. In the original study by Aslin et al. (1998), the authors noted that while they were focused on forward TPs, there were a number of other pair-wise statistics that could be equally informative, in principle, because they normalized the frequency of co-occurrence of the pair by the frequencies of one or both of the individual elements. From a computational perspective, some sequential statistics are direction-independent, such as mutual information (e.g., Charniak, 1993; Swingley, 1999). Other statistics contain directional information. For example, TP can be defined in two different and symmetrical ways, depending on how the normalization step is performed. In addition to forward TPs (hereafter FTP) described in Equation 1, it is also possible to compute backward TPs (hereafter BTP), as shown in Equation 2, measuring the likelihood of X preceding Y:

BTP=P(X[mid ]Y)=frequency(XY)frequency(Y)

Indeed, in a corpus analysis of English infant-directed speech, Swingley (1999) demonstrated that FTP and BTP are equally informative as independent cues to word boundaries.

The majority of the studies investigating TP computations have manipulated FTP, often not controlling for effects of BTP. These conditional probability statistics are typically correlated, making it difficult to assess the independent roles of FTPs and BTPs. Although the two measures are often identical in studies that use artificial language materials (e.g., Aslin et al., 1998; Graf Estes et al., 2007), they are likely to differ substantially in natural languages. In the one infant study in which these two cues were decoupled, Pelucchi et al. (2009) controlled for BTPs in their test items, leaving FTPs as the only useful cues (see Table 1). No study to date has tested whether infants can exploit BTP cues in the absence of informative FTP cues, though results from recent adult studies suggest that such learning is indeed feasible both in the auditory (Perruchet & Desaulty, 2008) and visual domains (Jones & Pashler, 2007).

Table 1
Comparison Between FTP and BTP of the Words and Part-words from Saffran et al., 1996 and Perruchet et al., 2008, and for the HTP- and LTP Words from Pelucchi et al. (2009) and Those Used in the Current Experiment

The aim of the current study was to determine whether infants track BTPs in fluent speech. Eight-month-old infants listened to a corpus of natural speech from an unfamiliar language. Unlike earlier studies of statistical learning using highly artificial stimuli, these materials have the complexity of speech found in infants’ natural environments, increasing the ecological validity of the task (Pelucchi et al., 2009). After familiarization, infants were tested on two different types of familiar words: High Transitional Probability words (HTP-words, BTP=1.0) versus Low Transitional Probability words (LTP-words, BTP=0.33). Importantly, both word types occurred equally often during familiarization and shared the same trochaic stress pattern and the same FTP (1.0). Successful discrimination between HTP- and LTP-words would suggest that infants track BTPs while listening to natural speech.



Thirty-two infants (mean age 8.4 months, range 8.0-9.0) were assigned to one of two counterbalanced conditions: Language A or B. All infants were monolingual English learners with no history of hearing or vision impairments; none had prior exposure to Italian or Spanish. Eight additional infants were excluded because of fussiness (6) or failure to attend (2).


Two counterbalanced languages were used to control for arbitrary listening preferences at test. Languages A and B both consisted of 12 grammatically correct meaningful Italian sentences. The 4 target words (fuga, melo, bici, and casa) each appeared 6 times in each corpus. These target words all followed a strong/weak stress pattern and were phonetically and phonotactically legal in English.

The critical manipulation concerned the BTPs between the syllables of the target words. In Language A, the syllables fu, ga, me, and lo, appeared only in the words fuga and melo. Consequently, the BTP of these two words was 1.0 (HTP-words). However, for casa and bici there were 12 additional occurrences of the syllables sa and ci in the corpus, always in weak (unstressed) position consistent with their stress level in the target words. As a consequence, the BTPs of casa and bici were 0.33 (LTP-words) relative to the Language A familiarization sentences.

Importantly, since the first syllables, bi and ca, only occurred in the context of the target words, the FTPs for casa and bici were 1.0. This is identical to the FTPs for the two HTP-words, fuga and melo. Thus, the only difference between HTP- and LTP-words is their BTPs: 1.0 for HTP-words, and 0.33 for LTP-words. In the counterbalanced Language B, HTP- and LTP-words were switched.

Each block of sentences was presented 3 times, for a total of 3 min. The resulting corpora for Languages A and B included 18 repetitions of each of the 4 target words, and 36 additional occurrences of the second syllables in the two LTP-words. Test items consisted of the 2 HTP-words and 2 LTP-words. A female native Italian speaker, naïve to the purpose of the experiment, recorded the stimuli in an infant-directed register. Test words were read in citation form and were digitally matched for length and amplitude, while preserving their original pitches.


Infants were tested using the Head Turn Preference Procedure as adapted by Saffran et al. (1996). During familiarization, Language A or B played from speakers mounted beneath two sidelights. The lights flashed contingent on looking behavior while the familiarization materials played continuously. Immediately after familiarization, 12 test trials were presented. Each trial consisted of a single test item repeated as long as the infant maintained a head-turn in the direction of the flashing light above the loudspeaker presenting the sound. All infants heard the same test items regardless of familiarization condition. Each of the 2 HTP-word and 2 LTP-word trials occurred 3 times, randomized by block. Test items that were HTP-words in Language A corresponded to LTP-words in Language B, and vice versa.


As there were no significant differences in difference scores for the two counterbalanced languages [t(30)=.30, p=.77], the two conditions were combined in the subsequent analysis. A paired t-test revealed a significant difference in average looking time for HTP-words (10.06 s) versus LTP-words (8.91 s): t(31)=3.05, p<.01 (see Figure 1).

Figure 1
Mean looking times (± 1SE) to HTP-words and LTP-words

Since test items were matched for their trochaic pattern, frequency, and FTP, infants’ successful test discrimination suggests sensitivity to BTP. The observed familiarity preference is consistent with results of Pelucchi et al. (2009), where FTP was manipulated using similar materials and procedures. Together, these results indicate that infants can keep track of TPs in fluent natural speech, computing probabilities of co-occurrence in both directions.

General Discussion

Despite the burgeoning literature exploring the role of TP in word segmentation, remarkably little is known about the directional selectivity of this computation. Prior to the current study, data on infants’ sensitivity to BTP was not available. These results provide the first evidence showing that infants can successfully discriminate disyllabic sequences based on differences in their internal BTPs.

While intuition suggests that forward-going statistics should be more informative to learners, the acquisition of some types of linguistic structures would be facilitated by detection of backward-going statistics (e.g., Saffran, 2001, 2002; Saffran et al., 2008). For example, in languages that mark grammatical gender, backward TPs would help learners to discover which types of articles typically precede nouns; forward TPs are far less informative in this situation (Lew-Williams, personal communication). For example, in the Spanish article-noun sequence la pelota, la is not a good predictor of pelota since there are many possible different feminine nouns that could follow la. However, pelota is a good predictor of la since they are often paired. Even in English, BTPs are more useful than FTPs for discovering some relationships. Corpus analyses suggest that to discover the grammatical category ‘noun’, BTPs are far more informative than FTPs. For example, given the pair of words the dog, the backward probability (the preceding dog) is higher than the corresponding forward probability (dog following the; 0.25 vs. 0.002; Willits, Seidenberg, & Saffran, 2009). In other languages, like Korean, determiners tend to follow nouns, and thus the forward probability is more informative than the corresponding backwards probability. Indeed, a recent adult study suggests that the degree of sensitivity to forward versus backwards probability may in fact be language dependent (Onnis, 2009). In general, forward and backward TPs, working together, likely help to constrain the set of potential alternatives across a range of learning problems.

The successful discrimination observed in this experiment has a further, potentially important, implication. To track BTPs in these materials, infants needed to compute probabilities over weak syllables. Unstressed syllables are less perceptually salient and thus potentially more elusive than strong syllables (Cutler & Foss, 1977; Tabossi, Burani, & Scott, 1995). In Italian, weak syllables are shorter, softer, and lower pitched than their corresponding strong syllables, which infants readily track in statistical learning tasks (Pelucchi et al., 2009). These results are thus important in that they demonstrate that infants can track the probabilities of these less prominent elements. Moreover, unlike earlier studies of statistical learning in infancy, the material used in this study contained the richness of natural speech. The results thus serve to emphasize the sophisticated abilities of infants who readily track the statistics of sounds in complex input despite differences in their acoustic realizations (e.g., coarticulation, nasalization, etc.).

The current studies illuminate the power of statistical learning mechanisms and extend our understanding of their computational range; infants readily track both forward and backward TP even in complex, natural linguistic input. These findings both constrain and challenge future studies that invoke statistical learning processes. Input statistics that appear to be difficult to learn based on weak forward-going statistics may be supported by strong backward-going statistics, and vice versa. Future research will need to pinpoint the means by which infant learners integrate these sources of information, along with the myriad other cues that lurk amidst linguistic input.


This research was funded by grants from NICHD to JRS (R01HD37466) and JFH (F32-HD557032), a grant from the James F. McDonnell Foundation to JRS, and by a core grant to the Waisman Center (P30HD03352). We would like to thank the members of the Infant Learning Lab and especially Jessica Rich for her assistance in the conduct of this research, the families who generously contributed their time, and two anonymous reviewers for their helpful comments on a previous version of this manuscript.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


  • Aslin RN, Saffran JR, Newport EL. Computation of conditional probability statistics by 8-month-old infants. Psychological Science. 1998;9:321–324.
  • Charniak E. Statistical language learning. MIT Press; Cambridge, MA: 1993.
  • Cutler A, Foss DJ. On the role of sentence stress in sentence processing. Language and Speech. 1977;20:1–10. [PubMed]
  • Graf Estes K, Evans JL, Alibali MW, Saffran JR. Can infants map meaning to newly segmented words? Statistical segmentation and word learning. Psychological Science. 2007;18:254–60. [PubMed]
  • Jones J, Pashler H. Is the mind inherently forward thinking? Comparing prediction and retrodiction. Psychonomic Bulletin and Review. 2007;14(2):295–300. [PubMed]
  • Jusczyk PW, Aslin RN. Infants’ detection of the sound patterns of words in fluent speech. Cognitive Psychology. 1995;29:1–23. [PubMed]
  • Onnis L. Manuscript under review. 2009. Language-induced constraints on statistical learning.
  • Pelucchi B, Hay JF, Saffran JR. Statistical learning in a natural language by 8-month-old infants. Child Development. 2009;80(3):674–685. [PubMed]
  • Perruchet P, Desaulty S. A role for backward transitional probabilities in word segmentation? Memory & Cognition. 2008;36:1299–1305. [PubMed]
  • Saffran JR. The use of predictive dependencies in language learning. Journal of Memory and Language. 2001;44:493–513.
  • Saffran JR. Constraints on statistical language learning. Journal of Memory and Language. 2002;47:172–196.
  • Saffran JR, Aslin RN, Newport EL. Statistical learning by 8-month-old infants. Science. 1996;274:1926–1928. [PubMed]
  • Saffran JR, Hauser M, Seibel RL, Kapfhamer J, Tsao F, Cushman F. Grammatical pattern learning by infants and cotton-top tamarin monkeys. Cognition. 2008;107:479–500. [PMC free article] [PubMed]
  • Saffran J, Johnson E, Aslin R, Newport E. Statistical learning of tone sequences by human infants and adults. Cognition. 1999;70:27–52. [PubMed]
  • Saffran JR, Werker J, Werner L. The infant’s auditory world: Hearing, speech, and the beginnings of language. In: Siegler R, Kuhn D, editors. Handbook of Child Development. Wiley; New York: 2006. pp. 58–108.
  • Swingley D. In: Hahn M, Stoness SC, editors. Conditional probability and word discovery: A corpus analysis of speech to infant; Proceedings of the 21st Annual Conference of the Congitive Science Society; Mahwah, NJ: Erlbaum. 1999.pp. 724–729.
  • Swingley D. Statistical clustering and the contents of the infant vocabulary. Cognitive Psychology. 2005;50:86–132. [PubMed]
  • Tabossi P, Burani C, Scott D. Word identification in fluent speech. Journal of Memory and Language. 1995;34:440–467.
  • Willits J, Seidenberg M, Saffran JR. What do infants really want (to count)? The units of statistical learning; 2009 Proceedings of the Annual Meeting of the Cognitive Science Society; 2009; Manuscript to appear in the.