Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Mem Lang. Author manuscript; available in PMC Feb 1, 2009.
Published in final edited form as:
J Mem Lang. Feb 2008; 58(2): 541–573.
doi:  10.1016/j.jml.2007.06.013
PMCID: PMC2361389
Anticipatory effects of intonation: Eye movements during instructed visual search
Kiwako Ito and Shari R Speer
Department of Linguistics, The Ohio State University, Columbus, Ohio
Three eye-tracking experiments investigated the role of pitch accents during online discourse comprehension. Participants faced a grid with ornaments, and followed pre-recorded instructions such as “Next, hang the blue ball” to decorate holiday trees. Experiment 1 demonstrated a processing advantage for felicitous as compared to infelicitous uses of L+H* on the adjective noun pair (e.g. blue ball followed by GREEN ball vs. green BALL). Experiment 2 confirmed that L+H* on a contrastive adjective led to ‘anticipatory’ fixations, and demonstrated a “garden path” effect for infelicitous L+H* in sequences with no discourse contrast (e.g. blue angel followed by GREEN ball resulted in erroneous fixations to the cell of angels). Experiment 3 examined listeners’ sensitivity to coherence between pitch accents assigned to discourse markers such as ‘And then,’ and those assigned to the target object noun phrase.
When speakers communicate with one another in the course of everyday conversation, they convey a great deal of information beyond their chosen words and sentences. Facial expressions and gestures convey aspects of messages that would be unavailable in the absence of face-to-face articulation. Even when interlocutors are not visually co-present for conversation, listeners have access to a great deal of information beyond the spoken strings of consonants and vowels, including variation in rhythm, melody, tempo, loudness, tenseness and tone of voice. Intonation provides an organizational structure for speech, and can covey simultaneously a speaker’s attitude, utterance purpose, and the relative importance of particular words or phrases. Producing and responding to various intonation patterns that are specific to a given linguistic community is an automatic, sophisticated and highly general cognitive skill. For example, most people are sensitive to dialectal differences in intonation (e.g., consider how you would identify the speaker from either Sydney, Boston, or Mumbai by listening to the melody of “Did he actually show up?”). This sensitivity to dialectal distinctions suggests that the members of a language-speaking community have come to a subconscious consensus on the conventionalized use of intonation patterns.
Consistent with this notion, recent work on intonation strives toward standard descriptions of tunes and their meanings in various languages (e.g., English: Pierrehumbert & Hirschberg (1990); German: Fery (1993), Kohler (2005), Grice, Baumann & Benzmüller (2005); Italian: D’Imperio (2000); Grice, D’Imperio, Savino & Avesani (2005); Japanese: Pierrehumbert & Beckman (1988); Venditti (1997; 2005); Spanish: Face (2001); Prieto, van Santen, & Hirschberg (1995). However, this work also reveals a wide range of intonation variation across speakers within the same community, and even within the same experimental setting (cf. Schafer, Speer & Warren, 2003). Given this diversity of form, how reliably can intonation cue pragmatics during speech comprehension? If intonation is indeed a powerful auditory cue that guides the interpretation of incoming utterances, models of speech comprehension must properly describe the mechanism of such processing. The research presented here uses eye-movement monitoring to investigate whether and when listeners make use of intonational cues to guide visual search through a set of real-world objects as they follow spoken instructions to decorate a series of holiday trees.
According to autosegmental accounts of prosody (e.g. Bruce, 1977; Goldsmith, 1967; 1990; Pierrehumbert, 1980), local excursions of tonal prominence in speech are called pitch accents. In English, pitch accents are aligned with the lexically stressed syllable of a word, and are generally accompanied by increased amplitude and duration, and hyper-articulation of the segments as compared to unaccented syllables (Beckman, 1996; Ladd, 1996). Within the framework of a widely-used tone annotation system, ToBI (Tones and Break Indices: Beckman and Ayers, 1997), there are five pitch accent types in American English (H*, L*, L*+H, L+H*, H+!H*: * indicates the association or phonetic alignment between the tone and the stressed syllable; ! indicates the ‘downstep,’ or contextually-triggered lowering of a tone).
These pitch accents are claimed to convey different pragmatic meanings for the utterances in which they appear. Pierrehumbert and Hirschberg (1990) introduced the idea of compositionality of tonal meaning, where meanings conveyed by pitch accent, phrasal accent and boundary tone combine to contribute to the overall meaning of an utterance in discourse. For example, H* accented items are added to the interlocutors’ mutual belief space when they appear in either a declarative sentence with final L-(phrasal) and L% (boundary) tones, or in a question with final H- and H% tones. In contrast, the L* accent can also signal the salience of an accented item, but in this case the item is assumed to be already part of the hearer’s mutual beliefs. L+H* evokes contrast in the discourse, signaling that “the accented item –and not some alternative related item- should be mutually believed (p.296).” This is the tune that may be used for the capitalized word in an utterance such as “I made a reservation for FIFTEEN, not fifty!”.
Given the distribution of pitch accents present in American English, how do speakers and listeners make use of such cues during language processing? We focus here on the contribution of ‘emphatic’ or ‘contrastive’ pitch accent to discourse processing. Recent studies of language production have established that speakers’ use of pitch accent is tightly tied to the information structure of their utterance as it relates to the discourse structure in which it is embedded. Early work in the area characterized this correspondence as a tendency to accent ‘new’ information and to refrain from accenting ‘old’ information (Bolinger, 1961; 1986; Chafe, 1974; Cruttenden, 1986; Halliday, 1967; cf. Chafe, 1976). However, advancements in the specification of models of information structure as well as in the characterization of lexical and sentence level stress and accent have revealed a relationship between accent and discourse status that is considerably more complex. More recent work shows that there is not a strict parallel association between accentuation/deaccentuation of words and their ‘new’ or ‘old’ status in a discourse (Bard & Aylett, 1999; Hirschberg, 1993; Ito, Speer & Beckman, 2003; Nakatani, 1993, 1997; Terken & Hirschberg, 1994). Instead, the correspondence between information structure and speakers’ intonational expression of prominences requires the specification of not just the presence/absence of accent on the words that refer to discourse entities, but the particular accent type assigned (Beckman & Pierrehumbert 1986; Pierrehumbert, 1980; Pierrehumbert & Hirschberg, 1990).
A parallel stream of language comprehension studies have clearly established that the way a referring word is pronounced can influence a listener’s speed and accuracy in recovering the speaker’s intended referent (see Cutler, Dahan, & van Donselaar, 1997 for review). Numerous psycholinguistic studies show higher acceptability ratings and faster comprehension times for listeners when word-level intonation felicitously marks the discourse status of words than when it does not (Bock & Mazzella, 1983; Birch & Clifton, 1995; Needham, 1990; Nooteboom & Kruyt, 1987; Terken & Hirschberg, 1994; Terken & Nooteboom, 1987). In general, felicitously accented words are recognized faster, remembered better, or perceived as more prominent and intelligible than words without accent (Bard, Sotillo, Anderson, Doherty-Sneddon & Newlands, 1995; Krahmer & Swerts, 2001; Sedivy, Tanenhaus, Spivey-Knowlton, Eberhard & Carlson, 1995). Deaccentuation or melodic attenuation of previously mentioned information has also been shown to facilitate utterance comprehension (Needham, 1990; Nooteboom & Kruyt, 1987; Terken & Hirschberg, 1994; Terken& Nooteboom, 1987). Although these studies demonstrate that accent plays an important role in helping listeners determine referents, little has been done to investigate how rapidly pitch accent information is processed and the time course by which it is utilized during speech comprehension. The present experiments investigate listeners’ processing of intonational patterns using eye-tracking methodology, which enables the analysis of continuous on-line responses to various intonational patterns as they unfold over time.
The merits of head mounted eyetracking methodology with naturalistic interactive tasks for the investigation of real-time spoken language processing are well-established (Allopenna, Magnuson & Tanenhaus, 1998; Chambers, Tanenhaus, Eberhard, Filip & Carlson, 2002; Dahan, Magnuson, & Tanenhaus, 2001; Dahan, Magnuson, Tanenhaus, & Hogan, 2001; Dahan, Tanenhaus, & Chambers, 2002; Spivey, Tanenhaus, Eberhard & Sedivy, 2002, see Trueswell and Tanenhaus, 2005 for additional studies and discussion of the advantages of this method). The primary advantages of this method for the research presented here are 1) The head mounted eyetracking system provides correction for head position and rotation, allowing participants to move and talk during listening data collection, 2) Eye movement monitoring is a continuous, non-intrusive, implicit measure of processing difficulty. Because it allows observation throughout the time course of listeners’ perception and interpretation of intonational patterns, we can compare the relative appropriateness of particular intonation patterns during the course of a naturalistic conversational exchange. 3) Established linking hypotheses between spoken word recognition and eye-movements (e.g. Tanenhaus & Trueswell, 1995) allow the assumption that listeners’ attention shifts in response to words relevant according to the available spoken context and visual scene, resulting in the non-strategic planning of eye movements to a mentioned object (here, a holiday ornament that must be moved from a grid to a tree). Despite such merits, the experimental paradigm is not free of potential problems. As Trueswell and Tanenhaus (2005) point out, limiting the complexity of visual world or the interactive task with a simple set of materials or actions may not yield the responses that represent normal speech comprehension processing or discourse behavior beyond the experimental environment. This closed-set problem is tightly related to unwelcome task-specific strategies. We attempted to avoid these problems by designing our experimental task and visual world to be natural yet fairly complex. Ornament grids displayed more than 40 objects, so that listeners were engaged in genuine visual search rather than simple matching of spoken words to a small set of objects whose locations and identities could potentially be simultaneously held in memory throughout a trial. Participants needed to focus not only on the action of selecting the correct ornament from the board, but also on placing each small ornament in the correct location on the tree. (See General Discussion for our further thoughts on visual complexity and responses to prosodic prominence.)
In a study particularly relevant to the research presented here, Dahan et al. (2002) demonstrated the immediate effect of accentual prominence on eye movements reflecting spoken language comprehension. The effect of accent on reference resolution was shown by using two cohort items such as candle and candy, which were presented on a computer monitor with phonetically unrelated items such as necklace and pear, and geometric landmarks such as triangle and square. On each trial, participants heard a paired instructions such as “Put the candy above the triangle. Now put the CANDLE above the square.”, and were told to move the objects accordingly. The target in the second sentence was either repeated from the first sentence (e.g., candle-candle) or switched to the cohort competitor (e.g., candy-candle), and it was presented either with or without accentual prominence (e.g., CANDLE vs. candle). Participants’ eye movements to the target object mentioned in the second sentence were monitored. The proportion of fixations to the cohort competitor (e.g., candy) was higher when the repeated target carried prominence (e.g., candle-CANDLE) than when it did not (e.g., candle-candle). In addition, when the target was not repeated from the first to the second instruction (e.g., candy-candle), the effect was reversed: fixation proportions to the competitor (candy) were higher when the target was not prominent than when it was. Dahan et al. interpret these results as indicating the anaphoric interpretation of non-prominent words and nonanaphoric interpretation of prominent words.
This is partially consistent with the claim by Terken and Nooteboom (1987) that a deaccented entity is processed as given, and thus a listener tries to match that entity to an already-activated discourse entity, whereas the interpretation of an accented word is not constrained in such a manner. The results echo previous off-line studies showing that improperly accenting an already-mentioned entity or deaccenting a yet-to-be-mentioned entity increases comprehension time (Bock & Mazzella, 1983; Nooteboom & Kruyt, 1987; Birch & Clifton, 1995). Note, however, that statement-verification tasks show that infelicitous accentuation of given (i.e., already-mentioned) items does not slow down comprehension as much as infelicitous deaccentuation of new (never-mentioned) items (Birch & Clifton, 1995; Ito, 2002). This may mislead us to conclude that the presence of an accent does not have an immediate impact on the interpretation of incoming speech. The eye fixation patterns shown by Dahan et al. demonstrate that this is not the case. Instead, accent leads to rapid identification of an item not previously mentioned within the interactive environment.
Interestingly, Dahan et al.’s stimuli contained a variety of pitch accent and phrasal tone combinations. According to their ToBI annotation, 14 out of 24 targets in the accented condition had L+H* while the remaining 10 cases had H*. In the deaccented condition, all targets carried H+!H*, but a preceding discourse marker (e.g., on ‘Now,’ Experiment 1) had either L+H*, L*+H, or H*. In addition, the theme (e.g. candy) in the first instruction had L+H* in 26, H* in 20, and L* in 2 of 48 items. The combined phrasal-boundary tones varied between L-L% and L-H%. This variation of accent pattern in the context instruction may have affected the salience and informational status of target or competitor items even before subjects heard the second instruction. More specifically, it is possible that contrastive accents in the context instruction may have contributed to the finding of a very early advantage for items with felicitous accent patterns (shown within 300ms of target onset). The absence of phonetic control over auditory stimuli is problematic for a comparison across studies (see General Discussion for a close comparison between the present study and Sedivy, Tanenhaus, Chambers and Carlson (1999), which failed to demonstrate the effect of focus intonation.) In the present study, we controlled sentence-level tonal patterns for all trials including fillers. Stimuli were re-recorded until two ToBI annotators agreed on their respective tonal patterns. In order not to make the target instructions sound noticeably contrastive or monotonous, intonation variation was introduced by manipulating the patterns in filler trials (see Materials sections below). Furthermore, we specifically tested the effect of accentual patterns on the discourse marker in the last experiment.
A more important goal of the current study is to explore the additional role of contrastive accent L+H*, in hope of advancing the investigation of a general mechanism of accentual information processing during speech comprehension. As demonstrated by Dahan et al., prominent accent on the noun naming the direct object of the task action ‘put’ promptly assigns the new/not-previously-mentioned or given/already-mentioned status to the target entity. Obviously, the tonal shape of a word plays a role beyond simply signaling its informational status with respect to discourse background. The present study entertains the possibility that contrastive accentual information feeds the discourse foreground, projecting pragmatic links between the accented discourse entity and upcoming entities. Compare, for example, what you would expect as continuations of utterance fragments (1) and (2).
  • KATIE did not win a truck,
    L+H*H+!H* L-H%

  • Katie did not win a TRUCK,
    H*L+H* L-H%

In (1), it is very plausible that the sentence continues with somebody else’s name who won a truck (e.g., “…, LAURA did.”), whereas (2) may continue to mention something other than a truck that Katie actually won (e.g. “ …, she won a MOTORCYCLE.”). Such predictions become available due to the contrastive function of L+H*, which evokes a set of alternatives appropriate for the immediately ensuing discourse (Pierrehumbert & Hirschberg, 1990; for the effect of contrastive accent in negation, see Davidson 2000). In other words, a contrastive L+H* constrains the set of coherent upcoming discourse entities as well as specifying the informational status of the accented word itself. Consistent with the immediate effect of accent demonstrated by Dahan et al., we predict that anticipatory effects of L+H* will be evident in eye movements when a member of the evoked set is present in the visual field. Furthermore, we pursue the view that accentual cues are evaluated immediately, allowing assignment of referential status to the named discourse entity as its speech segments unfold in time. That is, a particular pragmatic relation is established between the accented word and other background/foreground discourse entities during the recognition of the content of the speech signal. Although the immediacy of accentual processing itself may not be contentious, the question remains as to whether the processing of accentual information is complementary to incremental speech comprehension, or if instead it is a robust independent computation that takes place in parallel with phoneme-based word recognition. Dahan et al. suggest strong influence of parallel prosodic processing on lexical access, but their evidence is rather indirect due to the use of cohorts. In the present study, we manipulated the accentual prominence of a prenominal adjective (Experiment 1 and 2) to test the effect of contrastive accent on the modifier on the access to the upcoming noun. If a prominent accent on a prenominal adjective evokes a contrastive relation between the immediately preceding referent and the about-to-be-mentioned referent, it should constrain the set of nouns that the adjective modifies. Hereafter, we refer to this constraint on the upcoming referent as an “anticipatory effect” of contrastive accent. We hypothesize that this effect of prosodic prominence is independent of the segmental processing necessary for lexical identification, and thus predict that speech comprehension will be disrupted when the segmental information of the noun does not substantiate the anticipated referent (The effect of such mismatch between the anticipated referent and the incoming noun is directly tested in Experiment 2). The scope of the anticipatory effect in discourse foregrounding is tested by manipulating accentual patterns of discourse markers that connect utterances in Experiment 3.
We now turn to a series of three experiments, where participants decorated holiday trees following instructions that mention target ornaments as combinations of color adjectives and object nouns (e.g., blue ball). Experiment 1 tests whether felicitous use of L+H* (e.g., “First, hang the green ball.” → “Now, hang the BLUE ball.”) facilitates fixations to a target object as compared to infelicitous use (e.g., “First, hang the green ball.” → “Now, hang the blue BALL.”). Experiment 2 confirms the facilitatory effect of felicitous L+H* by comparing it against a condition that does not involve L+H* (e.g., “First, hang the green ball.” → “Now, hang the blue ball.”). This experiment also examines whether the set-evoking function of L+H* can mislead listeners to fixate on the previously mentioned object type in the presence of the name of a different object (e.g., Will balls be fixated during the second instruction for the sequence “First, hang the green ball.” → “Now, hang the BLUE angel.”?). Finally, Experiment 3 investigates whether the presence of L+H* on temporal-adverbial discourse markers such as And then, After that, and And next have additional effects on the processing of the upcoming target.
The first experiment was designed to test whether felicitous use of a contrastive pitch accent (here, L+H*) is advantageous for listeners as compared to infelicitous use during the holiday tree decoration task. We hypothesized that L+H* on a color adjective evokes a set of alternative color possibilities that could modify the ornament object, while restricting the referent to be named by the upcoming noun to the same type of object that was mentioned in the immediately preceding utterance. According to this view, L+H* accent on the color adjective should not only assign contrastive status to the color itself, but also should increase listeners’ expectations that the most recently mentioned target noun will be repeated in the current utterance. We predicted a rapid anticipatory effect of the adjective’s intonational prominence on the listener’s selection of a candidate noun, which might be detectable even before the noun’s segmental information was fully processed for word recognition, consistent with previous demonstrations of the very early use of prosodic information in visual world tasks (Dahan et al, 2002, Snedecker & Trueswell, 2003).
In all experiments in this study, ornaments were sorted by type into the cells of a display grid (e.g., one cell contained all balls, another all angels, etc.). We predicted that fixations to the target ornament cell would be speeded when L+H* on the adjective felicitously marked contrast between two consecutive referents in sequences such as “Hang the green drum. Now, hang the BLUE drum.” In contrast, fixations would not be speeded if L+H* expressed contrast in an infelicitous position, e.g., on the object noun instead of the color adjective, as in “Hang the green drum. Now, hang the blue DRUM”. In such cases, intonation-based anticipation would not be established at the adjective, and despite the prominence on the object noun itself, fixations to the target cell should not be as early as those for the felicitous sequences. Experiment 1 also examined whether felicitous use of contrastive L+H* on the object noun provided a processing advantage to listeners during visual search. We compared sequences such as “Hang the green drum. Now, hang the green BALL” to infelicitous use in sequences such as “Hang the green drum. Now, hang the GREEN ball”. Notice that noun contrast trials differed from adjective contrast trials in that it was possible for the lack of contrastive accent in combination with the repetition of the adjective in the felicitous sequence (green drum → green BALL) to serve as a cue to upcoming noun contrast even before the occurrence of the contrastive L+H* accent. However, because the visual search was among cells organized by object type, such a cue could only serve to foreshadow contrast and could not allow projection of a particular noun candidate (because there were always multiple green ornaments in the display). Thus we predict no difference between felicitous and infelicitous noun contrast trials until after the phonemic information of the noun becomes available.
Thirty-six native speakers of Midwestern American English were recruited at the Ohio State University. They received partial credit toward a course requirement for their participation.
Design and Materials
Visual search task
Each participant had four trees to decorate, each from its own grid of ornaments. Grids held 40 to 52 ornaments, and finished decorated trees held a total of 24 (16 target and 8 filler) ornaments. No grid held more than one ornament of a particular type and color combination. Grid layouts were designed to prevent listeners from being able to predict the mention of an upcoming color or ornament type, either by process of elimination as the trials transpired, or by a pattern of repeatedly choosing items of the same name across trees. All colors and all ornament types were mentioned at least once, and no more than three times during the decoration of a single tree.
Real-world objects
Eight target color adjectives (blue, red, green, orange, gold, silver, brown, grey) were combined with eight target object nouns (ball, drum, angel, bell, stocking, onion, candy, egg) to construct noun phrases that described the ornaments. Three additional colors (purple, white, yellow) were used to paint four filler ornament types (snowman, lightbulb, tree, star) as well as ‘dummy’ ornaments that were displayed on the grid but remained unmentioned in instructions. Color shades and ornament shapes or sizes of target objects were varied across trees, so that each color adjective and each object noun could refer to a different and novel discourse entity in each tree. Filler ornament types were varied across trees.
Adjective-Noun sequences
Within each tree, construction of contrastive trials required four sequences that repeated the object noun, serving as a context where the color adjective should convey contrastive information (e.g., green onion → orange onion). There were also four sequences that repeated the color adjective, serving as a context where the object noun should convey contrastive information (e.g., brown ball → brown angel). Across the four trees, each adjective and each noun appeared in these contrastive sequences one to three times, preventing the association of particular colors or objects with specific pragmatic contexts. Infelicitous contrastive sequences were limited to two noun and two adjective contrast trials per tree, to minimize the possibility that listeners might adopt some task-specific strategy or become insensitive to pitch accent information due to its infelicitous use. The design also included four non-contrastive trial types that were used to examine the effect of noun or adjective repetition. These trials are not discussed here for reasons of space (see Ito & Speer, 2006, for discussion). Appendix A includes the full ornament sequences for the four trees used in Experiment 1.
The instructed visual search task design used here differed from more traditional psycholinguistic experimental designs in several important respects. First, the visual array of objects remained substantially the same across the course of the experiment, changing only 4 times instead of on each new sentence trial (as in Snedecker & Trueswell, 2003), or each paired sentence trial (as in Dahan et al, 2002). This allowed us to measure the use of intonational information by participants engaged in a complex naturalistic task-based search, arguably a more natural setting for language use. Second, more traditional designs could be said not to involve search at all, but instead involve mapping names recovered from the speech signal to a circumscribed display of otherwise unrelated or loosely related objects (such as frogs and flowers, or candles and candies). Participants in these tasks have little need to search, readily fixating items based on similarity to the spoken input, and rarely looking at unrelated items. Here, the task and the larger number of items require search, and as such the test of whether spoken language processing can be immediately influenced by simultaneously available visual input is more rigorous. Third, because in our task the real world referents of words were not renewed from trial to trial, and groups of real world objects had the same name, the lexical items used named object categories in the display (i.e. colors and ornaments) as well as single objects. This manipulation was necessary in order for us to examine the contrastive function of L+H* accents in the context of a set. Fourth and finally, the design is not appropriate for the traditional psycholinguistic statistical analysis that considers language items as a random effect, because the necessity for repetition without predictability in the creation of the conditions made it impractical to use the same words in all conditions for reasons of time. Item analyses assume random sampling of items and are used to reduce the probability of TypeI error due to differences between item sets in conditions. They are often used to eliminate the contribution of particular lexical items to sentence processing effects, where factors such as lexical frequency or coherence across a sequence of words may artificially inflate or reduce processing time for a particular item. Here, the task structure introduced items in critical conditions into the discourse structure by visual presence and by mention of their names before they were mentioned as part of a target noun phrase, minimizing the contribution of initial lexical access or coherence to the processing of individual items. Nevertheless, we matched the sets of spoken items for number of syllables and absolute duration within conditions to be critically compared, as these variable have been shown in some studies to have a positive correlation with reaction time, and thus might have influenced time to fixate the target. All words had first-syllable stress. (Mean number of syllables and duration in critically compared conditions: Felicitous: adjective 1.3, 337ms; noun 1.5, 449ms; Infelicitous: adjective 1.3, 332ms, noun 1.6, 469ms; Adjective contrast: adjective 1.3, 327ms; noun 1.6, 457ms; Noun contrast: adjective 1.3, 343ms; noun 1.6, 461ms). When items are matched rather than random, and/or when they are selected from a highly constrained set (such as tree ornament names) use of F1 alone is the correct procedure (See Raaijmakers, Schrijineakers & Gremmen, 1999 for discussion of the conditions under which items analyses are not appropriate). To allay any concerns about the generality of our effects, we rotated adjectives and nouns into different conditions across experiments 1–3, thus replicating effects across different sets of word items in conditions.
Spoken materials
The intonation patterns for the audio instructions were determined based on speech production data from a previous experiment using a comparable holiday tree decoration task (Ito, Speer and Beckman, 2003; Ito & Speer, 2006). In the previous study, naïve speakers gave instructions on how to decorate holiday trees to our confederate, who decorated them. Speakers were not instructed as to what to say, but saw a CRT display with two color photographs on each trial. One showed the next ornament to hang, and the other showed a tree marked at the appropriate location with a text tag naming the ornament (e.g. ‘green onion’). This unscripted interactive task allowed us to record spontaneous productions of the intonation patterns that participants typically produced for the adjective-noun ornament names. Results showed that both adjectives and nouns were very likely to be accented when they were mentioned for the first time (over 80% of the time). However, while adjectives were accented with similar frequency regardless of whether they had been already mentioned, nouns were less likely to be accented on subsequent mentions (probability of accentuation dropped to .5 when an already-mentioned noun followed the first mention of an adjective). When adjectives appeared in a contrastive context, they were frequently pronounced with L+H* (46% when they were mentioned for the first time, and 50% for subsequent mentions). L+H* on nouns in contrastive contexts was less frequent (19% first mention, 18% already mentioned).
Auditory stimuli for Experiment 1 were constructed on the basis of these patterns from naïve speakers. Items contained a pitch accent on both the first and consecutive mentions of both nouns and adjectives. Adjective noun pairs that did not occur in contrastive context were assigned the sequence most typical for such pairs in our production study, H* !H*, where the accent on the noun was ‘downstepped’ from the preceding adjective’s accent. The only exception was when the first mention of a noun followed an already-mentioned adjective, in which case the sequence assigned was H* H* (again, with reference to such sequences in the production study). In contrastive contexts, where an adjective or noun was repeated on the immediately following trial, L+H* was assigned to the contrast word. When a noun was repeated, L+H* on the adjective was followed by no accent on the noun. When an adjective was repeated, it was assigned a H*, and L+H* occurred on the following noun. Table 1 shows the four critical conditions and their associated pitch accent patterns for the instructions used in Experiment 1. (Four additional conditions appeared in the design of the experiment. These conditions tested the effect of initial vs. repeated mention of adjectives and nouns, but did not involve contrast. They are omitted from further discussion here for reasons of space). The crucial comparisons among the four conditions are between the Felicitous and Infelicitous Adjective Contrast trials (Felicitous Adj Contrast vs. Infelicitous Adj Contrast) and between the Felicitous and Infelicitous Noun Contrast trials (Felicitous N Contrast vs. Infelicitous N Contrast). Felicitous and Infelicitous Adj Contrast conditions appeared in “contrast-on-color” discourse context, where the words describing the sequence of objects specified a new color for a subsequent object of the same type (e.g. green drumblue drum). Felicitous and Infelicitous N Contrast conditions appeared in “contrast-on-object” discourse context, where the words describing the sequence of objects specified the same color for a subsequent object of a different type (e.g. blue onionblue drum). Each tree had two trials in each of the four critical contrast conditions.
Table 1
Table 1
Experiment 1: Information status and accent pattern for the critical four conditions.
Instructions were recorded at 22.05 KHz, 16 bit resolution (using SoundEdit 16, Version II, Macromedia) by a trained female phonetician who could maintain a consistent overall pitch range and intonation pattern across items within each condition. The F0 values were calculated with a 10 ms window using an autocorrelation algorithm (Boersma, 1993) in Praat (Version 4.2.17: Boesma & Weenink, 1992–2004). To validate the accentual patterns present in the stimuli, two independent ToBI transcribers annotated the target instructions in a randomized order. Instructions were re-recorded until both transcribers gave independent annotations indicating that the target adjective and the noun carried the intended intonational patterns. Figure 1 shows example ToBI transcriptions of the utterances used for the Felicitous and Infelicitous Adj Contrast conditions.
Figure 1
Figure 1
Example ToBI transcriptions of target noun phrases. (left) Hang the ORANGE onion in the Felicitous Adj Contrast condition: and (right) Next, hang the brown ANGEL in the Infelicitous Adj Contrast condition.
Our ToBI annotators used the pitch accent categories H* and L+H* as specified in the guide to ToBI labeling (Beckman & Ayers, 1997). However, we acknowledge that some scholars consider the phonological distinction between H* and L+H* to be continuous rather than categorical. Contradictory results have been found for these accents in comparisons of perceptual rating, classic auditory identification and discrimination tasks (e.g. Bartels & Kingston, 1994, Ladd & Morton, 1997). In Ladd and Morton (1997), listeners rated the degree of emphasis for a set of sentence stimuli (e.g. He’s Iranian) created by modifying the F0 peak range with step sizes 6Hz, 8Hz, 10Hz, and 16 Hz. Emphasis judgment ratings showed a gradual increase with increasing F0 augmentation, suggesting continuous perception. However, forced-choice identification between “everyday occurrence” and “unusual experience” interpretations of stimuli from the same continua showed the classic S-shape response curve, with a sharp increase for the “unusual experience” response in the mid F0 range, suggesting a categorical boundary between normal and emphatic accents. Results of a same/different discrimination task using paired stimuli from the same series failed to show a peak for the correct “different” responses in the F0 mid range, which would have indicated a boundary between the two categories. Ladd and Morton argued that these results suggest that pitch range changes are categorically interpreted, though they may not be categorically perceived (Ladd & Morton, 1997, p339). Although further research is needed to determine whether there is a ‘true’ categorical boundary between H* and L+H* (like the one between phonemes/p/and/b/), these results do clearly indicate that pitch accent height has a substantial impact on the interpretation of spoken emphasis, with listeners interpreting accents with relatively higher F0 as more emphatic. In the experiments presented here, we distinguished H* from L+H* on the basis of maximum F0 in the accented syllable and auditory discrimination by ToBI annotators, choosing unambiguous tokens for use as stimuli in the two categories.
The ToBI-annotated materials for Experiment 1 were submitted to phonetic analysis. Duration values were obtained for the verb, the article the, the adjective, and the noun. For accented items, peak F0 values were measured for H tones. Table 2 shows the mean duration and F0 values of the target stimuli. Note that words with L+H* clearly differentiated with higher F0 peaks and longer duration as compared to words with H*.
Table 2
Table 2
Experiment 1: Mean duration and F0 values of the target stimuli
Participants were seated in front of an ornament grid supported by a drafting table with a surface incline of 35 degrees. A small holiday tree was located to the left side of the table on a rotating stand. Location and timing of eye movements were monitored with an ASL E5000 head mounted eyetracker with eye-head integration. Eye position was calibrated initially and whenever needed to prevent track loss throughout the experiment. Participants wore light plastic headgear supporting a 60Hz eye-camera and a small magnetic receiver, which signaled head position to the system. A 60Hz stationary scene camera was mounted on the ceiling behind the participant, providing a view of the ornament grid. The experimenter sat behind the drafting table, operating a laptop computer running E-prime, Version 1.0 (Psychology Software Tools, Inc.). Individual trial onsets were initiated by the experimenter via a button box connected to the laptop. The onset and offset of eye position data collection was synchronized with the onset and offset of the critical auditory phrases in each instruction by an E-prime command sequence that played the sound and initiated data recording via a serial connection to the ASL Control Unit.
Participants were told that for each trial, they would listen to an instruction, pick the specified ornament from the grid, hang it on the tree at the instructed location, and then face back to the board and say “O.K.” so that the experimenter could play the next instruction. Instructions were played through a set of speakers placed on the experimenter’s desk, facing the participant. The layout of experimental scene is depicted in Figure 2.
Figure 2
Figure 2
Experimental setup for holiday tree experiments 1–3.
Each of the four experimental grids had 11 cells, each containing 3–5 ornaments. For each tree, the 16 critical target ornaments (2 in each condition) were placed in the 8 cells surrounding the center cell. Filler ornament sets occupied the center cell and two cells located on the far left and right sides of the grid. Throughout the trials, at least one ornament remained in each grid cell, so that the predictability of color and ornament terms did not change over the course of tree decoration. Figure 3 shows a photograph of a complete grid at the beginning of a tree sequence.
Figure 3
Figure 3
An example image of the ornament board.
Results and Discussion
Three participants requested that one instruction be replayed, when they did not remember the direction of decoration (e.g. “Moving to the right,”). Since they all fixated the correct cell before reaching the target ornament during the first presentation of instruction, eye movement data during the replay were discarded. Two more trials were discarded from two other participants because the target ornament was missing from the grid due to experimenter error. Across the critical conditions, there was only one missing trial (Infelicitous Adj Contrast Condition: 1/288 observations missing).
Saccades and fixations to target cells were automatically generated using ASL E5000 software. Figure 4 shows the proportions of fixations to the target cells averaged across participants for the four critical conditions, coding fixation onset times from the onset of the saccade (Altmann & Kamide, 2004). The eye movements were recorded at 60Hz, and thus each sampling point on this figure indicates how likely the fixation was within the target cell at approximately every 17 ms. Proportions give the number of fixations to the target cell divided by the number of possible fixations (36 participants × 8 trials per condition, 288). The fixation proportions are shown aligned from the onset of the object noun, displaying the fixations during the prenominal adjectives with negative time values. This alignment strategy was adopted for two reasons. First, the object noun was the critical word that singled out the referent to be mapped onto the real world object. Thus, aligning fixation proportions from the onset of the noun allows direct comparison of eye movement timing between our data and those of previous studies such as Allopenna et al. (1998) and Dahan et al. (2002). Secondly, backward alignment of the data allows the examination of the anticipatory effect of accent toward the end of adjective without risking the loss of data. (Note that double-alignment of data at the onset of adjective and again at the onset of the noun using the average duration of words (as in Snedeker & Trueswell, 2003) would ignore data available from the final portions of words that had longer-than-average durations.) In Figure 4, the mean duration of adjectives and nouns is indicated by a vertical line for each of the two accentual patterns. Horizontal lines indicate the range of word duration, for each accentual pattern, with solid lines indicating the shortest item duration, and dotted lines extending to the longest item duration. The floating vertical lines indicate the 99% confidence intervals for the time regions where the mean difference reached statistical significance for each critical comparison. Values of the confidence intervals were back-transformed from arcsine-transformed proportions that served as the dependent measures in the statistical analyses.
Figure 4
Figure 4
Experiment 1: Proportion of fixations to the target object cell, aligned from noun onset in the four critical conditions.
It is important to note that an alternative method to calculate fixation proportion has been proposed to normalize the durational differences across spoken items (Altmann and Kamide, 2004). According to this method, fixation proportion signifies how often subject fixated the target during each word. This strategy has been adopted by other researchers to analyze the time course of referential resolution in the presence of syntactic ambiguity (Engelhardt et al., 2006). Although this method is appropriate to examine at which word the effect of sentence context or syntactic structure appears during sentence processing, it does not allow detection of the point within the critical word where the effect of prosodic cue on word recognition starts appearing. Since it is of our particular interest to identify the point where the effect of contrastive accent appears within the noun phrase, we calculated fixation proportions for each time point from the onset of the noun.
In order to examine the time course of prosodic effects on visual search, mean arcsine transformed fixation proportions for each 300 ms window before and after the onset of the noun were submitted to five (2 backward and 3 forward time windows) 2 (discourse contrast) × 2 (L+H* accent location) repeated measures ANOVAs. This size of time widow was chosen to examine (1) whether the effect of accentual manipulation for the adjective appears during the adjective itself, and (2) whether the accentual manipulation for the adjective affects the fixation patterns before the segmental information of the noun is used for launching the eyes. Here, we assume that a saccade planning in a visual search task with a field as complex as our experiment would require 200–300 ms after initial segmental processing (Viviani, 1990). Table 3 summarizes the results of critical pairwise comparisons. The relevant confidence intervals were calculated with the means by participants. Item analyses are not provided (see discussion in Experiment 1 method section above).
Table 3
Table 3
Experiment 1: Results compar ing fixation proportions (n = 36), Subject analysis
Figure 4 shows the effect of felicitous L+H* on the prenominal adjective for Felicitous and Infelicitous Adjective Contrast conditions where the ornament type was immediately repeated and thus the color adjective conveyed contrastive information (e.g. green drumblue drum). The data show more early fixations to the target cells when L+H* felicitously marked the adjective (BLUE drum), expressing its contrastive status, than when L+H* was infelicitously placed on the object noun (blue DRUM). Note that the fixation proportion rises from the beginning of the noun in both conditions. However, the initial rise is clearly steeper in the Felicitous Adjective Contrast condition [L+H* no-accent] than the Infelicitous Adjective Contrast condition [H* L+H*]. The two lines diverge at around 100 ms into the noun, with no overlap until after 650 ms. We assume that planning and execution of a saccade requires at least 150 to 200 ms for simple visual search (Fischer, 1992; Matin, Shao & Boff, 1993; Saslow, 1967), and that the visual search involved in our task is likely to be more complex. Thus the early more frequent fixations to the target cell in Felicitous Adjective Contrast condition could not have been driven solely by the phoneme-based recognition of the object noun. Instead, we claim that fixation to the target cell was facilitated by the preceding adjective’s contrastive accent, which restricted the candidates for the upcoming referent of the noun to the set of ornaments that were the same type as those in the previous trial.
One concern for this interpretation of our results, however, is that the adjectives with felicitous L+H* were on average 40 ms longer than those with infelicitous H*s. Thus the accentual prominence may have simply provided a durational advantage, with longer adjectives serving as additional opportunities for planning saccades to the target. In order to eliminate this possibility, we examined the correlation between adjective duration and the first fixation latencies from the onset of the noun in each condition finding no significant relationship (r=−.165; .004; .149; .154 for Felicitous Adjective Contrast, Infelicitous Adjective Contrast, Felicitous Noun Contrast, and Infelicitous Noun Contrast, respectively.) Therefore, chances are slim that participants took advantage of longer adjective durations to plan saccades to the target cells.
Figure 4 also compares Felicitous and Infelicitous Noun Contrast conditions, where the color adjective was immediately repeated (in contrast-on-object discourse context). Unlike the above comparison in contrast-on-adjective context, there is no clear early advantage for the felicitous condition, where the adjective carried a H* accent and L+H* properly marked the contrastive status on the noun. In fact, fixation proportion data showed numerically more fixations on the target in the infelicitous [L+H* no-acc] rather than the felicitous [H* L+H*] condition up to about 400 ms (this early difference did not reach significance, mean difference .034 in the 0–300ms window, 95%CI=.075)1. Instead, the effect of felicitous L+H* in the contrast-on-object context appeared much later, in the 600–900 ms window, where fixation proportions were significantly higher for felicitous than infelicitous accents. This indicates that felicitous use of L+H* on the noun evoked contrast, and eventually led to more frequent fixations to the target. It is important to note that the shapes of the rising functions are remarkably different between Adjective Contrast conditions and Noun Contrast conditions in Figure 4. As we mentioned above, the two functions in Adjective Contrast context start rising sharply immediately after the onset of the noun, whereas both functions in Noun Contrast context show a relatively shallow initial slope. The timing of rise in fixations in the Noun Contrast context is comparable to the results in Allopenna et al. (1998) and Dahan et. al (2002), where participants started fixating objects 200–300 ms after the onset of the nouns. (The mean difference scores during the object noun for Adjective Contrast vs. Noun Contrast trials were .174, 99%CI=.100 for the 0–300ms window, .371, 99%CI=.116 for the 300–600ms window, and .382, 99%CI=.142 for the 600–900ms window.) This difference indicates that, regardless of the pitch accent on the adjective, participants began fixating the target cells much earlier when they heard a repeated noun following an adjective that was different from the immediately preceding ornament than when they heard the same color adjective repeated before a non-repeated noun. This effect is not surprising in the context of a visual search among cells organized by ornament type, because in repeated noun conditions, the location of the target ornament type is easily remembered from the immediately preceding trial. In contrast, repeated color adjective instructions were less informative, as they eliminated the most recently visited ornament cell, but left open ten other possible cells.
The results of Experiment 1 demonstrated that felicitous use of L+H* led to early fixations to the target cells, but only when it marked contrastive status on the color adjective, and not when it was used to express the contrast on the noun. The advantage for targets with felicitous L+H* on the prenominal adjective was observed very early, during the pronunciation of the noun. Fixation proportions in both of the Adjective Contrast conditions increased sharply from noun onset, suggesting that participants were actively searching for the target ornament before they had recognized the object noun –that is, on the basis of information from the adjective. Note that in Dahan et al.’s Experiment 1, fixation proportions began to increase approximately 300 ms after the onset of the target word, even when it was repeated from the previous command (e.g., candle → candle). Based on the assumption about the time requirement for complex visual search tasks (Viviani, 1990), Dahan et al. argue that fixation data begin to reflect the processing of a word about 200–300 ms after its onset. In the present results, however, fixations to the target cells increased from the beginning of the noun even in the Infelicitous Adjective Contrast condition with [H* L+H*] pattern, indicating that listeners returned their gaze to the most recently mentioned ornament cell upon hearing the repeated noun even in the absence of contrastive accent on the preceding color adjective. We interpret the immediate rise in fixations to the target in Adjective Contrast context as the effect of repetition, and tentatively argue that listeners did not need 300 ms to process the repeated word and program their saccades accordingly. Thus, the relatively early rise in fixation proportions in the Felicitous Adjective Contrast condition demonstrates a combined advantage of repetition and the preceding contrastive accent. We attribute the lack of such repetition effect in Dahan et al. (2002) to the overt presence of an unmentioned cohort competitor that drew participants’ attention even before they heard the target instruction.
Unlike to the Adjective Contrast conditions, the fixation proportions in the two Noun Contrast conditions did not increase immediately after the onset of the target noun. Instead, fixations to the target cells started to rise at around 300 ms in both felicitous and infelicitous trials when the color was immediately repeated. Based on the assumptions above about the timing of saccade planning and execution for non-repeated objects, we interpret these late increases in fixation as the result of processing the segmental and intonational information in the object nouns. The lack of anticipatory fixations in the Noun Contrast conditions indicates that participants did not plan saccades to the target until they heard the object nouns. We suspect that this difference in the ease of anticipation comes from the way the search task environment was structured. The ornaments were sorted by object types on the grid, and there was no duplication of ornaments (i.e., the same object in the same color) in any cell. Thus repeated noun information provided clear information about the location of the target, while repeated color information did not. In addition, we found no evidence that the combination of repeated color term and lack of a contrastive accent presaged an upcoming contrast on the object noun. In sum, the intonational cues of prenominal adjective did not promptly facilitate eye movements when participants needed to wait for the object noun in order to plan a saccade.
We also note that slightly earlier fixations would be predicted for the condition with L+H* on the noun, if felicitous L+H* unconditionally facilitated lexical access. The present data instead suggest that the accentual prominence itself does not trigger earlier outset of word recognition. The timing of fixation rise indicates that felicitous contrastive accent facilitated word recognition once enough segmental information was available. Therefore, participants seem to have selectively tuned to and made use of intonational cues that were informative to performance of the search task in the given environment.
The results of Experiment 1 demonstrated the robust effect of intonational cues that could facilitate anticipatory eye movements. However, a problem remains with the interpretation of data since the felicitous uses of L+H* were always compared against the infelicitous use of L+H*. Because contrastive prominence on an immediately repeated word sounds odd in the absence of unusual discourse context, it is possible that the differences shown in Experiment 1 are due to a delay in the processing of informational status of the word marked with infelicitous L+H*. We designed Experiment 2 to eliminate this possibility and confirm the facilitative effect of felicitous L+H* during the visual search. In addition, we reasoned that if the presence of an L+H* accent on the adjective was the source of an expectation of contrast, and thus could lead listeners to anticipate the immediate repetition of the noun target, L+H* accents should also function to mislead listeners in a case where the upcoming noun is not a repetition. Experiment 2 was designed to include such misleading trials.
Experiment 2 was conducted to confirm and explore the primary findings from Experiment 1. First, we addressed whether the early fixations to target objects in the felicitous adjective contrast condition were genuinely due to the anticipatory use of L+H* on the contrasting color adjective. In Experiment 1, we compared felicitous to infelicitous pitch accent sequences for adjective contrast trials (e.g. blue ballGREEN ball vs. blue ballgreen BALL), leaving open the possibility that differences were due to processing difficulty caused by the presence of an infelicitous L+H* on the target noun. In the current experiment, we compare felicitous adjective contrast to simple absence of a contrastive accent. In the new neutral condition for comparison, the adjective carried a H*, and the following noun a downstepped !H* (e.g. blue ballGREEN ball vs. blue ballgreen ball), which was the most frequent accent sequence for adjective noun pairs in non-contrastive sequences in our previous production study (Ito, Speer & Beckman, 2003). Thus any differences between these conditions should be attributable to the presence or absence of a felicitous L+H*. Second, we more rigorously test our contention that L+H* on a color adjective evokes a set of alternatives, increasing listeners’ expectations that the most recently mentioned target noun will be repeated in the current utterance. If this is so, not only should a L+H* on a color adjective provide an advantage when the just-mentioned noun is repeated, but it should also mislead listeners to fixate on the previously mentioned object cell when the following noun is not repeated (as in the sequence, red angelBLUE drum). Thus, when the color adjective carries L+H* in a sequence where the object is not repeated, initial fixations should be to the (incorrect) cell of the preceding object (e.g., angel), with correct target cell fixated later, based on the noun information. If the contrastive accent on the adjective really guides anticipatory selection of the target noun, we should observe a very early increase of fixations to the just-mentioned object cell triggered by L+H* on the adjective, regardless of the segmental information that identifies the following noun. As a result of accentual misguidance (a pitch accent–based ‘garden path’ effect), we predict that fixations to the real target object cell should be distinctly delayed.
Thirty-six native speakers of American English were recruited at the Ohio State University. None of them participated in Experiment 1. They received partial credit toward a course requirement for their participation.
Ornaments were identical to those used in Experiment 1, but different adjective noun combinations were created for use in the conditions of Experiment 2. There were four trees to be decorated, each with 26 ornaments (16 targets and 10 fillers), and two trials in each of the four conditions shown in Table 4. Again, each color and each object was mentioned at least once, but no more than three times in each tree. Comparison of the Felicitous and Neutral Adjective Contrast conditions allowed us to test whether the effect of felicitous L+H* is facilitative. In the Felicitous and Infelicitous No Contrast conditions, comparison of early eye movements to the target noun cell with those to the cell mentioned on the immediately preceding trial allowed us to test for ‘garden-path’ effects of infelicitous L+H*. Critical conditions were again balanced for absolute word duration and number of syllables in adjectives and nouns. All words had first-syllable stress. (Mean number of syllables and duration in critically compared conditions: Felicitous: adjective 1.2, 327ms; noun 1.5, 472ms; Infelicitous: adjective 1.5, 341ms; noun 1.6, 483ms; Adjective contrast: adjective 1.2, 334ms; noun 1.6, 479ms; No contrast: adjective 1.5, 335ms; noun 1.6, 476ms). As Table 4 shows, we constructed No Contrast conditions so that although neither the adjective nor the noun was immediately repeated from the previous trial, both of these words had been recently used in the instructions sequence, and thus were repeated word mentions. This was done to neutralize any effect that previous mention of the words, or previous fixation of the relevant ornament cell might have on fixations in these conditions.
Table 4
Table 4
Experiment 2: Information status and accent pattern for the eight conditions.
The same female speaker recorded auditory stimuli with the identical recording settings. Stimuli were re-recorded until the same two ToBI annotators independently confirmed that they bore the intended intonational patterns. Table 5 shows the mean duration of the verb, the article the, the color adjective and the object noun, and the mean peak F0 values of the adjectives and nouns of the target instructions used in Experiment 2.
Table 5
Table 5
Experiment 2: Mean duration and F0 values of the target stimuli
The eye-tracking procedure in Experiment 2 was identical to that of Experiment 1. Four grids of ornaments were used to decorate four trees, and participants were given the same instructions for the task and the same practice trials. The only differences between Experiments 1 and 2 were the order of decoration, the distribution of adjectives and nouns into item pairs in conditions, and the intonational patterns of the instructions. Note, adjective-noun pairs in Experiment 2 were assigned to different conditions than they were in Experiment 1. Thus a comparison examination of like conditions that occur in both experiments (specifically, felicitous and infelicitous L+H* no accent trials) provides a test of whether the effects in these conditions generalize across different sets of items.
Results and Discussion
Due to experimenter error in the sequencing of trials on one of the trees, we had to eliminate one trial in the Infelicitous Adjective Contrast condition from every participant’s data (Neutral Adj Contrast Condition: 36/288 observations missing).
Figure 5 shows the proportions of fixations to the target cells averaged across participants for the four critical conditions, aligned from the onset of the noun in the same manner as in Figure 4. Again, these functions show fixation initiation times calculated from saccade onset, and proportions give the number of fixations to target divided by the number of possible fixations during each 17ms camera cycle (36 Ps × 8 trials per condition = 288, except for Infelicitous Adjective Contrast condition = 252). Statistical analyses consisted of 2 (discourse contrast) × 2 (accent type) repeated measures ANOVAs conducted on arcsine transformed fixation proportions for each 300ms time window. Table 6 summarizes the results of critical comparisons. The floating bars in Figure 5 indicate the relevant confidence intervals for the time regions where the mean differences reached statistical significance in the top two comparisons in Table 6. Values of the confidence intervals presented were back-transformed from arcsine-transformed proportions.
Figure 5
Figure 5
Experiment 2: Proportion of fixations to the target object cell, aligned from noun onset in the four critical conditions.
Table 6
Table 6
Experiment 2: Results comparing fixation proportions (n = 36), Subject analysis
Figure 5 compares the Felicitous and Infelicitous Adjective Contrast conditions, where the same ornament name was immediately repeated to create contrastive status for the color adjective (compare to Figure 4, Experiment 1). As in Experiment 1, the fixation proportions for both conditions begin to increase from the onset of the target noun, but more sharply in the condition where L+H* felicitously marked the contrastive status of the color adjective than in the neutral condition with H*. The functions diverge very early, within 100 ms of noun onset, and the fixation proportion reaches its peak much earlier in the felicitous condition with L+H*. This pattern of results confirms that the felicitous use of L+H* facilitated visual search by prompting anticipatory identification of the target referent on the basis of information from the adjective. Note that mean word duration for the color adjective was comparable between L+H* and H* in Experiment 2. In fact, the adjectives with L+H* in Felicitous Adjective Contrast condition had numerically shorter average duration than the adjectives with H* in Neutral Adjective Condition (Table 5; vertical reference lines in Figure 5). Therefore, it is unlikely that the durational differences in the adjectives led the earlier more frequent fixations to the target in the Felicitous Adjective Contrast condition. Correlation analysis confirmed no reliable relation between adjective duration and the first fixation latencies in all four critical conditions (r=.105; .164; −.235; −.073 for Felicitous Adjective Contrast, Neutral Adjective Contrast, Felicitous Both Already, and Infelicitous Both Already, respectively.)
Figure 5 also shows mean fixation proportions for the Felicitous No Contrast condition with [H* !H*] accent pattern and the Infelicitous No Contrast condition with [L+H* no accent] pattern (e.g. red angelblue drum vs. red angelBLUE drum). Results indicate that an infelicitous contrastive accent did indeed delay fixations to the target noun cell. The fixation proportion was higher for the Felicitous No Contrast condition than for Infelicitous No Contrast condition from the beginning of the noun. While the mean fixation proportion for the felicitous condition rose steadily from about 200 ms after noun onset, that for the infelicitous condition did not begin to rise until approximately 300 ms after noun onset. A consistent gap persisted between the two conditions until beyond fixation proportions of above 50%. As the Felicitous No Contrast condition involved neither a repetition nor any contrastive accent, we may consider its fixation proportion function as the baseline for the visual search task. Thus, the difference between the Felicitous No Contrast condition and the Infelicitous No Contrast condition, which was sustained throughout the noun, indicates delayed processing due to misleading pitch accent information. At the same time, the larger difference between the fixation proportion functions for the Neutral Adjective Contrast and the Felicitous No Contrast conditions indicates the pure effect of repetition.
Figures 6 and and77 compare fixation proportions within the two No Contrast conditions in order to examine the hypothesis that the presence of a L+H* contrastive accent on the color adjective prompted listeners to actively search for a target object that contrasts in color with the target from the immediately preceding trial. Fixation proportions were averaged across participants for the mentioned target noun and the previous, i.e., incorrect target noun in the infelicitous [L+H* no accent] (Figure 6) and felicitous [H* !H*] conditions (Figure 7), both aligned from the onset of the noun. Table 7 summarizes the results of comparisons of mean fixation proportions between the target and the incorrect target for the two conditions. The floating bars in Figure 6 and and77 indicate the confidence intervals for the regions where the mean differences reached statistical significance shown in Table 7.
Figure 6
Figure 6
Experiment 2: Proportion of fixations to the mentioned vs. previous target object cells, aligned from noun onset in the Infelicitous Adjective Contrast [L+H* no accent] condition.
Figure 7
Figure 7
Experiment 2: Proportion of fixations to the mentioned vs. previous target object cells, aligned from noun onset in the Felicitous No Contrast [H* !H*] condition.
Table 7
Table 7
Experiment 2: Results comparing fixation proportions (n = 36), Subject analysis
Figure 6 shows fixations to the target ornament cell vs. the immediately preceding ornament cell for instructions with L+H* infelicitously produced on a color adjective that modified a non-repeated object (e.g. red angelBLUE drum). The data show a clear increase in incorrect fixations to the immediately preceding target due to the presence of the infelicitous L+H*. Notice that the increase in incorrect fixations began before the onset of the noun and peaked before the noun ended, approximately 250 ms after its onset. This serves as incontestable evidence for the anticipatory effect of contrastive L+H* on the prenominal adjective. The early fixations to the incorrect target (e.g., angel) appeared before the object noun could have been recognized, and they continued to increase even after segmental information from the noun might have been used to guide fixations to the correct target cell. The fact that the increase in fixations to the correct target cell was delayed until after 300 ms into the noun suggests that processing was considerably disrupted by the conflict between the anticipated referent and the incoming noun.
In contrast, Figure 7 shows no such increase in looks to the immediately preceding target when the same no contrast sequence was produced with a felicitous [H* !H*], dispelling any concern that participants may have used a general strategy of looking back to the most recent cell. The figure instead shows that participants rarely fixated the previous target throughout the felicitously produced target noun phrase.
The results of Experiment 2 confirmed the anticipatory effect of L+H* on the contrastive adjective and demonstrated that this effect was facilitative in nature. That is, listeners planned and executed saccades to the just-mentioned ornament cells earlier when the color adjective carried contrastive prominence L+H* than when it was heard with the less prominent accent H*. In addition, even an infelicitous L+H* on the adjective led to strong anticipation, yielding an increase in incorrect fixations to the immediately-preceding target ornament cell. The timing of the incorrect fixations indicates that listeners planned and executed saccades immediately upon hearing contrastively accented adjectives. It is important to note that these results do not simply confirm the incremental processing of intonational cues shown by Dahan et al. (2002), but also demonstrate how accentual cues interact with the grammatical roles of words to anticipate upcoming input. In both Experiment 1 and 2, fixation patterns showed that listeners quickly integrated the modifier role of the color adjective with its accentual prominence L+H*, assigning contrastive discourse status to the color adjective against the preceding ornament’s color, and simultaneously constraining the candidate referent for the upcoming noun to be the same ornament type as the immediately-preceding trial. This integration took place rapidly enough to show an increase in fixations to the target from the onset of the noun, indicating that ballistic saccades were planned and executed based solely on the adjectives’ grammatical role and their intonational information. The robustness of this anticipatory effect was demonstrated with the infelicitous L+H* combined with non-repeated objects, which acted like a false alarm to automatically trigger garden path eye movements to the incorrect cells.
Although Experiment 1 and 2 confirmed the anticipatory effect of L+H* within a noun phrase, such an effect may have a limit in its scope. Both Dahan et al. and Experiment 1 and 2 in the present study show that the prominent accent L+H* evokes contrast and directs listeners’ attention to the alternative(s) available in the given task environment. However, we have not yet addressed the question of the range of L+H* accents in foreshadowing contrast: At what point in the discourse does a prominent L+H* elicit upcoming contrast, thereby leading to anticipatory eye movements? Listeners constantly update their discourse representation to integrate information about the actions and references involved in conversation, and thus a model of intonational processing in discourse must capture the domain of accentual-cue integration. Extending the designs of previous two experiments, Experiment 3 tested the scope of the anticipatory effect of L+H*.
In the tree decoration task used in Experiments 1 and 2, each audio instruction started either with a phrase that specified the direction of decoration such as “On the right” and “To its left”, or with other discourse markers that smoothed the transition between trials such as “And now,” “Next,” and “Finally.” As mentioned earlier, Dahan et al.’s instructions always contained “Now” sentence-initially. Such discourse markers (DMs) establish relations between the statements they introduce and the previous utterances, and thus their accentual patterns may affect the process of updating the discourse status of already-mentioned referents and assigning discourse status to upcoming referents.
Using the same tree decoration task, Experiment 3 investigated whether L+H* on the temporal adverbial DMs And next, And then, and After that evokes a contrastive interpretation of the upcoming instruction. During a natural discourse or narrative, such DMs signal temporal continuity between the previously described events and those about to be described. Although the speaker may make spontaneous decisions about what to emphasize in the upcoming utterance, the listener may develop an expectation about the focus of upcoming message according to what has been emphasized in the previous utterance and how the DM introduces the next event. Within the task environment of the present study where the action itself is repetitive (i.e., hanging a series of ornaments on the tree), the primary focus of each instruction utterance is the noun phrase that names the target ornament. Thus, the contrastive accent on a DM in a repetitive command “And THEN, hang the…” might cue an upcoming contrast between the preceding ornament and the next one. Indeed, we noted an intriguing difference in the timing of effects between Experiments 1 and 2 in Dahan et al (2002) that may be partly due to a difference in accent patterns on the DM. Although DM accent patterns were not strictly controlled or compared in these studies, the majority of trials in the Experiment 1 deaccented conditions contained a L+H* accent on the word ‘Now,’ while in Experiment 2, ‘Now’ never carried a L+H*. Overall, looks to the preferred object in their Experiment 1 occurred very early, with significant differences at the onset of the target word, while comparable effects in Experiment 2 were not established until approximately 200ms after the target word onset. Although this difference is anecdotal, it is possible that the DM L+H* might have contributed to the very early differences found in Dahan et al’s Experiment 1.
In addition, the results of our own Experiment 1 suggest that listeners may be differentially sensitive to accentual cues, depending on their usefulness in a given task environment, such as that for visual search. Comparison of contrast-on-color vs. contrast-on-object trials showed early use of the contrastive L+H* information on color adjectives to guide a search of ornaments sorted by type. However, when the color was repeated and thus the listener had to wait for the noun information to identify the referent, the accentual pattern of the adjective did not have an immediate impact on eye movements. This may indicate that the listeners established very selective use of intonational cues, attending more to those cues most relevant for their particular visual search task. This leaves open the possibility that environment-based strategic tuning to the adjective accent yielded the anticipatory early fixations to the target in Experiment 1 and 2 and the robust ‘garden-path’ eye movements to the incorrect target due to the contrastive accent in the non-contrastive context in Experiment 2. If the participants were strategically tuning to the color contrast in the present task environment, a contrastive L+H* on a DM may also be interpreted as the signal to a contrast on the color rather than on the object type, and this selective interpretation of the accent may trigger anticipatory fixations to the immediately-preceding ornament cell. In other words, the task environment may restrict the domain of contrast signaled by L+H* on a DM, and this contrastive accent at the beginning of utterance may lead to anticipatory eye movement even before the noun phrase information is available. The absence of such anticipatory eye movement would eliminate the possibility that the effect of contrastive accent on the adjective reported in Experiment 1 and 2 were the product of environment-specific attention tuning.
In Experiment 3, DMs had either [L+H* L-H%] or [H* L-H%] intonation on the adverb (e.g. And NEXT vs. And next). Table 8 shows the four critical conditions and their accentual patterns. The two accentual patterns were crossed with two accentual patterns on target ornament phrases. Either the ornament was mentioned with a [L+H* no-accent] pattern in a contrast-on-color sequence (e.g. blue drumGREEN drum), or both the adjective and the noun had already been mentioned and appeared again in a non-contrastive sequence, produced with [H* !H*] (e.g. orange candygreen drum). Thus, DM accentual patterns were either matched or mismatched with the discourse status and intonational marking of the target ornament. In one of the two matched conditions, both the DM and the color adjective had L+H* (“First, hang the blue drum”“And THEN, hang the GREEN drum”), whereas in the other, neither the DM nor the adjective had contrastive L+H* (“First, hang the orange candy”“And then, hang the green drum”). In the two mismatched conditions, only the DM or the adjective had L+H* (“First, hang the orange candy” → “And THEN, hang the green drum”; “First, hang the blue drum” → “And then, hang the GREEN drum”). If L+H* on a DM is sufficient to evoke a contrastive interpretation of the upcoming referent’s color, then early fixations to the just-mentioned ornament cell should be observed even before listeners hear the target ornament phrase. On the contrary, if L+H* on a DM is not a strong enough cue to lead to such anticipation, yet plays some role in assigning the contrastive status to the current utterance, its accentual relevance may be evaluated against the informational status of the target ornament NP.
Thirty-six native speakers of American English were recruited at the Ohio State University. None of them participated in the previous two experiments. They received partial credit toward a course requirement for their participation.
Design and Materials
The same set of ornaments as those in Experiment 1 & 2 were used in Experiment 3, with adjective-noun pairs assigned to conditions in a pattern different from that used in the previous experiments. Each participant decorated four trees, which had 26 ornaments each (16 targets and 10 fillers). Each tree had two trials in each of the four conditions shown in Table 8. Each color and each object was mentioned at least once, but no more than three times in each tree. Conditions were balanced for duration and number of syllables in adjectives and nouns. Duration was balanced for adjective accent conditions, but this was not possible for nouns, as unaccented nouns are unavoidably shorter than accented nouns, even in sentence-final positions. All words had first-syllable stress. (Mean number of syllables and duration in accent conditions: L+H* adjectives: 1.2, 324ms, H* adjectives: 1.3, 329ms. !H*nouns: 3.3, 519ms, Unaccented nouns: 3, 465ms.)
In Experiment 3, the critical comparisons were among the matched and mismatched accentual patterns when the color adjective was contrastive (Matched Both vs. Mismatched Adj), and when there was no immediate contrast (Matched Neither vs. Mismatched DM). Each of the three discourse markers “And next,” “And then,” and “After that” appeared two to three times in each of the four conditions across the four trees. These discourse markers were produced with either L+H* or H* on the adverb, followed by a L-H% phrasal-boundary tone combination. The discourse markers of the other eight trials on each tree were varied according to the status of each instruction such that they conveyed natural local discourse contexts (e.g. “At the top”, “Starting on the right”, “Moving to the left”, “Following that”, etc.). The accentual patterns of those DMs were either [H* H-L%] or [L+H* H-L%]. Four of the ten filler instructions had one each of the four accentual patterns of the critical conditions, while the remaining six instructions had [H* H-H%] for their DMs. These accentual patterns were selected based on the filler items in Experiments 1 and 2, so that the variation in the DMs and their accentual patterns were comparable across experiments.
The same female speaker produced the instructions for Experiment 3 with the identical recording setting as in Experiments 1 and 2. The instructions were re-recorded until the same two independent ToBI annotators confirmed the accentual patterns were produced as intended. The intervals between the offset of the DM and the onset of verb were manually edited to fall between 300 and 350 ms (mean 339ms, stdev 26ms). The mean duration and the F0 peak value of the last word of the discourse marker, the color adjective and the object noun are shown in Table 9.
The eye-tracking procedure in Experiment 3 was identical to that of Experiments 1 and 2. Four grids of ornaments were used to decorate four trees, and participants were given the same instructions for the task and the same practice trials. The only differences between Experiment 3 and the preceding two experiments were the distribution of items across conditions, the order of decoration and the intonational patterns of the instructions.
Results and Discussion
Results revealed no immediate anticipatory effect of L+H* on the DM itself. Figure 8 shows the proportions of fixations to the target cells averaged across participants for the four critical conditions, aligned from noun onset. Fixations between the DM and the noun onset are shown with negative time values and vertical lines indicate the mean duration of each word in the instruction as well as the duration of the pause following the DM. The fixation proportion functions before noun onset give no indication of anticipatory eye movements due to contrastive accent on the DM. In all four conditions, participants rarely fixated the target cell before the noun onset. Repeated measures ANOVAs on arcsine transformed proportions showed no effect of DM accent in the six 300 ms windows before the noun onset (the only window of F>1 was −1200 – −1500 ms: F=1.681, p=.2). The absence of fixations to the target before the noun in the Matched Both trials (e.g., “And NEXT, hang the GREEN drum”) suggests that a L+H* on a DM did not trigger the anticipation of upcoming color contrast.
Figure 8
Figure 8
Experiment 3: Proportion of fixations to the target object cell, aligned from the noun onset in the four critical conditions.
In order to confirm this null effect of DM contrastive accent, fixations to the target cell were compared with the fixations to the incorrect target cells (i.e., the preceding trial’s target) in the Mismatched DM trials (orange candy → “And NEXT, hang the green drum”; incorrect target = candy) If a L+H* on the DM acted as a false-alarm in a way similar to the infelicitous L+H* on the adjective in the non-contrastive trials of Experiment 2, a similar increase in fixations to the incorrect target would be observed before the noun phrase. However, as shown in Figure 9, the contrastive accent on the DM triggered no garden-path eye movement before the listeners fixated correct target as the noun information became available. In fact, participants rarely looked at the previous target throughout these trials. Instead, fixations to the correct target began after the noun increased steadily afterwards. (The results of means comparisons are given in Table 10.) Therefore, it is very unlikely that the participants developed the strategy of using a selective interpretation of L+H* to perform the visual search task in the present experiment.
Figure 9
Figure 9
Experiment 3: Proportion of fixations to the mentioned vs. previous target object cells, aligned from noun onset in the Mismatched DM condition.
Although L+H* on the DM did not lead the anticipatory eye movement to the immediately preceding ornament cell, the presence of L+H* may have played some role in foreshadowing the contrast status of upcoming referent. This effect appeared in the much later time region for the comparison between the Matched Both and the Mismatched Adjective trials, and can be seen in Figure 8. Note that in both conditions where the contrast-on-color was expressed by L+H* on the adjective, the fixation proportions increased sharply right after the onset of the noun, replicating the patterns for Felicitous Adjective Contrast trials in Experiment 1 and 2. This immediate increase in fixation proportions from the beginning of the noun again indicates that the saccades to the target were planned during the preceding adjective carrying a contrastive L+H*. While the fixation proportions continue to rise until about 700–800ms after the noun onset in both conditions, a remarkable difference appears as the fixation proportion function declines. As compared to the plateau above .8 line in the Mismatch Adj condition, fixation proportions in the Matched Both condition start dropping sharply after about 1000 ms from the noun onset. This rapid declination in Matched Both trials indicates that the participants’ eyes left the target cell quickly after they identified the correct ornament, and this departure was facilitated when the accentual pattern of DM had signaled a contrast than when it had not. Assuming that the decrease in fixations indicates an attention shift from the target cell, we speculate the role of L+H* on the DM as follows. First, contrastive accent on the DM may have prompted a contrast between the preceding and the upcoming referent without restricting the interpretation to the color contrast. Then the L+H* on the color adjective triggered the anticipatory selection of upcoming referent, which initiated the increase in fixations to the previously-mentioned target cell. When the contrastive status of the referent was confirmed with the noun information, participants could initiate the action to reach for the ornaments on the board. At the same time, they may have started shifting attention to the tree where they had to place the ornament. We suspect that the presence of early contrastive accent on the DM may have facilitated the confirmation of the contrastive status of the referent, and thus speeded the attention shift.
Posthoc means comparisons were conducted between critical conditions in the time windows beginning at −300ms. Mean fixation proportions in the Mismatched Adj condition were significantly higher than those for the Matched Both conditions in the 0–300 ms window as shown in Table 11. This result was unexpected, but it is unlikely that the narrow gap between the initial rise in these two functions indicates a genuine difference in processing ease due to the accentual property of DM. Instead, we view the sharp rises in the two functions together as the demonstration of a ceiling on the anticipatory effect of contrastive accent on the adjective.
Interestingly, we found an early and continuing series of significant advantages for the Mismatched DM condition over the Matched Neither condition in −300 – 0 ms, 0–300 ms and 300 – 600 ms windows (see Table 11). The Mismatched DM condition (e.g., “Hang the orange candy. And NEXT, hang the green drum”), is the condition that is intonationally most like Dahan et al’s (2002) Experiment 1 deaccented condition, where the sentence pair was, “Put the candle above the star. NOW put the can…” However, the two conditions differ substantially in the informational content of their accent patterns. In Dahan et al’s materials, the felicitous use of the contrastively accented DM followed by a deaccented target noun involved the repetition of the immediately preceding noun target. In the current experiment, neither the adjective nor the noun was repeated. We speculate that the presence of L+H* on the DM, when compared to the absence of such contrastive cue, may have simply had a generalized attentional effect, speeding processing of the repeated target in Dahan et al (2002) as well as the non-contrastive target in the current experiment, resulting in an effect that surfaced once the noun phrase information became available. This effect may be present but not measurable in our Matched Both condition due to the ceiling effect discussed above. Although this is a plausible account of the present outcome, further research is required to explore the general function of the contrastive accent on the discourse marker.
The results of Experiment 3 demonstrated that contrastive accent on a DM does not evoke the strategic expectation of a contrast in the upcoming utterance. Participants in the present study did not develop the use of a strategic interpretation of contrastive accent even with the repetitive commands to search ornaments sorted by object types, which may have highlighted the possible advantageous use of L+H* for the adjective but not for the noun. Although L+H* on the DM may have signaled upcoming contrast, or may have simply provided an additional attention-orienting cue, it did not specify the type of contrast, and thus it did not trigger early fixations to the immediately preceding ornament cell. The present results suggest that a contrastive accent on the DM is evaluated when the informational status of the referent becomes available with the noun phrase naming the referent. The early prompt of the contrast seems to facilitate the attentional shift after the contrast is confirmed at the later point in the utterance.
We have presented three eye-tracking experiments examining the role of pitch accent in discourse comprehension during a relatively complex real-world visual search task. Participants followed pre-recorded instructions for a tree decoration task that necessitated a search through over 40 to 52 small ornaments in an organized grid display. Each instruction was followed by a visual search and the hanging of the selected ornament (for some, a relatively delicate manual process). This naturalistic task captured participants’ attention, and combined with head mounted eye movement monitoring, allowed us to measure implicitly the time course of listeners’ use of intonational patterns to anticipate and recognize discourse referents. Even with the inclusion of intonationally infelicitous trials, post-experiment debriefing interactions with participants informed us that they uniformly did not consider accents on words as a possible focus of the study, with few even attending to intonation. Many participants thought that the experiment was testing their responses to color distribution or their memory of the objects’ locations (ornament boards vs. trees).
Experiment 1 compared felicitous and infelicitous use of L+H*. When L+H* felicitously marked contrast on a color adjective modifying a repeated noun (“First hang the green drum.” → “Next, hang the BLUE drum.”), fixation proportions to target cells increased more quickly than when L+H* infelicitously marked the immediately repeated noun (“Hang the blue onion.” → “Next, hang the blue DRUM.”). In contrast, felicitous L+H* on the noun (“Hang the blue onion.” → “Then, hang the blue DRUM.”) did not lead to an early increase in fixations to the target as compared to infelicitous L+H* on the immediately repeated color adjective (“First, hang the blue onion.” → “Then, hang the BLUE drum.”). The effect of felicitous L+H* on the noun appeared after the noun’s segmental information was fully available. The results of Experiment 1 suggested that listeners may have ‘tuned’ to tonal cues that were relevant to the task environment, where no two objects within a cell had the same color and different objects of the same color were distributed across cells. Within this visual context, a contrastive accent on the non-repeated prenominal adjective was useful to predict the upcoming object type, and participants were able to rapidly integrate pitch accent information to guide visual search. In contrast, the pitch accent pattern on repeated adjectives did not lead to an immediate difference in eye movements. While these results demonstrated clear effects of pitch accent type and location on establishing a contrast set from which to choose the upcoming referent during discourse comprehension, they could not establish whether the effects were due to a processing advantage conferred by felicitous use of L+H*, or instead to disruption from infelicitous use.
Experiment 2 confirmed that the presence of L+H* (as opposed more neutral H*) on a non-repeated adjective provided a processing advantage, triggering the selection of a candidate for the following noun. The anticipatory nature of the effect was demonstrated by ‘garden-path’ eye movements when the pitch accent pattern on a non-repeated adjective was misleading. Upon hearing a contrastive L+H* accent on the color (red angelGREEN drum), participants immediately fixated the incorrect, previously-mentioned object cell. The effect was clearly due to the accented adjective, because it began before segmental information for the noun became available. Importantly, the proportion of fixations to the incorrect object cell continued to increase from before noun onset until 300 ms into the noun. That is, listeners’ eyes continued to be drawn to the anticipated object even as they listened to conflicting segmental information. Such incorrect initial fixations were not observed in the absence of L+H* on the adjective modifying the non-contrastive referent, ruling out the possibility that participants had simply developed a strategy of looking back to the previous cell.
In Experiment 3, L+H* was placed on discourse markers (e.g., And NEXT) that preceded the repetitive command “… hang the …” to test whether participants had developed a selective interpretation of the presence of a L+H* as ‘contrast-on-color’ as a strategy to perform the visual search task. No sign of specific anticipation due to the contrastive accent on a DM was observed in the eye fixation patterns before the target noun phrase. Instead, when a L+H* on the DM prompted an upcoming contrast on the color, the attention shift (presumably to the tree) seemed to be speeded. In addition, L+H* on the DM may have generally drawn attention to the target noun phrase, as indicated by early more frequent fixations to the non-contrastive target.
Our results are consistent with those from previous eye movement studies that have demonstrated very early use of prosodic information during real-time processing, with anticipatory looks to targets made on the basis of intonational cues and before confirming lexical information (e.g. Snedecker & Trueswell, 2003, Dahan et al, 2002). The effects support a model of spoken language processing that assumes immediate, parallel processing of segmental and suprasegmental information such as pitch accent, despite the latter’s distribution across multiple phonemes in the speech stream. Our participants showed immediate sensitivity to the presence of and type of pitch accent, integrating it with information from the discourse representation, and using it to speed ongoing identification of the object of visual search. Taken together, the results of the three experiments presented here also make two novel contributions to our understanding of how listeners use pitch accent information to establish referential domains during discourse comprehension. Specifically, the work demonstrates how the immediate integration of pitch accent information into the discourse representation can generate an expected referent, and also how grammatical roles of words and referential context constrain the domain of accent-based referential resolution.
First, a L+H* on a prenominal adjective immediately evoked a contrast between the accented discourse entity (i.e., the accented color) and the most salient entity that shared the same grammatical role in the discourse background (i.e., just mentioned color). Simultaneously, this contrastive link between the two prenominal modifiers evoked a mapping between the two modified nouns, projecting a specific candidate in the discourse foreground. The data indicate that eye movements to incorrect targets were planned based on the accentual information of the prenominal modifier, and executed even in the presence of conflicting segmental information. Because saccades are ballistic motor movements, they cannot be re-programmed after they are initiated. Thus they may not reflect the exact time course of processing of conflicting speech signals. However, the finding that the misleading intonational cue delayed fixations to the object even while its phonemic information was available strongly suggests that pragmatic information from contrastive accent is processed immediately, on par with other acoustic information in the speech stream. Not only did pitch accent produce an incremental update of the informational status of the currently processed word, but it also initiated predictive lexical access.
Second, our results show that the effect of accentual cues seems to be constrained by the discourse/grammatical role of the word conveying the accent. Although we found a robust anticipatory effect with the contrastive accent on the adjective, the same prominent accent did not produce an equivalent effect in the discourse marker position. Although speakers often use contrastive accent on DMs to draw the listener’s attention to a specific (or maybe an entire) part of upcoming message, because DMs are largely independent of the syntactic structure of the following utterance, listeners have insufficient information to generate specific hypotheses about upcoming referent possibilities. In contrast, a determiner-adjective sequence provides enough information for the listener to project a head noun. Presumably, the accentual cues can be integrated faster when they accompany words whose grammatical roles constrain the upcoming informational structure. At the same time, the accentual cues associated with particular grammatical roles may be constrained by referential context, as demonstrated by the weaker effect of accentual cues on the repeated adjective in the present study’s search task environment. Thus, our present results suggest that both referential context and grammatical structure may define the domain and the strength of accentual effects. Further research is needed to explore both the scope of referential constraint and the scope of syntactic constraint on the effect of accentual cues during speech comprehension.
The present study demonstrated robust, pervasive effects of contrastive accent on the processing of discourse referents during visual search. The current experiments stand in contrast to previous work by Sedivy et al (1999), which failed to demonstrate any effect of intonationally marked contrast with substantially similar materials. In their Experiment 1B, Sedivy et al. (1999) monitored participants’ eye movements while they followed auditory commands to touch one of four objects: a minimal contrast pair with a pronominal modifier (e.g., a pink comb and a yellow comb), a competitor that shared the contrast property (e.g., a yellow bowl), and a distracter that did not share the contrast (e.g., a metal knife). For each critical trial, participants first heard an instruction that mentioned one of the minimal-pair objects (e.g., “Touch the pink comb.”). The following instruction mentioned either the counterpart of the minimal pair (e.g., the yellow comb) or the competitor (e.g., the yellow bowl), and modifiers were produced either with L+H* or H*. Results showed that modifiers were immediately interpreted as contrastive, with fast eye movements to the minimal-pair counterparts. However, no effect of the H* vs. L+H* accentual difference was observed.
We notice three major differences between Sedivy et al.’s work and that presented here, which may have led to the difference in findings across the two studies: display complexity, informativeness of the adjective modifier in the discourse context, and consistency of phonological information in the spoken materials. First, display complexity differed substantially across the studies, with four objects in Sedivy et al, and more than ten times that number in the current experiments. In a set of four, the relationship between the minimal pair and competitor objects would be salient to participants, who observed each display change for about 20 seconds before the initial instruction on each trial. Thus the preview and relatively simple display may have allowed listeners to establish a double contrast (pink/yellow comb and yellow comb/bowl). We suspect that this display-oriented overt contrast, rather than the general ‘contrastive interpretation’ of the adjectival modifier (Sedivy, et al., 1999, p127) led to a ceiling effect (mean fixation latencies ranged from 270–281ms). In the present experiments, ornaments were sorted into 11 cells, and each cell contained three to five ornaments. This visual complexity engaged participants in visual search rather than simple object selection from a known field. In addition, in the current study there was no display-oriented referential bias, as there were multiple competitors of the same color across cells. In other words, participants had no way of guessing the next target ornament based on non-linguistic contrast within the display. In order to make a direct comparison of the timing of effects across the two studies, we calculated the average latency of first fixations for each condition in Experiment 1 and 2. The average first fixation latencies for the felicitous L+H* conditions in Experiment 1 and 2 were 407ms and 388ms, respectively. We argue that the approximate 120 ms difference in the fixation latency across the two studies was driven by the inequality in display complexity and in non-linguistic contextual bias. The relatively slow increases in fixations in the present experiments may be due also to the visual complexity in the experimental setup. (For the detailed discussion on the relation between the visual complexity, preview time and fixations in scene perception, see Henderson & Ferreira, 2004).
A second difference between studies is in the informativeness of the prenominal modifiers used. Sedivy et al. (1999) posit that their adjectives were interpreted contrastively regardless of their accents because the wide range of modifiers used increased their informativeness (objects were described with size, color and material terms, and also presented unmodified). They argue further that the presence of a modifier in the initial instruction (pink comb) may have drawn extra attention to the contrasting modified object (yellow comb) during the initial instruction. Thus excessive informativeness of modifiers across the experiment led to very fast eye movements to the minimal pair object (yellow comb) as compared to the competitor (yellow bowl) – a ceiling effect that obscured the effect of felicitous contrastive accent. We instead attribute the ceiling effect to the varied informativeness of modifier within a display – while the modifier was informative for minimal pair members (pink/yellow comb), the modifier was also used in a less informative manner to describe the single object that shared color with a member of the minimal pair (yellow bowl). When the modifier conveyed unnecessary or confusing information about the target, participants may have interpreted it as the modifier of the contrastive object (e.g., the yellow comb) which required the modification in order to be distinguished from the other member of the pair. We suggest that the overt visual contrast led to frequent looks to the contrastive object regardless of the accentual property on the modifier. In the present study, the informativeness of color modifier was consistent. There were multiple ornaments of multiple colors, and no ornament could be singled out without a color modifier. Therefore, the color modifier in each instruction was equally informative, allowing the examination of the effect of accent uninfluenced by differences in informativeness.
Finally, the phonetic consistency of instructions is crucial for examining prosodic effects. Sedivy et al. (1999) gave the oral instructions by reading aloud the script, pronouncing L+H* accents in the ‘Stress’ condition and H* in the ‘No stress’ condition. Since no prosodic transcriptions or acoustic analysis of the instructions were provided, it is difficult to ascertain the phonetic distinction between the two conditions. Pitch range and speech rate are highly variable within speakers, and it is not easy to consistently pronounce equivalent tunes across utterances. It is possible that some L+H* instructions of Sedivy et al. (1999) were produced with less (or more) prominent accents than our speech materials. The present study employed careful ToBI analysis for screening the speech stimuli. Although the debate remains open over phonological distinction between L+H* and H* (Ladd and Morton, 1997; Ladd and Schepman, 2003), we ensured that the accentual values of our speech stimuli were phonetically distinct across conditions. Although we do not wish to devalue the importance of investigating the online processing of non-cardinal accents produced in spontaneous speech, we feel it is critical to provide phonetic and phonological analysis of experimental materials used in research on processing and intonation. Current work in our laboratory examines the production and detection of categorical boundaries across different accent types in online spontaneous dialogue comprehension.
The research presented here adds to a growing body of work that employs natural tasks to structure the attention and intentions of interlocutors during discourse production and comprehension. Here, we have successfully used a visual search task and head mounted eyetracking to examine the time-course of contrastive pitch accent use. We feel that the combination of such measures and tasks with careful phonetic and phonological analyses of spoken materials will lead to an accurate characterization of the use of intonation in language processing.
The research presented in this paper was supported by NSF award BCS 04-18464 and NIH grant DC007090. We thank Ping Bai for assistance with data analysis, Mary Beckman for her interest and support throughout the course of this work, Allison Blodgett and Laurie Maynell for assistance with creating and ToBI-annotating our spoken stimuli, Claire Shank and friends for help with the creation of ornament sets, and three anonymous reviewers for their insightful and thorough comments.
Appendix A Experimental trials for Experiment 1
TrialConditionAdjectiveNounAccent Pattern
Tree 1
1FillerwhitetreeH* !H*
2Both InitialreddrumH* !H*
3Both InitialgreyballH* !H*
4FillerpurpletreeH* !H*
5FillerwhitesnowmanH* !H*
6Initial AdjblueballH* !H*
7Initial NgreycandyH* H*
8Infel Adj ContgoldCANDYH* L+H*
9FillerwhitestarH* H*
10Initial NredstockingH* H*
11FilleryellowtreeH* !H*
12Both AlreadybluedrumH* !H*
13Infel N ContBLUEeggL+H* no-acc
14Fel Adj ContSILVEReggL+H* no-acc
15FillerpurplestarH* !H*
16Initial AdjbrownballH* !H*
17Fel N ContbrownANGELH* L+H*
18Infel Adj ContgreenANGELH* L+H*
19FillerpurplesnowmanH* !H*
20Both AlreadygreeneggH* !H*
21Fel N ContgreenONIONH* L+H*
22Fel Adj ContORANGEonionL+H* no-acc
23Infel N ContORANGEbellL+H* no-acc
24FilleryellowstarH* !H*
Tree 2
1FillerpurpletreeH* !H*
2Both InitialsilverballH* !H*
3Both InitialgreyangelH* !H*
4Fillerwhitelight bulbH* !H*
5FilleryellowsnowmanH* !H*
6Initial AdjgoldangelH* !H*
7FillerwhitetreeH* !H*
8Initial NgreystockingH* H*
9Infel Adj ContbrownSTOCKINGH* L+H*
10Fillerpurplelight bulbH* !H*
11Initial NsilvereggH* H*
12Both AlreadygoldballH* !H*
13Fel N ContgoldCANDYH* L+H*
14Infel Adj ContgreenCANDYH* L+H*
15FillerpurplesnowmanH* !H*
16Initial AdjblueangelH* !H*
17Infel N ContBLUEbellL+H* no-acc
18Fel Adj ContORANGEbellL+H* no-acc
19Filleryellowlight bulbH* !H*
20Both AlreadyorangecandyH* !H*
21Infel N ContORANGEonionL+H* no-acc
22Fel Adj ContREDonionL+H* no-acc
23Fel N ContredDRUML+H* no-acc
24FillerwhitesnowmanH* !H*
Tree 3
1FilleryellowstarH* !H*
2Both InitialbrownbellH* !H*
3Both InitialorangestockingH* !H*
4Fillerpurplelight bulbH* !H*
5Initial AdjsilverstockingH* !H*
6Initial NorangecandyH* H*
7Infel Adj ContblueCANDYH* L+H*
8FillerwhitesnowmanH* !H*
9Filleryellowlight bulbH* !H*
10Initial NbrownangelH* H*
11Both AlreadysilverbellH* !H*
12Infel N ContSILVERonionL+H* no-acc
13Fel Adj ContGREYonionL+H* no-acc
14FillerpurplesnowmanH* !H*
15Fillerwhitelight bulbH* !H*
16Initial AdjgreenstockingH* !H*
17Fel N ContgreenDRUMH* L+H*
18Infel Adj ContredDRUMH* L+H*
19FilleryellowsnowmanH* !H*
20FillerwhitestarH* !H*
21Both AlreadyredonionH* !H*
22Fel N ContredEGGH* L+H*
23Fel Adj ContGOLDeggL+H* no-acc
24Infel N ContGOLDballL+H* no-acc
Tree 4
1Fillerwhitelight bulbH* !H*
2Both InitialblueeggH* !H*
3Both InitialgreenbellH* !H*
4FilleryellowstarH* !H*
5FillerpurpletreeH* !H*
6FillerWHITEtreeL+H* no-acc
7Initial AdjbrownbellH* !H*
8Initial NgreenangelH* H*
9Infel Adj ContorangeANGELH* L+H*
10Initial NblueonionH* H*
11Filleryellowlight bulbH* !H*
12Both AlreadybrowneggH* !H*
13Fel N ContbrownDRUMH* L+H*
14Infel Adj ContredDRUMH* L+H*
15FillerwhitestarH* !H*
16Initial AdjgoldbellH* !H*
17Infel N ContGOLDballL+H* no-acc
18Fel Adj ContGREYballL+H* no-acc
19FilleryellowtreeH* !H*
20Fillerpurplelight bulbH* !H*
21Both AlreadygreydrumH* !H*
22Infel N ContGREYstockingL+H* no-acc
23Fel Adj ContSILVERstockingL+H* no-acc
24Fel N ContsilverCANDYH* L+H*
Appendix B Experimental trials for Experiment 3
TrialConditionDM and its accent patternTarget Adj + N and its accent pattern
Tree 1
1FillerAt the top H* L-H%purple H*star !H*
2FillerOn the LEFT L+H* L-H%white H*tree !H*
3Both InitialIn the middle H* H-L%green H*bell !H*
4Initial N ContOn the RIGHT L+H* H-L%green H*EGG L+H*
5FillerFirst H* H-H%yellow H*drum !H*
6FillerMoving to the right H* H-H%purple H*SNOWMAN L+H*
7Both InitialFollowing THAT L+H* H-L%gold H*stocking !H*
8FillerAnd then H* H-H%white H*star !H*
9Initial AdjTo the right of THAT L+H* H-L%silver H*bell !H*
10Mismatch AdjAnd then H* L-H%ORANGE L+H*bell noacc
11FillerFIRST L+H* H-L%gold H*candy !H*
12FillerMoving to the left H* H-H%yellow H*tree !H*
13Initial NFollowing THAT L+H* H-L%green H*angel H*
14Match BothAnd NEXT L+H* L-H%BROWN L+H*angel noacc
15Initial N ContTo the left of that H* H-L%brown H*BALL L+H*
16Mismatch AdjAfter that H* L-H%GREY L+H*ball noacc
17FillerFinally H* H-H%purple H*TREE L+H*
18FillerThe first ornament is H* H-H%yellow H*star !H*
19Match NeitherAnd next H* L-H%green H*stocking !H*
20Match BothAfter THAT L+H* L-H%RED L+H*stocking noacc
21Initial AdjMoving to the left H* H-L%blue H*egg !H*
22Mismatch DMAnd THEN L+H* L-H%silver H*candy !H*
23Initial NTo the left of that H* H-L%orange H*onion H*
24Mismatch DMAnd NEXT L+H* L-H%red H*drum !H*
25Match NeitherAfter that H* L-H%blue H*ball !H*
26FillerThe last ornament is H* H-L%white H*snowman!H*
Tree 2
1FillerAt the very top H* H-H%white H*tree !H*
2Both InitialOn the RIGHT L+H* L-H%silver H*egg !H*
3Mismatch AdjAnd then H* L-H%GREEN L+H*egg noacc
4Both InitialTo the left of that H* H-L%red H*drum H*
5FillerSo FIRST L+H* H-L%purple H*light bulb !H*
6FillerTo the right of that H* H-H%yellow H*bell H*
7Initial NFollowing that H* H-L%red H*onion H*
8Match NeitherAnd next H* H-H%green H*drum !H*
9Initial N ContMoving to the RIGHT L+H* H-L%green H*CANDY L+H*
10Match BothAfter THAT L+H* L-H%GOLD L+H*candy noacc
11FillerFIRST L+H* H-L%purple H*BALL L+H*
12Initial AdjTo the right of that H* H-L%orange H*bell !H*
13FillerFollowing that H* H-H%yellow H*light bulb H*
14FillerMoving to the right H* H-L%white H*snowman H*
15Mismatch DMAnd THEN L+H* L-H%orange H*onion H*
16Match BothAnd NEXT L+H* L-H%BLUE L+H*onion noacc
17FillerFinally in this row H* H-H%purple H*tree !H*
18FillerFIRST L+H* H-L%brown H*drum !H*
19Initial N ContMoving to the left H* H-L%brown H*STOCKING L+H*
20FillerAfter that H* L-H%RED L+H*stocking noacc
21Initial NTo the left of THAT L+H* H-L%blue H*angel !H*
22Mismatch AdjAnd next H* L-H%GREY L+H*angel noacc
23FillerTo the left of that H* H-H%yellow H*SNOWMAN L+H*
24Mismatch DMAfter THAT L+H* L-H%silver H*ball !H*
25Match NeitherAfter then H* L-H%gold H*angel !H*
26FillerFinally H* H-H%purple H*snowman !H*
Tree 3
1FillerAt the top H* L-H%purple H*snowman !H*
2Both InitialStarting on the RIGHT L+H* L-H%silver H*bell !H*
3FillerIn the middle H* L-H%white H*light bulb !H*
4FillerOn the left H* H-H%YELLOW L+H*lightbulb noacc
5Initial NStarting on the left H* H-L%silver H*onion H*
6Mismatch AdjAnd then H* L-H%BLUE L+H*onion noacc
7FillerAnd NEXT L+H* L-H%white H*star !H*
8Both InitialTo the right of that H* H-L%brown H*drum !H*
9Match BothAfter THAT L+H* L-H%RED L+H*drum noacc
10Inittial N ContFollowing THATL+H* H-L%red H*BALL L+H*
11FillerStarting on the right H* H-H%purple H*egg !H*
12Initial NTo the left of THATL+H* H-L%silver H*stocking H*
13Mismatch AdjAnd next H* L-H%ORANGE L+H*stocking noacc
14FillerMoving to the left H* H-H%blue H*CANDY L+H*
15Initial AdjFollowing THATL+H* H-L%grey H*bell !H*
16Match NeitherAfter that H* L-H%orange H*candy !H*
17FillerFinally H* H-H%white H*snowman !H*
18FillerFIRST L+H* H-L%yellowH* star !H*
19Mismatch DMAnd NEXT L+H* L-H%blue H*egg !H*
20Match BothAnd THEN L+H* L-H%GOLD L+H*egg noacc
21FillerMoving to the right H* H-L%yellow H*snowman !H*
22Mismatch DMAfter THAT L+H* L-H%red H*onion !H*
23Initial N ContTo the right of that H* H-L%red H*ANGEL L+H*
24FillerMoving to the right H* H-H%purple H*light bulb !H*
25Initial AdjFollowing that H* H-L%green H*onion !H*
26Match NeitherAnd then H* L-H%grey H*candy !H*
Tree 4
1FillerAt the very top H* H-H%white H*tree !H*
2Both InitialStarting on the LEFT L+H* L-H%blue H*egg !H*
3FillerTo the right of that H* H-H%yellow H*light bulb !H*
4Both InitialOn the right H* H-H%brown H*bell !H*
5FillerStarting on the left H* H-H%purple H*angel !H*
6Initial NMoving to the RIGHT L+H* H-L%blue H*drum H*
7Match BothAnd THEN L+H* L-H%RED L+H*drum noacc
8FillerFollowing that H* H-H%purple H*tree !H*
9Initial NTo the right of THATL+H* H-L%orange H*angel !H*
10Match NeitherAfter that H* L-H%red H*bell !H*
11FillerAgain on the LEFT L+H* L-H%white H*stocking !H*
12FillerTo the right of that H* H-H%yellow H*STAR L+H*
13Initial AdjFollowing that H* H-L%green H*bell !H*
14Mismatch AdjAnd next H* L-H%GOLD L+H*bell noacc
15Mismatch DMAfter THAT L+H* L-H%brown H*egg !H*
16Initial N ContTo the right of THAT L+H* H-L%brown H*ONION !H*
17Match NeitherAnd then H* L-H%blue H*stocking !H*
18FillerFirst H* H-H%purple H*LIGHT BULB L+H*
19FillerMoving to the right H* H-L%white H*star !H*
20Initial NFollowing that H* H-L%green H*candy H*
21Match BothAfter NEXT L+H* L-H%SILVER L+H*candy noacc
22FillerTo the right of that H* L-H%yellow H*tree !H*
23Mismatch DMAnd THEN L+H* L-H%green H*angel !H*
24Initial N ContAnd next H* H-L%green H*BALL L+H*
25Mismatch AdjAfter that H* L-H%GREY H*ball noacc
26FillerFINALLY L+H* H-L%white H* lightbulb!H*
1One anonymous reviewer requested an analysis targeting the region that contains the numerical advantage for the infelicitous accentual pattern. The current analysis windows from 1–300ms and 300–600ms divide the region of numerical advantage, such that from 1–300 it is paired with a region where no effect is present, and from 300–600ms it is paired with a region that contains a disadvantage for the infelicitous pattern. A targeted analysis showed no significant differences for data in the time window from 160–420ms, the region of the numerical advantage. The back-transformed mean fixation proportions were .187 (infelicitous) and .137 (felicitous), with a mean difference of .05, and a 95% confidence interval of .089 (F=1.305).
  • Allopenna PD, Magnuson JS, Tanenhaus MK. Tracking the time course of spoken word recognition using eye movements: evidence for continuous mapping models. Journal of Memory and Language. 1988;38:419–439.
  • Bard EG& Aylett MP. The dissociation of de-accenting, giveness, and syntactic role in spontaneous speech. Proceedings of the Fourteenth International Conference of Phonetic Sciences; San Francisco. 1999. pp. 1753–1756.
  • Bard EG, Sotillo C, Anderson AH, Doherty-Sneddon G, Newlands A. The control of intelligibility in running speech. Proceedings of the13th International Congress of Phonetic Sciences; Stockholm. 1995.
  • Bartels C, Kingston J. Salient pitch cues in the perception of contrastive focus. In: Bosch P, van der Sandt R, editors. Focus and natural language processing, IBM Working Papers on Logic and Linguistics. Vol. 6. Heidelberg: 1994. pp. 1–10.
  • Beckman ME. The parsing of prosody. Language and Cognitive Processes. 1996;11:17–67.
  • Beckman ME, Ayers GM. Guidelines for ToBI labelling, vers 3.0. Ohio State University; 1997. [manuscript]
  • Beckman ME, Pierrehumbert JB. Intonational structure in Japanese and English, Phonology Yearbook. Vol. 3. 1986. pp. 255–309.
  • Birch S, Clifton C. Focus, accent, and argument structure: Effects on language comprehension. Language and Speech. 1995;38:365–391. [PubMed]
  • Bock JK, Mazzella JR. Intonational marking of given and new information: Some consequences for comprehension. Memory and Cognition. 1983;11:64–76. [PubMed]
  • Boersma P. Proceedings of the Institute of Phonetic Sciences. Vol. 17. University of Amsterdam; 1993. Accurate short-term analysis of the fundamental frequency and the aharmonics-to-noise ratio of a sampled sound; pp. 97–110. Can be downloaded as a pdf from http://fon/hum/uva.n1/paul/
  • Boersma PW, Weenick D. Praat: doing phonetics by computer (Version 4.2.17) 1992–2004.
  • Bolinger D. Contrastive accent and contrastive stress. Language. 1961;37:83–96.
  • Bolinger D. Intonation and its parts. Edward Arnold; London: 1986.
  • Bruce G. Swedish word accents in sentence perspective. Lund; Geleerup: 1977.
  • Chafe W. Language and consciousness. Language. 1974;50:111–133.
  • Chafe W. Givenness, contrastiveness, definiteness, subjects, topics, and points of view. In: Li C, editor. Subject and topic. New York: Academic Press; 1976. pp. 25–56.
  • Chambers CG, Tanenhaus MK, Eberhard KM, Filip H, Carlson GN. Circumscribing referential domains during real-time language comprehension. Journal of Memory and Language. 2002;47:30–49.
  • Cruttenden A. Intonation. Avon: Cambridge University Press; 1986.
  • Cutler A, Dahan D, Donselaar Wv. Prosody in comprehension of spoken language: a literature review. Language and Speech. 1997;40:141–201. [PubMed]
  • Davidson D. PhD dissertation. Michigan State Univerisity; 2001. Association with Focus in Denials.
  • Dahan D, Tanenhaus MK, Chambers CG. Accent and reference resolution in spoken-language comprehension. Journal of Memory and Language. 2002;47:292–314.
  • Dahan D, Magnuson JS, Tanenhaus MK. Time course frequency effects in spoken-word recognition: evidence from eye-movements. Cognitive Psychology. 2001;42:317–367. [PubMed]
  • Dahan D, Magnuson JS, Tanenhaus MK, Hogan EM. Subcategorical mismatches and the time course of lexical access: evidence for lexical competition. Language and Cognitive Processes. 2001;16:507–534.
  • D’Imperio M. PhD dissertation. Ohio State University; 2000. The role of perception in defining tonal targets and their alignment.
  • Face T. Intonational marking of contrastive focus in Madrid Spanish. PhD Dissertation; Ohio State University: 2001.
  • Fery C. German Intonational Patterns. Niemeyer: 1993.
  • Fischer B. Saccadic reaction time: implications for reading, dyslexia and visual cognition. In: Rayner K, editor. Eye movements and visual cognition: scene perception and reading. New York: Springer-Verlag; 1992. pp. 31–45.
  • Goldsmith J. PhD dissertation. MIT; Cambridge MA: 1976. Autosegmental phonology.
  • Goldsmith J. Autosegmental and Metrical Phonology. Blackwell; Oxford: 1990.
  • Grice M, Baumann S, Benzmüller R. German Intonation in autosegmental-metrical phonology. In: Jun S-A, editor. Prosodic Typology and Transcription: A Unified Approach. Oxford: OUP Press; 2005. pp. 55–83.
  • Grice M, D’Imperio M, Savino M, Avesani C. Strategies for intonation labeling varieties of Italian. In: Jun S-A, editor. Prosodic Typology and Transcription: A Unified Approach. Oxford: Oxford University Press; 2005. pp. 362–389.
  • Halliday MAK. Notes on transitivity and theme in English, part2. Journal of Linguistics. 1967;3:199–244.
  • Henderson JM, Ferreira F. Scene perceptions for psycholinguists. In: Henderson JM, Ferreira F, editors. The interface of language, vision, and action: Eye movements and the visual world. NewYork: Psychology Press; 2004.
  • Hirschberg J. Pitch accent in context: Predicting intonational prominence from text. Artificial Intelligence. 1993;63:305–340.
  • Ito K. The interaction of focus and lexical pitch accent in speech production and dialogue comprehension: Evidence from Japanese and Basque. PhD dissertation; University of Illinois at Urbana-Champaign: 2002.
  • Ito K, Speer SR. Using interactive tasks to elicit natural dialogue. In: Augurzky P, Lenertova D, editors. Methods in Empirical Prosody Research. Mouton de Gruyter: 2006. pp. 229–257.
  • Ito K, Speer SR, Beckman M. The influence of given-new status and lexical accent on intonation in Japanese spontaneous speech. Presentation to the Annual CUNY Conference on Sentence Processing; Boston, MA. 2003.
  • Krahmer E, Swerts M. On the alleged existence of contrastive accents. Speech Communication. 2001;34:391–405.
  • Kohler K. Timing and communicative functions of pitch contours. Phonetica. 2005;62:88–105. [PubMed]
  • Ladd RD. Intonational phonology. Vol. 79. Cambridge: Cambridge University Press; 1996.
  • Ladd RD, Morton R. The perception of intonational emphasis: continuous or categorical? Journal of Phonetics. 1997;25:313–342.
  • Ladd DR, Schepman A. “Sagging transitions” between high pitch accents in English: experimental evidence. Journal of Phonetics. 2003;31:81–112.
  • Matin E, Shao KC, Boff KR. Saccadic overhead: Iinformation processing time with and without saccades. Perception & Psychophysics. 1993;53:372–380. [PubMed]
  • Nakatani CH. Working Papers Department of Linguistics and Phonetics. Vol. 41. Lund, Sweden: 1993. Accenting on pronouns and proper names in spontaneous narrative; pp. 164–167.
  • Nakatani CH. The computational processing of intonational prominence: A functional prosody perspective. Ph.D. dissertation; Harvard University: 1997.
  • Needham W. Semantic structure, information structure, and intonation in discourse production. Journal of Memory and Language. 1990;29:455–468.
  • Nooteboom SG, Kruyt JG. Accents, focus distribution, and the perceived distribution of given and new information: An experiment. Journal of Acoustical Society of America. 1987;82:1512–1524. [PubMed]
  • Pierrehumbert JB. The phonology and phonetics of English intonation. PhD dissertation; Massachusetts Institute of Technology: 1980.
  • Pierrehumbert J, Beckman M. Japanese tone structure. Cambridge: MIT Press; 1988.
  • Pierrehumbert J, Hirschberg J. The meaning of intonational contours in the interpretation of discourse. In: Cohen P, Morgan J, Pollack M, editors. Intentions in communication. Cambridge: MIT Press; 1990. pp. 342–365.
  • Prieto P, van Santen J, Hirschberg J. Tonal alignment patterns in Spanish. Journal of Phonetics. 1995;23:429–451.
  • Raaihmakers JGW, Schrijnemakers JMC, Gremmen F. How to deal with “The language-as-fixed-effect-fallacy”: Common misconceptions and alternative solutions. Journal of Memory and Language. 1999;41:416–426.
  • Saslow MG. Latency for saccadic eye movement. Journal of Optical Society of America. 1967;57:1030–1033. [PubMed]
  • Schafer AJ, Speer SR, Warren P. Prosodic influences on the production and comprehension of syntactic ambiguity in a game-based conversation task. In: Tanenhaus a, Trueswell, editors. World Situated Language Use:Psycholinguistic, Linguistic and Computational Perspectives on Bridging the Product and Action Tradition. Cambridge: MIT Press; 2003. pp. 209–225.
  • Sedivy J, Tanenhaus M, Spivey-Knowlton M, Eberhard K, Carlson G. Using intonationally-marked presuppositional information in on-line language processing: Evidence from eye movements to a visual model. Proceedings of the 17th Annual Conference of the Cognitive Science Society; Hillsdale, NJ. 1995.
  • Sedivy J, Tanenhaus M, Chambers C, Carlson G. Achieving incremental semantic interpretation through contextual representation Cognition. 1999;71:109–147. [PubMed]
  • Spivey MJ, Tanenhaus MK, Eberhard KM, Sedivy JC. Effects of visual context in the resolution of temporary syntactic ambiguities in spoken language comprehension. Cognitive Psychology. 2002;45:447–481. [PubMed]
  • Tanenhaus MK, Spivey-Knowlton M, Eberhard KM, Sedivy JC. Integration of visual and linguistic information in spoken language comprehension. Science. 1995;268:1632–1634. [PubMed]
  • Tanenhaus MK, Spivey-Knowlton M, Eberhard KM, Sedivy JC. Using eye-movements to study spoken language comprehension: Evidence for visually-mediated incremental interpretation. In: Inui T, McClelland J, editors. Attention & Performance XVI: Integration in Perception and Communication. Cambridge: MIT Press; 1996. pp. 457–478.
  • Tanenhaus MK, Trueswell JC. Sentence Comprehension. In: Eimas, Miller, editors. Handbook in Perception and Cognition, Volume 11: Speech Language and Communication. N.Y.: Academic Press; 1995. pp. 217–262.
  • Trueswell JC, Tanenhaus MK. Approaches to Studying World-Situated Language Use: Bridging the Language-as-Product and Language-as-Action Traditions. Cambridge, MA: MIT Press; 2005.
  • Terken JMB. The distribution of pitch accents in instructions as a function of discourse structure. Language and Speech. 1984;27:269–289.
  • Terken JMB, Hirschberg J. Deaccentuation of words representing “given” information: Effects of persistence of grammatical function and surface position. Language and Speech. 1994;37:125–145.
  • Terken JMB, Nooteboom SG. Opposite effects of accentuation and deaccentuation on verification latencies for given and new information. Language and Cognitive Processes. 1987;2:145–163.
  • Venditti JJ. Japanese ToBI Labelling Guidelines. Ohio State University Working Papers in Linguistics. 1997;50:62–72.
  • Venditti JJ. The J-ToBI model of Japanese intonation. In: Jun S-A, editor. Prosodic Typology and Transcription: A Unified Approach. Oxford: Oxford University Press; 2005. pp. 172–200. Online at
  • Viviani P. Eye movements in visual search: cognitive, perceptual, and motor control aspects. In: Kowler K, editor. Eye movements and their role in visual and cognitive processes. Amsterdam: Elsevier; 1990. pp. 353–393. [PubMed]