|Home | About | Journals | Submit | Contact Us | Français|
Native Chinese readers’ eye movements were monitored as they read text that did or did not demark word boundary information. In Experiment 1, sentences had 4 types of spacing: normal unspaced text, text with spaces between words, text with spaces between characters that yielded nonwords, and finally text with spaces between every character. The authors investigated whether the introduction of spaces into unspaced Chinese text facilitates reading and whether the word or, alternatively, the character is a unit of information that is of primary importance in Chinese reading. Global and local measures indicated that sentences with unfamiliar word spaced format were as easy to read as visually familiar unspaced text. Nonword spacing and a space between every character produced longer reading times. In Experiment 2, highlighting was used to create analogous conditions: normal Chinese text, highlighting that marked words, highlighting that yielded nonwords, and highlighting that marked each character. The data from both experiments clearly indicated that words, and not individual characters, are the unit of primary importance in Chinese reading.
It is rather uncontroversial that in alphabetic writing systems, like English, the spaces between the words facilitate reading. When space information is eliminated, reading speed typically decreases by up to 50% (see Malt & Seamon, 1978; Morris, Rayner, & Pollatsek, 1990; Pollatsek & Rayner, 1982; Rayner, Fischer, & Pollatsek, 1998; Rayner & Pollatsek, 1996; Spragins, Lefton, & Fisher, 1976). Furthermore, Rayner et al. (1998) demonstrated that spaces influence word recognition and also aid saccade programming. They found that when the spaces between words were eliminated, readers (a) fixated proportionally longer on low-frequency words than on high-frequency words (indicating that word identification was more difficult when spaces were removed) and (b) that readers fixated much earlier in the word (as their average saccade lengths were much shorter when the spaces were removed).
Given the central role that word spacing information plays in written English comprehension, it is intriguing that a number of languages do not include spaces between words in their written form. This in turn raises questions concerning how readers target saccades and how words are recognized in writing systems, like Chinese, that do not include spaces between words. Chinese text is formed by strings of equally spaced symbols called characters; Chinese characters are more like morphemes and most words are made up of two characters, though some words consist of only one character and some consist of three or more characters. Historically, Chinese was printed from top to bottom (with the columns printed from right to left). However, like English, it is now almost always printed horizontally from left to right. Unlike English (and other alphabetic writing systems), Chinese is written without spaces between successive characters and words. Furthermore, individual characters vary in terms of complexity because they differ in (a) the number of strokes per character, (b) the number of radicals (or certain combinations of strokes that denote semantic or phonological information), and (c) the manner of construction (i.e., radicals can be combined in different ways to form compound words). Basically, there are many visual details packed into a constant, box-shaped area for each character.
Another intriguing characteristic of Chinese is that there can be ambiguity concerning those characters that compose a particular word. Thus, Chinese readers who are linguistic experts, as well as Chinese lay readers, sometimes experience some difficulty in agreeing on which Chinese characters compose certain words. This view might lead one to question whether the concrete notion of the word as a meaningful linguistic unit of information in spaced languages like English has a similar status in an unspaced language like Chinese (see Feng, in press). At the very least, these characteristics of written Chinese language raise an interesting theoretical question concerning whether the word unit plays as central a role in eye movement control during reading for Chinese readers as for English readers.
In general terms, the experiments reported here were designed to investigate whether the word is as important a unit of information during Chinese reading as it is during English reading. To do this we explored the influence of spacing information on eye movement behavior during Chinese reading. Specifically, we investigated two questions: First, whether the introduction of spaces into unspaced Chinese text might facilitate reading; second, whether words or characters are the primary unit of information in Chinese reading.
The experiments reported here are not the first to manipulate spacing between characters in languages that are typically written without spaces. Interestingly, Kohsom and Gobet (1997) found that inserting spaces between words in Thai, another language that does not contain spaces between words (although it is alphabetic and therefore more spatially extended across a line), actually facilitated reading. In this study eye movements were not monitored (with reading time as the primary measure), even though the introduction of spaces produced text with a visually unfamiliar format, and passage reading times were shorter for spaced than for normal unspaced text.
Another language that does not include spaces between words is Japanese. Kajii, Nazir, and Osaka (2001) conducted an experiment in which they recorded participants’ eye movements as they read Japanese. Japanese text is written by mixing three different writing systems (Kanji, Hiragana, and Katakana). Kajii et al. examined how readers process text without spaces between the characters, as well as whether the different types of character (Kanji, Hiragana, or Katakana) were more or less likely to attract fixations during reading. Kanji characters, which are ideographic and usually have more than one pronunciation, are morphemes representing meaning units. Japanese also has two sets of 46 syllable-based characters, Hiragana and Katakana. Hiragana is used to mark grammatical structures, while Katakana is used mainly to write foreign names and loan words. Kajii et al. found that Kanji characters were more likely to attract fixations than Hiragana and Katakana characters. Perhaps more interestingly for our purposes, while Japanese readers processed unspaced text relatively easily, their saccadic targeting strategies appeared to be quite different to those used by English readers processing unspaced text (see Rayner et al., 1998). More recently, Sainio, Hyöna, Bingushi, and Bertram (2007) recorded the eye movements of Japanese readers reading pure Hiragana (syllabic script) and mixed Kanji–Hiragana (ideographic and syllabic script) with either normal unspaced text or with spaces inserted between words. They found that spacing facilitated both word identification and eye guidance when the script was syllabic but not when the script contained ideographic characters. They concluded that interword spaces with Hiragana serve as an effective segmentation cue but that spaces in mixed Kanji–Hiragana text are redundant since the visually salient Kanji characters serve as effective segmentation cues in and of themselves.
Finally, there are a number of studies that have examined how inserting spaces between words in Chinese influences reading (see Hsu & Huang, 2000a, 2000b; Inhoff, Liu, Wang, & Fu, 1997). Whereas Hsu and Huang (2000a, 2000b) did not record eye movements (again relying on a more gross reading time measure), in Inhoff et al.’s (1997) study readers’ eye movements were recorded as they read Chinese sentences under three presentation conditions: normal unspaced text, word spaced text in which a space appeared between each Chinese word, and nonword spaced text in which spaces were positioned between characters such that the resulting groups of Chinese characters formed nonwords. Unfortunately, there were no reliable differences in total reading times, mean fixation durations, and mean saccade lengths for any of the presentation conditions. The null effects are somewhat surprising, and there may have been a number of methodological reasons why Inhoff et al. failed to obtain any reliable differences. Specifically, the eye-tracking system used did not have a high level of spatial resolution, and the sampling rate of the equipment (50 Hz) was relatively coarse. But, more critically, the spacing manipulations themselves were relatively weak in that the spaces that were inserted between the characters were actually quite small and therefore potentially ineffectual.
Generally, the studies that have examined interword spacing in Chinese have observed no facilitation from inserting spaces between words but also no interference. In many ways, it would be quite surprising if there were facilitative effects of spacing (Kohsom and Gobet’s, 1997, study with Thai notwithstanding). That is, it seems quite unlikely that a lifetime of reading experience without spaces could be quickly overcome via the insertion of spaces between words. Nevertheless, we consider the idea of examining the influence of space information during reading in Chinese as an important one, and given the possibility that the prior studies may have failed to obtain effects due to methodological limitations, it seemed reasonable to undertake a further study using eye-tracking equipment with high spatial and temporal accuracy, along with more robust spacing manipulations.
In Experiment 1, we presented native Chinese readers with Chinese sentences in four different spacing conditions (see Figure 1). In the control condition the text was presented in normal unspaced format with each Chinese character immediately adjacent to its neighbors. In the single character spaced condition, we inserted a space between every character. In the word spaced condition, we inserted a space between groups of characters that formed a word. To confirm that the Chinese readers would agree on the word boundaries, we required 12 Chinese readers who did not participate in the main experiment to indicate the word boundaries within the sentences. This reliability prescreen produced 95% agreement among participants, and word spacing was manipulated accordingly. Finally, in the nonword spaced condition, spaces were inserted between characters such that the resulting groups of characters formed nonwords.1 If the introduction of spaces between words facilitates reading, then reading times for sentences under word spaced conditions should be shorter than under the normal unspaced, single spaced, and nonword spaced conditions. Of particular interest was whether the global measure of sentence reading times under word spacing conditions would be shorter than reading times under normal, unspaced conditions. We anticipated that this comparison would be informative with respect to the first of the theoretical questions that we set out to address: whether the introduction of spaces into Chinese text might facilitate reading (as with the Kohsom and Gobet, 1997, study with Thai).
We also anticipated that differences in sentence reading times might allow us to address our second important theoretical question: whether words or characters are the primary unit of information in Chinese reading. We predicted that if characters are the primary unit of information in Chinese reading, then sentence reading times would be shorter under the single character spacing condition than under the word spacing condition. Conversely, if words are the primary unit of information, then the opposite pattern should be obtained. Additionally, sentence reading times for these conditions in relation to normal unspaced text would be informative of the degree to which reading was facilitated or hindered relative to reading under normal conditions.
In addition to the sentence reading times under each condition, we anticipated that there might be differences in the precise mechanics of oculomotor control under the different spacing conditions. Given that the text is more or less spatially extended, as well as differentially spatially grouped under each of the spacing conditions, we assumed that we might observe differences in a number of other global measures such as average fixation durations, numbers of fixations and saccade sizes, as well as local measures computed for individual target words or characters that we identified in the sentences under different spacing conditions in the experiment.
Sixteen undergraduate students at Tianjin Normal University participated in the experiment.2 They were all native speakers of Chinese who were skilled readers with normal or corrected-to-normal vision. They were all naive regarding the purpose of the experiment.
Sixty Chinese sentences were constructed. The sentences were all between 19 and 23 characters in length (M = 20.83 characters). The experimental sentences were rated on a 9-point scale for their naturalness by 30 participants who did not take part in the eye-tracking study. The mean naturalness score was 2.04 (where a score of 1 was very natural). We included four spacing conditions in the experiment: normal spacing, single spacing, word spacing, and nonword spacing (see Figure 1).
Four files were constructed, with each file containing 60 sentences. There were 15 sentences in each condition, and conditions were rotated across files according to a Latin square. Sentences in each condition were presented in a blocked format, and the order of the sentences in each block was random. Sixteen practice sentences, four for each spacing condition, were included at the beginning of each experimental file. In addition, there were 20 filler sentences (five in each condition) that appeared randomly throughout the block. After each of these filler sentences, a yes/no comprehension question was presented.3 In total each participant read 96 sentences.
Participants’ eye movements were recorded with an SR Research (Osgoode, Ontario, Canada) EyeLink II eye tracker (location; sampling rate = 500 Hz) that monitored the position of the right eye every 2 ms. This system is accurate to 0.5° visual angle. The stimuli were presented on a 19-in. (48.3-cm) DELL monitor with a 1,024 × 768 pixel resolution. The distance between the participant and the screen was 75 cm. Stimuli were presented in Song font, and the size of each Chinese character was 21 × 21 pixels (with a space of 1 pixel between characters in the unspaced condition). One Chinese character subtended 0.63° visual angle.
Each participant was tested individually. Participants were informed that they would read sentences in which the characters would be presented under different spacing conditions. They were told that they were required to read the sentences and understand them to the best of their ability. When they completed reading a sentence, they pushed a button box to terminate the display. They were instructed that occasionally a comprehension question would appear after a sentence and that they should try hard to answer the question correctly. Participants gave answers to the comprehension questions orally, and their answers were noted by the experimenter. Although the EyeLink tracker compensates for head movements, a chin rest was used to ensure that the head was maintained in a still position. Prior to the start of the experiment, a calibration procedure was completed, and the computer software calculated the position of the point of fixation on the basis of the calibration. After a successful calibration, the sentences were presented in turn. Calibration was checked after each trial, and participants were recalibrated whenever necessary. In total the experiment took approximately 20 min.
The overall comprehension rate was 92% indicating that participants read and fully understood the sentences. Three of the participants accidentally triggered the button box prematurely terminating the display for four of the sentences and therefore no data were obtained for these trials. We also excluded trials on which tracker loss occurred, as well as any first fixation durations that were less than 80 ms or greater than 1,200 ms. All the eye movement measures above or below three standard deviations from the mean were also excluded. In total 5.1% of the data was removed prior to conducting the analyses.
Below we provide two sets of analyses. In the global analyses we conducted analyses of different measures of eye movement behavior based on all the fixations made as each of the sentences was read under each of the experimental conditions. We computed the mean fixation duration, mean saccade length, number of forward saccades (i.e., saccades made in a left-to-right direction), number of regressive saccades (i.e., saccades made from right to left), total number of fixations, total sentence reading time (i.e., the sum of all the fixations and saccades made during sentence reading), and reading speed (see Table 1).
In addition to the global analyses, we conducted four sets of local analyses based on only a proportion of the fixations that were made as the sentences were read. For these analyses we computed first fixation duration (the duration of the first fixation on a word), single fixation duration (the duration of fixations when only one fixation is made on a word), gaze duration (the sum of all fixations on a word before moving to another word), total fixation time (the sum of all fixations on a word, including regressions), number of first pass fixations, and total number of fixations. In order to carry out these analyses we selected smaller regions of the sentences that were of particular interest under particular spacing conditions. Below, we first report the global analyses followed by the local analyses.
A repeated measures analysis of variance was carried out for the variable presentation condition with four levels (normal unspaced, single character spacing, word spacing, nonword spacing) using participants (F1) and sentences (F2) as random effects. The mean fixation duration, the mean saccade length, the number of forward saccades, the number of regressive saccades, the total number of fixations, the total sentence reading time, and the reading speed are given in Table 1.
For mean fixation duration there was a significant effect of presentation condition, F1(3, 45) = 35.9, p < .001; F2(3, 177) = 38.3, p < .001. To establish which conditions differed from each other we conducted paired t tests. Mean fixation durations were longer under normal spacing conditions than under single character, word spacing, and nonword spacing conditions (all ps < .001). Also, mean fixation durations were longer under word and non-word spacing conditions than under single spacing conditions (all ps < .001). Finally, mean fixation durations did not differ between word and nonword spacing conditions (ps > .05). The results show very clearly that readers made longer fixations under normal unspaced conditions relative to all the other conditions. This result may initially appear surprising in that increased fixation times are usually taken to indicate increased processing difficulty (Liversedge & Findlay, 2000; Rayner, 1998). However, as seen in Table 1, there was a trade-off between fixation duration and number of fixations so that while fixations were longer in the normal unspaced condition readers also made fewer fixations. Thus, it is perhaps most helpful to consider the fixation duration data in relation to the total reading time data presented below.
For mean saccade lengths there was a highly reliable effect of presentation condition, F1(3, 45) = 148.6, p < .001; F2(3, 177) =140.7, p < .001. Unsurprisingly, mean saccades were shortest under the normal spacing conditions, somewhat longer under non-word spacing conditions, longer again under word spacing conditions, and longest under the single character spacing condition (all differences reliable, ps < .001). The main point to note from these results is that saccade length varied in relation to how horizontally distributed the text was. The Chinese characters are most densely packed under unspaced conditions; are horizontally, spatially distributed to a greater degree under word and nonword spacing conditions; and are most distributed under the single character spacing condition. Notably, the reliably shorter saccades under nonword spaced conditions compared with word spaced conditions might reflect increased processing difficulty under the nonword compared with the word spaced conditions.
Next we considered the number of forward saccades and again found reliable effects of presentation condition, F1(3, 45) = 42.2, p < .001; F2(3, 177) = 45.4, p < .001. Readers made the least number of progressive saccades in the normal unspaced condition, slightly more under the word spaced condition, more again under the nonword spaced condition, and the most under the single spaced condition. The data for each condition were reliably different from the data in all the other conditions (all ps < .01). Presumably, readers made the most forward saccades for text with single character spacing because the text is most horizontally distributed in this condition. As with the saccade length data, the difference we observed between the data for the word and non-word spacing conditions may reflect increased processing difficulty associated with processing Chinese text when it is segmented as nonwords relative to when it is segmented as words. Finally, readers made fewest forward saccades for normally presented text both because it is easiest to read and the least horizontally distributed.
The effect of presentation condition was also reliable for the total number of fixations, F1(3, 45) = 26.4, p < .001; F2(3, 177) = 28.5, p < .001. Readers made fewest fixations when reading text presented normally, numerically more when reading word spaced text (ps < .05), and substantially more when either reading text presented as nonwords or text presented under single character spacing conditions (ps < .01). There was no difference in the total number of fixations readers made when reading text under non-word and single character spacing conditions (ps > .05). These results suggest that for the total number of fixations the data from the word spacing condition pattern were similar to the data from the normal spacing condition. In contrast, the data from the non-word spacing condition pattern were similar to those from the single character spacing condition. Assuming that the total number of fixations provides an index of the overall difficulty that the participants experienced as they read the sentences, then these data are at least suggestive that readers found the text presented under word spacing conditions almost as easy to read as the text presented under normal unspaced conditions. The data also suggest that single character spacing and nonword spacing conditions were much more disruptive to processing than were word spaced and normal unspaced text.
We also considered the number of regressive saccades that participants made as they read the sentences. As before, we observed a highly reliable effect of presentation condition, F1(3, 45) = 8.1, p < .001; F2(3, 177) = 11.4, p < .001. Readers made fewest regressions for text presented normally and slightly more when text was presented under word spacing conditions (ps<.05). There was no reliable difference in the number of regressions made under word and single character spacing conditions (ps > .05), however, readers made reliably more regressions under non-word conditions than under word conditions (ps < .05). There was no reliable difference in the number of regressions readers made under nonword conditions compared to single character spacing conditions. The pattern of results for the regression data is similar to that obtained for the total number of fixations. Readers made fewest regressions when text was presented normally and made the most regressions when text was presented under nonword spacing conditions.
Finally, we considered the total reading times for the sentences under each of the spacing conditions. Total sentence reading times are an extremely important measure with respect to the first (and perhaps also the second) theoretical question that we set out to address. This measure provides us with an indication of how long, overall, it took readers to read the sentences under the different spacing conditions. Additionally, on the basis of the total reading times and the mean number of characters in a sentence together, we present reading rates (in terms of characters per minute) for each of the conditions. As with the measures reported earlier, total sentence reading times showed a reliable effect of presentation condition, F1(3, 45) = 5.7, p < .01; F2(3, 177) = 5.7, p < .01. The pattern of effects is extremely informative regarding the ease with which processing occurred. Total reading times were shortest and approximately the same for the text presented in the normal unspaced and word spaced conditions (ps > .05). By contrast, total reading times for text under the nonword spacing condition were reliably longer than for text under both the normal unspaced condition and the word spaced condition (ps < .01). However, total times for text presented under nonword and single character spacing conditions were not reliably different (ps > .05).
These data show that the presentation of text using nonword spacing caused disruption to processing such that the time to read the sentences was substantially increased relative to all the other conditions. A second important aspect to note from these data is that text presented with word spacing was as easy to read as normal unspaced text. These data are directly relevant to the first theoretical question that we set out to address in this experiment, namely, whether the introduction of word space information into Chinese text would facilitate reading. While the introduction of spaces to demark word boundaries did not facilitate reading relative to that for text presented in the usual unspaced format, it also appears that word spacing information did not disrupt processing to any great degree. In contrast, single character and nonword spacing did induce disruption relative to normal unspaced text.
In addition to the global analyses, we conducted a series of local analyses in which we considered smaller regions of the sentences that comprised directly comparable characters that formed words or nonwords under different spacing conditions. These analyses are potentially important because they provide an opportunity for us to compare the different conditions when the difference in spatial arrangement of the characters is not as great as it is in the global analyses. For each Chinese sentence we identified between one and four regions that comprised two characters (except for the fourth set of local analyses in which single characters were compared). We ensured that our regions never occurred at the beginning or the end of the sentences (thereby avoiding any contamination from fixations associated with the onset or completion of reading). Each of the different comparisons that were undertaken is illustrated in Figure 2.
In the first set of local analyses we compared measures for regions that comprised two Chinese characters and a space under word and nonword spacing conditions. The characters always formed a word, but the space either occurred between the characters in the nonword spacing condition or it occurred after the two characters in the word spacing condition. The inclusion of the space in the region of analysis allowed us to compare regions of the sentence that were identical in terms of physical size and content as well as spatial layout.
First fixation durations showed a marginal effect of spacing, t1(15) = 1.83, p = .09; t2(59) = 1.73, p = .09, with shorter initial fixation durations for nonword spacing (231 ms) than for word spacing conditions (240 ms). In fact, we had anticipated that first fixation durations would actually be longer under nonword than word spacing conditions, and it is not immediately clear exactly why the effect went in the opposite direction. It is possible that participants may have curtailed their initial fixation on the character string more quickly when the string under fixation was a nonword than when it was a word. The single fixation and gaze duration measures, along with the number of first pass fixations, showed no influence of spacing (all ts < 1.5, all ps > .05). However, as anticipated, there was a reliable influence of spacing on the total fixation time, t1(15) = 3.78, p < .01, and t2(59) = 2.70, p < .01, and the total number of fixations, t1(15) = 3.61, p < .01, and t2(59) = 3.46, p < .01, with longer total fixation times and a greater total number of fixations under nonword spacing conditions (483 ms and 2.2 ms, respectively) than under word spacing conditions (427 ms and 1.9 ms, respectively). Consistent with the global measures, these data show that readers experienced greater disruption to processing under nonword spacing conditions than under word spacing conditions.
In the second set of local analyses we again compared regions comprising two characters and a space, where the two characters always formed a word. However, for these analyses the words were presented under single character and nonword spacing conditions. For first fixation and single fixation duration there were no reliable effects (all ts < 1.76, all ps > .05). However, gaze durations were longer, t1(15) = 4.51, p < .001, and t2(59) = 4.65, p < .001, and number of first pass fixations greater, t1(15) = 2.66, p < .05, and t2(59) = 2.59, p < .05, for nonword spacing conditions (312 ms and 1.35 ms, respectively) than for single character spacing conditions (272 ms and 1.26 ms, respectively). Similarly, total fixation times were longer, t1(15) = 4.34, p < .001, and t2(59) = 5.69, p < .001, and total number of fixations greater, t1(15) = 3.75, p < .01, and t2(59) = 4.48, p < .001, for nonword spacing conditions (483 ms and 2.15 ms, respectively) than for single character spacing conditions (394 ms and 1.83 ms, respectively).4 These results clearly indicate that the introduction of spacing into Chinese text was much more disruptive to processing when the spacing information produced groups of characters that formed nonwords than when it did not. Nonword spacing produced disruption to reading for Chinese text relative to single character spacing.
For the third set of local analyses we compared measures for words under normal unspaced and word spaced conditions. While there was no significant difference between first fixations on words in normal unspaced and word spaced conditions (ts < 1.77, ps > .05), all of the other measures showed reliable or very marginal effects. There was a difference for single fixation durations that was reliable by participants but not by items, t1(15) = 2.16, p = .05; t2(58) = 1.57, p > .05. Single fixation durations were numerically longer for normal unspaced text (258 ms) than for word spaced text (243 ms). A similar pattern occurred for gaze durations, t1(15) = 3.1, p < .05, and t2(59) = 2.92, p < .01, and number of first pass fixations, t1(15) = 1.9, p = .08, t2(59) = 2.05, p < .05, as well as for the total fixation times, t1(15) = 3.41, p < .01, t2(59) = 4.06, p < .01, and the total number of fixations, t1(15) = 2.71, p < .05; t2(59) = 3.15, p < .01. Gaze durations and total fixation times were longer under normal unspaced conditions (297 ms and 440 ms, respectively) than under word spaced conditions (272 ms and 379 ms, respectively). Similarly, there were more first pass fixations and total fixations for normal unspaced text (1.23 and 1.83, respectively) than for word spaced text (1.17 and 1.63, respectively).
Total fixation times were longer and participants made more fixations when the text was presented in the normal unspaced format that was most familiar to Chinese readers than in the comparatively unfamiliar word spaced format. On the assumption that readers make more, shorter fixations when reading is easier, then the local analyses suggest that the introduction of word spacing information facilitated reading of Chinese text. Recall also that in the global analyses average fixation durations were longer for normal unspaced than for word spaced text. Clearly, the direction of the effects obtained for the global analyses matches that obtained for the local analyses. Note also, however, that the total sentence reading times for the normal unspaced and word spaced sentences were approximately the same, and, correspondingly, participants also made more fixations on average under word spaced than normal unspaced conditions (regardless of whether those fixations were made after a progressive or regressive saccade). Thus, both the global and local measures together, along with the total sentence reading times, provide a very clear picture of the different patterns of eye movements that occurred under the normal unspaced and word spaced conditions. Readers took about the same amount of time overall to process word spaced and normal unspaced sentences, but they made more but shorter fixations when reading word spaced text than when reading normal unspaced text. What is clear is that presenting Chinese text in a visually unfamiliar format did not cause disruption to processing. We will consider why this pattern may have arisen in more detail below in the Discussion.
In the fourth and final set of local analyses, we compared single Chinese characters under the single character spacing condition and the nonword spacing condition. This analysis allowed us to investigate whether the introduction of nonword spacing caused disruption to processing of single Chinese characters compared with the introduction of characters per se (in the single character spacing condition). That is, we wished to determine whether there was a cost associated with segregating single Chinese characters such that they appeared as nonwords, relative to simply segregating all of the characters in the sentence. In this respect, our analyses showed convincingly that the introduction of nonword spacing increased reading times more than the introduction of spaces between all of the characters. Reading times were always numerically longer in the nonword spacing condition than in the single character spacing condition, though the effects were not consistently reliable by subjects and by items: for first fixation duration, t1(15) = 2.07, p = .06, and t2(49) = 1.41, p > .05; for single fixation duration, t1(15) = 2.74, p < .05, and t2(46) = 1.53, p > .05; for gaze duration, t1(15) = 2.22, p < .05, and t2(49) = 1.54, p > .05; for total fixation time, t1(15) = 2.43, p < .05, and t2(49) = 2.74, p < .01); first fixation durations, 220 ms and 205 ms; single fixation durations, 224 ms and 203 ms; gaze durations, 227 ms and 210 ms; and total fixation times, 286 and 245.
In evaluating the results for the normal unspaced and word spaced text, it is perhaps helpful to consider factors that may have exerted opposing influences on reading. The familiarity of the format in which the text was presented, as well as the extent to which word objects were demarcated by spaces, may both have affected the ease with which the text was processed. It seems reasonable to assume that a more familiar visual text format should produce shorter reading times than an unfamiliar format. Similarly, a second reasonable assumption may be that the clear demarcation of words by spaces will facilitate word identification and thereby produce shorter reading times relative to normal unspaced text for which word identification is more difficult. If these two assumptions are correct, then the influence of these two factors will be in opposition in the word spaced and the normal unspaced conditions in our experiment. The normal unspaced text will be extremely familiar, but word identification may be hindered due to poor word demarcation. In contrast, the word spaced text will be visually unfamiliar but word identification will be facilitated due to good word demarcation. It is perhaps not surprising, therefore, that total reading times for these conditions were approximately the same and that they are both somewhat shorter than for the single character spaced and the nonword spaced text.
While the results of Experiment 1 were quite straightforward, it is quite possible to argue that our findings are not as easily interpretable as we have suggested because of the natural confounding of condition by spatial layout in the experiment. That is, when spaces are inserted between characters, the resulting text will invariably be longer (more spatially distributed) than when no spaces are present. Exactly what this confounding may mean for the present results is not entirely clear. Therefore, we replicated the first experiment (using the same materials) but with a different manipulation in which we were able to create the same four conditions but the spatial distribution of the text was the same across the conditions. To do this, we used a highlighting manipulation (see Figure 3).
In Experiment 2, Chinese readers were presented with sentences in which word boundary information was marked by gray highlighting so that the spatial distribution of the sentence was the same across the different experimental conditions.5 If words, rather than characters, are the important components of Chinese reading, we should obtain results similar to what we obtained in Experiment 1.
Twenty-four undergraduate students at Tianjin Normal University participated in the experiment. They were all native speakers of Chinese who were skilled readers with normal or corrected-to-normal vision. As in Experiment 1, they were all naive regarding the purpose of the experiment.
The materials and design were identical to Experiment 1. The main difference between the experiments was that instead of a spacing manipulation, Experiment 2 used a gray highlighting manipulation (see Figure 3) that created four conditions: normal text, text with highlighting used to mark words, text with highlighting that yielded nonwords, and text with highlighting to mark each character.
Both were identical to Experiment 1.
The overall comprehension rate was 90%, again indicating that the participants read and fully understood the sentences. We again excluded trials on which tracker loss occurred and any first fixation durations that were less than 80 ms or greater than 1,200ms as well as any eye movement measures above or below three standard deviations from the mean. In total 4.6% of the data was removed prior to conducting the analyses.
As in Experiment 1, we also report local analyses. A repeated measures analysis of variance was carried out for the variable presentation condition, with four levels (normal, character, word, and nonword), using participants (F1) and sentences (F2) as random effects. The mean fixation duration, the mean saccade length, the number of forward saccades, the number of regressive saccades, the total number of fixations, the total sentence reading time, and the reading speed are given in Table 2.
As is apparent from Table 2, in Experiment 2 there was very little difference across presentation conditions in terms of mean fixation duration, mean saccade length, or mean number of regressive saccades (Fs generally < 1). However, there were significant differences in terms of number of forward saccades, F1(3, 69) = 4.5, p < .01, and F2(3, 177) = 3.2, p < .05, total number of fixations, F1(3, 69) = 9.5, p < .001, and F2(3, 177) = 4.1, p < .01, and total sentence reading times, F1(3, 69) = 10.3, p < .001, and F2(3, 177) = 3.8, p < .05. Most importantly for our purposes, paired t tests consistently revealed that the normal and word conditions did not differ from each other (all ps >.05). The text was as easy to read when it appeared in the word spaced format as it was in the normal unspaced format. Furthermore, text in the normal and word spaced conditions was easier to read than text in the single character and nonword conditions. Comparisons of the mean of the normal and word conditions with the data from the single character spacing condition and the nonword spacing condition showed effects that were reliable by participants (ps < .01) and reliable or very close to significance by items (ps ≤ .06). Finally, the latter two conditions consistently did not differ from each other (ps > .05).
We also conducted a series of local analyses analogous to those for Experiment 1 in which we directly compared smaller regions of the sentences comprising identical characters that formed words or nonwords under different highlighting conditions. The results of these analyses were strikingly similar to those obtained in Experiment 1.
In the first set of local analyses we compared measures for regions that comprised two Chinese characters under word and nonword highlighting conditions. There were no reliable effects for first fixation duration, single fixation duration, gaze duration, and number of first pass fixations (all ts < 2.05). However, similar to Experiment 1, readers had longer total fixation times, t1(15) = 2.67, p < .05, and t2(59) = 2.12, p < .05, for nonword highlighting (402 ms) than for word highlighting (375 ms) conditions. Also, the total number of fixations was greater under nonword highlighting (1.7) than word highlighting (1.6) conditions, t1(15) = 2.16, p < .05, and t2(59) = 2.03, p < .05. Consistent with the global measures from Experiment 2 and the data from Experiment 1, readers experienced greater disruption to processing under non-word highlighting conditions than under word highlighting conditions.
In the second set of local analyses we compared regions with two characters forming a word under single character and nonword highlighting conditions. Once more, for first fixation duration, single fixation duration, gaze duration, and number of first pass fixations, there were no reliable effects (all ts < .67, all ps > .05). Total fixation times were marginally longer, t1(15) = 2.37, p < .05, and t2(59) = 1.89, p = .07, and total number of fixations marginally greater, t1(15) = 1.89, p = .07, t2(59) = 1.80, p = .08, for nonword highlighting conditions (402 ms and 1.7 ms, respectively) than for single character highlighting conditions (382 ms and 1.6 ms, respectively). These results suggest that highlighting text as nonwords caused slightly more disruption to reading compared with single character text highlighting.
In the third set of local analyses we compared measures for word regions under normal conditions with no highlighting and word highlighting conditions. For these analyses the only measure that showed a reliable difference was the number of first pass fixations. Readers made more first pass fixations, t1(15) = 2.12, p < .05, and t2(58) = 2.12, p < .05, under word highlighted conditions (1.2) than under no highlighting conditions (1.1). This difference is likely to be due to the unfamiliarity of the highlighting condition relative to text presented normally without highlighting. No other measures showed reliable effects (all ts < 1.27).
In the fourth set of local analyses for Experiment 2 we compared single Chinese characters under single character and nonword highlighting conditions. For these analyses there were no reliable differences for any of the measures (all ts < 1.29) indicating that Chinese text presented under single character highlighting conditions was as difficult to read as Chinese text presented under nonword highlighting conditions. Again, these results mirror those obtained in Experiment 1.
As with Experiment 1, the results of Experiment 2 were very clear. Reading in the word condition did not differ from the normal condition, and both of these conditions yielded faster reading than either the single character condition or the nonword condition. Similarly, the local analyses showed that nonword and single character highlighting in Chinese text produced disruption to reading relative to text without highlighting. Furthermore, text presented with single character and nonword highlighting was equally disruptive across all measures. Finally, reading times for text presented under word highlighting conditions did not differ from those for text presented with no highlighting. As such, the results of the global and local analyses for both experiments are fully consistent and strongly point to the conclusion that words, rather than single characters, are the important unit in Chinese reading. We again note that the fact that the normal condition and the word condition did not differ from each other is quite remarkable in that Chinese readers presumably have not had much experience reading under the highlighting conditions of Experiment 2 (while they have had a lifetime of experience reading normal text), yet they still did as well in the word condition as in the normal condition.
The results of Experiment 2 also clearly demonstrate that the findings from Experiment 1 are not compromised by the spatial distribution differences between conditions. In Experiment 2, the spatial distribution of the text was identical across conditions. It should be noted that there were some differences across the two experiments; in particular, there was little in the way of difference across the conditions in fixation duration, saccade length, and number of regressions in Experiment 2, whereas there were differences in Experiment 1. Admittedly, these differences that did occur in Experiment 1 were probably related to the spatial layout of the text. For example, it is not at all surprising that inserting spaces between characters would increase saccade length in comparison to the normal condition (as it did in Experiment 1). But, the main point remains, Chinese readers read text with word boundaries demarcated as easily as normal text and more easily than when individual characters were demarcated. Finally, it is interesting to note that the overall reading rate in Experiment 2 was faster than in Experiment 1. Given that this is a cross-experiment (and between-participants) comparison, it is difficult to know for certain what this reflects. However, it may be that the unusual spacing aspects of the first experiment resulted in readers adopting a more cautious reading strategy. The fact that the reading rates were slower in the normal unspaced condition in Experiment 1 than in Experiment 2 is at least consistent with this suggestion.
With respect to the first theoretical question we raised at the outset, the results of the present experiments indicated that inserting spaces between words (or highlighting word boundaries) did not facilitate reading Chinese, at least beyond the level observed for normal unspaced text. On the other hand, and perhaps more importantly, demarcating word boundaries either through the use of spaces or highlighting did not interfere with reading. As we noted in the Discussion section of Experiment 1, there are facilitatory and inhibitory factors that must be trading off against each other when words are clearly marked (in contrast to the normal text presentation of Chinese).
Let us now turn to the second theoretical question that we addressed, namely, whether words or characters are the primary unit of information in Chinese reading. The results of the present study provide rather clear confirmation that words must have psychological reality for Chinese readers and that they are more salient than characters. Specifically, we found that inserting spaces between characters (or highlighting characters) actually interfered with reading, whereas inserting spaces (or highlighting word boundaries) between words caused no disruption and such text was as easy to read as the more familiar unspaced text. We would have been surprised if a lifetime of reading experience with unspaced text could be overridden by an experimental manipulation where spaces were inserted between words or where word boundaries were highlighted. Nevertheless, the fact that inserting spaces between characters (or highlighting characters) caused a reading slowdown while inserting spaces between words (or highlighting words) did not clearly demonstrates the importance of words in reading Chinese.
In contrast to our conclusion that words are central, it has been suggested that characters are more important in reading Chinese than words. For example, on the basis of regression analyses of Chinese readers’ eye movements, Chen, Song, Lau, Wong, and Tang (2003) argued that characters are more salient than words. Such a conclusion is also at least consistent with the observation that Chinese readers do not always agree on where the word boundaries are in text. On the other hand, a growing body of research on the eye movements of Chinese readers has demonstrated word-based effects. In particular, Chinese readers, like English readers, fixate for less time on high-frequency words than on low-frequency words (Yan, Tian, Bai, & Rayner, 2006) and longer on low predictable words than on high predictable words6 (Rayner, Li, Juhasz, & Yan, 2005); like English readers, they also skip high predictable words more than low predictable words (Rayner et al., 2005) and high-frequency words more than low-frequency words (Yan et al., 2006). Yan et al. (2006) also found that character frequency affects fixation time on a word but only when word frequency is low.
Another interesting issue is whether the results reported in the present experiments reflect early or later processing in reading Chinese. Of course, this depends in part on what is meant by early and late, but we view our results as nonconclusive on this issue. The results of the local analyses suggest that the segmentation of characters into words may not occur early in processing since the measures that typically reflect early processes in the eye movement record (first fixation duration, single fixation duration, and gaze duration) did not typically yield differences between conditions, whereas a later measure (total fixation time) did. Obviously, how (and when) Chinese readers segment characters into words (given the lack of space information that demarks word boundaries) is critical to understanding Chinese reading, and more research is needed on this important topic.
Finally, in the context of the present study and other studies suggesting the importance of words (as opposed to characters) in reading Chinese, it is interesting to note that Rayner, Li, and Pollatsek (in press) recently simulated the eye movement behavior of Chinese readers in the context of the E-Z Reader model (Pollatsek, Reichle, & Rayner, 2006; Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Pollatsek, & Rayner, 2006; Reichle, Rayner, & Pollatsek, 2003). In their modeling endeavor, Rayner et al. (in press) assumed that words were the unit of analysis for Chinese readers. And indeed, a simulation using character frequency as an additional predictor did not add to the overall fits of the data. In essence, the modeling work and the present research both point to the psychological reality of words for Chinese readers.
This research was the result of an exchange program supported by a China–United Kingdom Science Network Grant from the Royal Society that enabled Guoli Yan to visit the School of Psychology at the University of Southampton. It was also supported by Biotechnology and Biological Sciences Research Council (United Kingdom) Grant 12/S19168 and National Institutes of Health Grant HD26765. Portions of the data were presented at the European Conference on Eye Movements, Potsdam, Germany, August 2007. We thank Albrecht Inhoff for his helpful comments.
1Of course, the nonword condition creates a number of potential problems in that the word parsing system is disrupted, the local semantics are incorrect, and covert prosodic phrase boundaries are deviant. Nevertheless, we believe that including the condition provides a useful baseline against which to compare the other conditions.
2Eye movement data from 2 additional participants were excluded due to poor reading performance (i.e., failing to read the complete sentence).
3The filler sentences were not analyzed but were indistinguishable from the experimental sentences.
4This result is in contrast to the global analyses for which there was no difference in gaze durations and total reading times between the single character and nonword conditions. This difference is not particularly surprising. In order to select regions that were identical in content, we were forced to include characters that appeared in isolation under the single spacing condition but as part of nonwords under the nonword spacing condition. Thus, the differences in the local analyses reflect the fact that readers found it harder to identify a character that was grouped as part of a nonword than to identify a character that appeared in isolation.
5We are grateful to Albrecht Inhoff for suggesting this manipulation.
Xuejun Bai, Academy of Psychology and Behaviour, Tianjin Normal University, Tianjin, China.
Guoli Yan, Academy of Psychology and Behaviour, Tianjin Normal University, Tianjin, China.
Chuanli Zang, Academy of Psychology and Behaviour, Tianjin Normal University, Tianjin, China.
Simon P. Liversedge, School of Psychology, University of Southampton, Southampton, United Kingdom.
Keith Rayner, Department of Psychology, University of Massachusetts, Amherst and Department of Psychology, University of California, San Diego.