|Home | About | Journals | Submit | Contact Us | Français|
Many theoretical models of reading assume that different writing systems require different processing assumptions. For example, it is often claimed that print-to-sound mappings in Chinese are not represented or processed sub-lexically. We present a connectionist model that learns the print to sound mappings of Chinese characters using the same functional architecture and learning rules that have been applied to English. The model predicts an interaction between item frequency and print-to-sound consistency analogous to what has been found for English, as well as a language-specific regularity effect particular to Chinese. Behavioral naming experiments using the same test items as the model confirmed these predictions. Corpus properties and the analyses of internal representations that evolved over training revealed that the model was able to capitalize on information in “phonetic components” – sub-lexical structures of variable size that convey probabilistic information about pronunciation. The results suggest that adult reading performance across very different writing systems may be explained as the result of applying the same learning mechanisms to the particular input statistics of writing systems shaped by both culture and the exigencies of communicating spoken language in a visual medium.
Over the past three decades, computational models have become increasingly sophisticated in accounting for a broad range of phenomena and specifying the mechanisms underlying skilled reading and its acquisition (see reviews in, e.g., Rayner, Foorman, Perfetti, Pesetsky, & Seidenberg, 2001; Plaut, 2005). The vast majority of this work has been done in English, and has thus focused on issues arising from the particularities of its writing system (Share, 2008). This has led to the construction of models that implement relatively writing system specific assumptions, such as the inclusion of distinct processing mechanisms for “sub-lexical” and “lexical” translation from spelling to sound (Coltheart, Curtis, Atkins, & Haller, 1993; Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001). An alternative approach has been to assume that reading skill is acquired by way of domain-general learning mechanisms that operate over distributed representations of basic levels of information, such as orthography, phonology and semantics (Plaut, McClelland, Seidenberg, & Patterson, 1996; Seidenberg & McClelland, 1989). Here we present a general connectionist model of Chinese print-to-sound translation that implements the same functional architecture and learning rules as models that have been previously applied to English (Harm & Seidenberg, 1999, 2004; Treiman, Kessler, Zevin, Bick, & Davis, 2006; Zevin & Seidenberg, 2002, 2006), The model provides a computationally explicit theoretical account of the role of sublexical phonology and the emergence of functional units in Chinese reading.
The notion of "orthographic depth" (Bentin & Frost, 1987) provides a descriptive framework that captures important differences among writing systems. In “shallow” orthographies, each character or group of characters corresponds with a high degree of consistency to a single speech sound. An extreme example of this is the Hangul system in Korean, in which each syllable is transcribed as a set of one to four jamo, each of which in turn comprises a set of strokes that each indicate a specific phonetic feature, assuring that words with the same pronunciation are identical to one another in the script, and that words with similar pronunciations are written similarly (Lee, 2000). Although English spelling is famously complicated (Malone, 1925; Venezky, 1999), at its core is an alphabet of letters that correspond roughly to individual speech sounds (Venezky, 1970) which can be described in terms of sub-lexical mapping “rules”. In the dual-route framework, these rules are supplemented by a lexical route that contains the correct pronunciations for known words (Coltheart et al., 2001; Zorzi, Houghton, & Butterworth, 1998).
Chinese, in contrast, is an example of an extremely “deep” orthography (DeFrancis, 1989) in that the pronunciation of a character cannot be computed sound-by-sound from its constituent parts. Chinese orthography has a roughly hierarchical organization, with five types of strokes combined to form radicals, orthographic units that may carry probabilistic information about meaning and sound. Unlike letters, radicals do not contain componential information about pronunciation (Mattingly, 1987). There is no relationship between, e.g., the first radical in a character and the first phoneme in the spoken word it represents.
Further, radicals are often organized into larger units such as phonetic and semantic components. The majority (85%) of characters in Chinese are phonograms (Zhu, 1988), which consist of a semantic component that provides information about the meaning of the character, and a phonetic component that provides information about the character’s pronunciation (Li &Kang, 1993). Although the pronunciation of each character can be probabilistically determined by its phonetic component, it is sometimes entirely arbitrary, so that very similarly written words often have completely different pronunciations, and many homophones share no orthographic features.
The effect of orthographic depth on the organization of the reading system is often described in terms of a bias toward using one or another of the component processes hypothesized to be necessary for English (Frost, Katz, & Bentin, 1987). Visual word recognition in shallow orthographies is thought to involve relatively limited engagement of lexical processes, because sub-lexical translation from spelling to sound is so automatic and consistently accurate. For example Korean Hangul characters are thought to be read predominately via sub-lexical processing unless task demands create a strong bias toward lexical processing (Kang & Simpson, 2001, also see Raman, Baluch, & Besner, 2004 for Turkish; Havelka, & Rastle, 2005 for Serbian).
Many existing models of Chinese character reading reflect the converse assumption, i.e., that spelling-to-sound translation is accomplished entirely via a “lexical” mechanism (Perfetti, Liu, & Tan, 2005; Zhou & Marslen-Wilson, 1999a). The lexical mechanism for print to sound translation is most often described as relying on hierarchically organized levels of representation, with localist representations at each level, in the vein of the Interactive Activation model of English word recognition (McClelland & Rumelhart, 1981).
The architecture of the lexical mechanism in these models also reflects a number of assumptions about the organization of the reading system that are fairly specific to Chinese. For example, Taft and colleagues (Taft, 2006; Taft & Zhu, 1997; Taft, Zhu, & Peng, 1999) have proposed a multilevel interactive-activation model of word recognition in Chinese that assumes characters, radicals and strokes each have their own separate level of representation, and that these are organized serially. The assumption that radicals form a basic orthographic unit is well supported by a number of priming studies (Ding, Peng & Taft, 2004), but as we shall argue below, additional considerations suggest that radicals are not necessarily apt functional units for print-to-sound conversion.
A number of empirical findings appear to conflict with the notion that print-to-sound translation is purely lexical in Chinese. First, some properties of the writing system suggest that a sub-lexical mechanism is plausible after all. Phonetic components contain probabilistic sub-lexical information about how the whole characters are pronounced. There is a great deal of evidence that this sub-lexical information is involved in reading aloud. When a phonogram’s pronunciation matches its phonetic component exactly, it is called “regular” (Seidenberg, 1985; Shu & Zhang, 1987). Large effects of regularity — regular characters are named faster and more accurately — have been found in studies of both adults (Fang, Horng, & Tzeng, 1986; Hue, 1992; Lee, Tsai, Su, Tzeng, & Hung, 2005; Peng & Yang, 1997) and children (Shu, Anderson, & Wu, 2000; Shu & Wu, 2006; Xing, Shu, & Li, 2004; Yang & Peng, 1997). Although regularity effects are prima facie evidence for sub-lexical mappings from print-to-sound, they are also potentially consistent with models that assume purely lexical activation of phonology. It could be possible, for example, to construct a model such that when a character that has a lexical representation is presented — even as part of another character — it becomes activated, thus leading to a regularity effect.
Phonetic components also vary with respect to how reliably the words they appear in match their canonical pronunciation, so that even items that are “regular”, as described above, can vary in “consistency” in a manner directly analogous to English (Jared, McRae, & Seidenberg, 1990). Just as a completely regular pronunciation of a rime in English can be difficult to compute if there are many irregular neighbors (e.g., DOLL vs. ROLL, POLL, TOLL), a completely regular character can be difficult to pronounce if many characters containing the same phonetic component have irregular pronunciations (Fang et al., 1986; Hue, 1992). These effects argue more strongly for some form of sub-lexical print-to-sound translation, because they reflect distributional information related to shared sub-lexical units (phonetic components) among characters. Although it has been suggested that consistency effects in English might arise from activation of similar words in the lexical route, for example in the Dual-Route Cascade model (Coltheart et al., 2001), subsequent work with this model has shown that where this model correctly simulates consistency effects, it is due to sub-lexical processing (Seidenberg & Plaut, 2006). It is thus an open question whether a purely lexical model of print-to-sound translation for Chinese could correctly simulate consistency effects.
The role of phonetic components in print to sound translation also poses a problem for the notion of radicals as the basic unit of character reading, because phonetic components can comprise one or more individual radicals. Some single radicals can also function as phonetic components. For example, the radical “” is pronounced /paI/1 when it appears as a word, and is also the phonetic component of a number of words that share the same pronunciation (e.g., “”). The same radicals may participate in many different phonetic components, however. For example, the “” radical also occurs in characters that do not share its pronunciation at all (e.g. “”), which in turn serve as phonetic components of larger characters (). Complicating matters further, the same simple radical occurs in many contexts () without any relation to the whole character’s pronunciation. Thus, although radicals clearly function as a unit at the level of orthographic organization, (Taft & Zhu, 1997, Zhang, Perfetti & Yang, 1999, Taft, 2006, Ding, Peng & Taft, 2004), they are not usually the most relevant functional unit for print-to-sound translation. This raises an important challenge for models of reading in Chinese-- how can models explain the emergence of phonetic components in print-to-sound mappings when they are not readily identifiable as “basic units” from the surface features of the orthography?
Many theorists assume that different scripts necessitate different processing assumptions. For example, in discussing the possibility of extending their model of English reading to other languages, Coltheart et al. (2001) assert that "the Chinese, Japanese, and Korean writing systems are structurally so different from the English writing system that a model like the DRC model would simply not be applicable." (p. 236). An alternative approach is to assume that the basic processes underlying reading are essentially the same across writing systems, and that differences in how the reading system becomes organized are due to the influence of statistical learning mechanisms operating over distributed orthographic, phonological and semantic representations (Seidenberg, 1992; Seidenberg & McClelland, 1989). Thus, rather than assuming that Chinese and English require entirely different processing mechanisms, this approach seeks to explain processing differences between the two languages in terms of properties of the orthography that lead to different learning outcomes. The most important of these in the current case are the grain size (Ziegler & Goswami, 2005) and degree of arbitrariness (Seidenberg, 1992) of mappings across writing systems.
English can be described as having regularities at multiple grain sizes (Ziegler & Goswami, 2005). For example, each spoken consonant in the English word BREED is represented by a single letter, whereas the vowel is written as a two-letter combination. These written forms of single phonemes – or “graphemes” -- are all completely regular, in the sense that they are each assigned their most typical pronunciation in this word. In contrast, the word BREAD contains an atypical pronunciation for the grapheme EA. Words that contain such mappings are called “irregular” in some models (Coltheart et al., 2001). Note, however, that the spelling-to-sound mapping for BREAD is supported by many other examples, e.g., HEAD, THREAD, and TREAD. Statistical regularities at this level of detail give rise to consistency effects (Jared, McRae, & Seidenberg, 1990). Thus, subsyllabic mappings from spelling to sound are dominant in the English writing system, with multiple grain sizes contributing to the spelling-to-sound mapping, sometimes in conflicting ways.
Note that describing English in this manner reveals an important parallel between English and Chinese. In both systems, a basic orthographic unit (the letter in English, the radical in Chinese) can be recombined into larger functional units for print-to-sound translation. Indeed, regularities at multiple grain sizes are clearly used productively in generalization, as demonstrated by studies of nonword reading in adults (Treiman, Kessler, & Bick, 2002) and children (Treiman & Kessler, 2006)—effects that are accurately simulated in connectionist models that explicitly capitalize on multiple grain sizes simultaneously during learning (Treiman et al., 2006; Zevin & Seidenberg, 2006).
The key difference between English and Chinese, then, is that whereas functional print-to-sound units in English can exist at a number of different sizes, in Chinese, regularities exist almost entirely in the mapping of phonetic components to whole syllables (Leong, 1997; Mattingly, 1987). The difference in the grain size at which regularities exist drives a difference in the degree of arbitrariness between the two writing systems. English spelling, while highly inconsistent, is never entirely arbitrary. Even a very strange word such as YACHT has some predictability to it (the Y, A and T are assigned pronunciations common in other contexts). In Chinese this is not necessarily the case. The many pronunciations for characters containing the “” component, for example, are a highly heterogeneous set (see Figure 1A)2. Whereas in English, the spelling of an unfamiliar word provides partial cues to its pronunciation, in Chinese characters, the distribution of pronunciations for any given phonetic component may contain a number of highly dissimilar forms. Thus, the differences between the two writing systems can be described in terms of factors that have well-characterized effects on models that employ domain- and language-general statistical learning mechanisms.
The Lexical Constituency Model (LCM, Perfetti et al., 2005) is an implemented model based loosely on the interactive activation model of reading in English. The focus of research in the LCM framework is establishing a role for phonological processing in reading, despite what is described as a lack of reliable sub-lexical phonological information in the writing system. In the authors’ view this reflects a general principle about reading across languages, i.e. that the natural medium for language is speech, and therefore reading necessarily involves accessing phonological representations. However, the processes by which print to sound translation is accomplished are thought to be defined by the language. Unlike alphabetic writing systems, which the same authors describe as having “assembled phonology as a sub-lexical mechanism” (Perfetti et al., 2005, p. 43) in addition to a lexical look-up mechanism (Coltheart et al., 1993), Chinese is described as depending on only the lexical look-up mechanism. The input to this model comprises localist representations of radicals. These feed forward to a lexicon of localist representations of whole characters, which in turn are connected to their pronunciations in a phonological layer. Representations of character meaning are activated jointly by input from the phonological and lexical orthographic layers. Critically, there is no direct input from the radical layer to the phonological layer (although many radicals are also characters in their own right, and have redundant representations in the lexical layer which are connected to their pronunciation). In this model, then, access to pronunciation is strictly lexical, in the sense that sub-character components cannot directly activate their pronunciations. Thus, although the modeling framework is fairly general, the implementation embodies a number of writing-system-specific assumptions about how the reading system is organized for Chinese.
In addition to the LCM (Perfetti et al., 2005), there have been a number of connectionist models of Chinese reading. These have tended to focus on language-specific phenomena. For example, Hsiao and Shillcock (2005, 2006) focused on the interaction between regularity and the position of phonetic and semantic radicals, but didn’t explore frequency by consistency interactions. Regularity in this model was defined in terms of the canonical pronunciation of the phonetic radical This model also includes some interesting details about the potential role of hemispheric asymmetry in reading aloud that makes specific predictions about experiments in which stimuli are presented to either the left or right visual field. This model uses localist representations for phonetic and semantic components, and therefore can say little about the emergence of functional units at multiple grain sizes -- the functional unit of analysis for print-to-sound translation is presupposed by the selection of the training corpus, and pre-coded into the network’s architecture.
Another model, developed by Xing, Shu & Li (2002, 2004) focused on simulating how children acquire reading skill in Chinese based on Self-Organized Feature Mapping (SOFM) between orthographic and phonological similarity. The SOFM model captured the development of regularity, consistency and their interaction with frequency, correctly simulating some aspects of children’s acquisition of characters during the elementary school years. One limitation of this model was that training corpus was quite small (about 300 words), and it is unclear that the particular formalism employed would generalize to other training sets, or scale up to a larger lexicon. A more serious limitation of the model was the way regularity effects were simulated: Each time a word was presented, the pronunciations both of the whole character and its phonetic component were input simultaneously to the model. Thus, the model shows that statistical learning mechanisms can integrate sub-lexical and lexical information about pronunciation, but it assumes that the phonetic component is an a priori functional unit in print-to-sound mapping, and that its most frequent pronunciation is explicitly related to whole character’s pronunciation each time it is encountered, instead of learning the mapping from print to sound.
Finally, there is a preliminary report of a model that employs a similar architecture to models that have been applied to English (Chen & Peng, 1994). This model has orthographic and phonological representations that do not build in any assumptions about the role of phonetic components in computing pronunciations from print, and yet it correctly simulates the interaction between frequency and regularity as observed in a number of contemporary studies (Fang et al., 1986, Hue, 1992). Although these are encouraging preliminary results, the model did not address consistency effects, nor were any analyses conducted on the acquisition of reading skill over the course of development. Thus, existing models of reading in Chinese have either been implemented with Chinese-specific assumptions about the processes (Perfetti et al., 2005), and functional units (Hsiao & Shillcock, 2005, 2006; Xing, Shu & Li, 2002, 2004), or have addressed a relatively limited range of phenomena (Chen & Peng, 1994).
Here we adapt a computational model that has been successful in explaining a range of English reading phenomena to simulating skilled reading aloud in Chinese. In this framework, aspects that are common to the two reading systems, such as the frequency by consistency interaction, can be accounted for in terms of the learning mechanisms that underlie acquisition: Frequency effects are simply practice effects on particular items, whereas consistency effects reflect the influence of statistical patterns across many similar items, and emerge from a generic tendency to preserve similarity: i.e., most statistical learning mechanisms are predisposed to map similar inputs onto similar outputs. Thus, the most difficult items both in acquisition and skilled performance are those with statistically rare print to sound correspondences that are also encountered infrequently (Lee et al., 2005; Jared, 2002). At the same time, the adaptation of these mechanisms to different properties of the writing systems (grain size, arbitrariness) explain language-specific effects as emerging from statistical patterns shaped by historical and linguistic forces.
The model presented here has the same functional architecture and learning rules that have been used in a number of English studies (Harm & Seidenberg, 1999; Treiman et al., 2006; Zevin & Seidenberg, 2002, 2006), but with input and output representations modified to represent Chinese orthography and phonology. In Study 1, we demonstrate that the model correctly simulates both the effects of regularity and consistency, and their interactions with frequency. In Study 2, we explore the frequency by consistency interaction in detail, in light of the statistical properties of the lexicon and potential biases in the stimulus set designed for Study 1. Here again, the model correctly predicts human performance in a word naming task. Finally, a series of analyses of the training corpus and the internal representations that support the model’s performance -- including how these develop over time were conducted. These analyses provide new insights into how the phonetic component emerges as a functional unit in learning to read Chinese.
Every unit in the orthographic input layer was connected to the 200 hidden units, which in turn were connected to every unit in the phonological output layer (See Figure 2A). In addition, every unit in the output layer was connected back to the output layer (including auto-connections) both directly and via a set of 50 "cleanup" hidden units, forming an attractor structure (Hinton & Shallice, 1991; Plaut & Shallice, 1993). This architecture differs from models used in English (e.g., Treiman et al., 2006; Zevin, & Seidenberg, 2002, 2006) only in the number of units used in each layer, and the fact that the orthographic and phonological representations are based on Chinese instead of English. Prior to training, all connections in the model were randomized to weights between −0.1 and 0.1 with a mean of 0.0 and a Gaussian distribution of values over the network.
The phonological representation was based on a phonetic description of Standard Mandarin (Huang & Liao, 2002) and comprised five slots, one for the onset (in Chinese, the onset unit, or “shengmu” contains only the initial consonant), three for the rime (the “yunmu”, which in this case includes any semivowels, the nuclear vowel and codas) and a fifth slot composed of four units to represent lexical tone (See Figure 2B). The representations were centered so that the nuclear vowel always occurred in the second slot of the rime.
Each phoneme was encoded as a set of abstract phonetic features using a distributed representation with graded similarity. This is equivalent to representations that have been used in a number of English models (Harm & Seidenberg, 2004; Treiman et al., 2006; Zevin & Seidenberg, 2006). The four slots used to represent phonemes comprised 3 groups: 1) 8 units encoded manner (e.g., stop, affricate, fricative, semivowel/liquid, vowel), 2) 6 units encoded place (e.g., dental, retroflex, palatal, velar), 3) 8 units encoded impressionistic vowel quality, with 1 unit to code retroflexing, 2 to represent backness, 3 for height and 2 for lip rounding. Thus, each of the four phoneme slots had 22 units, which, taken together with the 4 units for lexical tone, makes 92 output units. Note that more units were used in some cases than was mathematically sufficient to represent the number of possible values in a feature space. This was done to encode the relative similarity of different features, for example, stop is coded as “1 0 0”, affricate as “1 1 0” and fricative as “1 1 1” on the units that encode degree of closure in the “manner” group.
The orthographic layer of the model was organized into a set of 9 slots, each of which was further organized into a variable number of groups containing a variable number of units that could take on values of either 0 or 1 – comprising 270 binary units in all (See Figure 2C). Each character was represented by a unique pattern of activity over these units, based on a linguistic description (Xing, Shu, & Li, 2004) of Chinese orthography, adapted into a distributed representation that included features for both the hierarchical structure of each character (overall shape, relative position of radicals) and the orthographic details of which radicals appear in the character, and a finer-grained description of the radicals in terms of strokes.
Two slots encoded the overall structure of each character. One slot, containing seven groups of between 2 and 4 units each, represented overall character shape, with each encoding one type: left-right, top-bottom, left-middle-right, top-middle-bottom, cross, round and single. All together, 27 different sub-types of character structure were encoded using 18 units. The second slot comprised a single group of three units for the number of radicals in the character (one to seven).
Another seven slots were used to represent the radicals in the character, with each slot corresponding to a single radical according the definition of 560 unique radicals in the Chinese Character Component Standard of GB13000.1 Character Set for information Processing (1997). Radicals are roughly analogous to letters in an alphabetic script, except that they can be arranged in a variety of configurations within a character and do not necessarily encode for phonology. The variety of positions in which radicals can appear poses a version of the "dispersion problem" – i.e., that because the input representations are position-specific, what is learned about a letter in one position does not transfer to the same letter in a different position -- that has been noted in English (Plaut et al., 1996). In Chinese, the problem is more complex because of the greater possible number of spatial arrangements a character can take. For example, phonetic components typically appear on the right in a character with an overall left-right structure, and on the bottom in top-bottom characters. In the model, radicals were aligned in slots to reflect this similarity, as shown in Table 1. For left-right characters, the radicals from the left component were left-aligned into slots 1 through 3, and the radicals from the right component were right-aligned into slots 5 through 7, with the same overall scheme for top-bottom characters (with the top component left-aligned and the bottom component right-aligned). For left-middle-right and top-middle-bottom characters, the “middle” radicals were central-aligned in slot 4. When a phonetic component appeared alone in a character, it was represented in its most frequent position according its family members. This encoding was adopted in order to represent the similarity of characters containing the same components. It maintains the overlap among characters with very different shapes that share a phonetic component, which is critical for simulating both consistency and regularity effects.
All radical slots comprised at least 9 groups, three of which encoded overall structure, five of which encoded the "strokes" used to write the character, and a final group of 6 units used to disambiguate the 53 remaining radicals that could not be uniquely identified based on the first two groups. For six slots, a tenth group of 3 units encoded the relationship between the radical in that slot and adjacent radicals in terms of six different possible relations (e.g., left-right, surrounding, single). Groups for overall structure comprised 1) a group of 4 units to encode ten types of radical structure, -- e.g., left-right (), top-bottom (), surrounding (), 2) a group of 3 units to encode six possible stroke relations – e.g., crossing (,), separate (,), connecting (,), and 3) a group of 4 units to encode eleven possible modal positions. Five groups encoded for strokes, including 4 units to encode number of strokes from 1–10, and 3 units to encode five possible stroke types (horizontal, vertical, slanted, pointed and crooked) for the first, second, third and last stroke in the radical, respectively.
A set of 4468 items from the Modern Chinese Frequency Dictionary (1986) was used to train the model. During training, the probability of using any character on a given trial was proportional to the square root of its frequency (Plaut et al., 1996), with raw frequencies capped at 1,000. This ensured that low-frequency characters would be selected a reasonable number of times over the 3 million training trials.
Following Harm & Seidenberg (1999), we first pre-trained the phonological attractor model, and then trained the full reading model on the mapping from orthography to phonology. The continuous recurrent back-propagation algorithm (Pearlmutter, 1995) was used, with online learning, a learning rate of 0.005 and momentum of 0.9. On each trial, a character was selected and the orthographic units were clamped with the pattern corresponding to the writing of the character for 12 time ticks. Error computed at the phonological layer was computed after the first time tick activation was propagated forward to the output layer from time ticks 5–12 and a gradient based on this error was back propagated to update the connection weights.
A set of 120 items was chosen to test the frequency, consistency and regularity effects from the model on human participants: 20 from each cell resulting when three levels of regularity/consistency (regular-consistent, R-C; regular-inconsistent, R-I; irregular-inconsistent, I-I) were crossed with two levels of frequency (high, HF; low, LF). The mean frequency of occurrence was 475.35 per million for HF characters and 12.56 per million for LF characters, yielding a frequency manipulation of equivalent size across all three levels of regularity/consistency. Regular-consistent items contain phonetic components that appear only in characters that share the same onset and rime, regardless of tone. By definition, regular inconsistent items have the same pronunciation as their phonetic component, although other characters sharing the same phonetic component are pronounced differently. Conversely, irregular inconsistent items are pronounced differently from their phonetic components. The number of characters that share a phonetic component is called "family size," a property that was matched across the various frequency and regularity/consistency conditions, with the exception that the R-C items have a smaller family size than the others. Note that throughout the lexicon, R-C items have a much smaller number of neighbors than the other stimulus types; because larger family size have a facilitative effect on early visual recognition for naming R-C items (Feldman & Siok, 1999; Hsu, Lee & Tzeng, 2009), we matched the family size for inconsistent items only and included R-C items with a smaller family size than R-I or I-I items. Consistency is defined by the ratio of the frequency and number of “friends” (characters with the same pronunciation) and “enemies” (characters with different pronunciations, Peng & Yang, 1997). We therefore matched frequency weighted consistency value for inconsistent items. The consistency value is equal to summed frequency of friends divided by summed frequency of all family members (including the character itself, as in Shu, Chen, Anderson, Wu, & Xuan, 2003). Characters were also matched across all conditions for structure type, typical position of phonetic components, number of strokes and number of radicals.
Testing was carried out by presenting each character to the model, i.e., clamping the appropriate pattern on the input layer. We allowed the model to compute a pronunciation over 12 time ticks and the last tick was counted as the final output. Naming accuracy and sum squared error (SSE) were computed to test the model’s performance. Accuracy was determined by applying a winner-take-all scoring system: for each slot on the output layer, we determined which phoneme was closest to the pattern on the output at the final time tick and reported this as the model’s pronunciation; responses that did not match the correct pronunciation were scored as errors. SSE, a stand-in for response latency, was computed from the model’s output at the second to last time tick by adding together the square of the difference between the model’s output and the target for each unit. Because the model was run 40 times with different starting weights and random seeds, we were able to conduct both “by subjects” analyses that treated each run as a subject, and analyses with items as a random factor.
At the end of training, 95% of the test items were named correctly. A 2 (high vs. low frequency) ×3 (type: R-C vs. R-I vs. I-I) ANOVA was conducted to examine the effects of frequency, regularity and consistency, and their interaction (See Table 3). The overall correlation between SSE and response latency for all items in the behavioral experiment was 0.617 (p<0.01).
This analysis revealed a significant frequency effect for both naming accuracy and SSE. The main effect of consistency/regularity was also significant for naming accuracy and SSE. A strong interaction between frequency and consistency/regularity was found for both accuracy and SSE, such that the effect of print-to-sound regularity and consistency were both larger for low than for high-frequency words. Performance was perfect for high-frequency regular words, whereas high-frequency irregulars elicited 1% errors. For low-frequency items, a strong effect of the consistency/regularity factor was observed in naming accuracy and SSE.
Large effects of stimulus type were observed in oneway ANOVAs for each level of frequency (Table 3). To further explore the effects of regularity and consistency for low-frequency items independently, Tukey tests (one-tailed) were conducted for both SSE and accuracy. The effect of consistency (R-I vs. R-C) was reliable for SSE t1 (39) = 12.26, p < 0.01; t2 (38) = 2.29, p < 0.05, but reliable only by participants for accuracy, t1 (39) = 6.83, p < 0.01; t2 (38) = 1.68, p= 0.11. A similar pattern was observed for regularity effects (R-I vs. I-I) among inconsistent words, with regularity effects reliable for SSE, t1 (39) = 9.29, p < 0.01; t2 (38) = 4.57, p < 0.05, and reliable by participants only for accuracy, t1 (39) = 4.65, p < 0.01; t2 (38) = 1.65, p = 0.12.
To test predictions of the model, a character naming experiment was conducted with 39 undergraduates (19 male, 20 female) from Beijing Normal University. All participants were native speakers of Mandarin Chinese with normal or corrected-to-normal vision, aged between 17 and 25. They provided written informed consent and were paid for their participation. Participants sat a comfortable distance from the screen (about 60cm) and were instructed to read aloud single characters into a microphone as quickly and accurately as possible. On each trial, a fixation cross appeared for 500ms, after which the screen was cleared for 120ms and a single character was presented for up to 2000ms (or until a response was made). Stimuli were presented centrally, in white against a black background using 28pt Songti font. Stimulus presentation and response latency collection was controlled using DMASTR software (Forster & Forster, 2003). Stimuli were presented in a different, randomized order for each participant.
Three participants’ data were removed from the analysis because of error rates above 25%. Overall naming accuracy for the remaining 36 participants was 92.8%. Response latency and accuracy for each condition are summarized in figure 3, revealing effects that confirm each of the model’s predictions.
A 2 (high vs. low frequency) ×3 (R-C vs. R-I vs. I-I) ANOVA was conducted both for naming latency and accuracy (See Table 4). The main effect of frequency was observed both for naming accuracy and response latency. The high frequency items were named faster and more accurately than low frequent items. Consistency/regularity influenced participants both on accuracy and response latency. Most critically, the predicted interaction between frequency and the consistency/regularity factor was observed for both accuracy and response latency.
For low frequency items, the effect of consistency (R-I vs. R-C) was reliable for both response latency, t1 (35) = 6.55, p < 0.01; t2 (38) = 2.75, p < 0.01, and for accuracy, t1 (35) = 4.08, p < 0.01; t2 (38) = 2.37, p < 0.05. A regularity effect (R-I vs. I-I) was also found for these items on response latency, t1 (35) = 6.54, p < 0.01; t2 (38) = 3.56, p < 0.01, and accuracy, t1 (1, 35) = 5.39, p < 0.01; t2 (38) = 3.05, p < 0.01.
A computational model of reading aloud in Chinese based on connectionist principles correctly simulated effects of regularity, consistency, frequency and their interaction, as observed in a behavioral study of reading aloud. In both the human and simulation data, regularity and consistency effects were each larger for low- than for high-frequency words. A regularity effect was found for inconsistent items, and was correctly simulated, demonstrating that regularity effects do not depend on categorical rules, but can emerge from the statistics of the writing system.
The interaction between frequency and consistency observed in Study 1 was quite large, and no consistency effect was found for high-frequency items. While this is broadly consistent with many studies of reading in English (Taraban & McClelland, 1987; Balota & Ferraro, 1993; Treiman, Mullennix, Bijeljac-Babic, & Richmond-Welty, 1995), the current results are in apparent conflict with some findings in the literature, particularly with respect to the strength of the frequency by consistency interaction and the lack of a consistency effect for high frequency items. In a series of experiments focusing on the neighborhood statistics that give rise to consistency effects, Jared (1997) found a relatively weak frequency by consistency interaction when neighborhood properties were tightly controlled between high- and low-frequency items. Although there was a trend toward an frequency by consistency interaction for both response latency (non-significant) and accuracy (significant by subjects, but not by items), this failure to replicate the “standard” effect under tight controls raised questions about the real basis of this interaction, which may be relevant to the current study.
The focus of the Jared (1997) study was on the consistency effect for high-frequency items. In a series of experiments, neighborhood properties (described in terms of “friends” and “enemies” as above) were shown to have strong and reliable effects on high frequency items. More immediately relevant to the current study is an experiment by Lee et al. (2005, Experiment 3) in which a reliable consistency effect was observed for high-frequency words.
Thus, the current findings leave some open questions regarding the strong frequency by consistency interaction and the null consistency effect for high frequency items. One possibility is that these effects arise from the types of confounds identified by Jared (1997). A later study (Jared, 2002) examined complex interactions among frequency, regularity and neighborhood properties and found results consistent with a significant frequency by consistency interaction with similar controls, suggesting against such an explanation. Another possibility is that the frequency manipulation in Study 1 was larger than those employed in studies reporting consistency effects for high frequency items. Consistency effects have been found most reliably for items with frequencies closer to 100 per million (Jared, 1997, 2002; Lee et al., 2005), yet in the current study, high frequency items appeared on average about 500 times per million words in the relevant corpora.
To extend and confirm the results of Study 1, it is critical to establish whether the model correctly predicts the interaction (or lack thereof) between frequency and consistency when stricter controls on neighborhood structure and a weaker manipulation of frequency are employed.
It is an open empirical question whether the interaction between frequency and consistency in Chinese is an artifact of the same stimulus properties that appear to enhance this interaction in English. As a first step in exploring whether there are potential confounds between frequency and consistency in the lexicon at large, we conducted a corpus analysis. Each item in the training corpus (4468 items) from Modern Chinese Frequency Dictionary (1986) was tagged with respect to the following dimensions: whether it is a phonogram, which phonetic component it contains, the pronunciation of the phonetic components, and the phonological relationship between phonetic component and character (in terms of regularity and consistency). Phonograms were identified based on the XianDaiHanZiXingShengZiZiHui (Ni, 1982). The number of characters and phonograms were counted as well as three types of phonograms: regular-consistent (R-C), regular-inconsistent (R-I) and irregular-inconsistent (I-I). As in Experiment 1, consistency level was calculated as the ratio of the summed frequency of friends to the summed frequency of the characters in a family, with family defined as all characters sharing the same phonetic component.
As summarized in Table 5, this corpus analysis revealed that consistency level is related to frequency, such that higher frequency items in general tend to be more consistent than low-frequency items. Taken together with the fact that there are many fewer high-frequency items than low-frequency items, and that a smaller proportion of these are phonograms than is the case for the low and middle frequency bands, it does seem plausible to suggest that the interaction between consistency and frequency is due in part to a confound with some additional statistical properties of the input corpus.
The comparisons of the consistency effect for high and low-frequency items may be confounded with differences in consistency level. Although Study 1 controlled for this possibility, as shown in Table 2, it did not control for other properties, such as family size or family frequency, nor for frequency of friends/enemies across levels of frequency for the consistency manipulation. Study 2 directly addresses these shortcomings by contrasting middle and low frequency items, thereby avoiding several of the confounds present in the high frequency stimuli, and enabling an examination of the interaction between consistency and frequency in more detail.
A set of 120 regular characters were selected from the training corpus, with 30 characters in each cell created by crossing frequency (middle/low) with consistency (consistent/inconsistent). Details of the stimulus properties are shown in Table 6. The simpler design permitted better control over family size across levels of consistency, in addition to a number of other stimulus properties such as family frequency, phonetic component frequency, and summed frequency of friends -- which were matched across levels of frequency. Finally, frequency levels were selected to be similar to Lee’s (2005) and Jared’s (1997) experiments 1&2 which showed consistency effect for high/medial frequency items and the weak or null interactions between frequency and consistency. Frequency counts were confirmed by drawing estimates from two corpora: Modern Chinese Frequency Dictionary (1986) and Balanced Corpus of Modern Chinese (Sun, 2006).
All 120 characters were tested on 40 models from Simulation 1. The testing method was the same as in the previous simulation. Both the SSE and naming accuracy were computed at 3 million training trials.
Because of the lower overall frequency, the average SSE was nominally higher than simulation 1 and overall accuracy was lower (91% vs. 95% in simulation 1). The 2 (middle vs. low frequency) × 2 (consistent vs. inconsistent) ANOVA was conducted both for SSE and naming accuracy (Table 7). The overall correlation between SSE and response latency for all items in the behavioral experiment was 0.320 (p<0.01).
A main effect of frequency was found both for SSE and naming accuracy. Middle frequency characters had lower SSE and fewer errors than low frequency characters. Consistent characters were read more easily than inconsistent characters. Finally, a significant interaction between frequency and consistency was found for both SSE and accuracy.
For both accuracy and SSE, the simple consistency effect for middle frequency characters was significant in SSE by participants’ analysis (treating each run of the model as a participant), but not in the item analysis, whereas the consistency effect was strongly significant for low-frequency items in both analyses. This replicates Lee’s (2005) Experiment 3 results. The simple frequency effects for consistent and inconsistent characters were significant both for SSE and naming accuracy (Ps < 0.01). The frequency effect was larger for inconsistent characters (SSE=0.59) than for consistent characters (SSE=0.29).
Forty participants (none had participated in Experiment 1) were recruited from Beijing Normal University to name the new set of 120 characters used in Simulation 2. The procedure was the same as in Experiment 1.
All participants’ data were included and the overall naming accuracy was 92.19%. The reaction time for error responses was replaced with the participant’s condition mean for correct trials. The average reaction time is presented in Figure 4. The ANOVA analysis for naming latency and accuracy is shown in Table 8.
The ANOVA results revealed that participants named middle-frequency characters significantly faster and accurately than low-frequency characters. Naming latencies were significantly longer for inconsistent characters than for consistent characters, and participants were significantly more accurate in naming consistent characters than inconsistent characters. As in Experiment 1, we found a significant interaction between frequency and consistency, both in the latency and accuracy data. The simple frequency effects both for consistent and inconsistent characters were significant (Ps < 0.01); the frequency effect was larger for inconsistent (124ms) than consistent characters (66ms). The consistency effect was 27ms for middle frequency characters and 84ms for low frequency characters. As predicted by Simulation 2, both for reaction time and naming accuracy, medium-frequency characters produced a significant consistency effect in the analysis by participants, but not by items. The consistency effect in low frequency characters was reliable both for participant and item analyses.
In the following analysis, we examine the training corpus and the internal workings of the model to shed light on how these phenomena might emerge from the application of statistical learning mechanisms to the problem of mapping from print to sound in different writing systems.
To examine the statistical properties that might give rise to functional units in reading Chinese characters, descriptive statistics for phonetic components and radicals were calculated in the training corpus: Pronunciations for all characters containing each radical and phonetic component were identified and counted in order to determine the reliability of these two sub-lexical structures in print to sound mappings.
For the 4468 characters in the training corpus, 298 constituent radicals and 1021 phonetic components were identified. Some radicals are also phonetic components, which creates some ambiguity with respect to what the functional units of the print-to-sound mapping might be. For example, as shown in Figure 1B, the radicals “” and “” can function as phonetic components on their own, they also can be combined to form the phonetic component “”. Most phonetic components are composed of more than two radicals, on average 2.4 radicals, with a range of one to six (Figure 5A). Figure 5B shows a histogram of the number of characters and syllables (including and not including tone) that can be formed by radicals and phonetic components. Radicals appear, on average, in about 39 characters with 27 different pronunciations each. In contrast, more than 70% of phonetic components are mapped to no more than four characters (on average about three), with an average of two distinct pronunciations. Thus, phonetic components, if they can be identified, are much stronger cues to pronunciation than radicals.
In our model, no explicit coding was included in the representation to identify phonetic components, rather, the influence of these units emerged in the course of learning the mapping from orthographic to phonological representations. Our model did, however, replicate the empirical finding of a regularity/consistency effect, which is defined in terms of phonetic components. In order to explore the emergence of functional units for print to sound translation in the model, we analyzed the similarity space in hidden layer activations over the course of learning for a subset of items from the training set.
These analyses were carried out on a small subset of the training items, selected in order to probe whether the similarity space defined by hidden unit activations showed evidence of being shaped by phonetic components. A set of items that shared a phonetic component was selected, along with controls for orthographic and phonological similarity. In all, forty-four items were selected, of which 7 shared a phonetic component. In addition to the simple character, “” (/thaI/, three regular (/thaI/:) and three irregular () items from the same family were included. As a comparison group, a set of 27 orthographically similar items was selected to have the same degree of orthographic similarity to the critical items as they would if they shared a phonetic component. This was done by finding items that shared two radicals that did not comprise a phonetic. For example, shares the radicals and with the target item (comprising, and ). In addition, we selected phonologically similar characters, which were 10 high frequency homophones of two irregular characters (), included in order to address the role of phonology by itself in organizing the hidden layer.
Each character was clamped on the input layer for 12 time ticks and the activation values of 200 hidden units were computed at the last time tick. Multidimensional scaling was computed to characterize the similarity space both on the orthographic and hidden layers. Tests for similarity of groups of items (shared phonetic component, orthographic controls, and phonological controls) were conducted on the Euclidean distances within and between groups in this similarity space.
Results from the Euclidean distance measures based on the similarity space of characters derived from hidden unit activations are plotted over the course of training in Figure 6A. At the beginning of training, overall activity levels on the hidden units were quite small (because of the small, random weights to which the model was initialized), resulting in relatively small distances driven entirely by the input. As the model learned to map from print to sound, however, differences emerged, such that by the end of training, items that share a phonetic component were on average more similar to one another than they were to either orthographic or phonological controls. This was supported by a 2 (training period: initial weights, after 3 million trials) by 4 (stimulus comparison: within phonetic component’s family, orthographic control, phonological control, between phonetic component’s family) ANOVA, which demonstrated a main effect of training, F (1,313) = 553.68, MSE = 50.58, p < 0.01; a main effect of stimulus, F (3,313) = 22.17, MSE = 3.95, p < 0.01, and a significant training by stimulus interaction, F (3,313) = 10.78, MSE = 0.99, p < 0.01. Planned comparisons at 3 million training trials revealed that within-phonetic component’s family comparisons had smaller distances (1.41) than the other three conditions: Orthographic controls (1.90, t (46) = 3.1, p < 0.01), phonological controls (2.03, t (29) = 2.49, p < 0.01) or between-family comparisons (2.13, t (278) = 6.58, p < 0.01).
For comparison to the MDS solutions shown for the hidden units, Panel B of Figure 6 shows a solution for the similarity space represented on the orthographic input. Note that, although the members of the same family form a loose cluster, for many items, there are non-family members that are much closer in similarity space than other family members. In Panel C, it is clear that in the initially randomized state, the hidden unit activations essentially recapitulate the input representations. Indeed, correlations between pairwise distances on the input and hidden units before training are nearly perfect (r = 0.99, p < 0.001). After 3 million trials of training (Panel D) items with the same phonetic component (/thaI/) form a relatively tight cluster. Within this cluster, there are also sub-clusters of items that share a pronunciation, so that, for example, the items with the same pronunciation (/thaI/: ) and the items with different pronunciations () occupy distinct regions of the similarity space. The correlation between input and hidden unit representations (r = 0.48, p < 0.001) is smaller than what was observed initially (t (943) = 54.35, p < 0.001).
The relatively high degree of similarity among these items is not simply a reflection of their output similarity. A number of orthographically related homophones ( ) are quite distant from this cluster in the similarity space. The distance between homophones (mean distance = 2.03, STD=0.52, N=10) and irregular items () was significantly larger (t (20) =3.11, p<0.01) than the distance to other phonetic component’s family members (mean distance = 1.37, STD=0.48, N=12).
Figure 7 shows hidden unit activations for a set of items from the similarity space analyses. The overlap among items that share the phonetic component () and are pronounced /thaI/ is strikingly compact. Two hidden units are near their maximum activation for all three of these items. One of these same hidden units is active for all items that contain the phonetic component (), independent of how they are pronounced. In contrast, among items that are equivalently similar in terms of their orthographic representations, but do not share a phonetic component, there is no such overlap in the hidden layer representation. Thus, we can see that functional sub-lexical units are represented in a relatively compact manner in the model.
It is interesting to contrast the patterns of hidden unit representation to what has been observed in similar analyses of English reading models (Harm & Seidenberg, 1999; Harm, McCandliss & Seidenberg, 2003). In those models, individual words evoke much more diffuse and widespread activity over the hidden layer, likely reflecting both the denser representations permitted by the much smaller range of possible inputs, and the contribution of smaller sub-lexical units (e.g., single letters mapping with some consistency onto single phonemes). In Chinese, the relatively large grain size and high degree of arbitrariness require a much sparser representation. This may also explain why a larger number of hidden units are needed for Chinese than for English in these models.
Items that share the same phonetic component are treated as more similar than items that share the same amount of orthographic information, even when these map on to highly distinct phonological outputs. For example, the characters and have essentially no overlap in their pronunciations (/thaI/, ), and yet they share similar representations (in this case, largely defined by a single hidden unit). This example clearly illustrates that the representations arrived at in the acquisition of print-to-sound mappings are organized by both the information-bearing structures in the input and the similarity of their mapping to the output, reflecting the model's extraction of sub-lexical regularities in the translation from print to sound. The model thus demonstrates how compact and superpositional representations of shared orthographic structure may be learned and play a role in reading aloud, even when the grain size of mappings from print to sound is fairly coarse, and the mappings themselves are probabilistic with a high degree of arbitrariness.
This study explored the interactions between consistency and frequency under tight stimulus control and manipulation of “medium” vs. “low” frequency, in order to determine the robustness of this finding. As in previous studies with similar manipulations, the interaction between frequency and consistency was much weaker than in Study 1 (but nonetheless reliable), and a consistency effect was found for the medium-frequency items (although this was only significant by subjects). Thus, the interaction between frequency and consistency in Study 1 was not the result of stimulus factors that could not be controlled due to the complexity of the design. The predictions of the model were confirmed by behavioral experiments on the same testing items; further analyses revealed that the functional units on which the frequency by consistency interaction depends are emergent properties of the application of the statistical learning rules of the model to the corpus.
Analyses of the hidden units revealed that these effects were driven by the organization of representations that encode probabilistic information about the functional units that support mappings from orthography to phonology. Further, they demonstrate how these representations might emerge over development as a result of statistical learning, and without assumptions about the level at which regularities exist in the writing system. Whereas at first, the similarity space defined by hidden unit activations is organized by orthographic similarity, relatively early in training, words that share a phonetic component begin to be represented as more similar to one another than orthographic or phonological controls.
There is some empirical evidence to support the model’s account of learning to read in Chinese as well: Shen and Bear (2000) collected invented spellings from schoolchildren and classified them according to whether they contained orthographic, phonological or semantic errors. They found that the earliest invented spellings involved mostly orthographic confusions, whereas by fourth grade, the majority of invented spellings involved insertion of a phonetic radical that was consistent with the intended character’s pronunciation.
The emergence of regularity and consistency effects in children is also broadly consistent with this analysis. Early readers (second graders) show a strong regularity effect, tending to misread phonograms according to the pronunciation of their phonetic component when it appears by itself (Shu, Anderson, & Wu, 2000). By sixth grade, however, subtler effects of phonetic family (i.e., consistency effects) are also found (Chen, Shu, Wu, & Anderson, 2003; Yang, & Peng, 1997). Analyses of children’s reading materials (Shu, Chen, Wu, & Xuan, 2003) suggest that this transition is driven in part by changes in the statistics of children’s reading materials. Phonetic components are learned as individual words in the early grades, whereas in later grades, families of words that share a phonetic component are learned as the written vocabulary expands to include a larger number of phonograms. A more detailed simulation of this developmental pattern could be achieved with incremental training (Powell, Plaut, &Funnell, 2006).
A computational model of reading aloud in Chinese based on connectionist principles correctly simulated effects of regularity, consistency, frequency and their interaction, as observed in a behavioral study of reading aloud. In both the human and simulation data, regularity and consistency effects were each larger for low- than for high-frequency words. A regularity effect was found for inconsistent items, a finding that would seem on its face inconsistent with this modeling framework. The simulation correctly captured this effect, however, demonstrating that it can be explained without rule-based processing or the inclusion of phonetic components as a special “level” of the representation. A more in-depth exploration of the frequency by consistency interaction demonstrated that it was robust to strict controls of neighborhood characteristics, and could be found even under a relatively weak manipulation of frequency. The model correctly simulated these findings as well. Finally, analyses of the training set and internal representations in the model revealed that its sensitivity to consistency and regularity results from the emergence of phonetic components as functional units over the course of training.
The current simulations applied the same basic architecture and learning rules used to simulate a variety of phenomena in English reading (Harm & Seidenberg, 1999, 2004; Seidenberg & Zevin, 2006), to capture both effects that are common to both English and Chinese – i.e., frequency, consistency and their interaction – and effects that are specific to a particular writing system – i.e., regularity as defined with reference to phonetic components in Chinese. In both cases, the explanation rests on the operation of domain-general statistical learning mechanisms.
Frequency and consistency effects are explained by complementary features of the learning algorithm applied in the current simulation. On the one hand, learning in the models is local, so that practice with a particular item has the greatest impact on performance for that item itself. On the other hand, because representation in the model is superpositional – i.e. many different items are represented as patterns of activation over the same units – the model has an inherent tendency to map similar inputs onto similar outputs. Thus, whereas frequency effects reflect previous experience with a particular word, consistency effects reflect experience with all similar words. The effects of these properties interact because sufficient practice with a particular word can override ambiguity arising from similarly spelled words. These same principles hold, even as the definition of consistency itself (Jared, 2002; Treiman et al., 2006) is further refined, because the critical assumption is the role of statistics at multiple grain sizes, not the particular unit (letter, grapheme, word body or whole word) over which the statistics are relevant.
Studies of the interaction between consistency and frequency – including those presented here -- typically take advantage of the power of factorial designs. Nonetheless, it is clear that both consistency and frequency are essentially continuous variables. Thus it is fair to suggest that the strength of the interaction between them will depend on the details of how each is manipulated. Indeed, some of the contradictory findings across studies reviewed by Jared (1997, 2002) are apparently the result of quantitative differences in the neighborhood structure of inconsistent words. Words like TINT, that have many similarly pronounced, relatively high-frequency “friends” (HINT, MINT, LINT, etc.) and few, relatively low-frequency “enemies” (PINT) show relatively weak or null consistency effects, even when they are relatively low frequency.
Less attention has been paid to the role of the strength of the frequency manipulation in modulating this interaction. The studies reviewed and presented here suggest that the interaction between frequency and consistency is strongest for relatively large frequency manipulations. This is correctly predicted by the model because, as a statistical learning model, it naturally takes into account the continuous nature of both frequency and consistency. Thus the model’s prediction of a frequency by consistency interaction for a particular set of items is not the same as a blanket prediction from a verbal model that these variables should interact under all conditions. That the model fits both strong and weak interactions of frequency and consistency revealed across human studies (and could conceivably fit null effects with the appropriate stimulus manipulations) suggests that humans are sensitive to the same statistical properties that drive its performance.
Regularity effects based on phonetic component pronunciation are unique to the Chinese writing system, and yet they can be explained as resulting from the same properties of statistical learning. Most phonetic components can appear on their own as individual characters (A few bound phonetics are archaic characters whose pronunciation are lost in modern Chinese -- see Shu et al., 2003 -- however all phonetic components of the characters used in the current study were unbound). This creates a very effective learning situation for mapping from a phonetic component to its canonical pronunciation. In terms of the learning algorithm used here, all of the error on learning trials when a phonetic component is presented in isolation accrues to strengthening the connections between that component and its pronunciation. In this way, the canonical pronunciation of each phonetic component attains something of a “special” status, such that characters whose pronunciations deviate from it are more difficult to read. Critically, the special status of phonetic components in these models is driven by statistics of the input rather than by an a priori representational scheme imposed by the modelers.
Previous models of Chinese reading have included separate representational “layers” for radicals and characters (Perfetti, Liu, & Tan, 2005) or designed input representations explicitly around the radical as an unanalyzed, holistic unit (Hsiao & Shillcock, 2005, 2006) and in one case (Xing, Shu, & Li, 2002, 2004) included phonetic components along with their pronunciations as part of the input to the model in order to capture regularity and consistency effects (but see Chen & Peng, 1994). In contrast, the current model uses a distributed representation of Chinese characters that eschews any a priori assumptions about the functional units that underlie spelling-to-sound translation.
In some sense, the current coding scheme for the input layer contains some language-specific features -- a more elaborate model that learned to identify contrastive features from an input that incorporated features of the visual system would be truly language-general, in the sense that it could take either English or Chinese as its input. Nevertheless, the current model is language-general in the sense that its orthographic representation is based on the smallest contrastive unit in the writing system (strokes), and organized to capture the visual similarity of characters, rather than fixing a specific level of representation such as the radical or phonetic component as a functional unit. This is illustrated in Figure 6B, which shows that the similarity space on the orthographic input layer is loosely organized by orthographic units at multiple levels of representation. For example, the cluster of characters in the bottom left hand corner mostly share the (“water”) radical, but some also contain a (“heart”) radical (which also has three strokes and appears on the left). In response to the demands of learning to map from print-to-sound, the model’s representations are increasingly organized by behaviorally relevant structures, i.e., phonetic components (Figures 6C and 6D). Thus, it is not necessary to assume a fixed level of orthographic representation in order to simulate print-to-sound conversion in Chinese.
Previous studies have used priming paradigms to address the issue of whether radicals have special status in the organization of the adult reading system (Ding, Peng, & Taft, 2004; Taft & Zhu, 1997; Taft, Zhu, & Ding, 2000; Taft, Zhu, & Peng, 1999). The results of these studies have shown that radicals influence lexical access, and may be a basic unit of Chinese orthographic representation (Ding, Peng, & Taft, 2004). The analyses presented in Study 2, however, suggest that radicals are an unlikely candidate for a functional unit in mapping from spelling to sound. It may be that they are more consistently related to semantics, or that they are important in some early stage of character identification.
Although it is beyond the scope of the current work to simulate the priming effects that are taken as evidence for radicals as functional units, the model provides an alternative perspective on how functional units emerge over the development of reading, instead of assuming a fixed functional unit for all reading processing. A topic for future research is the emergence of similar structure in mappings from orthography to semantics. Just as the phonetic component of a phonogram provides cues to its pronunciation, the semantic component provides cues to meaning. For example, the radical for water () appears in many phonograms that are related to water (), but it can also appear in phonograms that are not (). This radical may thus be viewed as encoding probabilistic information about semantics at the sub-lexical level.
A number of studies have focused on the role of phonology in computing meaning from sound (e.g., Perfetti & Tan, 1998; Perfetti & Zhang, 1995). This work led to the Lexical Constituency Model (LCM, Perfetti, Liu, & Tan, 2005), which simulates the time-course of effects of graphical, phonological and semantic primes. The central theoretical point of the LCM is that phonology is rapidly and obligatorily computed during reading (even reading for meaning). While there are some data inconsistent with this claim (Zhou & Marslen-Wilson, 2000), there are a large number of phenomena that appear difficult to explain in any other way (Chen, Flores d'Arcais, & Cheung, 1995; Chen & Shu, 2001; Xu, Pollatsek, & Potter, 1999; Zhou & Marslen-Wilson, 1999b).
Because we present a model of the print-to-sound, we cannot directly address the role of phonology in the computation of semantics. A model of English reading with a similar architecture for print-to-sound that also included semantics (Harm & Seidenberg, 2004), was able to correctly simulate a wide range of phonological effects in the computing semantics from print. Given the gross differences in the ease of print-to-sound mapping, and the probabilistic cues to meaning that distinguish Chinese from alphabetic writing systems, we expect the division of labor between direct and phonologically-mediated mappings from print to semantics to differ between the two languages. The current model, however, demonstrates that print to sound mappings can be learned and processed efficiently independently of semantics, and thus could plausibly contribute to semantic activation in a model that included semantics.
In contrast to the LCM, however, we have specified a sub-lexical mechanism for print-to-sound conversion. In the LCM, it is assumed that each character has a stored pronunciation, and these are simulated in practice by hand-coding the pronunciations into the model. The current model learns print-to-sound mappings over distributed representations, and thus is capable of discovering regularities among sub-lexical units of varying size and consistency. The behavioral results presented here (and Hsu, Tsai, Lee, & Tzeng, 2009; Fang, Horng, & Tzeng, 1986; Hue, 1992; Lee, Tsai, Su, Tzeng, & Hung, 2005; Peng & Yang, 1997) demonstrate that these sub-lexical regularities play an important role in a task that directly taps print-to-sound conversion, and their simulation in the model demonstrates that they can be explained in terms of sub-lexical processing.
We have adapted a statistical learning model of reading aloud initially developed to study an alphabetic writing system (English) to an ideographic one (Chinese). This was possible because the central assumptions of the model are simple and readily generalize across writing systems. The success of this adaptation suggests that the same framework can be applied to understanding reading skill across a wide variety of languages, despite gross differences in their surface properties. This is a critical first step in providing a mechanistic explanation of many cross-linguistic phenomena, including the differential impact of factors that predict reading success (McBride-Chang, Chao, Liu, et al., 2005; Shu, Peng, & McBride-Chang, 2008), differences in the prevalence of developmental disorders of reading (Johansson, 2006; Shu, McBride-Chang, Wu, & Liu, 2006; Shu, Meng, Chen, Luan, & Cao, 2005) and patterns of reading disorder subsequent to brain injury (Bi, Han, Weekes, & Shu, 2007; Jefferies, Sage, & Ralph, 2007; Woollams, Lambon Ralph, Plaut, & Patterson, 2007).
The central explanatory principles of this framework may be usefully expanded in future research to examine the unique aspects of semantic encoding in Chinese. Similar types of probabilistic mappings are present in print to meaning in Chinese, and future research should focus on incorporating this very unusual property of the writing system. Preliminary studies suggests that it may be possible to capture the differential division of labor between semantically mediated and direct translation from print to sound, and the differential contributions of phonological and semantic processing across languages in this same framework (Yang, McCandliss, Shu, & Zevin, 2008; Yang, Zevin, Shu, McCandliss, & Li, 2006). Thus, this framework has the potential to explain how the same functional architecture can give rise to both strikingly similar (e.g., the case of regularity and consistency effects) and highly distinct (e.g., differential contributions of phonological abilities to reading acquisition) outcomes across writing systems.
The authors would like to thank Ping Li and Hongbing Xing for contribution on the orthographic representation, Haiyan Zhou, Youyi Li and Xiaojuan Wang’s work on the empirical data, and Mike Harm for technical assistance, the Mikenet software and interesting discussions. This research was supported by NSF of China 30870758, 60534080 and NSF of Beijing 7092051 (HS) grants, NSF REC 0337765 (BDM).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1Here and throughout, transcriptions are in International Phonetic Alphabet. Note that in the Pinyin romanization system, the voiceless, unaspirated stop /p/ is written as a “b.”
2Fully 48% of irregular words share neither onset nor rime with their regular counterparts, making them dramatically less predictable than even “strange” words in English. Interestingly, however, there is some subsyllabic regularity, such that 18% of irregular forms share their onset with regular family members, and 35% share a rime. These are substantially greater than would be predicted if pronunciations were uniformly distributed (spoken Standard Mandarin has 23 onsets and 39 rimes). Note, however, that due to the non-componential nature of the mapping, it is not possible to know whether a particular phonetic component predicts the onset or the rime (or the vowel, or the tone, which we have not explored).