The accuracy of letter identification often suffers when letters are presented in strings, even when each of the letters can be identified correctly when presented alone. This effect is more pronounced when letters are presented outside the foveal region (
Bouma, 1970). The difficulty in correctly identifying letters in the presence of other letters is referred to as
crowding.
Crowding is ubiquitous in spatial vision and affects a variety of spatial tasks (for a review, refer to
Levi, 2008). With respect to letter identification, the hallmark of the crowding effect is a reduction in letter identification accuracy for letters flanked by other letters when compared with the performance for identifying single letters. The reduction in accuracy can be a result of (1) assigning a wrong identity to the target letter (
letter-identity errors); and/or (2) assigning the correct identity to the target letter but the wrong position relative to other letters (
letter-reversal errors). The latter type of error is often referred to as a “transposition error” (
Estes, Allmeyer & Reder, 1976) or a “mislocation error” (
Ortiz, 2002;
Chung, Legge & Ortiz, 2003;
Strasburger, 2005). There is evidence that a significant proportion of the errors made when people identify strings of letters away from fixation is due to mislocation errors, e.g. the string of letters “oae” might be mis-read as “aoe” (
Butler & Currie, 1986;
Estes et al, 1976;
Mewhort, Campbell, Marchetti & Campbell, 1981;
Townsend, Taylor & Brown, 1971;
Ortiz, 2002;
Strasburger, 2005;
Strasburger, Harvey & Rentschler, 1991). Given that accurate reading of words and text relies on correct identification of letters in the left to right order, errors in either identification or spatial order of letters may disrupt both word recognition and reading.
In this study, we hypothesized that the accuracy of judging the spatial order (the relative positions) of letters is directly related to the precision of position coding of letters. As such, the goals of this study were to examine the precision of position coding of letters at different distances away from fixation, and to determine whether the imprecision of letter position coding could account for a portion of the errors made in identifying letter strings.
Position judgments can be exquisite under optimal conditions. For example, our ability to judge the relative position of a pair of highly visible lines or dots that are in close proximity to one another (Vernier judgment) can be as precise as a few arc sec (
Westheimer, 1975). This exquisite performance has often been attributed to a
spatial filter mechanism for mediating relative position judgments. According to this model, the visual system compares the contrast response output from spatially localized oriented filters that straddle the crucial target features, thereby deducing the relative position of the two targets (
Klein & Levi, 1985;
Wilson, 1986). However, the precision for judging the relative position of two objects decreases dramatically when the two objects are separated by a few arc min (
Klein & Levi, 1987;
Levi & Klein, 1989;
Waugh & Levi, 1993;
Williams, Enoch & Essock, 1984). For instance,
Williams et al (1984) reported that thresholds for judging the relative position of two dots (a Vernier task) separated by 60 arc min reaches approximately 60 arc sec, an order of magnitude higher than the Vernier threshold for abutting dots. The declining precision in position judgment for widely separated targets is often attributed to the reliance on a less precise mechanism for localization – the
local sign mechanism.
Local signs are hypothetical sensory signals that represent stimulus locations in the visual field. According to
Lotze (1885) who first proposed the notion of local sign, each retinal receptor stimulated by a target will signal a local sign that can be thought of as a location or position tag.
Hering (1899) suggested that for an extended stimulus such as a thin line, positional accuracy of the line can be improved by averaging the local signs along the length of the line. Relative position judgment of a pair of separated Vernier lines could be accomplished by comparing the mean local signs of the two lines. For extended two-dimensional targets, local signs are likely to be computed based on the centroid of each target, because relative position judgment can be equally precise for separated targets that are composed of clusters of dots (
Badcock, Hess & Dobbins, 1996;
Hess, Dakin & Badcock, 1994;
Whitaker & Walker, 1988), irregular shapes (
Patel, Bedell & Ukwade, 1999), or have opposite contrast polarity (
Levi, Jiang & Klein, 1990;
Levi & Waugh, 1996;
Levi & Westheimer, 1987;
O'Shea & Mitchell, 1990).
In this study, we were interested in the position coding of letters. Because letter stimuli are two dimensional, often have irregular shapes, and adjacent letters in text usually have center-to-center spacing greater than a few arc min (the mean spacing between 12 point Times Roman letters, viewed from 40 cm, is approximately 17.4 arc min), we reason that positional information for letters is likely to be based on local signs that are computed based on centroids of the letters. The computation of the centroid of a target, if based on features which themselves have some positional imprecision, should follow a normal distribution in which the spread of the distribution represents the precision of localizing the target. A distribution with a smaller spread implies that we could localize the target with higher precision. A pair of adjacent letters would therefore yield two distributions of the centroid signals, one for each letter. If the spread of the distributions is small enough so that there is little or no overlap, then we would be able to determine the spatial order of the two letters with high precision. In contrast, significant overlapping of the two distributions could cause the letters to be localized in the wrong relative position, i.e. in reversed left-to-right order (see ). This leads to the hypothesis that the imprecision of letter position coding, which is directly related to the spread, or the position noise of the underlying centroid distributions, could account for mislocation errors made in identifying letter strings. This hypothesis predicts that the rate of mislocation errors should increase with the position noise (the width) of the underlying distributions for letter position coding.
In this paper, we first present a simple probabilistic model embodying these concepts. The key parameter of the model is the standard deviation of the underlying distribution of position noise, representing the imprecision of position coding. In Experiment 1, we examined the ability of the model to fit the empirical data on the accuracy of identifying letters at various letter positions, and how the distance from fixation affects the position noise standard deviation. In the experiment, we measured the rate of mislocation errors when the identities of pairs of letters were known to the observers and the task was to indicate the relative position of the two letters. In Experiment 2, we extended the use of the model to account for mislocation errors in the more important case of identifying pairs of unknown letters in the correct order. Experiment 2 included three letter sizes, enabling us to determine the impact of letter size on the model's noise standard deviation. We also examined the potential benefit of a precue for guiding spatial attention to the target location by reducing the position noise.
Our model is similar in its basic concept to the “Overlap Model” recently described by
Gomez, Ratcliff and Perea (2008). By assuming position uncertainty, the overlap model accounts for a number of well-known effects observed in the recognition of strings of letters including replacement errors, transposition errors, letter migration errors, insertion errors and repetition of letters. In the Gomez et al. study, subjects saw strings of five briefly presented letters (60 ms), and then tried to choose this target string from a pair of subsequently presented strings (two-alternative forced choice). The foils differed from the target strings in having letter transpositions, letter insertions, etc. Accuracy data were interpreted using the overlap model. This model, with its six free parameters (Experiment 1) did well in modeling most of the types of letter response errors. Gomez et al. were primarily interested in position uncertainty associated with relative position within the string. For example, their results indicate that position uncertainty is least for the leading letter, increasing monotonically so that the final letter exhibits the largest uncertainty (their Table 3). Presumably, these findings are influenced by both bottom-up factors such as crowding and distance from fixation, and top-down factors such as the lexical status of strings (word vs. non-word, their Experiment 2), linguistic and memory effects in matching the target to the two alternatives in the forced-choice procedure. By contrast, our interest was focused primarily on early sensory coding, especially the impact of retinal eccentricity and character size.