The experiments varied word frequency and whether nonwords were pronounceable pseudowords or unpronounceable random strings of letters. The aim was to examine accuracy and the shapes of RT distributions for correct and error responses as a function of the two variables. The words were high-, low-, and very lowfrequency words (with mean frequency values of 325, 4.4, and .37 per million, respectively; Kučera & Francis, 1967
). Experiments 1 and 2 included all three levels of frequency, and Experiments 3 and 4 included only the high- and low-frequency words. In Experiments 1, 3, 5, and 6, the nonwords were pseudowords, and in Experiments 2 and 4, they were random letter strings. Experiments 5 and 6 examined the hypothesis (e.g., Glanzer & Ehrenreich, 1979
) that word frequency effects are a product of strategies used by subjects, such that the choice of strategy depends on the proportions of high- versus low-frequency words in the experiment. In Experiment 5, 80% of the words were high-frequency words, and in Experiment 6, only 13% were high-frequency words.
Northwestern undergraduates participated in the experiments for credit in an introductory psychology class. Sixteen students participated in Experiment 1, 14 in Experiment 2, 15 in Experiment 3, 17 in Experiment 4, 15 in Experiment 5, and 9 in Experiment 6.
There were 800 high-frequency words, with frequencies from 78 to 10,600 per million (M
= 325, SD
= 645; Kučera & Francis, 1967
); 800 low-frequency words, with frequencies of 4 and 5 per millionM
= 4.41, SD
= 0.19); and 741 very low-frequency words, with frequencies of 1 per million or no occurrence in Kuc era and Francis’s corpus (M
= .365, SD
= .48). All the very low-frequency words did occur in the Merriam
-Webster’s Ninth Collegiate Dictionary
1990, and they were screened by three Northwestern undergraduate students; any words that any one of the three students did not know were eliminated.
From each word, a pseudoword was generated by randomly replacing all the vowels with other vowels (except for u after q), giving a pool of 2,341 nonwords. There was also a pool of 2,400 random letter strings, created by randomly sampling letters from the alphabet and then removing those strings that were pronounceable. The distributions of the numbers of letters per word for each type of word are shown in . The random letter strings had the same proportions for each length as the word strings for the three frequency groups combined, and these are also shown in .
Numbers of Words in the Stimulus Pools per Number of Letters and Type of Stimulus
Stimuli were presented on a personal computer screen, with responses collected from the keyboard. Stimulus presentation and response recording were controlled by a real-time computer system.
Subjects were presented with strings of letters and instructed to decide if each string of letters was or was not an English word, pressing the/key for a word response and the z key for a nonword response. If a response was incorrect, the word “ERROR” was presented on the screen for 750 ms. The intertrial interval was 150 ms. Trials were grouped in blocks of 30; after each block, subjects had a self-paced break. The first block was used for practice and was not included in the data analysis.
In Experiment 1, 5 high-frequency, 5 low-frequency, 5 very low-frequency words, and 15 pseudowords were randomly selected without replacement for each of 50 blocks. No participant was ever presented with both a word and the pseudoword derived from it, and pseudowords were selected from the three pools in proportion to the words used in the experiment in this and subsequent experiments. Each subject was tested on 250 words of each type and on 750 pseudowords. The design of Experiment 2 was the same, except that the nonwords were random letter strings.
In Experiments 3 and 4, we did not use the very low-frequency words to test whether their absence would change the results from Experiments 1 and 2. There were 50 test blocks, each composed of 8 high-frequency words, 7 low-frequency words, and 15 nonwords (pseudowords in Experiment 3 and random letter strings in Experiment 4).
In Experiment 5, in each of 50 blocks of trials, there were 12 high-frequency words, 2 low-frequency words, 1 very low-frequency word, and 15 pseudowords. In Experiment 6, there were also 50 blocks, each with 2 high-frequency words, 13 very low-frequency words, and 15 pseudowords.
Results From Experiments 1–6
Responses longer than 2,000 ms and shorter than 350 ms (around 0.6% of the responses across all the experiments) were eliminated from the analyses. Data from three subjects who stopped participation early were discarded. shows error mean RTs and correct mean RTs as well as standard errors for those RTs. It also provides predictions (discussed later) of the diffusion model. Observed and predicted .1 quantile RTs are also shown in .
Results From Experiments 1–6 as a Function of Stimulus Type
The three pools of words—high-, low-, and very low-frequency—had different numbers of words for each word length. Consequently, any observed effects of word frequency could be due to the differing distributions of word lengths. For each experiment, we analyzed the data using only four- and five-letter strings (which allowed us to almost equate word length) and found that the patterns of results were in each case similar to the ones found with all the stimuli. Thus, all of the analyses that we present are based on all the stimuli.
Correct Responses for Words: Accuracy and Mean RT
As shown in , the data replicated previous research: RTs increased and accuracy decreased as word frequency decreased, and responses were slower and less accurate when pseudowords were used in the experiment than when random letter strings were used. The differences in accuracy rates and RTs among the frequency conditions were larger with pseudowords than with random letter strings.
In all six experiments, the effects of word frequency were significant. In Experiment 1, with pseudowords, the difference in mean correct RTs between high- and low-frequency words was 68 ms, and the difference between low- and very low-frequency words was 40 ms, F
(2, 30) = 188.45, MSE
= 252 ( p
< .05 throughout this article). The difference in probability correct from high- to very low-frequency words was .167, F
(2, 26) = 102.97, MSE
= .0015. In Experiment 2, with random strings of letters as nonwords, responses were about 100 ms faster overall and between −.004 (high-frequency words) and .127 (very low-frequency words) more accurate relative to Experiment 1. The differences in mean RTs for high- and lower frequency words were reduced to 40 ms and 20 ms (cf. James, 1975
; Neely, 1977
) but were still significant, F
(2, 26) = 50.28, MSE
= 237. The decrease in accuracy from high- to low- and very low-frequency words was also reduced to about .04, which was still significant, F
(2, 26) = 16.21, MSE
Experiments 3 and 4, which did not include very low-frequency words, showed the same patterns of results as Experiments 1 and 2. The RT difference between high- and low-frequency words was 66 ms with pseudowords as the nonwords, F(1, 14) = 119.94, MSE = 415, and 38 ms with random letter strings as the nonwords, F(1, 14) = 68.68, MSE = 172; the accuracy rates differences were .127 with pseudowords, F(1, 14) = 55.06, MSE = .0017, and .017 with random letter strings, F(1, 14) = 26.84, MSE = .0000934.
The manipulation of the proportion of high- versus low-frequency words had little effect on the patterns of results in Experiments 5 and 6 compared with Experiments 1 and 3 except that for Experiment 5, there were greater differences between RTs for high-, low-, and very low-frequency words. In Experiment 5, with a high proportion of high-frequency words, mean RTs and accuracy rates were similar to those in Experiment 1. The differences in mean RTs and accuracy across the three levels of word frequency were 89 ms and .072 between high- and low-frequency words and 61 ms and .117 between low- and very low-frequency words, F(2, 26) = 85.20, MSE = 915, for RT; and F(2, 26) = 103.56, MSE = .0012, for accuracy. In Experiment 6, the result of using a high proportion of very low-frequency words was to produce slower responses in all conditions relative to Experiments 1 to 5 and to reduce the difference between high- and very low-frequency words relative to Experiment 5. The difference between high- and very low-frequency words in mean response time was 100 ms and in accuracy rates was .130, both significant: F(1, 8) = 58.09, MSE = 1005; and F(1, 8) = 28.32, MSE = .0028, respectively.
The manipulation of the proportion of high-frequency words in Experiments 5 and 6 was similar to a manipulation that has been labeled frequency blocking
. In frequency blocking, blocks of trials that include only high-frequency words are compared to blocks that include equal proportions of high- and low-frequency words. Generally, RTs for high-frequency words are shorter in blocks that include only high-frequency words (Glanzer & Ehrenreich, 1979
; Gordon, 1983
; G. O. Stone & Van Orden, 1993
). This result contrasts with the results presented here: RTs for high-frequency words were only about 30 ms shorter in Experiment 5, where there was a large proportion of high-frequency words, as in Experiment 6, where there was a low proportion. Also, RTs for high-frequency words in Experiment 5 were longer than in Experiment 1, where the proportions of high- and lower frequency words were about equal. However, the difference in RTs between high- and very low-frequency words was larger in Experiment 5 than in Experiments 1 and 6, suggesting something like a frequency blocking effect on the difference between high- and low-frequency RTs instead of on the RT for high-frequency words alone.
Correct Responses for Nonwords
In general, correct nonword responses had about the same RTs or were a little shorter than the slowest word responses, which were responses for the low- or very low-frequency words. Also, nonword RTs were shorter for random letter strings than for pseudowords (by about 70 to 200 ms) and were more accurate (by about .02 to .04). In Experiments 1 through 6, type of nonword was manipulated between experiments. Experiment 7 compared responses to random letter strings and pseudowords in the same experiment.
The effects of the two main variables—word frequency and type of nonword—on RTs for word stimuli were generally the same for error responses as for correct responses. Just as for correct RTs, error RTs decreased as word frequency increased, and error RTs were shorter when the nonwords were random letter strings than when they were pseudowords. Error RTs for nonwords were about the same as error RTs for low- and very low-frequency words.
The aspect of error responses that strongly constrains the diffusion model is their RT relative to the RT for correct responses. In the experiments with random letter strings as nonwords (Experiments 2 and 4), which were those with highest overall accuracy rates, the pattern was clear: Error RTs were shorter than correct RTs for both words and nonwords.
For Experiments 1, 3, 5, and 6—the experiments with pseudowords—the pattern was complex because there were individual differences (a complexity that is not shown in because the RTs reported in the table are means across subjects). For some subjects in the conditions with highest accuracy (high-frequency words), there were no error responses or very few error responses. These subjects tended to be the slower and more accurate subjects. Because these subjects had few errors in the high-accuracy conditions, the error RTs in these conditions tended to come from fast, lower accuracy subjects (and so the entries in reflect these subjects). In addition to this speed–accuracy effect, we noticed that the fast subjects tended to produce errors faster than correct responses, and the slow subjects tended to produce errors slower than correct responses. To show this, combines Experiments 1, 3, and 5 (Experiment 6 did not have low-frequency words) and splits the data so that accuracy and RTs for the fast and slow subjects are presented separately. Subjects put into the fast group (n
= 24) had mean RTs shorter than the overall mean RT, and subjects put into the slow group (n
= 21) had mean RTs longer than the overall mean RT. This split shows that error RTs for the fast subjects were shorter than correct RTs, whereas error RTs for the slow subjects were longer or about the same as correct RTs (the correct RT – error RT difference was about 30 ms for fast subjects and about −20 ms for slow subjects, averaged over high- and low-frequency words and pseudowords in Experiments 1, 3, and 5). The fast subjects were also somewhat less accurate than the slow subjects (see Ratcliff et al., 1999
, Experiment 1, for discussion of similar speed–accuracy differences across individual subjects).
Fast and Slow Subjects’ Mean RTs for Experiments 1, 3, and 5
The differing patterns for fast versus slow subjects’ correct and error RTs provided one of the main constraints on fitting the diffusion model. The only means the model had to account for the differing patterns was to allow boundary separation and variability in starting point to vary between fast and slow subjects. If we required variability in starting point to be the same for fast and slow subjects, the range of starting points would be a larger proportion of the total boundary separation when the boundary separation was small than when it was large. This can produce fast errors relative to correct responses when boundary separation is small (fast subjects) and slow errors relative to correct responses when boundary separation is large (slow subjects). This pattern of fast versus slow errors also depends on the other parameters; for different combinations of parameter values, the pattern also could be all fast errors or all slow errors for both speed and accuracy conditions (see Ratcliff & Rouder, 1998
; Ratcliff et al., 1999
To examine RT distributions, we used the RTs of each subject to calculate five quantile RTs: the .1, .3, .5, .7, and .9 quantiles. Then we averaged the quantiles across subjects to form the average quantiles shown in . (The average quantiles are not Vincent averages [Vincent, 1912
], which are the averages of means of the RTs within bins [Ratcliff, 1979
]; instead, the averages used here are averages over individual subjects’ quantile RTs. Average quantiles were used because it was more efficient for the model to generate predictions for quantiles.)
RT Distributions for Correct Responses, Experiments 1–6
shows the five quantiles for correct responses for the various types of word stimuli and for nonwords for Experiments 1 and 2. The data are represented by the crosses. The dark gray dots are the output of Monte Carlo simulations of the diffusion model, the +s are best-fitting values from the diffusion model, and the light gray dots are bootstrap samples designed to show the range of the data if the experiment was repeated; these are discussed later. The leading edges of the RT distributions are represented by the .1 quantiles (the lowest cross in each column), whereas their skews are represented by the spread of the higher quantiles. Overall, the RT distributions were positively skewed (i.e., larger separation among the higher quantiles than among the lower quantiles), the typical result in RT studies. When mean RT increased across the word frequency conditions, the distributions moved both in leading edge and spread, with the larger part of the increase in the mean coming from increasing spread of the longer quantiles. Although the effects in the leading edges of the distributions were small, they were significant. When the nonwords were pseudowords (Experiments 1, 3, 5, and 6), the leading edge shifted among the different word frequency conditions more than when the nonwords were random letter strings (Experiments 2 and 4), as is shown in for Experiments 1 and 2 and in for Experiments 3 through 6. With pseudowords, averaging across experiments, the .1 quantile RT for high-frequency words was about 40 ms shorter than the .1 quantiles for low- and very low-frequency words. With random letter strings as nonwords, this difference in the .1 quantile RTs was smaller: 13 ms in Experiment 2 and 14 ms in Experiment 4 (see ). The leading edges as measured by the .1 quantile RT varied significantly as a function of word frequency: Experiment 1, F(2, 30) = 81.37, MSE = 105.40; Experiment 2, F(2, 26) = 23.21, MSE = 55.23; Experiment 3, F(1, 14) = 112.68, MSE = 126.73; Experiment 4, F(1, 16) = 34.67, MSE = 47.12; Experiment 5, F(2, 26) = 35.78, MSE = 292.67; Experiment 6, F(1, 8) = 27.79, MSE = 385.31.
Figure 3 Empirical and predicted .1, .3, .5, .7, and .9 quantiles for the response time (RT) distributions in Experiments 1 and 2. The ×s are quantile RTs plotted against accuracy values calculated from the data. The +s are the predicted values from the (more ...)
There were six main features of the data for modeling:
- For words, accuracy increased and RT decreased (for both correct and error responses) as word frequency increased, and this was true whether the nonwords were random letter strings or pseudowords. The differences between the high- and low-frequency conditions were larger when the nonwords were pseudowords.
- For words, RTs were shorter and accuracy was higher when the nonwords were random letter strings than when they were pseudowords.
- For nonwords, correct responses had about the same RTs as correct responses for the slowest words. Responses were faster for random letter strings than for pseudowords, and accuracy was a little higher.
- Most of the differences in RTs that occurred with increased word frequency were due to decreased skew of the RT distribution.
- However, when the nonwords were pseudowords, there was a moderately large effect of frequency on the leading edge of the RT distribution: The leading edge for high-frequency words was shorter by 40 ms than the leading edges of the RT distributions for lower frequency words and nonwords. When the nonwords were random letter strings, the differences were considerably smaller, about 13–14 ms, but still significant.
- With random letter strings, error RTs were shorter than correct RTs. Error RTs were also shorter than correct RTs with pseudowords but only for fast subjects; for slow subjects, error RTs were about the same as or longer than correct RTs (we present error RT distributions later).
Overall, these six features of the data provide severe constraints on fitting the diffusion model. With only six parameters plus one value of drift rate for each type of word (high-, low-, and very low-frequency) and each type of nonword (pseudowords and random letter strings), the model is required to fit the effects of word frequency and type of nonword on the complete sets of data: mean correct and error RTs for words and nonwords; accuracy rates for words and nonwords; the shapes of the RT distributions, including both skew and leading edge for correct responses for words and nonwords; and the relative speeds of correct versus error responses for words and nonwords.
Method for Fitting the Diffusion Model to Data
To fit the diffusion model to the data, we formed a chi-square statistic and minimized it by adjusting the parameter values using a general SIMPLEX minimization routine. The data that were entered into the minimization routine for each experimental condition were the five quantile RTs averaged across subjects for both correct and error responses and the accuracy values. The quantile response times were fed into the diffusion model, and for each quantile, the cumulative probability of a response by that point in time was generated from the model. Subtracting the cumulative probabilities for each successive quantile from the next higher quantile yields the proportion of responses between each quantile. For the chi-square computation, these are the expected values, to be compared with the observed proportions of responses between the empirical quantiles. The expected values were multiplied by the number of observations to produce expected frequencies. The observed proportions of responses for the quantiles are the proportions of the distribution between successive quantiles (i.e., the proportions between the 0, .1, .3, .5, .7, .9, and 1.0 quantiles are .1, .2, .2, .2, .2, and .1) multiplied by the probability correct for correct response distributions or the probability of error for error response distributions (multiplied by a number proportional to the number of observations in the condition). In a few cases, there were too few error RTs (less than five) to compute error RT quantiles for high-frequency words for more than one or two subjects. In these cases, these error RTs did not contribute to the fit; that is, no value of chi-square was computed for these conditions for error responses. Summing over (observed [O] − expected [E])2/E for correct and error responses for each type of word and nonword gives a single chi-square value to be minimized:
In research on fitting the diffusion model to data with the chi-square method (Ratcliff & Tuerlinckx, 2002
), it was found that parameter values could not be recovered accurately when there were enough long or short outlier RTs in the data to seriously affect the quantile RTs. In fitting the data reported here, we removed short outliers by trimming out responses shorter than 350 ms (e.g., Swensson, 1972
), and we also removed very long outliers (longer than 2,000 ms). Ratcliff and Tuerlinckx explicitly modeled remaining contaminants by assuming that the contaminants in each experimental condition came from a uniform distribution that had maximum and minimum values corresponding to the maximum and minimum RTs in the condition. We performed the fits for the data here both with and without this assumption about contaminants. We found little difference between the two sets of fits and report the fits without the assumptions about contaminants.
The quality of fits of a model to data can sometimes be compromised by averaging. For example, in most of the experiments presented here, there were large differences among subjects in their overall accuracy values (e.g., 10%–15%). Pooling all of the data from all the subjects together can provide a picture that is not representative of any subject. To check for this problem, we computed the accuracy, mean RT, and .1 quantile values for each subject (in each condition) and averaged these values across subjects. The resulting averages looked much like the typical subject, providing reassurance that averaging over subjects as we did for the fits reported here did not introduce biases. This replicated a finding from three earlier studies (Ratcliff, Thapar, & McKoon, 2001
; Thapar, Ratcliff, & McKoon, 2003
) in which the diffusion model was fit both to data pooled over all subjects and to individual subjects; there were no systematic differences between parameter values in the two cases.
In fitting the model to the data from each experiment, all the parameters were held constant across the conditions of the experiment except drift rate. The parameters held constant were as follows: the starting point of the diffusion process (z), the across-trial variability in the starting point (sz), the boundary separation (a), the nondecision time (Ter), the across-trial variability in the nondecision time (st), and the across-trial variability in drift rate (η). Drift rate (v) varied for words of the three different frequencies and for the two types of pseudowords. Variability within a trial s is a scaling parameter (this means that the same fits could be obtained with another value of s by rescaling the rest of the parameters), and its value was fixed at s = .1 for consistency with other published fits of the diffusion model to data.
The best-fitting parameter values are shown in . The values of the boundary separation and starting point parameters (a and z) were highly consistent across the experiments. The type of nonword—pseudowords in Experiments 1, 3, 5, and 6 versus random letter strings in Experiments 2 and 4—produced a small difference in Ter and in st: The values were smaller with random letter strings. However, it was not clear whether this was a systematic or random effect; it was not obtained for Experiments 7, 8, and 9. The parameters for variability in drift (η) and variability in starting point (sz) showed no systematic differences due to pseudowords versus random letter strings (except that η and sz were a little higher for Experiments 2 and 4).
Diffusion Model Best-Fitting Parameters for Experiments 1–7
The only large and reliable effects on parameter values were the effects on drift rates of word frequency and type of nonword. Not surprisingly, the drift rate was higher for high-frequency words than for low-frequency words and was higher for low-frequency words than for very low-frequency words; in addition, the drift rate for random letter strings had a larger negative value than the drift rate for pseudowords.
Of the drift rate effects, the one that might be thought surprising in the context of some models (e.g., G. O. Stone & Van Orden, 1993
) was that drift rate alone captured almost all of the effect on word RTs of the type of nonword. The differences in RTs among words of different frequencies were larger with pseudowords than with random letter strings, and this was accounted for by differences in drift rates: Differences among the drift rates were smaller with pseudowords than with random letter strings. In particular, although the drift rates for high-frequency words were about the same in all the experiments, the drift rates for the low- and very low-frequency words were lower when the nonwords were pseudowords. This is clearly observable in , where the drift rates for low- and very low-frequency words in Experiments 1, 3, and 5 are always numerically smaller than their drift rates in Experiments 2 and 4.
Differences in drift rates also accounted for the shift in the leading edge of the RT distribution for high-frequency words relative to lower frequency words. The leading edge of the high-frequency word distribution was shifted about 40 ms shorter relative to the leading edges of the low-frequency and very low-frequency word distributions when the nonwords were pseudowords. The diffusion model accurately captured this with only differences in drift rate as a function of word frequency (see ). The diffusion model accommodated the shift in leading edge for the reasons discussed in the introduction.
shows fits of the model to the data for correct and error mean RTs, accuracy values, and .1 quantile RTs for correct responses, along with standard errors in these quantities. is designed to show the model’s goodness of fit graphically for the data from Experiments 1 and 2. In the figure, the Xs are the experimental data and the +s are the predicted values from the model with the best-fitting parameter values.
We calculated two different estimates of variability, one using a graphical Monte Carlo method (Ratcliff & Tuerlinckx, 2002
) to show variability in the model’s predictions and the other using a bootstrap method to show variability in the data. For the graphical Monte Carlo method, for each experiment, we first generated sets of simulated data, each set with the same number of observations as for each subject in the experiment. We repeated this to produce the same number of data sets as there were subjects. For each data set, quantile RTs and accuracy values were calculated, and these were averaged over data sets (in the same way the experimental data were averaged over subjects). This was repeated 100 times, and the dark gray dots in plot the quantile RTs and accuracy values for each of the 100 replications (see Ratcliff & Tuerlinckx, 2002
). Variability in accuracy values is represented by the scatter of the dots along the x
-axis and variability in the quantile values is represented by scatter along the y
The bootstrap method we used allows an estimate of the variability that would result if the experiment were rerun with new subjects. We used two levels of random selection. First, for each subject in the experiment, we sampled with replacement from the experimental data for that subject to generate a new set of bootstrap data for the subject. Second, we sampled with replacement from the subjects to produce a new set of subjects, and for each of these subjects, we used the bootstrap data we had generated (as just described). The idea was to represent what would happen with a different random sample of subjects than those that actually participated in the experiment (see Efron, 1982
). For each of the simulated subjects, we calculated their quantile RTs and accuracy values and averaged these across subjects in the same way as for the experimental data. We repeated this 100 times, and the resulting values are plotted as the light gray dots in .
shows that, for Experiments 1 and 2, the two types of simulated data overlap each other, which means that the model predictions vary in the same ways as would be expected from the experimental data. Although only the results for Experiments 1 and 2 are displayed in the figure, we did the same simulations for Experiments 3 through 6. Across all six experiments, only 9 out of the 105 quantile RTs predicted from the model with the best-fitting parameter values (the +s) are outside 2 SE confidence intervals for the bootstrap simulated data (light gray dots), and only 24 out of 105 of the data points (the Xs) are outside 2 SE confidence intervals for the Monte Carlo simulated predictions for the model (dark gray dots).
provides values of mean RT and accuracy for the data, standard error values in each, and the predicted values from the model with best-fitting parameter values. These standard errors supplement the Monte Carlo and bootstrap studies. All except two differences between the data and the predicted accuracy values were within .025, all except two differences between the data and the predicted mean RTs were within 25 ms, and all except one difference between the data and the predicted .1 quantile RTs were within 16 ms. Error RTs are more variable because they are based on many fewer observations; all except one difference between the predicted and data values were within 40 ms.
In lexical decision, the discriminability of words from nonwords is reflected in both accuracy and RT, and so a model must account for both dependent measures. A measure based on accuracy alone, such as d’, ignores the other dependent variable, RT. Measures of RT alone ignore trade-offs in accuracy that accompany changes in speed. The diffusion model provides an integrated account of both speed and accuracy. The RT and accuracy data from Experiments 1 through 6 are translated by the model into drift rates, which are measures of discriminability for the various experimental conditions.
The important result of the experiments is that variations in drift rate account for all the observed effects of word frequency and type of nonword on RT and accuracy. Discrimination of words from nonwords, measured by drift rate, is better for high- than lower frequency words, and it is better when the nonwords are random letter strings than when they are pseudowords. The interaction between these two variables is also a matter of discrimination: The difference in discriminability between high- and lower frequency words is larger with pseudowords than with random letter strings.
(top panel) shows how discriminability can be represented in a two-dimensional signal-detection framework. Two is the smallest number of dimensions that can adequately accommodate the relative drift rates for the various classes of stimuli. Distances among the classes of stimuli in the space represent differences in their wordness values. How the values from the figure enter into the diffusion decision process can be understood as follows: Suppose that the decision process has multiple sources of information feeding into it from the lexicon, including semantic information, phonological information, orthographic information, and other kinds of lexical information. With random letter strings as the nonwords in an experiment, all of these sources of information are valid indicators of whether a stimulus is a word. In effect, if a string of letters looks like a word, if its phonemes are wordlike, and if it has meaning, these sources of evidence combine to produce a high value of wordness and hence a high drift rate toward the “word” decision boundary. But with pseudowords in the experiment instead of random letter strings, some of the sources of information, especially orthographic information, are less reliable and so are not used (or they are weighted much less) in determining drift rate. For lack of better insight into what the two dimensions might unambiguously represent, we label them lexical strength
and orthographic wordlikeness
. Two-dimensional representation suggestions like the one proposed here have been made previously, for example, by G. O. Stone and Van Orden (1993)
Figure 4 Top: An illustrative two-dimensional signal-detection view of distances between the different kinds of stimuli in the experiments. HF = high-frequency words, LF = low-frequency words, VLF = very low-frequency words, PW = pseudowords, and RL = random letter (more ...)
The two-dimensional representation can be embedded for illustrative purposes in a two-dimensional signal-detection framework (e.g., Ashby, 2000
). In the top panel of , pseudowords (PWs) are considerably lower than words on the lexical strength dimension but only a little lower on the orthographic dimension. Random letter strings (RL) are considerably lower on both dimensions than words and pseudowords. High-frequency words are higher on the lexical strength dimension than low-frequency words, which are higher than very low-frequency words. Differences in drift rates between the different item types are given by distances between them in the two-dimensional space. When the nonwords are random letter strings, both dimensions figure into a determination of the distances. The distance between words and nonwords is represented by x
in the figure and the distance among the high-, low-, and very low-frequency words is represented by u
, where u
is determined by projecting the differences among the high-, low-, and very low-frequency words onto the diagonal (as shown by the dashed lines in the figure). When the nonwords are pseudowords, the orthographic dimension is not reliable and so distances are computed on the lexical strength dimension alone. The distance between words and nonwords is represented by y
, and the distance among the three types of words is represented by v
. The relative distances determine the relative values of the drift rates that enter the diffusion model, as shown in the bottom panel of .