|Home | About | Journals | Submit | Contact Us | Français|
Four rats' choices between two levers were differentially reinforced using a runs-test algorithm. On each trial, a runs-test score was calculated based on the last 20 choices. In Experiment 1, the onset of stimulus lights cued when the runs score was smaller than criterion. Following cuing, the correct choice was occasionally reinforced with food, and the incorrect choice resulted in a blackout. Results indicated that this contingency reduced sequential dependencies among successive choice responses. With one exception, subjects' choice rule was well described as biased coin flipping. In Experiment 2, cuing was removed and the reinforcement criterion was changed to a percentile score based on the last 20 reinforced responses. The results replicated those of Experiment 1 in successfully eliminating first-order dependencies in all subjects. For 2 subjects, choice allocation was approximately consistent with nonbiased coin flipping. These results suggest that sequential dependencies may be a function of reinforcement contingency.
The variability of a series of responses, distributed between some alternatives such as left (L) and right (R) levers, has been defined in terms of two properties from the concept of randomness (Neuringer, 2002). First, variability is high if each member of a set is as frequent (overall) as any other member of the set, that is, the relative frequencies (or probabilities) of different response alternatives are similar, as in a uniform probability distribution. Second, variability is high if the relative frequencies of all higher-order sequential combinations, such as dyads, triads, etc. are also (over the long run) equal. The former implies a property of equiprobability, and the latter implies that of sequential independence.
Previous research aimed at producing highly variable performance has used reinforcement contingencies that are based on the relative frequencies of the response alternatives. In most studies, these contingencies have involved frequency-dependent selection. For example, Page and Neuringer (1985) reinforced responses when they had not occurred in the last N trials, whereas Machado (1992) reduced reinforcer likelihood when the frequency of a response increased. These and other studies (Blough, 1966; Bryant & Church, 1974; Denney & Neuringer, 1998; Machado, 1989; Pryor, Haag, O'Reilly, 1969; Schoenfeld, Harris, & Farmer, 1966; Shimp, 1967) all reinforced response alternatives that had a low (or zero) frequency in the recent past.
In many experiments, a single trial consisted of the emission of a response unit, defined by the reinforcement contingency, comprising a four-response sequence of binary choices, such as left (L) and right (R) responses. When observed probabilities of the 16 (24) possible response combinations (e.g., RLRR) were equal, the behavior was deemed to have maximum variability. By definition, any bias in the frequency distribution of the alternatives indicates reduced variability, and exclusive emission of any particular sequence constitutes minimal variability. Thus, such studies were concerned chiefly with the relative frequencies of response alternatives. They attempted to control response bias by reinforcing response distributions that exhibit maximum dispersion (Abreu-Rodrigues, Lattal, dos Santos, & Matos, 2005; Cherot, Jones, & Neuringer, 1996; Cohen, Neuringer, & Rhodes, 1990; Denney & Neuringer, 1998; Doughty & Lattal, 2001; Machado, 1989; McElroy & Neuringer, 1990; Miller & Neuringer, 2000; Mook, Jeffrey, & Neuringer, 1993; Morgan & Neuringer, 1990; Morris, 1987, 1989, 1990; Neuringer, 1991, 1992, 1993; Neuringer, Deiss, & Imig, 2000; Neuringer & Huntley, 1991; Odum, Ward, Barnes, & Burke, 2006; van Hest, van Haaren, & van de Poll, 1989).
Frequency-dependent reinforcement can be used to create sequential independence as well as equiprobability, although it may require a set of more than eight response alternatives. Machado (1992, 1993) systematically investigated the necessary and sufficient conditions of random-like performance. Using a set of two response alternatives (L, R) as targets of a frequency-dependent selection, he found pigeons had a significant tendency to alternate responses: LRLRLR…. Next, using sequences involving two successive responses as targets (LL, LR, RL, RR), some, but not all, pigeons performed double alternation patterns successfully; however, when he used all possible combinations of three-response sequences to define target sets (i.e., LLL, LLR, LRL, LRR, RLL, RLR, RRL, RRR), then all pigeons performed randomly. The results suggest that the last procedure suffices to engender random-like behavior in that all of the possible response sequences have the same strength. If all are equiprobable, then sequential dependencies cannot be present.
It is, however, important to underscore that sequential independence can be achieved even when individual response alternatives are not equally probable (Nickerson, 2002). To illustrate our rationale, consider a case involving two mutually exclusive events, such as heads (H) or tails (T) in a coin toss. An alternation pattern of HTHTHT… shows that the H and T are equiprobable, thereby meeting one standard of randomness; however, it fails a second standard of unpredictability because event order is perfectly predictable based on first order conditional probability. Conversely, sequential independence among events H and T is possible when these two events are not equiprobable [e.g., p(T) > p(H), as when a coin is biased], but their conditional probabilities may reveal independence of a coin's head and tail [i.e., p(H|T) = p(H) and p(T|H) = p(T)]. In a relevant experiment, Machado (1994) used frequency-dependent selection to shape molar response proportions toward various equilibrium values between 0 and 1, and examined sequential dependencies in local response sequences. The procedure successfully altered molar response proportions, and at extreme values, local performance fell midway between biased randomness (sequential independence) and stable sequences (which imply successive dependence). That is, when molar response proportions deviated from .5, stable local patterns that were present at .5 broke down, although not to the extent that they conformed to biased coin flipping.
A more direct approach to controlling sequential dependencies might be more successful in achieving sequential independence, and hence, random-like behavior. One aim of our research is to present an approach based on the idea that run-length frequencies can serve as a basis for contingent reinforcement. Using such a contingency, we can ask whether reinforcement of certain run-length distributions, expected from a putatively “random” source, leads to random-like behavior. To ensure that a reinforcement contingency targets sequential dependency per se, the procedure must have an impact on the sequential dependency of interest but leave the relative frequencies of responses unaffected. That is, the ideal procedure must separate the influence on sequential dependency from any influence on relative frequencies of responses. The procedure we developed here is derived from the runs-test algorithm for randomness from Siegel (1956). A run is defined as an uninterrupted sequence of identical elements delimited by different elements. The number of runs in a sequence equals the number of response alternations plus one. Generally, when the observed number of runs is significantly different from the expected number of runs, calculated according to overall response proportion, the runs test rejects the null hypothesis that the sequence was independent. Plainly, when alternation occurs either too infrequently or too frequently in the sequence, this sequence is regarded as including a certain regular pattern, and the null hypothesis will likely be rejected.
Our procedure reinforced, on each trial, a L or R response possessing a score smaller than the critical value of the runs test. When the symbol K represents an observed number of runs, the expected number of runs () and its variance () were computed according to the following equations:
in which nR and nL represent the number of R and L responses, respectively, in a sample sequence. Then, the runs-test score, S, was calculated as follows:
When nR and nL are large, the distribution approaches the normal distribution and S (Equation 3) is a normal unit variable (hence the familiar value of ±1.96 for alpha = .05). We discuss the relation between the distribution and our procedure further in the General Discussion.
Our procedure used an algorithm that calculated S (from the last 20 responses) every time a response was emitted, and compared it with a critical value to determine whether reinforcement would be delivered. With a fixed sample size of 20, we needed only two parameters for calculation: the proportion of emitted responses [p(L) = 1 − p(R)], and the number of runs. We initially set two critical boundary values for S, ±1.96. Over 20 responses, comprising Rs and Ls, observed S values that fell within these boundaries were eligible for reinforcement. Note that within wide limits, the use of a runs-test score does not require any given proportion of L and R responses for reinforcement. For example, suppose nR and nL were 4 and 16, respectively [p(L) = .8], and K was 4. In this case, the score would be −2.52, the null hypothesis would be rejected, and reinforcement would not be given for the last response. With the same frequencies for L and R but with K = 6, however, the score is −1.04 and is eligible for reinforcement. As this case illustrates, subjects could satisfy the contingency even if the response proportion was quite strongly biased.
In Experiment 1, we introduced the new reinforcement contingency in a modest way, that is, stimulus lights above levers were used as a conditioned reinforcer, because a previous study demonstrated that the effect of a contingency on behavioral variability was stronger under conditioned reinforcement (Cherot et al., 1996), and was maintained in a delayed-reinforcement situation (Odum et al., 2006; Wagner & Neuringer, 2006). Accordingly, stimulus lights were illuminated in Experiment 1 when a subject's performance fell within the criterion range, and a primary reinforcer was provided with p = .1 in that state. Next, in Experiment 2, we removed the conditioned reinforcers and examined the effect of direct reinforcement with a more sophisticated experimental design.
In Experiment 1, we examined the effect of the runs-test contingency with a conditioned reinforcer. We reinforced responses that produced S scores within a required range, but with low probability (.1). To help establish responding that met criteria for sequential independence, we used stimulus lights as a conditioned reinforcer. Two stimulus lights, one above each of two levers, were illuminated when the score of the runs test was within a criterion range, whereas they were extinguished when the score was outside this range. Thus, if a response occurred that met the runs criterion, and the stimulus lights were off, then stimulus lights were turned on. If the lights were already on, then they remained on for as long as successive responses continued to meet the criterion. If the lights were on and the response did not meet criterion, then they were turned off. If the lights were already off and the response did not meet the criterion, they remained off.
Reinforcement occurred only for those responses that met the stipulated runs criteria. Thus, responses that initiated or maintained illumination (i.e. lights on) sometimes received primary reinforcement. Although the aim was to extinguish responses that did not meet the runs criterion, it was necessary to reinforce some of these responses early in the experiment in order to prevent complete extinction in subjects that exhibited low behavioral variability. Accordingly, responses that maintained the lights in the off state did receive some reinforcement at the beginning of this experiment, but the frequency of this reinforcement was lower than for criterial responses.
Four male Wistar rats were maintained at approximately 80% of their free-feeding body weights. Water and sawdust were continuously available in their home cages where a 12-hr light-dark cycle was in effect. At the beginning of the experiment, two 46-week-old subjects (Rat 1 and Rat 3) had previous experience with variability reinforcement schedules; one 48-week-old subject (Rat 5) only had experience with lever-press training; and the 4th subject (Rat 9), which was 32 weeks old, had experience under a concurrent-chains schedule.
The experimental chamber was 210 mm long by 280 mm wide by 270 mm high, and was enclosed in a sound-dampening box. The chamber had a ceiling and side walls constructed of Plexiglas and front and back walls of metal. The front wall contained two shielded stimulus lights (white 28-V bulbs), 120 mm above the floor and 100 mm apart. Two response levers, requiring a force of 0.15 N to operate, were located 70 mm above the floor and 80 mm apart measured center to center. A pellet tray that received 45-mg food pellets was centered between the levers 20 mm above the floor. A shielded houselight (28-V bulb) was on top of the back wall. A speaker for presenting white noise and a ventilating fan were attached on the outer box. All experimental devices were controlled and monitored by a MED-PC version 2.0 system.
Because all rats had previous experimental experience, they were placed immediately in the runs-test procedure. A session consisted of 440 trials per day, and a trial consisted of a single response, L or R. Responses could occur freely except that each one turned off the houselight for 0.2 s, during which further responses had no effect.
After the first 20 responses of the session, each response yielded an S score. If the absolute value of the runs-test score fell within stipulated boundaries, shown as the unshaded cells in Figure 1, then stimulus lights were turned on and a food pellet was delivered with p = .1. At the beginning of the experiment, none of the animals met the criterion. For responses that maintained a lights-off state, responses were reinforced also with p = .1 if the current score was closer to zero than the two previous scores (for responses that turned off the light, this condition could not be met).
The criterion for receipt of a food pellet became stricter as training progressed. In the first experimental condition the critical value on the runs test was set to |±1.96| and the training continued until performances became stable. After performance attained stability, food delivery on light-off trials was terminated. Then, in the second condition the critical value was changed from |±1.96| to |±1.39|, and the training continued until performances became stable.
Sessions continued until the relative frequencies of R responses and the number of alternations were judged to be stable under the following criterion: the last nine sessions were divided into three blocks and the largest difference between the medians of the three blocks was within 15% of the average of the last nine sessions.
Dealing with the sequential dependencies in behavioral variability, the Markov chain model is appropriate (see Machado, 1997). With our contingency, we expect to observe an increased frequency of intermediate numbers of runs according to the proportion of L and R, that is, no first-order dependency. The S-values of the runs test is of limited value here because it does not show whether there are higher-order dependencies. Accordingly, an additional analysis is needed to examine sequential dependencies in greater detail.
There are several methods of tracking the phenomenon, including the use of chi-square goodness-of-fit tests, likelihood ratio tests and an approach based on information theory. Although these indices are related to each other, and there is little to choose among them for statistical analysis, the estimated values of mutual uncertainties provide a valuable visual aid to complement the significance tests which depend on the validity of the chi-square approximation (Attneave, 1959; Chatfield, 1973; Chatfield & Lemon, 1970; Miller & Frick, 1949; Pincus & Singer, 1996). Using these values, we can track the changes in performance as training progressed. We use the mutual uncertainties (Ts) from information theory as follows:
where H1 = −∑pi log2 pi; H2 = −∑ p(i, j) log2 p(i, j)+ ∑pi log2 pi; H3 = −∑ p(i, j, k) log2 p(i, j, k) + ∑p(i, j) log2 p(i, j); and H4 = −∑ p(i, j, k, l) log2 p(i, j, k, l) + ∑ p(i, j, k) log2 p(i, j, k), where i, j, k, l are arbitrary successive responses in a session. We transform Ts into chi-square statistics for observing the variation of estimated values of mutual uncertainties, verifying the statistical test at one time. The chi-square form is as follows:
df = cm−1(c − 1)2 , where N is the length of trial per session, and c is the number of instances, that is, left or right response. The subscript m reflects the order of a dependency, therefore, m is the value we test. Using these indices, we observe the change of sequential dependencies.
In addition to mutual uncertainties, we utilize a lag analysis to examine the obtained response patterns (Machado, 1992, 1993, 1994). If Xn is the response in trial n; then p (Xn+k = R | Xn = R) is the probability of a right response in trial n + k, given a right response in the current trial n. The lag analysis plots p (Xn+k = R | Xn = R) against k, the lag value. Strong deviation from the probability at lag 0 displays sequential dependencies. For example, with perfect alternation (RLRLR…), lag 0 is the probability .5, lag 1 is 0, lag 2 is 1.0, lag 3 is 0, and lag 4 is 1.0. When there are no sequential patterns, all lags approximate the lag 0 value.
Because the first 20 trials in the sessions were stored as samples for calculations and were unaffected by the contingencies of reinforcement, we used the data from the last 420 trials per session to: (1) assess run structure; and (2) examine sequential dependencies.
At every lever press, a runs test score, S, was produced. Figure 2 plots proportions of the S scores whose absolute values were smaller than 1, between 1 and 2, and larger than 2, in each session. In the sessions before the vertical dashed line (Area A), additional food deliveries occurred when the stimulus lights turned off. Sessions after this line (Area B) had no additional food deliveries. In the sessions after the vertical solid line (Area C), the critical value was changed from |1.96| to |1.39|.
At the beginning of Experiment 1, all subjects showed low proportions of S scores in the range −1 to 1. Subjects 1 and 9 showed increases after only a few sessions. Subject 3 initially showed a large proportion of S scores whose absolute values were greater than 2 (ineligible for reinforcement). These decreased, and the proportion between 1 and 2 increased, with further training. Subject 5 showed little differentiation of S scores. After removing additional food deliveries, subjects' performances deteriorated temporarily. When the criterial region narrowed to |1.39|, the performance of all subjects improved in that the proportion of S values in the range −1 to 1 increased, and more extreme S values decreased, although these changes were small for Subject 9.
If the rats responded perfectly according to the reinforcement contingency, all responses in a session would produce S scores in the prescribed range and illuminate the cue lights. Figure 3 plots the proportion of responses that illuminated the cue lights, and hence were eligible for primary reinforcement. Except for Subject 9, whose performance was consistently close to 1.0 after the first few sessions, performances became more and more eligible for reinforcement with extended exposure to the contingency. Therefore, the results indicate that differential reinforcement by the runs-test criterion can modify the subjects' performances.
Runs data alone cannot provide complete evidence for sequential dependencies. Accordingly, we did not employ the runs test as a statistical test and instead, we relied on mutual uncertainties. This approach permitted us to examine sequential dependencies in much greater detail. We examined the way subjects adapted the contingency, that is, whether they developed high-order dependencies as first-order dependency decreased, or whether sequential dependencies were removed altogether.
Figure 4 plots mutual uncertainties, Tm, for m = 1, 2, and 3 (Equation 9). Each column in Figure 4 shows a chi-square value associated, respectively, with T1 (first order), T2 (second order), and T3 (third order) sequential dependencies for each of the 65, 114, 111, and 57 sessions, respectively, for each subject. Note the degree of sequential dependency cannot be an all-or-none phenomenon; it is necessarily a continuum. This is true even after chi-square transformation. Horizontal lines indicate 5% critical chi-square values. Observed chi-square values below the horizontal lines indicate performance that exhibits no sequential dependency. Sessions prior to the point indicated by a vertical line had additional food deliveries with stimulus lights off. These indices are useful for investigating the trends of the sequential-dependency data. Comparing panels horizontally within subjects, the lowest order tends to show the highest level of dependency. Although large values in T1 were generated in the first sessions, for all subjects T1 decreased below the critical value as the training progressed. Subjects 1, 3, and 5 approximated independence at all Tm, although after initially achieving sequential independence, Subject 1 developed a slight first-order dependency towards the end of the experiment. Subject 9 continued to show higher-order dependencies throughout.
A lag analysis was conducted to examine the obtained response patterns. Figure 5 shows results from lag zero to lag 6 in the first seven sessions of Condition |1.96|, and in the last seven sessions in Condition |1.39|. Only the lag profiles of right responses are shown. The profiles of left responses had a similar tendency. Horizontal solid lines indicate unconditioned probability, that is, lag zero values, in each session. If there were no sequential dependencies, all lag values would be similar to lag zero values.
In the first seven sessions Subjects 1, 3, and 5 show stable and consistent tendencies of repetition, like RR or RRR, but Subjects 3 and 5 do not show the same tendencies in the last seven sessions. This means that performance of these subjects approximated sequential independence. The lag profile of Subject 1 in the last seven sessions showed a simple alternation pattern, RL. Subject 9 showed the pattern RLR in first two sessions, which changed over the course of three to seven sessions (RLL, RLLR). Its lag profiles seemed to be similar in pattern in the last seven sessions of Condition |1.39|; however, note that the lag-1 probability approximated that of lag zero. In other words, the first-order dependency disappeared.
Because the lag zero probability coincided with that of its elementary components (L or R), lag zero also indicates response biases in emitting L and R alternatives. In the first seven sessions, most subjects revealed no striking biases. However, in the last seven sessions, some subjects showed a distinct bias for the left lever (see Subjects 1 and 3).
Finally, Figure 6 plots the relative frequencies of four-response sequences as units. Solid lines show the expected values, calculated from the relative frequencies of quadruplets of instances (Jensen, Miller, & Neuringer, 2006). For example, when p(R) = .25 and p(L) = .75, p(LLLL) = .75 × .75 × .75 × .75 = .316 and p(LRLR) = .75 × .25 × .75 × .25 = .035. These are expected from a stochastic process. The first column in Figure 6 shows that subjects' performances deviated from the expected distribution during the first session of the experiment. However the middle and right columns show that their performances changed, and for Subjects 1, 3, and 5, approximated the expected distribution. That is, what 3 of 4 subjects were effectively doing was approximately random. The characteristic of Subject 9's performance was alternation pattern, that is, LLRL, LRLR, LRLL, RLLR, RLRL was emitted frequently.
This experiment was designed to demonstrate a new technique for controlling behavioral variability, using a runs-test criterion. Generally, first-order dependency, that is, T1 in uncertainty indices, was controlled well in all subjects. In addition, results showed that Subjects 1, 3 and 5 achieved sequentially independent behavior by successfully excluding several orders (T1, T2, T3); however, one (Subject 9) maintained higher order dependency.
As discussed earlier, the runs test gauges the number of runs observed in a performance relative to the expected number. Because the production of a run depends on whether subjects repeat or alternate a response emitted on the preceding trial, our runs-test algorithm affected the level of repetition and alternation, that is, first order dependency. The level of repetition and alternation relates directly to the first-order dependency, because both describe the relation between responding on one trial and that on the preceding trial. Therefore, our procedure was successful in eliminating a first-order sequential dependency, in spite of the fact that higher-order dependencies were evident in Subject 9's profile.
Having achieved sequential independence under the runs-test contingency, Subject 1 later developed first-order dependency. This is trivial because the relative distribution of four-response units showed that its behavior closely approximated the expected distribution. We believe that it was the result of an extreme bias (.05:.95) toward one of the two responses. For example, one sequence consisted of 10 consecutive Ls, one R, and nine Ls (i.e., LLLLLLLLLLRLLLLLLLLL); this yields a runs score of 0.33, based on three runs. Such an outcome can occur if the less frequent response (e.g., here R) is not first or last in a series. By contrast, a sequence consisting of nine Ls, two Rs, and nine Ls (i.e., LLLLLLLLLRRLLLLLLLLL) yields a score of −2.28, which is outside the criterial range. In the case of extreme bias, the subject has to emit only one response to the less-preferred lever and return to more-preferred lever. The results of the lag analysis were consistent with this prediction. It was possible that subjects could learn to use the light-off as a cue for switching to the less-chosen lever. However, only 1 rat (Subject 1) developed this and only after much training, suggesting that such an usual discrimination is generally difficult to acquire.
In Experiment 2, we modified the procedure in several ways. First, to make the effects of the runs-test contingency clearer, we trained subjects in a standard concurrent schedule for several sessions before introducing the runs-test contingency. Second, we no longer illuminated the stimulus lights. If subjects had used them as a discriminative stimulus in Experiment 1, then this would permit them to emit different patterns of responses, respectively, in conditions with lights on versus off. Such a discrimination may have contaminated the effect of differential reinforcement. Third, we held the probability of reinforcement constant. Many studies indicate that behavioral variability is influenced by variation of reinforcement frequency (Boren, Moershbaecher, & Whyte, 1978; Gharib, Derby, & Roberts, 2001; Gharib, Gade, & Roberts, 2004; Grunow & Neuringer, 2002; Tatam, Wanchisen, & Hineline, 1993). In Experiment 1, it is possible that the change from less frequent to more frequent reinforcement, rather than the runs-test contingency, was responsible for the development of sequential independence. By keeping reinforcement probability constant in Experiment 2, we eliminated this factor as a source of sequential independence.
Finally, in order to hold the probability of reinforcement constant, we also adjusted the runs-test criterion. Instead of using criterial test values, such as 1.96 and 1.39, we relied upon a percentile criterion (see Alleman & Platt, 1973; Galbicka, 1988, 1994; Machado, 1989). After each response, the current S score was compared against the scores in the last 19 trials. A food pellet was delivered with probability 2/3 if the current score was closer to zero than at least 17 of the previous 19 scores. This procedure can hold the probability of reinforcement constant.
Four male Wistar rats (Subjects A, B, C, D) were maintained at approximately 80% of their free-feeding body weights. They were experimental naïve and 40 weeks old at the start of the experiment. Water and sawdust were continuously available in their home cages where a 12-hr light-dark cycle was in effect.
The apparatus was the same as in Experiment 1 except all experimental devices were controlled by a computer using Visual Basic 2005 Express Edition software.
After subjects were trained to press the lever by hand shaping, they were exposed to a continuous-reinforcement schedule, which provided 100 food deliveries per session. Either the left or right lever provided reinforcement in a given session, and the reinforcing lever was switched after each training session. After a few sessions, when all subjects pressed both levers reliably, two-lever training was initiated. In this procedure, a reinforcer was assigned probabilistically to a particular lever. No further assignments were made until the reinforcer was delivered (Stubbs & Pliskoff, 1969). In the baseline, reinforcers were allocated equally often for left and right responses. Each session ended after 500 responses. The probability of reinforcement was decreased gradually from 1.0 to .1. Once the reinforcement probability had been reduced to .1, it remained at that level until performances stabilized. It is against this baseline that we compare the results from the runs-test phase, which was run next. Both the baseline phase and the runs-test phase had the same probability of reinforcement, but the baseline phase had no runs-test contingency.
In the runs-test contingency phase, the score on each trial was compared against the previous 19. If the current one was closer to zero than at least 17 of previous 19 scores, then a reinforcer was delivered with p = .667. Once the runs test score reached criterion, several trials would be likely to deliver a reinforcer in some cases. Except for the absence of stimulus lights, the remaining procedures and analyses were the same as in Experiment 1.
Again we examine the runs structure of subjects' behavior first, and then the data on sequential dependencies among successive responses.
Figure 7 plots proportions of S scores whose absolute values were smaller than 1, between 1 and 2, and larger than 2, in each session. The sessions before the vertical line are from the baseline phase, where the probability of reinforcement was .1, whereas those after the vertical line indicate differential reinforcement by the runs-test phase with the same probability. In the baseline phase, Subject A showed similar proportions of S scores smaller than 1 and between 1 and 2. Only Subject D showed an increase in the proportion that were smaller than 1. On transition to the test phase, all subjects improved their proportions in this range. Scores for Subjects B and C improved rapidly, while Subject A improved gradually. Comparing the last five sessions between baseline and the runs test phases, all subjects improved their scores. Thus, Figure 7 reveals that in Experiment 2, as in Experiment 1, behavior of all subjects was sensitive to the runs test contingency.
Mutual uncertainties are plotted in Figure 8 for the last five sessions. Results from both baseline and the runs-test phases are shown, separated by a vertical line. Successive columns give chi-square values of T1, T2, and T3. Horizontal lines indicate 5% critical values of the chi square; values below the horizontal lines indicate that performance showed no sequential dependency. The first column (T1) shows that except for Subject D, first-order sequential dependency was present in baseline, but this decreased under the runs-test contingency. Columns for T2 and T3 show that sequential independence was achieved in the higher orders for Subjects A and D, whereas some dependencies remained in Subjects B and C. These results are in broad agreement with those of Experiment 1.
Figure 9 presents a lag analysis for the last five sessions of both phases. Lag profiles showed all subjects favored some response sequence patterns in the baseline phase. Typical patterns were RR (Subjects B and C), or RRL (Subjects A and D). However, in the runs-test phase, such patterns gradually disappeared. For all subjects lag-1 probability was similar to lag 0, that is, the first order dependency disappeared. Moreover, Subjects A and D showed almost no pattern. Subject B retained the same pattern as in baseline, although it became less conspicuous, and Subject C tended to emit L in Lag 2. In comparing these data with lag data of Experiment 1, we see that these subjects exhibit no biases for either other lever; instead, response probabilities were near .5.
Finally, Figure 10 plots the relative frequencies of four-response sequences as units. At the start of the experiment (left column), all subjects tended to repeat responses, that is, LLLL and RRRR are high. Through baseline sessions, their performance was modified somewhat. By the end of baseline training (middle column), for all 4 rats, a common pattern is evident in that the frequency of double-alternation pattern—LLRR and RRLL—increased, and high alternation patterns—LLRL, LRLL, LRLR, RLRL, RLRR—remained low. This pattern was lost by the end of the runs-test phase (right column), and profiles approximated the expected values derived by assuming randomness.
The results in Experiment 2 replicated those of Experiment 1. All subjects were susceptible to a reinforcement contingency that used the runs-test algorithm (Figure 7). In addition to showing their sensitivity to this contingency, subjects' performance came to eliminate sequential dependencies (Figure 8). This tendency was not different between Experiments 1 and 2 in spite of the fact that conditioned reinforcers were removed and primary reinforcement was more strictly controlled in the latter.
Our differential reinforcement procedures were designed to have no effect on response bias. Subjects in Experiment 1 showed a strong bias to the left lever (Figure 5) whereas in Experiment 2 they showed almost no bias. In consequence, they attained uniform distribution of choice between response alternatives (Figures 9 and and10).10). Thus, our results showed we could control variability, producing a sequentially independent pattern, regardless of whatever bias existed; it was not a byproduct of differentially reinforcing equiprobable outcomes.
The present work aimed to demonstrate a new reinforcement contingency based on run-structure analyses of successive responses in a choice task. By using the runs-test algorithm as a criterion for differential reinforcement, we show that first-order response dependencies can be successfully removed. Higher-order dependencies were sometimes present early in training also, and these were often reduced with extended exposure to the contingency. Thus, the new contingency appeared to be effective in modifying the structure of response runs in almost all subjects.
A possible criticism involves our use of the runs test. This test was designed as a test for randomness. Equation 3 is appropriate for cases where at least one of the response alternatives occurred more than 20 times, that is, for large numbers (Siegel, 1956), whereas in our experiment, the sum of both response alternatives is 20. However, we used the runs test not as a statistical test for randomness, but rather as a criterion for differential reinforcement. Thus, the issue becomes whether or not our conclusions about the effects of contingency are reliable in this context. To assess this, we relied upon a nonparametric method, for which Siegel (1956) and Swed and Eisenhart (1943) prepared tables of expected runs based on small samples. These tables provided appropriate critical values in the case of small samples. Thus, if we compare data in Figure 1, calculated from Equation 3, with test-score statistics for this nonparametric test, the latter decreases the risk of Type 1 error (i.e., rejecting a true null hypothesis of no dependency), whereas it increases the risk of Type 2 error. In other words, our use penalizes Type 2 errors more than predicted by the nonparametric test tables. In effect, this means we may have imposed a more severe criterion than required by the runs test. This possibility does not present a problem for our conclusions. Rather we note that the procedure for differential reinforcement requires a sample size that is not so large as to dilute the differential nature of the contingency (Alleman & Platt, 1973; Galbicka, 1988, 1994).
Our procedure involved an interlocking schedule with which two experimental dimensions (K, response proportion) are related. In previous investigations, either the proportion of responses to an alternative, or the number of runs, has been used as the basis for differential reinforcement (Bryant & Church, 1974; Machado, 1997; Neuringer, 1986). By contrast, we attempted to combine these dimensions and to contrive a procedure of differential reinforcement for sequential dependencies. It was different from differential reinforcement of response alternatives with lower frequency in that it permits one response alternative to have high frequency. However, performance approached an equiprobable state and some subjects performed randomly in Experiment 2. Such findings suggest there may be various procedures that will yield highly variable or random behavior. If so, it remains to be determined what the necessary and sufficient conditions are for producing this behavior.
We note two different views on reinforced sequential dependencies, according to different epistemological attitudes, that is, molar and molecular. From the molar standpoint, molar behavioral phenomena, say, allocations of behavior, response rates, and behavioral variability, are regarded as individuals or concrete particulars, as species were (Baum, 2002; Glenn & Field, 1994). From the molecular standpoint, such phenomena are regarded as abstractions or derived things. Glenn (2003) discussed them from the analogy of organic evolutionary theory, in which Maynard Smith (1994) characterized the increases in complexity during evolution of the organic world as resulting from a succession of processes that became possible only when a previous level of complexity had been reached. With behavior, complex behavioral phenomena are regarded as a result of repeated rounds of selection acting on phenomena resulting from earlier rounds of selection. If we regard the phenomena as derived things, we would seek the cause of variation of the behavioral variability in earlier rounds of selection. On the other hand, if we regard them as concrete particulars, we would focus on the effect of the behavioral phenomena at the higher-complexity level. With behavioral variability, Machado (1992, 1997) claimed that dispersion of response alternatives might have been a derivative of more fundamental processes. This claim is reasonable because the process of differential reinforcement of response alternatives with lower frequency produced the behavioral variability. On the other hand, some researchers focused on the effect of variation and repetition as a concrete particular in choice, delayed reinforcement, resistance to change, and so on (Abreu-Rodrigues et al., 2005; Doughty & Lattal, 2001; Neuringer, 1992; Odum, et al, 2006; Wagner & Neuringer, 2006). These studies also bring some fruitful knowledge. Whereas our experiment showed the runs-test contingency effects on sequential dependencies, studies that reveal the effect of sequential patterns on complex behavioral phenomena remain for the future.
This research was supported by grants from Japan Society for the Promotion of Science. We thank Anthony McLean for invaluable suggestions and great editorial effort, Takeharu Igaki for great technical assistance, Taku Ishii and Takayuki Tanno for helpful discussions, and Allen Neuringer, Armando Machado, and Alan Silberberg for editorial help.