PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
 
J Exp Anal Behav. 2010 January; 93(1): 61–80.
PMCID: PMC2801541

A Runs-Test Algorithm: Contingent Reinforcement and Response Run Structures

Abstract

Four rats' choices between two levers were differentially reinforced using a runs-test algorithm. On each trial, a runs-test score was calculated based on the last 20 choices. In Experiment 1, the onset of stimulus lights cued when the runs score was smaller than criterion. Following cuing, the correct choice was occasionally reinforced with food, and the incorrect choice resulted in a blackout. Results indicated that this contingency reduced sequential dependencies among successive choice responses. With one exception, subjects' choice rule was well described as biased coin flipping. In Experiment 2, cuing was removed and the reinforcement criterion was changed to a percentile score based on the last 20 reinforced responses. The results replicated those of Experiment 1 in successfully eliminating first-order dependencies in all subjects. For 2 subjects, choice allocation was approximately consistent with nonbiased coin flipping. These results suggest that sequential dependencies may be a function of reinforcement contingency.

Keywords: behavioral variability, sequential dependency, the runs test, mutual uncertainty, lever press, rat

The variability of a series of responses, distributed between some alternatives such as left (L) and right (R) levers, has been defined in terms of two properties from the concept of randomness (Neuringer, 2002). First, variability is high if each member of a set is as frequent (overall) as any other member of the set, that is, the relative frequencies (or probabilities) of different response alternatives are similar, as in a uniform probability distribution. Second, variability is high if the relative frequencies of all higher-order sequential combinations, such as dyads, triads, etc. are also (over the long run) equal. The former implies a property of equiprobability, and the latter implies that of sequential independence.

Previous research aimed at producing highly variable performance has used reinforcement contingencies that are based on the relative frequencies of the response alternatives. In most studies, these contingencies have involved frequency-dependent selection. For example, Page and Neuringer (1985) reinforced responses when they had not occurred in the last N trials, whereas Machado (1992) reduced reinforcer likelihood when the frequency of a response increased. These and other studies (Blough, 1966; Bryant & Church, 1974; Denney & Neuringer, 1998; Machado, 1989; Pryor, Haag, O'Reilly, 1969; Schoenfeld, Harris, & Farmer, 1966; Shimp, 1967) all reinforced response alternatives that had a low (or zero) frequency in the recent past.

In many experiments, a single trial consisted of the emission of a response unit, defined by the reinforcement contingency, comprising a four-response sequence of binary choices, such as left (L) and right (R) responses. When observed probabilities of the 16 (24) possible response combinations (e.g., RLRR) were equal, the behavior was deemed to have maximum variability. By definition, any bias in the frequency distribution of the alternatives indicates reduced variability, and exclusive emission of any particular sequence constitutes minimal variability. Thus, such studies were concerned chiefly with the relative frequencies of response alternatives. They attempted to control response bias by reinforcing response distributions that exhibit maximum dispersion (Abreu-Rodrigues, Lattal, dos Santos, & Matos, 2005; Cherot, Jones, & Neuringer, 1996; Cohen, Neuringer, & Rhodes, 1990; Denney & Neuringer, 1998; Doughty & Lattal, 2001; Machado, 1989; McElroy & Neuringer, 1990; Miller & Neuringer, 2000; Mook, Jeffrey, & Neuringer, 1993; Morgan & Neuringer, 1990; Morris, 1987, 1989, 1990; Neuringer, 1991, 1992, 1993; Neuringer, Deiss, & Imig, 2000; Neuringer & Huntley, 1991; Odum, Ward, Barnes, & Burke, 2006; van Hest, van Haaren, & van de Poll, 1989).

Frequency-dependent reinforcement can be used to create sequential independence as well as equiprobability, although it may require a set of more than eight response alternatives. Machado (1992, 1993) systematically investigated the necessary and sufficient conditions of random-like performance. Using a set of two response alternatives (L, R) as targets of a frequency-dependent selection, he found pigeons had a significant tendency to alternate responses: LRLRLR…. Next, using sequences involving two successive responses as targets (LL, LR, RL, RR), some, but not all, pigeons performed double alternation patterns successfully; however, when he used all possible combinations of three-response sequences to define target sets (i.e., LLL, LLR, LRL, LRR, RLL, RLR, RRL, RRR), then all pigeons performed randomly. The results suggest that the last procedure suffices to engender random-like behavior in that all of the possible response sequences have the same strength. If all are equiprobable, then sequential dependencies cannot be present.

It is, however, important to underscore that sequential independence can be achieved even when individual response alternatives are not equally probable (Nickerson, 2002). To illustrate our rationale, consider a case involving two mutually exclusive events, such as heads (H) or tails (T) in a coin toss. An alternation pattern of HTHTHT… shows that the H and T are equiprobable, thereby meeting one standard of randomness; however, it fails a second standard of unpredictability because event order is perfectly predictable based on first order conditional probability. Conversely, sequential independence among events H and T is possible when these two events are not equiprobable [e.g., p(T) > p(H), as when a coin is biased], but their conditional probabilities may reveal independence of a coin's head and tail [i.e., p(H|T)  = p(H) and p(T|H)  = p(T)]. In a relevant experiment, Machado (1994) used frequency-dependent selection to shape molar response proportions toward various equilibrium values between 0 and 1, and examined sequential dependencies in local response sequences. The procedure successfully altered molar response proportions, and at extreme values, local performance fell midway between biased randomness (sequential independence) and stable sequences (which imply successive dependence). That is, when molar response proportions deviated from .5, stable local patterns that were present at .5 broke down, although not to the extent that they conformed to biased coin flipping.

A more direct approach to controlling sequential dependencies might be more successful in achieving sequential independence, and hence, random-like behavior. One aim of our research is to present an approach based on the idea that run-length frequencies can serve as a basis for contingent reinforcement. Using such a contingency, we can ask whether reinforcement of certain run-length distributions, expected from a putatively “random” source, leads to random-like behavior. To ensure that a reinforcement contingency targets sequential dependency per se, the procedure must have an impact on the sequential dependency of interest but leave the relative frequencies of responses unaffected. That is, the ideal procedure must separate the influence on sequential dependency from any influence on relative frequencies of responses. The procedure we developed here is derived from the runs-test algorithm for randomness from Siegel (1956). A run is defined as an uninterrupted sequence of identical elements delimited by different elements. The number of runs in a sequence equals the number of response alternations plus one. Generally, when the observed number of runs is significantly different from the expected number of runs, calculated according to overall response proportion, the runs test rejects the null hypothesis that the sequence was independent. Plainly, when alternation occurs either too infrequently or too frequently in the sequence, this sequence is regarded as including a certain regular pattern, and the null hypothesis will likely be rejected.

Our procedure reinforced, on each trial, a L or R response possessing a score smaller than the critical value of the runs test. When the symbol K represents an observed number of runs, the expected number of runs (An external file that holds a picture, illustration, etc.
Object name is jeab-93-01-05-e09.jpg) and its variance (An external file that holds a picture, illustration, etc.
Object name is jeab-93-01-05-e04.jpg) were computed according to the following equations:

equation image
1

equation image
2

in which nR and nL represent the number of R and L responses, respectively, in a sample sequence. Then, the runs-test score, S, was calculated as follows:

equation image
3

When nR and nL are large, the distribution approaches the normal distribution and S (Equation 3) is a normal unit variable (hence the familiar value of ±1.96 for alpha  = .05). We discuss the relation between the distribution and our procedure further in the General Discussion.

Our procedure used an algorithm that calculated S (from the last 20 responses) every time a response was emitted, and compared it with a critical value to determine whether reinforcement would be delivered. With a fixed sample size of 20, we needed only two parameters for calculation: the proportion of emitted responses [p(L)  = 1 − p(R)], and the number of runs. We initially set two critical boundary values for S, ±1.96. Over 20 responses, comprising Rs and Ls, observed S values that fell within these boundaries were eligible for reinforcement. Note that within wide limits, the use of a runs-test score does not require any given proportion of L and R responses for reinforcement. For example, suppose nR and nL were 4 and 16, respectively [p(L)  = .8], and K was 4. In this case, the score would be −2.52, the null hypothesis would be rejected, and reinforcement would not be given for the last response. With the same frequencies for L and R but with K  = 6, however, the score is −1.04 and is eligible for reinforcement. As this case illustrates, subjects could satisfy the contingency even if the response proportion was quite strongly biased.

In Experiment 1, we introduced the new reinforcement contingency in a modest way, that is, stimulus lights above levers were used as a conditioned reinforcer, because a previous study demonstrated that the effect of a contingency on behavioral variability was stronger under conditioned reinforcement (Cherot et al., 1996), and was maintained in a delayed-reinforcement situation (Odum et al., 2006; Wagner & Neuringer, 2006). Accordingly, stimulus lights were illuminated in Experiment 1 when a subject's performance fell within the criterion range, and a primary reinforcer was provided with p  = .1 in that state. Next, in Experiment 2, we removed the conditioned reinforcers and examined the effect of direct reinforcement with a more sophisticated experimental design.

EXPERIMENT 1

In Experiment 1, we examined the effect of the runs-test contingency with a conditioned reinforcer. We reinforced responses that produced S scores within a required range, but with low probability (.1). To help establish responding that met criteria for sequential independence, we used stimulus lights as a conditioned reinforcer. Two stimulus lights, one above each of two levers, were illuminated when the score of the runs test was within a criterion range, whereas they were extinguished when the score was outside this range. Thus, if a response occurred that met the runs criterion, and the stimulus lights were off, then stimulus lights were turned on. If the lights were already on, then they remained on for as long as successive responses continued to meet the criterion. If the lights were on and the response did not meet criterion, then they were turned off. If the lights were already off and the response did not meet the criterion, they remained off.

Reinforcement occurred only for those responses that met the stipulated runs criteria. Thus, responses that initiated or maintained illumination (i.e. lights on) sometimes received primary reinforcement. Although the aim was to extinguish responses that did not meet the runs criterion, it was necessary to reinforce some of these responses early in the experiment in order to prevent complete extinction in subjects that exhibited low behavioral variability. Accordingly, responses that maintained the lights in the off state did receive some reinforcement at the beginning of this experiment, but the frequency of this reinforcement was lower than for criterial responses.

Method

Subjects

Four male Wistar rats were maintained at approximately 80% of their free-feeding body weights. Water and sawdust were continuously available in their home cages where a 12-hr light-dark cycle was in effect. At the beginning of the experiment, two 46-week-old subjects (Rat 1 and Rat 3) had previous experience with variability reinforcement schedules; one 48-week-old subject (Rat 5) only had experience with lever-press training; and the 4th subject (Rat 9), which was 32 weeks old, had experience under a concurrent-chains schedule.

Apparatus

The experimental chamber was 210 mm long by 280 mm wide by 270 mm high, and was enclosed in a sound-dampening box. The chamber had a ceiling and side walls constructed of Plexiglas and front and back walls of metal. The front wall contained two shielded stimulus lights (white 28-V bulbs), 120 mm above the floor and 100 mm apart. Two response levers, requiring a force of 0.15 N to operate, were located 70 mm above the floor and 80 mm apart measured center to center. A pellet tray that received 45-mg food pellets was centered between the levers 20 mm above the floor. A shielded houselight (28-V bulb) was on top of the back wall. A speaker for presenting white noise and a ventilating fan were attached on the outer box. All experimental devices were controlled and monitored by a MED-PC version 2.0 system.

Procedure

Because all rats had previous experimental experience, they were placed immediately in the runs-test procedure. A session consisted of 440 trials per day, and a trial consisted of a single response, L or R. Responses could occur freely except that each one turned off the houselight for 0.2 s, during which further responses had no effect.

After the first 20 responses of the session, each response yielded an S score. If the absolute value of the runs-test score fell within stipulated boundaries, shown as the unshaded cells in Figure 1, then stimulus lights were turned on and a food pellet was delivered with p  = .1. At the beginning of the experiment, none of the animals met the criterion. For responses that maintained a lights-off state, responses were reinforced also with p  = .1 if the current score was closer to zero than the two previous scores (for responses that turned off the light, this condition could not be met).

Fig 1
All possible scores on the runs test in the number of sample  = 20, calculated from Equation 3. Response proportion is nx / (nx + ny), in which nx is the less-chosen response alternative. White cells signify data within ...

The criterion for receipt of a food pellet became stricter as training progressed. In the first experimental condition the critical value on the runs test was set to |±1.96| and the training continued until performances became stable. After performance attained stability, food delivery on light-off trials was terminated. Then, in the second condition the critical value was changed from |±1.96| to |±1.39|, and the training continued until performances became stable.

Sessions continued until the relative frequencies of R responses and the number of alternations were judged to be stable under the following criterion: the last nine sessions were divided into three blocks and the largest difference between the medians of the three blocks was within 15% of the average of the last nine sessions.

Data Analysis

Dealing with the sequential dependencies in behavioral variability, the Markov chain model is appropriate (see Machado, 1997). With our contingency, we expect to observe an increased frequency of intermediate numbers of runs according to the proportion of L and R, that is, no first-order dependency. The S-values of the runs test is of limited value here because it does not show whether there are higher-order dependencies. Accordingly, an additional analysis is needed to examine sequential dependencies in greater detail.

There are several methods of tracking the phenomenon, including the use of chi-square goodness-of-fit tests, likelihood ratio tests and an approach based on information theory. Although these indices are related to each other, and there is little to choose among them for statistical analysis, the estimated values of mutual uncertainties provide a valuable visual aid to complement the significance tests which depend on the validity of the chi-square approximation (Attneave, 1959; Chatfield, 1973; Chatfield & Lemon, 1970; Miller & Frick, 1949; Pincus & Singer, 1996). Using these values, we can track the changes in performance as training progressed. We use the mutual uncertainties (Ts) from information theory as follows:

equation image

equation image

equation image

where H1  = −∑pi log2 pi; H2  = −∑ p(i, j) log2 p(i, j)+ ∑pi log2 pi; H3  = −∑ p(i, j, k) log2 p(i, j, k) + ∑p(i, j) log2 p(i, j); and H4  = −∑ p(i, j, k, l) log2 p(i, j, k, l) + ∑ p(i, j, k) log2 p(i, j, k), where i, j, k, l are arbitrary successive responses in a session. We transform Ts into chi-square statistics for observing the variation of estimated values of mutual uncertainties, verifying the statistical test at one time. The chi-square form is as follows:

equation image

df  = cm−1(c − 1)2 , where N is the length of trial per session, and c is the number of instances, that is, left or right response. The subscript m reflects the order of a dependency, therefore, m is the value we test. Using these indices, we observe the change of sequential dependencies.

In addition to mutual uncertainties, we utilize a lag analysis to examine the obtained response patterns (Machado, 1992, 1993, 1994). If Xn is the response in trial n; then p (Xn+k  = R | Xn  = R) is the probability of a right response in trial n + k, given a right response in the current trial n. The lag analysis plots p (Xn+k  = R | Xn  = R) against k, the lag value. Strong deviation from the probability at lag 0 displays sequential dependencies. For example, with perfect alternation (RLRLR…), lag 0 is the probability .5, lag 1 is 0, lag 2 is 1.0, lag 3 is 0, and lag 4 is 1.0. When there are no sequential patterns, all lags approximate the lag 0 value.

Results and Discussion

Because the first 20 trials in the sessions were stored as samples for calculations and were unaffected by the contingencies of reinforcement, we used the data from the last 420 trials per session to: (1) assess run structure; and (2) examine sequential dependencies.

Runs Analyses

At every lever press, a runs test score, S, was produced. Figure 2 plots proportions of the S scores whose absolute values were smaller than 1, between 1 and 2, and larger than 2, in each session. In the sessions before the vertical dashed line (Area A), additional food deliveries occurred when the stimulus lights turned off. Sessions after this line (Area B) had no additional food deliveries. In the sessions after the vertical solid line (Area C), the critical value was changed from |1.96| to |1.39|.

Fig 2
Proportions of absolute values of S in three ranges: smaller than 1, between 1 and 2, and larger than 2, in each session of Experiment 1. In Area A, the criterial range for S scores was +/− 1.96, and there were additional food deliveries ...

At the beginning of Experiment 1, all subjects showed low proportions of S scores in the range −1 to 1. Subjects 1 and 9 showed increases after only a few sessions. Subject 3 initially showed a large proportion of S scores whose absolute values were greater than 2 (ineligible for reinforcement). These decreased, and the proportion between 1 and 2 increased, with further training. Subject 5 showed little differentiation of S scores. After removing additional food deliveries, subjects' performances deteriorated temporarily. When the criterial region narrowed to |1.39|, the performance of all subjects improved in that the proportion of S values in the range −1 to 1 increased, and more extreme S values decreased, although these changes were small for Subject 9.

If the rats responded perfectly according to the reinforcement contingency, all responses in a session would produce S scores in the prescribed range and illuminate the cue lights. Figure 3 plots the proportion of responses that illuminated the cue lights, and hence were eligible for primary reinforcement. Except for Subject 9, whose performance was consistently close to 1.0 after the first few sessions, performances became more and more eligible for reinforcement with extended exposure to the contingency. Therefore, the results indicate that differential reinforcement by the runs-test criterion can modify the subjects' performances.

Fig 3
Proportions of responses that illuminated the cue lights in each session of Experiment 1. Areas A, B, and C are the same as for Figure 2.

Analyses of Sequential Dependencies

Runs data alone cannot provide complete evidence for sequential dependencies. Accordingly, we did not employ the runs test as a statistical test and instead, we relied on mutual uncertainties. This approach permitted us to examine sequential dependencies in much greater detail. We examined the way subjects adapted the contingency, that is, whether they developed high-order dependencies as first-order dependency decreased, or whether sequential dependencies were removed altogether.

Figure 4 plots mutual uncertainties, Tm, for m  = 1, 2, and 3 (Equation 9). Each column in Figure 4 shows a chi-square value associated, respectively, with T1 (first order), T2 (second order), and T3 (third order) sequential dependencies for each of the 65, 114, 111, and 57 sessions, respectively, for each subject. Note the degree of sequential dependency cannot be an all-or-none phenomenon; it is necessarily a continuum. This is true even after chi-square transformation. Horizontal lines indicate 5% critical chi-square values. Observed chi-square values below the horizontal lines indicate performance that exhibits no sequential dependency. Sessions prior to the point indicated by a vertical line had additional food deliveries with stimulus lights off. These indices are useful for investigating the trends of the sequential-dependency data. Comparing panels horizontally within subjects, the lowest order tends to show the highest level of dependency. Although large values in T1 were generated in the first sessions, for all subjects T1 decreased below the critical value as the training progressed. Subjects 1, 3, and 5 approximated independence at all Tm, although after initially achieving sequential independence, Subject 1 developed a slight first-order dependency towards the end of the experiment. Subject 9 continued to show higher-order dependencies throughout.

Fig 4
Mutual uncertainties in Experiment 1. The difference (Tm) between successive uncertainty indices (Hm and Hm+1) for each subject for each order of sequential dependency. Horizontal lines indicate critical values for chi square. Data points below ...

A lag analysis was conducted to examine the obtained response patterns. Figure 5 shows results from lag zero to lag 6 in the first seven sessions of Condition |1.96|, and in the last seven sessions in Condition |1.39|. Only the lag profiles of right responses are shown. The profiles of left responses had a similar tendency. Horizontal solid lines indicate unconditioned probability, that is, lag zero values, in each session. If there were no sequential dependencies, all lag values would be similar to lag zero values.

Fig 5
Conditional probability profiles of right responses for the first seven sessions with the criterion set at +/− 1.96 and the last seven sessions with it set at +/− 1.39, in Experiment 1. Each set of seven connected points, ...

In the first seven sessions Subjects 1, 3, and 5 show stable and consistent tendencies of repetition, like RR or RRR, but Subjects 3 and 5 do not show the same tendencies in the last seven sessions. This means that performance of these subjects approximated sequential independence. The lag profile of Subject 1 in the last seven sessions showed a simple alternation pattern, RL. Subject 9 showed the pattern RLR in first two sessions, which changed over the course of three to seven sessions (RLL, RLLR). Its lag profiles seemed to be similar in pattern in the last seven sessions of Condition |1.39|; however, note that the lag-1 probability approximated that of lag zero. In other words, the first-order dependency disappeared.

Because the lag zero probability coincided with that of its elementary components (L or R), lag zero also indicates response biases in emitting L and R alternatives. In the first seven sessions, most subjects revealed no striking biases. However, in the last seven sessions, some subjects showed a distinct bias for the left lever (see Subjects 1 and 3).

Finally, Figure 6 plots the relative frequencies of four-response sequences as units. Solid lines show the expected values, calculated from the relative frequencies of quadruplets of instances (Jensen, Miller, & Neuringer, 2006). For example, when p(R)  = .25 and p(L)  = .75, p(LLLL)  = .75 × .75 × .75 × .75  = .316 and p(LRLR)  = .75 × .25 × .75 × .25  = .035. These are expected from a stochastic process. The first column in Figure 6 shows that subjects' performances deviated from the expected distribution during the first session of the experiment. However the middle and right columns show that their performances changed, and for Subjects 1, 3, and 5, approximated the expected distribution. That is, what 3 of 4 subjects were effectively doing was approximately random. The characteristic of Subject 9's performance was alternation pattern, that is, LLRL, LRLR, LRLL, RLLR, RLRL was emitted frequently.

Fig 6
The relative distributions of four-response units in the first session of Experiment 1, the last session with the criterion set at +/− 1.96, and the last session with it set at +/− 1.39, arranged in successive columns. ...

This experiment was designed to demonstrate a new technique for controlling behavioral variability, using a runs-test criterion. Generally, first-order dependency, that is, T1 in uncertainty indices, was controlled well in all subjects. In addition, results showed that Subjects 1, 3 and 5 achieved sequentially independent behavior by successfully excluding several orders (T1, T2, T3); however, one (Subject 9) maintained higher order dependency.

As discussed earlier, the runs test gauges the number of runs observed in a performance relative to the expected number. Because the production of a run depends on whether subjects repeat or alternate a response emitted on the preceding trial, our runs-test algorithm affected the level of repetition and alternation, that is, first order dependency. The level of repetition and alternation relates directly to the first-order dependency, because both describe the relation between responding on one trial and that on the preceding trial. Therefore, our procedure was successful in eliminating a first-order sequential dependency, in spite of the fact that higher-order dependencies were evident in Subject 9's profile.

Having achieved sequential independence under the runs-test contingency, Subject 1 later developed first-order dependency. This is trivial because the relative distribution of four-response units showed that its behavior closely approximated the expected distribution. We believe that it was the result of an extreme bias (.05:.95) toward one of the two responses. For example, one sequence consisted of 10 consecutive Ls, one R, and nine Ls (i.e., LLLLLLLLLLRLLLLLLLLL); this yields a runs score of 0.33, based on three runs. Such an outcome can occur if the less frequent response (e.g., here R) is not first or last in a series. By contrast, a sequence consisting of nine Ls, two Rs, and nine Ls (i.e., LLLLLLLLLRRLLLLLLLLL) yields a score of −2.28, which is outside the criterial range. In the case of extreme bias, the subject has to emit only one response to the less-preferred lever and return to more-preferred lever. The results of the lag analysis were consistent with this prediction. It was possible that subjects could learn to use the light-off as a cue for switching to the less-chosen lever. However, only 1 rat (Subject 1) developed this and only after much training, suggesting that such an usual discrimination is generally difficult to acquire.

EXPERIMENT 2

In Experiment 2, we modified the procedure in several ways. First, to make the effects of the runs-test contingency clearer, we trained subjects in a standard concurrent schedule for several sessions before introducing the runs-test contingency. Second, we no longer illuminated the stimulus lights. If subjects had used them as a discriminative stimulus in Experiment 1, then this would permit them to emit different patterns of responses, respectively, in conditions with lights on versus off. Such a discrimination may have contaminated the effect of differential reinforcement. Third, we held the probability of reinforcement constant. Many studies indicate that behavioral variability is influenced by variation of reinforcement frequency (Boren, Moershbaecher, & Whyte, 1978; Gharib, Derby, & Roberts, 2001; Gharib, Gade, & Roberts, 2004; Grunow & Neuringer, 2002; Tatam, Wanchisen, & Hineline, 1993). In Experiment 1, it is possible that the change from less frequent to more frequent reinforcement, rather than the runs-test contingency, was responsible for the development of sequential independence. By keeping reinforcement probability constant in Experiment 2, we eliminated this factor as a source of sequential independence.

Finally, in order to hold the probability of reinforcement constant, we also adjusted the runs-test criterion. Instead of using criterial test values, such as 1.96 and 1.39, we relied upon a percentile criterion (see Alleman & Platt, 1973; Galbicka, 1988, 1994; Machado, 1989). After each response, the current S score was compared against the scores in the last 19 trials. A food pellet was delivered with probability 2/3 if the current score was closer to zero than at least 17 of the previous 19 scores. This procedure can hold the probability of reinforcement constant.

Method

Subjects

Four male Wistar rats (Subjects A, B, C, D) were maintained at approximately 80% of their free-feeding body weights. They were experimental naïve and 40 weeks old at the start of the experiment. Water and sawdust were continuously available in their home cages where a 12-hr light-dark cycle was in effect.

Apparatus

The apparatus was the same as in Experiment 1 except all experimental devices were controlled by a computer using Visual Basic 2005 Express Edition software.

Procedure

After subjects were trained to press the lever by hand shaping, they were exposed to a continuous-reinforcement schedule, which provided 100 food deliveries per session. Either the left or right lever provided reinforcement in a given session, and the reinforcing lever was switched after each training session. After a few sessions, when all subjects pressed both levers reliably, two-lever training was initiated. In this procedure, a reinforcer was assigned probabilistically to a particular lever. No further assignments were made until the reinforcer was delivered (Stubbs & Pliskoff, 1969). In the baseline, reinforcers were allocated equally often for left and right responses. Each session ended after 500 responses. The probability of reinforcement was decreased gradually from 1.0 to .1. Once the reinforcement probability had been reduced to .1, it remained at that level until performances stabilized. It is against this baseline that we compare the results from the runs-test phase, which was run next. Both the baseline phase and the runs-test phase had the same probability of reinforcement, but the baseline phase had no runs-test contingency.

In the runs-test contingency phase, the score on each trial was compared against the previous 19. If the current one was closer to zero than at least 17 of previous 19 scores, then a reinforcer was delivered with p  = .667. Once the runs test score reached criterion, several trials would be likely to deliver a reinforcer in some cases. Except for the absence of stimulus lights, the remaining procedures and analyses were the same as in Experiment 1.

Results and Discussion

Again we examine the runs structure of subjects' behavior first, and then the data on sequential dependencies among successive responses.

Runs Analyses

Figure 7 plots proportions of S scores whose absolute values were smaller than 1, between 1 and 2, and larger than 2, in each session. The sessions before the vertical line are from the baseline phase, where the probability of reinforcement was .1, whereas those after the vertical line indicate differential reinforcement by the runs-test phase with the same probability. In the baseline phase, Subject A showed similar proportions of S scores smaller than 1 and between 1 and 2. Only Subject D showed an increase in the proportion that were smaller than 1. On transition to the test phase, all subjects improved their proportions in this range. Scores for Subjects B and C improved rapidly, while Subject A improved gradually. Comparing the last five sessions between baseline and the runs test phases, all subjects improved their scores. Thus, Figure 7 reveals that in Experiment 2, as in Experiment 1, behavior of all subjects was sensitive to the runs test contingency.

Fig 7
Proportions of absolute values of S in three ranges: smaller than 1, between 1 and 2, and larger than 2 in each session of Experiment 2. The area before the vertical line is baseline and that after it is the runs-test phase.

Sequential Dependency Analyses

Mutual uncertainties are plotted in Figure 8 for the last five sessions. Results from both baseline and the runs-test phases are shown, separated by a vertical line. Successive columns give chi-square values of T1, T2, and T3. Horizontal lines indicate 5% critical values of the chi square; values below the horizontal lines indicate that performance showed no sequential dependency. The first column (T1) shows that except for Subject D, first-order sequential dependency was present in baseline, but this decreased under the runs-test contingency. Columns for T2 and T3 show that sequential independence was achieved in the higher orders for Subjects A and D, whereas some dependencies remained in Subjects B and C. These results are in broad agreement with those of Experiment 1.

Fig 8
Mutual uncertainties of last five sessions under each phase of Experiment 2. Horizontal lines indicate critical values for chi square. Data points below the critical value represent no significant difference between Hm and Hm+1. See text for calculations ...

Figure 9 presents a lag analysis for the last five sessions of both phases. Lag profiles showed all subjects favored some response sequence patterns in the baseline phase. Typical patterns were RR (Subjects B and C), or RRL (Subjects A and D). However, in the runs-test phase, such patterns gradually disappeared. For all subjects lag-1 probability was similar to lag 0, that is, the first order dependency disappeared. Moreover, Subjects A and D showed almost no pattern. Subject B retained the same pattern as in baseline, although it became less conspicuous, and Subject C tended to emit L in Lag 2. In comparing these data with lag data of Experiment 1, we see that these subjects exhibit no biases for either other lever; instead, response probabilities were near .5.

Fig 9
Conditional probability profiles of right responses for the last five sessions in the baseline and runs test phases of Experiment 2. Each set of seven connected points, Lags 0 to 6, correspond to one session. The horizontal dotted line represents p of ...

Finally, Figure 10 plots the relative frequencies of four-response sequences as units. At the start of the experiment (left column), all subjects tended to repeat responses, that is, LLLL and RRRR are high. Through baseline sessions, their performance was modified somewhat. By the end of baseline training (middle column), for all 4 rats, a common pattern is evident in that the frequency of double-alternation pattern—LLRR and RRLL—increased, and high alternation patterns—LLRL, LRLL, LRLR, RLRL, RLRR—remained low. This pattern was lost by the end of the runs-test phase (right column), and profiles approximated the expected values derived by assuming randomness.

Fig 10
The relative distributions of four-response units. The left column is for the first session of Experiment 2, the middle column is for the last session of baseline, and the right column is for the last session of the runs-test phase. Lines are expected ...

The results in Experiment 2 replicated those of Experiment 1. All subjects were susceptible to a reinforcement contingency that used the runs-test algorithm (Figure 7). In addition to showing their sensitivity to this contingency, subjects' performance came to eliminate sequential dependencies (Figure 8). This tendency was not different between Experiments 1 and 2 in spite of the fact that conditioned reinforcers were removed and primary reinforcement was more strictly controlled in the latter.

Our differential reinforcement procedures were designed to have no effect on response bias. Subjects in Experiment 1 showed a strong bias to the left lever (Figure 5) whereas in Experiment 2 they showed almost no bias. In consequence, they attained uniform distribution of choice between response alternatives (Figures 9 and and10).10). Thus, our results showed we could control variability, producing a sequentially independent pattern, regardless of whatever bias existed; it was not a byproduct of differentially reinforcing equiprobable outcomes.

GENERAL DISCUSSION

The present work aimed to demonstrate a new reinforcement contingency based on run-structure analyses of successive responses in a choice task. By using the runs-test algorithm as a criterion for differential reinforcement, we show that first-order response dependencies can be successfully removed. Higher-order dependencies were sometimes present early in training also, and these were often reduced with extended exposure to the contingency. Thus, the new contingency appeared to be effective in modifying the structure of response runs in almost all subjects.

A possible criticism involves our use of the runs test. This test was designed as a test for randomness. Equation 3 is appropriate for cases where at least one of the response alternatives occurred more than 20 times, that is, for large numbers (Siegel, 1956), whereas in our experiment, the sum of both response alternatives is 20. However, we used the runs test not as a statistical test for randomness, but rather as a criterion for differential reinforcement. Thus, the issue becomes whether or not our conclusions about the effects of contingency are reliable in this context. To assess this, we relied upon a nonparametric method, for which Siegel (1956) and Swed and Eisenhart (1943) prepared tables of expected runs based on small samples. These tables provided appropriate critical values in the case of small samples. Thus, if we compare data in Figure 1, calculated from Equation 3, with test-score statistics for this nonparametric test, the latter decreases the risk of Type 1 error (i.e., rejecting a true null hypothesis of no dependency), whereas it increases the risk of Type 2 error. In other words, our use penalizes Type 2 errors more than predicted by the nonparametric test tables. In effect, this means we may have imposed a more severe criterion than required by the runs test. This possibility does not present a problem for our conclusions. Rather we note that the procedure for differential reinforcement requires a sample size that is not so large as to dilute the differential nature of the contingency (Alleman & Platt, 1973; Galbicka, 1988, 1994).

Our procedure involved an interlocking schedule with which two experimental dimensions (K, response proportion) are related. In previous investigations, either the proportion of responses to an alternative, or the number of runs, has been used as the basis for differential reinforcement (Bryant & Church, 1974; Machado, 1997; Neuringer, 1986). By contrast, we attempted to combine these dimensions and to contrive a procedure of differential reinforcement for sequential dependencies. It was different from differential reinforcement of response alternatives with lower frequency in that it permits one response alternative to have high frequency. However, performance approached an equiprobable state and some subjects performed randomly in Experiment 2. Such findings suggest there may be various procedures that will yield highly variable or random behavior. If so, it remains to be determined what the necessary and sufficient conditions are for producing this behavior.

We note two different views on reinforced sequential dependencies, according to different epistemological attitudes, that is, molar and molecular. From the molar standpoint, molar behavioral phenomena, say, allocations of behavior, response rates, and behavioral variability, are regarded as individuals or concrete particulars, as species were (Baum, 2002; Glenn & Field, 1994). From the molecular standpoint, such phenomena are regarded as abstractions or derived things. Glenn (2003) discussed them from the analogy of organic evolutionary theory, in which Maynard Smith (1994) characterized the increases in complexity during evolution of the organic world as resulting from a succession of processes that became possible only when a previous level of complexity had been reached. With behavior, complex behavioral phenomena are regarded as a result of repeated rounds of selection acting on phenomena resulting from earlier rounds of selection. If we regard the phenomena as derived things, we would seek the cause of variation of the behavioral variability in earlier rounds of selection. On the other hand, if we regard them as concrete particulars, we would focus on the effect of the behavioral phenomena at the higher-complexity level. With behavioral variability, Machado (1992, 1997) claimed that dispersion of response alternatives might have been a derivative of more fundamental processes. This claim is reasonable because the process of differential reinforcement of response alternatives with lower frequency produced the behavioral variability. On the other hand, some researchers focused on the effect of variation and repetition as a concrete particular in choice, delayed reinforcement, resistance to change, and so on (Abreu-Rodrigues et al., 2005; Doughty & Lattal, 2001; Neuringer, 1992; Odum, et al, 2006; Wagner & Neuringer, 2006). These studies also bring some fruitful knowledge. Whereas our experiment showed the runs-test contingency effects on sequential dependencies, studies that reveal the effect of sequential patterns on complex behavioral phenomena remain for the future.

Acknowledgments

This research was supported by grants from Japan Society for the Promotion of Science. We thank Anthony McLean for invaluable suggestions and great editorial effort, Takeharu Igaki for great technical assistance, Taku Ishii and Takayuki Tanno for helpful discussions, and Allen Neuringer, Armando Machado, and Alan Silberberg for editorial help.

REFERENCES

  • Abreu-Rodrigues J, Lattal K.A, dos Santos C.V, Matos R.A. Variation, repetition, and choice. Journal of the Experimental Analysis of Behavior. 2005;83:147–168. [PMC free article] [PubMed]
  • Alleman H.D, Platt J.R. Differential reinforcement of interresponse times with controlled probability of reinforcement per response. Learning and Motivation. 1973;4:40–73.
  • Attneave F. New York: Holt, Rinehart & Winston; 1959. Applications of information theory to psychology.
  • Baum W.M. From molecular to molar: A paradigm shift in behavior analysis. Journal of the Experimental Analysis of Behavior. 2002;78:95–116. [PMC free article] [PubMed]
  • Blough D.S. The reinforcement of least-frequent interresponse times. Journal of the Experimental Analysis of Behavior. 1966;9:581–591. [PMC free article] [PubMed]
  • Boren J.J, Moerschbaecher J.M, Whyte A.A. Variability of response location on fixed-ratio and fixed-interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior. 1978;30:63–67. [PMC free article] [PubMed]
  • Bryant D, Church R.M.G. The determinants of random choice. Animal Learning & Behavior. 1974;2:245–248.
  • Chatfield C. Inference regarding Markov chain models. Journal of Royal Statistical Society, Series C (Applied Statistics) 1973;22:7–20.
  • Chatfield C, Lemon R.E. Analyzing sequences of behavioral events. Journal of Theoretical Biology. 1970;29:427–445. [PubMed]
  • Cherot C, Jones A, Neuringer A. Reinforced variability decreases with approach to reinforcers. Journal of Experimental Psychology: Animal Behavior Processes. 1996;22:497–508. [PubMed]
  • Cohen L, Neuringer A, Rhodes D. Effects of ethanol on reinforced variations and repetitions by rats under a multiple schedule. Journal of the Experimental Analysis of Behavior. 1990;54:1–12. [PMC free article] [PubMed]
  • Denney J, Neuringer A. Behavioral variability is controlled by discriminative stimuli. Animal Learning & Behavior. 1998;26:154–162.
  • Doughty A.D, Lattal K.A. Resistance to change of operant variation and repetition. Journal of the Experimental Analysis of Behavior. 2001;76:195–215. [PMC free article] [PubMed]
  • Galbicka G. Differentiating the behavior of organisms. Journal of the Experimental Analysis of Behavior. 1988;50:343–354. [PMC free article] [PubMed]
  • Galbicka G. Shaping in the 21st century: Moving percentile schedules into applied settings. Journal of Applied Behavior Analysis. 1994;27:739–760. [PMC free article] [PubMed]
  • Gharib A, Derby S, Roberts S. Timing and the control of variation. Journal of Experimental Psychology: Animal Behavior Processes. 2001;27:165–178. [PubMed]
  • Gharib A, Gade C, Roberts S. Control of variation by reward probability. Journal of Experimental Psychology: Animal Behavior Processes. 2004;30:271–282. [PubMed]
  • Glenn S.S. Operant contingencies and the origin of cultures. In: Lattal K.A, Chase P.N, editors. Behavior theory and philosophy. New York: Kluwer Academic/Plenum Publishers; 2003. pp. 103–128.
  • Glenn S.S, Field D.P. Functions of the environment in behavioral evolution. The Behavior Analyst. 1994;17:241–259. [PMC free article] [PubMed]
  • Grunow A, Neuringer A. Learning to vary and varying to learn. Psychonomic Bulletin & Review. 2002;9:250–258. [PubMed]
  • Jensen G, Miller C, Neuringer A. Truly random operant responding: Results and reasons. In: Wasserman E, Zentall T.R, editors. Comparative Cognition. New York: Oxford University Press; 2006. pp. 459–480.
  • Machado A. Operant conditioning of behavioral variability using a percentile reinforcement schedule. Journal of the Experimental Analysis of Behavior. 1989;52:155–166. [PMC free article] [PubMed]
  • Machado A. Behavioral variability and frequency-dependent selection. Journal of the Experimental Analysis of Behavior. 1992;58:241–263. [PMC free article] [PubMed]
  • Machado A. Learning variable and stereotypical sequences of responses: some data and a new model. Behavioral Processes. 1993;30:103–130. [PubMed]
  • Machado A. Polymorphic response patterns under frequency-dependent selection. Animal Learning & Behavior. 1994;22:53–71.
  • Machado A. Increasing the variability of response sequences in pigeons by adjusting the frequency of switching between two keys. Journal of the Experimental Analysis of Behavior. 1997;68:1–25. [PMC free article] [PubMed]
  • Maynard Smith J. The major transitions in evolution. In: Cowan G, Pines D, Meltzer D, editors. Complexity: metaphors, models, and reality. Reading, MA: Addison-Wesley; 1994. SFI Studies in the Sciences of Complexity, Proc. Vol. 19.
  • McElroy E, Neuringer A. Effects of alcohol on reinforced repetitions and reinforced variations in rats. Psychopharmacology. 1990;102:49–55. [PubMed]
  • Miller G.A, Frick F.C. Statistical behavioristics and sequences of responses. Psychological Review. 1949;56:311–324. [PubMed]
  • Miller N, Neuringer A. Reinforcing variability in adolescents with autism. Journal of Applied Behavior Analysis. 2000;33:151–165. [PMC free article] [PubMed]
  • Mook D.M, Jeffrey J, Neuringer A. Spontaneously hypertensive rats (SHR) readily learn to vary but not repeat instrumental responses. Behavioral and Neural Biology. 1993;59:126–135. [PubMed]
  • Morgan L, Neuringer A. Behavioral variability as a function of response topography and reinforcement contingency. Animal Learning & Behavior. 1990;18(3):257–263.
  • Morris C.J. The operant conditioning of response variability: free-operant versus discrete-response procedures. Journal of the Experimental Analysis of Behavior. 1987;47:273–277. [PMC free article] [PubMed]
  • Morris C.J. The effects of lag value on the operant control of response variability under free-operant and discrete-response procedures. The Psychological Record. 1989;39:263–270.
  • Morris C.J. The effects of satiation on the operant control of response variability. The Psychological Record. 1990;40:105–112.
  • Neuringer A. Can people behave “randomly”?: The role of feedback. Journal of Experimental Psychology: General. 1986;115:62–75.
  • Neuringer A. Operant variability and repetition as functions of interresponse time. Journal of Experimental Psychology: Animal Behavior Processes. 1991;17:3–12.
  • Neuringer A. Choosing to vary and repeat. Psychological Science. 1992;3:246–250.
  • Neuringer A. Reinforced variation and selection. Animal Learning & Behavior. 1993;21:83–91.
  • Neuringer A. Operant variability: evidence, functions, and theory. Psychonomic Bulletin & Review. 2002;9:672–705. [PubMed]
  • Neuringer A, Deiss C, Imig S. Comparing choices and variations in people and rats: two teaching experiments. Behavior Research Methods, Instruments, & Computers. 2000;32:407–416. [PubMed]
  • Neuringer A, Huntley R.W. Reinforced variability in rats: effects of gender, age and contingency. Physiology & Behavior. 1991;51:145–149. [PubMed]
  • Nickerson R.S. The production and perception of randomness. Psychological Review. 2002;109:330–357. [PubMed]
  • Odum A.L, Ward R.D, Barnes C.A, Burke K.A. The effects of delayed reinforcement on variability and repetition of response sequences. Journal of the Experimental Analysis of Behavior. 2006;86:159–179. [PMC free article] [PubMed]
  • Page S, Neuringer A. Variability is an operant. Journal of Experimental Psychology: Animal Behavior Processes. 1985;11:429–452.
  • Pincus S, Singer B.H. Randomness and degree of irregularity. Proceedings of the National Academy of Sciences of the United States of America. 1996;93:2083–2088. [PubMed]
  • Pryor K.W, Haag R, O'Reilly J. The creative porpoise: training for novel behavior. Journal of the Experimental Analysis of Behavior. 1969;12:653–661. [PMC free article] [PubMed]
  • Schoenfeld W.N, Harris A.H, Farmer J. Conditioning response variability. Psychological Reports. 1966;19:551–557. [PubMed]
  • Shimp C.P. Reinforcement of least-frequent sequences of choices. Journal of the Experimental Analysis of Behavior. 1967;10:57–65. [PMC free article] [PubMed]
  • Siegel S. Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill Book Company; 1956.
  • Stubbs D.A, Pliskoff S.S. Concurrent responding with fixed relative rate of reinforcement. Journal of the Experimental Analysis of Behavior. 1969;12:887–895. [PMC free article] [PubMed]
  • Swed F.S, Eisenhart C. Tables for testing randomness of grouping in a sequence of alternatives. The Annals of Mathematical Statistics. 1943;14:66–87.
  • Tatham T.A, Wanchisen B.A, Hineline P.N. Effects of fixed and variable ratios on human behavioral variability. Journal of the Experimental Analysis of Behavior. 1993;59:349–359. [PMC free article] [PubMed]
  • van Hest A, van Haaren F, van De Poll N.E. Operant conditioning of response variability in male and female Wistar rats. Physiology & Behavior. 1989;45:551–555. [PubMed]
  • Wagner K, Neuringer A. Operant variability when reinforcement is delayed. Learning & Behavior. 2006;34:111–123. [PubMed]

Articles from Journal of the Experimental Analysis of Behavior are provided here courtesy of Society for the Experimental Analysis of Behavior