PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
 
J Exp Anal Behav. 2010 March; 93(2): 147–155.
PMCID: PMC2831654

Dual Effects on Choice of Conditioned Reinforcement Frequency and Conditioned Reinforcement Value

Abstract

Pigeons were presented with a concurrent-chains schedule in which the total time to primary reinforcement was equated for the two alternatives (VI 30 s VI 60 s vs. VI 60 s VI 30 s). In one set of conditions, the terminal links were signaled by the same stimulus, and in another set of conditions they were signaled by different stimuli. Choice was in favor of the shorter terminal link when the terminal links were differentially signaled but in favor of the shorter initial link (and longer terminal link) when the terminal links shared the same stimulus. Preference reversed regularly with reversals of the stimulus condition and was unrelated to the discrimination between the two terminal links during the nondifferential stimulus condition. The present results suggest that the relative value of the terminal-link stimuli and the relative rate of conditioned reinforcer presentation are important influences on choice behavior, and that models of conditioned reinforcement need to include both factors.

Keywords: conditioned reinforcement, choice, concurrent chains, key peck, pigeon

The concurrent-chains procedure has been used extensively to study choice between delayed reinforcers. During the initial link of this procedure, two (or more) choice options are presented. Selection of one or the other option leads to its terminal link, and ultimately food, while selection of the alternative option leads to a different mutually exclusive terminal link. The most common variation of the procedure is with identical initial-link schedules for the two chains while schedules in the terminal links are varied. Two theoretically influential findings are: (1) When the initial-link schedules are equal, shorter initial links produce a greater preference for the higher-valued terminal link than when the initial links are longer; and (2) When the ratio of the two terminal-link schedules is constant, greater preference occurs with longer terminal links. These findings have been critically important for the delay-reduction theory, which posits that choice responding in the initial link is determined by the relative improvement in time to reinforcement signaled by the terminal-link onsets (Fantino, 1969), as shown in Equation 1. BL and BR represent the number of responses to the left and right initial links, T represents the expected time to primary reinforcement from the onset of the initial links, and tL and tR represent the average durations of the left and right terminal links.

equation image
1

Less frequently studied have been concurrent-chains schedules with unequal initial links. To the extent that choice behavior is determined by the conditioned reinforcement values of the different terminal-link stimuli, the choice proportion in the initial link presumably should be a function not only of the relative values of the terminal-link stimuli, but also of the relative frequency with which the terminal-link stimuli are presented. That is, if conditioned reinforcers serve as substitutes for primary reinforcers, preference should be increased by more frequent presentation of conditioned reinforcers, just as preference is increased by more frequent presentation of primary reinforcers (see Shahan, Podlesnik, & Jimenez-Gomez, 2006, for supporting evidence). Yet delay reduction theory has no representation of terminal-link presentation rate. As shown in Equation 2, Squires and Fantino (1971) included terms rL and rR for the rate of primary reinforcement provided by the left and right keys, respectively.

equation image
2

It is important to note that the rate of primary reinforcement is correlated with the rate of conditioned reinforcement in many procedures (e.g., Fantino and Davison, 1983). In contrast, more recent models of concurrent chains performance, Grace's (1994) contextual choice model (CCM, Equation 3) and Mazur's (2001) hyperbolic value-added model (Equation 4), do include a role for terminal-link presentation rates in addition to a representation of the relative values of the terminal links:

equation image
3

In Grace's model, the ratio of behavior (BL and BR) to the left and right initial-link schedules is a function of bias (b), the average initial-link intervals between conditioned reinforcer presentations (r1L and r1R), the average terminal-link delays to primary reinforcement (r2L and r2R), sensitivities to initial-link conditioned reinforcement and terminal-link primary reinforcement rates (a1 and a2), and the ratio of the average terminal-link and initial-link durations (Tt and Ti). Mazur's model is similar to CCM, except that the right parenthetical expression represents the process of value addition, which includes a sensitivity parameter (at), and the hyperbolic value for the initial and terminal links (Vt1, Vt2, and Vi).

equation image
4

Williams and Dunn (1991) provided evidence that choice between two alternatives is determined by the relative frequency of conditioned reinforcer presentations, independent of any differences in rate of primary reinforcement. Their procedure differed from conventional concurrent-chains experiments in that the terminal links for both chains were correlated with the same stimulus, in order to eliminate differences in the strength of conditioned reinforcement. Pigeons were presented with two choice alternatives, both of which led to a common center-key stimulus to which responding was reinforced according to VI 2-min food schedules. Thus, the rate of primary reinforcement, as well as the strength of conditioned reinforcement, was equated for the two choice alternatives. The critical manipulation involved superimposing additional terminal-link presentations on the two alternatives. These additional presentations occurred on a VI 30-s schedule, 80% of which were assigned to one choice alternative and 20% to the other. No food was presented following these additional presentations of the terminal-link stimulus. The choice alternative assigned the higher frequency of the additional extinction periods was reversed across conditions. Preference was strongly in favor of the choice alternative leading to the more frequent nonreinforced presentations of the terminal link stimulus, and tracked the rate of conditioned reinforcers when the contingencies for the two alternatives were reversed. In a control condition in which the additional nonreinforced stimulus presentations were associated with a different stimulus, there was no effect on preference. Thus, Williams and Dunn's results demonstrate that the frequency of conditioned reinforcement can influence choice behavior similarly to the effect of the frequency of primary reinforcement.

Mazur (1999) replicated the preference for the more frequent terminal-link presentations but observed that the preference diminished over the course of training. However, Mazur's procedure differed from that of Williams and Dunn (1991) in terms of the duration of the terminal links. In comparison to 20-s terminal links of Williams and Dunn, Mazur's terminal links were either 5 s or 15 s in duration. The duration of the terminal link is potentially important because the terminal links for the two chains, while having the same nominal stimulus, potentially could be conditionally discriminated based on which choice alternative produced them. With longer terminal links the possibility of conditional discrimination should be less likely because the peck producing terminal-link entry (the conditional cue) is temporally more separated from the trial outcome.

When differential stimuli are correlated with the separate terminal links, additional nonreinforced presentations of the terminal-link stimuli have the opposite effect. Dunn, Williams, and Royalty (1987) presented pigeons with a conventional concurrent-chains schedule in which the terminal links were equal, differentially signaled FI 15-s food schedules. The critical manipulation was the addition of extra presentations of one of the terminal links, which were associated with extinction. In some conditions the additional extinction periods were contingent on the choice response usually leading to that terminal-link stimulus; in other conditions the additional extinction periods were presented independently of responding. In both cases, preference for the choice alternative leading to the “devalued” terminal-link stimulus was substantially reduced.

The results of Dunn et al. (1987) imply that manipulations of terminal-link value can override differences in terminal-link frequency. That is, when the additional extinction presentations were contingent on the choice response, many more terminal-link presentations occurred for that choice alternative, yet preference was in the opposite direction. This presumably was because the terminal-link stimulus associated with the additional extinction presentations had diminished value, in a manner similar to it being associated with a longer time to reinforcement. In contrast, the results of Williams and Dunn (1991) demonstrate that when the same stimulus is associated with the two terminal links, preference is increased by more frequent response-contingent stimulus presentations, despite the fact that these additional presentations substantially increase the time to reinforcement for that choice alternative.

The studies just described imply that choice in concurrent chains is a function of both the frequency of terminal-link presentation, and also of the relative value of the different terminal links, and that these two variables potentially compete or interact. Fantino (1969) reported an early example of their competition. Pigeons chose between concurrent chains in which the total time to reinforcement was the same for both chains. One chain had VI 30-s in the initial link and VI 90-s in the terminal link; the alternative chain had VI 90-s in the initial link and VI 30-s in the terminal link. Despite the equivalent total time values for the two chains, preference was strongly in favor of the chain with the shorter terminal link, despite it being presented only one-third as often. Thus, as with Dunn et al. (1987), the relative value of the terminal links was substantially more potent than their relative frequency of presentation.

The present study is a variation of the procedure used by Fantino (1969). Here we replicated his procedure (with slightly different schedule parameters), but we also investigated a second condition in which the same stimulus was used in both terminal links. Given that relative frequency of terminal-link presentations determines preference when the same stimulus occurred for all terminal links (Williams & Dunn, 1991), the issue is whether preference is reversed by changing the terminal-link stimuli from differential to nondifferential, while all of the other schedule parameters remain unchanged. When the same terminal-link stimulus is used for both chains, the delay-reduction value of the onsets of the two terminal links should be similar. Similarly, given that the time to reinforcement was equated for the two chains, the relative rate of primary reinforcement was also equal for the two chains. Preference reversal when nondifferential stimuli are used would thus require delay reduction theory to be modified to include a role for the relative frequency of the conditioned reinforcers signaling the different terminal links.

In addition to testing different accounts of concurrent-chains choice behavior, the present procedure has important implications for timing-based accounts of choice behavior derived from scalar expectancy theory (Gibbon, Church, Fairhurst, & Kacelnik, 1988). Like delay reduction theory, such theories provide no role for frequency of conditioned reinforcement, in part because timing theories eschew any role for the concept of conditioned value and other associative concepts (Gallistel & Gibbon, 2000). Thus, choice ostensibly should be solely a function of the temporal parameters of the procedure. If preference is reversed by changing differentially-signaled terminal links into nondifferentially signaled terminal links, while all temporal parameters are held constant, such timing-based accounts of choice behavior would be significantly challenged.

METHOD

Subjects

Four adult pigeons served as subjects. Each had previously participated in experimental research, but had no prior experience with concurrent-chains procedures. The pigeons were maintained at approximately 85% of their free-feeding weights with mixed grain that was available during, and when necessary, following experimental sessions. The pigeons were housed in individual cages under a 12-hr light/dark cycle, with water and grit freely available.

Apparatus

Four experimental chambers (approximately 360 mm wide, 320 mm long, and 350 mm high) were used. Three translucent response keys, 25 mm in diameter, were mounted on the front intelligence panel 260 mm above the floor and 72.5 mm apart. Each key required a force of approximately 0.15 N to operate and could be illuminated from the rear by standard IEE 28-V 12-stimulus projectors. A 28-V 1-W miniature lamp, located 87.5 mm above the center response key, provided general chamber illumination. Directly below the center key and 9.5 cm above the floor was an opening (57 mm high by 50 mm wide) that provided access to a solenoid-operated grain hopper. When activated, the hopper remained raised for 4 s, during which time it was illuminated from above with white light by a 28-V 1-W miniature lamp. A speaker mounted above the center of the ceiling provided continuous white noise throughout the experimental sessions. A computer and MED-PC® interface, located in an adjacent room, controlled experimental events.

Procedure

Pigeons were presented with concurrent chain schedules with variable-interval (VI) components. One chain (referred to as the short–long chain) was a VI 30-s VI 60-s schedule, and the other (referred to as the long–short chain) was a VI 60-s VI 30-s schedule. All VI values were selected from a Fleshler and Hoffman (1962) progression of 10 intervals. Interval values for each trial were randomly selected, with the limitation that all values were used before they all again became available for selection.

During the initial links, red and white keylights were simultaneously presented on the left and right response keys. The initial-link schedules operated independently. Stimulus location was randomly determined, with the constraint that each stimulus appeared twice on each side key within each block of four trials. Table 1 shows the stimulus assignments for each subject. Once a choice response was effective, the side keys were darkened, a terminal-link stimulus was presented on the center key, and a terminal-link VI schedule was initiated. When a response satisfied the terminal-link schedule, reinforcement consisted of 4-s access to mixed grain, and was followed immediately by the next trial. Sessions continued for 90 min or until 50 reinforcers were delivered, whichever occurred first.

Table 1
Stimulus Assignments for the Conditions with Differential Terminal-Link Stimuli.

Signal contingencies

Conditions differed depending on whether the two chains shared the same terminal-link stimulus (nondifferential conditions) or provided different stimuli (differential conditions). During the nondifferential stimulus conditions, the terminal-link stimulus for both chains consisted of the illumination of a triangle on the center response key. During differential stimulus conditions, the terminal-link components were signaled by either a circle or a plus sign on the center response key (as shown in Table 1). A total of four conditions were conducted, with the nondifferential contingency in effect during the first and third conditions, and differential contingency in effect during the second and fourth conditions. The first condition continued for 20 sessions, and the others continued for 24 sessions.

RESULTS

The data for each subject, for the last five sessions of each condition, are presented in detail in Table 2. Mean choice proportions were always calculated by first determining the choice proportion for each session for each pigeon, and then averaging across sessions. The overall mean response proportions in favor of the VI 30-s VI 60-s chain, averaged across the two presentations of each condition, were .66 and .46, for the nondifferential and differential conditions, respectively. A paired t-test performed on the response proportion data (the data from the two presentations of each condition were averaged together) indicated that preference was significantly higher with nondifferential terminal links than with differential terminal links, t (1,3)  = 3.85, p < .05.

Table 2
Mean Results from the Last Five Sessions of Each Condition.

Figure 1 shows the changes in choice proportions across blocks of four sessions for each of the four conditions. Choice proportions for 3 pigeons (143, 773, and 772) were very orderly, as preference for the VI 30-s VI 60-s chain increased regularly during the nondifferential stimulus conditions, and decreased regularly during the differential signal conditions. For the remaining subject (6336), choice was in favor of the VI 30-s VI 60-s chain in the initial nondifferential stimulus condition, and then changed minimally when switched to the differential stimuli in the second condition. However, choice of the VI 30-s VI 60-s chain did increase upon replication of the nondifferential stimulus conditions and decreased during replication of the differential stimulus condition. Preference for the VI 30-s VI 60-s chain was generally higher for subject 6336 regardless of the stimulus condition, presumably reflecting a color bias in the initial link. For birds 143, 773, and 772, preference was in favor of the chain VI 30-s VI 60-s in the nondifferential stimulus conditions and in favor of the chain VI 60-s VI 30-s in the differential stimulus conditions.

Fig 1
Response proportions for the VI 30 s VI 60 s chain schedule, averaged across blocks of four sessions, for each subject in the nondifferential and differential conditions. The terminal-link stimuli were the same for the two chains during ...

Figure 2 shows the response rates during the terminal links, in order to assess the degree of discrimination between the two terminal-link schedules, and the relationship between the degree of discrimination and preference in the initial links (by comparing Figures 1 and and2).2). In general, discrimination in the nondifferential stimulus condition did not occur or was very weak. Pigeon 773 did show some discrimination in both replications of the nondifferential condition, and pigeon 772 showed some discrimination in the initial exposure to the nondifferential condition but none during the replication. There appears to be no consistent relation between initial-link preference and terminal-link discrimination.

Fig 2
Responses per minute during the terminal link of each chain schedule, averaged across blocks of four sessions, for each subject.

Discrimination between the terminal links in the differential stimulus condition was highly variable across sessions. Pigeons 773 and 6336 showed clear discrimination while the discrimination of pigeons 143 and 772 changed across sessions in irregular ways. There was again no clear relationship between initial-link preference and terminal-link discrimination.

DISCUSSION

The present study replicated the results of Fantino (1969) using the standard concurrent chains procedure in which the different terminal links are differentially signaled. Preference was in favor of the shorter terminal link despite the fact that the total times to reinforcement, summed over the initial and terminal links, were constant for the two choice alternatives. However, the choice proportions reported here were substantially less extreme than those reported by Fantino. Several factors may have contributed to the smaller degree of preference seen here: the possible existence of color preferences, the fact that our terminal link schedule values (30 s vs. 60 s) were less extreme than Fantino's (30 s vs. 90 s), and our use of different colors (with location varied) as initial-link stimuli, rather than the usual procedure in which the choice alternatives are presented in different spatial positions.

Unlike Mazur (1999), in the present study preference for the shorter initial-link alternative during the nondifferential stimulus condition did not decrease over training, but in fact progressively increased. This progressive increase occurred even when there was some degree of discrimination between the terminal-link schedules (e.g., Pigeon 772 in the first condition; Pigeon 773 in both nondifferential stimulus conditions). This conflict in the pattern of preference possibly can be explained by procedural differences, as Mazur (1999) used FT schedules in the terminal links and the durations of the FT schedules were relatively short (2, 5, or 15 s). Mazur (1999) also reported a greater degree of discrimination between the terminal links during his nondifferential stimulus conditions. Thus, it seems plausible that preference for the shorter initial-link alternative would be weakened to the extent the terminal-link schedules could be discriminated during the nondifferential stimulus condition. That discrimination between the terminal-link schedules is possible even in the nondifferential signal conditions implies only that the nominally common stimulus has been differentiated by the conditional cue of the last choice response, so that in fact the nondifferential stimulus conditions have been converted, at least partially, into a differential stimulus condition. The critical finding for theoretical purposes is the pattern of preference when discrimination between the terminal-link schedules does not occur.

The most important feature of the results is the effect of switching between the nondifferential and differential signals during the two terminal links. For 3 of the 4 subjects, preference reversed in an orderly fashion as a function of the change in the stimulus conditions, and the 4th subject showed the same general pattern of change seemingly superimposed on a strong stimulus preference. The issue is how the strong effect of the stimulus contingencies is best interpreted. If the terminal-link stimuli are viewed as conditioned reinforcers, the interpretation is straightforward: In the differential signal condition, the two terminal links had different conditioned reinforcement values, and the terminal link with the greater value was preferred, despite the alternative terminal link being presented more frequently. Hence, the differential values of the terminal link dominated their differential frequencies of presentation, as was the case in Dunn et al. (1987). During the nondifferential signal conditions, using the same stimulus during both terminal links effectively eliminated discrimination between the two terminal-link schedules, so that there was no difference in terminal-link value. The relative frequency of conditioned reinforcement then dominated, and the initial link with the shorter VI schedule was preferred, consistent with the results of Williams and Dunn (1991). Taken together, these results suggest that any adequate account of choice in concurrent chain schedules must consider the trade-off between the relative value of the terminal-link stimuli and their relative frequency of presentation.

The importance of the relative frequency of terminal-link presentations has important implications for the various quantitative models of choice. Delay reduction theory is most obviously challenged, as frequency of conditioned reinforcement has not been included in its various formulations, in part because relative frequency of primary reinforcement is highly correlated with frequency of conditioned reinforcement in most situations. However, this is not the case for the present experiment because the frequency of primary reinforcement was equated for the two choice alternatives. Recently, Fantino and Romanowich (2007) concluded that models of concurrent-chains behavior do not need to include a term for the rate of conditioned reinforcement. However, in addition to conflicting with the present results, this view ignores the results from previous research in which the strength of conditioned reinforcement is equated through the use of the same terminal-link stimulus in the concurrent-chains procedure (Williams & Dunn, 1991; Mazur, 1999), as well as results from a procedure in which observing responses were used to assess the effects of rate of conditioned reinforcement on choice (Shahan, Podlesnik, & Jimenez-Gomez, 2006). Fantino and Romanowich did cite both Williams and Dunn (1991) and Mazur (1999), but focused on Mazur's finding of a relationship between the degree of terminal-link discrimination and the level of preference. We do not agree that this is a valid basis for ignoring the robust effects of conditioned reinforcement frequency seen here and in previous studies in which the same stimulus was paired with both terminal links.

It is important to note that including a role for conditioned reinforcement frequency would not meaningfully violate the fundamental tenet of delay-reduction theory—that conditioned reinforcement value is determined by the reduction in time to reinforcement rather than the absolute time to reinforcement. One such formulation was identified by Fantino, Preston, and Dunn (1993), in which the delay-reduction kernel would be multiplied by the rates of conditioned reinforcement.

Both the contextual choice model (CCM) of Grace (1994) and the hyperbolic value-added (HVA) model of Mazur (2001) do include roles for the frequency of conditioned reinforcement and presumably could explain the present findings. However, an adequate test of these models would require more parameter values than we included in the present study. The major uncertainty for both models is how they would encompass the preference for the alternative with the shorter initial link during the nondifferentially signaled conditions. For CCM this presumably would be via the sensitivity parameter for the ratio of the terminal links, the limiting case being when the parameter value is zero. Equation 3 would thus reduce to the ratio of entries into the terminal links. For Mazur's HVA model, the sensitivity parameters (at) in the right side of the equation would also go to zero, in which case the ratio of the values of the two terminal links would be multiplied by the relative rate of entry into the terminal links. But here it would be necessary to average the time values across both terminal links, rather than use their separate times to reinforcement.

It is more difficult to conceive how the present pattern of results could be interpreted without invoking conditioned reinforcement as an explanatory concept. Timing models of choice (Gibbon et al., 1988; Gallistel & Gibbon, 2000) eschew the concept of conditioned value, although to be fair, they as yet have not addressed the concurrent-chains procedure, where the concept of conditioned reinforcement seems more essential than for concurrent schedules. It is not for us to anticipate how timing theory might approach choice in concurrent chains, but the simplest version of a timing theory seems to be excluded. Given that the total times to reinforcement were equal for the two choice alternatives, the total time to reinforcement for the two chains, measured from their onset, cannot be the major determinant of choice. Indeed, Williams and Dunn (1991) demonstrated that pigeons strongly preferred the choice alternative associated with the longest time to reinforcement. Thus, separate timing processes would seemingly be required for the initial and terminal links of the separate chains. It is unclear how the different intervals would be combined, and how a combination of timed intervals might explain the preference reversal that resulted from the change between the differential versus nondifferential stimulus conditions.

The fundamental shortcoming of timing theories is that they provide no clear formulation of stimulus function, as their tacit assumption has been that stimuli serve only as discriminative stimuli for upcoming times to reinforcement. Something beyond discriminative functions is clearly needed to explain the present data and the other studies here cited. Various other empirical findings further illustrate the importance of stimulus functions. For example, when each terminal link in a concurrent-chains schedule includes two different schedules that are alternately presented, terminal links in which the different schedule values are differentially signaled are strongly preferred over terminal links in which the different schedule values are not differentially signaled (e.g., Hursh & Fantino, 1974). Until timing theories address the complexities of stimulus control, they cannot qualify as a comprehensive account of behavior.

Acknowledgments

We thank Bethany Grove, Dan Coons, Dan Bryden, and Anna Nydick for their assistance with data collection.

REFERENCES

  • Dunn R, Williams B, Royalty P. Devaluation of stimuli contingent on choice: Evidence for conditioned reinforcement. Journal of the Experimental Analysis of Behavior. 1987;48:117–131. [PMC free article] [PubMed]
  • Fantino E. Choice and the rate of reinforcement. Journal of the Experimental Analysis of Behavior. 1969;12:723–730. [PMC free article] [PubMed]
  • Fantino E, Davison M. Choice: Some quantitative relations. Journal of the Experimental Analysis of Behavior. 1983;40:1–13. [PMC free article] [PubMed]
  • Fantino E, Preston R.A, Dunn R. Delay reduction: Current status. Journal of the Experimental Analysis of Behavior. 1993;60:159–169. [PMC free article] [PubMed]
  • Fantino E, Romanowich P. The effect of conditioned reinforcement rate on choice: A review. Journal of the Experimental Analysis of Behavior. 2007;87:409–421. [PMC free article] [PubMed]
  • Fleshler M, Hoffman H.S. A progression for generating variable-interval schedules. Journal of the Experimental Analysis of Behavior. 1962;5:529–530. [PMC free article] [PubMed]
  • Gallistel C.R, Gibbon J. Time, rate, and conditioning. Psychological Review. 2000;107:289–344. [PubMed]
  • Gibbon J, Church R.M, Fairhurst S, Kacelnik A. Scalar expectancy theory and choice between delayed rewards. Psychological Review. 1988;95:102–114. [PubMed]
  • Grace R. A contextual model of concurrent-chains choice. Journal of the Experimental Analysis of Behavior. 1994;61:113–129. [PMC free article] [PubMed]
  • Hursh S, Fantino E. An appraisal of preference for multiple versus mixed schedules. Journal of the Experimental Analysis of Behavior. 1974;22:31–38. [PMC free article] [PubMed]
  • Mazur J.E. Preferences for and against stimuli paired with food. Journal of the Experimental Analysis of Behavior. 1999;72:21–32. [PMC free article] [PubMed]
  • Mazur J.E. Hyperbolic value addition and general models of animal choice. Psychological Review. 2001;108:96–112. [PubMed]
  • Shahan T.A, Podlesnik C.A, Jimenez-Gomez C. Matching and conditioned reinforcement rate. Journal of the Experimental Analysis of Behavior. 2006;85:167–180. [PMC free article] [PubMed]
  • Squires N, Fantino E. A model for choice in simple concurrent and concurrent-chains schedules. Journal of the Experimental Analysis of Behavior. 1971;15:27–38. [PMC free article] [PubMed]
  • Williams B.A, Dunn R. Preference for conditioned reinforcement. Journal of the Experimental Analysis of Behavior. 1991;55:37–46. [PMC free article] [PubMed]

Articles from Journal of the Experimental Analysis of Behavior are provided here courtesy of Society for the Experimental Analysis of Behavior