|Home | About | Journals | Submit | Contact Us | Français|
Dynamical models based on three steady-state equations for the law of effect were constructed under the assumption that behavior changes in proportion to the difference between current behavior and the equilibrium implied by current reinforcer rates. A comparison of dynamical models showed that a model based on Navakatikyan's (2007) two-component functions law-of-effect equations performed better than models based on Herrnstein's (1970) and Davison and Hunter's (1976) equations. Navakatikyan's model successfully described the behavioral dynamics in schedules with negative-slope feedback functions, concurrent variable-ratio schedules, Vaughan's (1981) melioration experiment, and experiments that arranged equal, and constant-ratio unequal, local reinforcer rates.
Once behavior stabilizes, we say it has reached a steady state, or equilibrium. When the environment changes, behavior undergoes dynamical changes in time as it moves toward a new steady state. Most of the existing quantitative laws of effect describe how behavior relates to reinforcers at the steady state. The purpose of this article is to assess whether dynamical models of data from experiments that have studied how behavior changes over time can help us to choose among competing steady-state equations of the law of effect (LOE).
Navakatikyan (2007) proposed a component-functions model of choice behavior as an alternative to Herrnstein's (1970) quantitative law of effect. The predictions of the model compared favorably with molar models based on equations offered by Davison and Hunter (1976), McDowell (1986), Stevens (1957) and Herrnstein (1970) in describing residence-time data in interdependent concurrent variable-interval (VI) VI schedules (Alsop & Elliffe, 1988; Elliffe & Alsop, 1996). Navakatikyan's model described the way that the generalized matching law sensitivity parameter a (Baum, 1974) changed as a function of overall reinforcer rate (Alsop & Elliffe; Elliffe & Alsop). One of features of the model is that it allows for matching, undermatching, and overmatching in the same subject to occur as a by-product of orderly changes in the absolute values of residence time.
In present article, we continue to explore Navakatikyan's (2007) LOE equations by applying them to some dynamical data that, in our view, have been insufficiently modeled, or which allow for alternative interpretations. We will compare the dynamical models based on the Navakatikyan's LOE equations with models based on the two best-competing LOE equations that were identified by Navakatikyan: the equations proposed by Herrnstein (1970) and by Davison and Hunter (1976).
The dynamical data in question are from: (a) concurrent VI VI schedules with negative-slope feedback functions (Vaughan & Miller, 1984); (b) concurrent variable-ratio (VR) VR schedules (Herrnstein & Loveland, 1975; Mazur, 1992; Mazur & Ratti, 1991); (c) concurrent VI VI schedules with complex feedback functions used to test melioration theory (Herrnstein, 1982; Herrnstein & Vaughan, 1980; Vaughan, 1981); (d) experiments with equal (Herrnstein & Vaughan; Vaughan, 1982), and (e) constant-ratio unequal local reinforcer rates (Horner and Staddon, 1987; Staddon, 1988). Not all of these reports contain full dynamical information. Some data are represented only by the resulting steady-state behavior, sometimes averaged across subjects; nevertheless, these data will be used to choose between the three LOE-based models by investigating whether dynamical models based on these can in principle reach the stable states reported in these experiments. Most of the data are from concurrent schedules, as the treatment of a choice constitutes the major difference between Navakatikyan's (2007), Herrnstein's (1970), and Davison and Hunter's (1976) LOE equations.
In this introduction, we will describe (a) our approach to dynamical modeling; (b) the principal differences between different LOE equations; (c) the datasets; and (d) the data analyses.
In this section, we describe an approach to combining an LOE equation and a feedback function into a dynamical model, and assess the resulting dynamical models from two perspectives: how well the model fits the data, and whether the model has the equilibrium properties observed in the reported data.
We were influenced by the following considerations in developing the present approach: If we know the state of behavior at a particular moment, and we know what the final steady state that will eventually be reached, we can approximate a graph of the change in a behavioral measure over time as the behavior approaches the final steady state. The final steady state of behavior is the behavior predicted by a steady-state LOE equation for a given reinforcer rate. We can assume that behavior changes towards the steady state in some proportion to the difference between current state of behavior and a steady state corresponding to the current reinforcer rate. The modeling done here is purely descriptive, rather than mechanistic. Below is a formal description of this process.
A general LOE equation (or a set of equations associated with a set of choice alternatives) for steady-state behavior (B) as a function of reinforcer rate (R) is:
There exist some underlying differential equations with respect to time dB/dt = F(R). Currently, we do not know their nature, or they are so complicated, or there are so many of them (e.g., Dragoi & Staddon, 1999), that an analytical approach is difficult. In this case, we can use linearization, that is, an assumption that a system changes in linear proportion to the deviation from a steady state. Thus, we can write the following difference equation to predict behavior after some short time step (Δt) and to build a behavioral trajectory step-by-step:
where B*i+1and B*i are the next and current value of behavior, respectively; B is behavior at the steady state calculated from Equation 1 using R, the current obtained reinforcer rate; kt is a dynamic constant, which is a fraction of (B - Bi*) that changes per unit of time and is measured generally in min−1; and Δt is the time step in seconds. Equation 2 says that change in behavior is directly proportional to the difference between current and steady-state behavior. Equation 2 works like this: Steady state is attained when current behavior B* reaches the value of B, so (B - B i*) becomes zero, and no further change of B* occurs. When B* > B, (B - Bi*) is negative, that is, B* decreases with time until it is equal to B. If B* < B, (B - Bi*) is positive, that is, B* increases until it equals B. Though linearization is most accurate near the stable state, it also allows insight into the behavior of a system that is far from equilibrium.
It is important to bear in mind that R also changes with a change of B*, as they are related by some feedback function:
where g is a feedback function of B*.
An LOE equation (Equation 1), a feedback function (Equation 3), and a dynamical Equation 2 form the general structure of the dynamical model considered here. The block diagram for modeling, and an example for a single-key fixed ratio (FR) schedule using Herrnstein's (1970) LOE equation, are given in Figure 1. We start with some initial value of behavior at time zero. Then, according to Equation 3, we calculate reinforcer rate R, related to current behavior. In this case Equation 3 is R = B/N, where N is the number of fixed-ratio responses required. Then, we calculate the value of a steady state behavior B by an LOE equation (Equation 1), and the change of behavior for the current step, that is, ΔB = kt(B − B*i)Δt. Finally, from Equation 2, the next value of behavior, or B*i+1, is found, and the process is repeated until a stable state is reached, or a predefined number of steps have been completed. When there is a two-alternative schedule, the same scheme is applied to each alternative and the model is built around values of B1* and B2*, B1 and B2, R1 and R2 for two alternatives. The models constructed in this way will be dynamic models based on Herrnstein's (1970), Davison and Hunter's (1976), and Navakatikyan's (2007) LOE equations (see next section).
Special points in the analyses are the equilibrium points of the models. Though we use LOE equations, or equilibrium solutions, these constitute only general solutions without feedback functions. Here, we will be looking for equilibria in the dynamical models that include feedback functions.
Most important for our analysis is the presence of stable and unstable equilibria in the model (see, for example, Staddon, 1988, pp. 304–305). A stable equilibrium attracts behavior; deviations of behavior from a stable equilibrium, at least within some range, are temporary, and behavior returns to equilibrium. An unstable equilibrium repels behavior; a small change in behavior drives the behavior further away. Water on different surfaces provides a good illustration. Water in a pool is in stable equilibrium, water on a top of mountain is in unstable equilibrium. Water on the mountain slope is not in equilibrium state—it runs down the slope. Trajectories of a behavioral measure in time converge to a stable equilibrium and diverge from an unstable equilibrium.
In case of two-alternative schedules, we have two response measures for each of two alternatives (B1 and B2). It is usual to analyze such system with a phase portrait. Phase is a state of the system and, in the present case, a state is a pair of B1 and B2, or it is a point on a plane with axes B1 and B2. Such a plane is called a phase plane. A phase portrait is a geometric representation of the trajectories of a dynamical system in the phase plane.
Trajectories are lines along which a system moves through time and can be quite complex if all three dimensions of the behavior (B1, B2, and time) are represented (upper left panel of Figure 2). If we omit the time dimension, we obtain a phase portrait, where the trajectories will show only B1 and B2, and the direction of system over time can be shown by arrows (upper right panel of Figure 2). The usefulness of the phase portrait is that all trajectories are unique and cannot intersect. Thus, we need only to plot a few major trajectories to characterize a dynamical system qualitatively, as neighboring trajectories converge or diverge from the same equilibrium in a geometrically similar way. Trajectories start with some initial values of the system, but at whatever point a system starts, it cannot leave that trajectory. Again, as seen in upper panels of Figure 2, trajectories can converge to a stable equilibrium, shown by the filled circle, and trajectories can diverge from unstable equilibrium marked by the unfilled circle. We cannot assess how much time is required to move along trajectories in phase portraits, but we can understand the order of the consecutive states. Once equilibrium is reached, a system can stay at it so long as the conditions remain constant. The lower left panel of Figure 2 shows time graphs of the response rates from a trajectory marked by a bold line in the upper panels, which starts at B1 = 30, B2 = 170. After about 300 minutes an equilibrium state is reached, and the further dynamics can be seen in time graphs only. The lower right panel of Figure 2 shows a preference trajectory related to the response rates shown in the lower left panel of Figure 2. We can see that exclusive preference was reached both from the phase portrait and from the preference graph.
Figure 3 shows examples of arbitrary equilibria in the phase plane. A stable equilibrium is usually identified by trajectories converging to it (left panel of Figure 3), while an unstable equilibrium has trajectories diverging from it in different directions (left panel of Figure 3). A particular type of unstable equilibrium, called a saddle, can mimic a stable one having some trajectories that initially approach it, but eventually move away (right panel of Figure 3). Using the same example of water on different surfaces, water on a saddle-shaped mountain flows from the top to the ridge, as if attracted by a stable equilibrium, but it cannot remain there and flows further down in another direction.
We will investigate the location and type of equilibria using a number of different techniques: (a) analytically, by solving Equations 1 and 3 simultaneously under the condition that at equilibrium B = B*; (b) graphically, by finding the intersections of feedback functions and various LOE equations, which is possible for a single-key procedure; and more frequently (c) by analyzing time graphs and phase portraits that do not have a time coordinate.
We are not concerned with finding analytic solutions to differential equations. We will assess the LOE equations in terms of their viability as descriptors of steady-state responding using goodness of fit and Bayesian information criteria (BIC) for their dynamic models; and also by analyzing the types of equilibria in the dynamic model and its corresponding data.
Herrnstein's (1970) equation is related to the body of research on matching reported over the last 40 years. It is called the strict matching law, and is the steady-state solution for the process known as melioration (Herrnstein, 1982; Herrnstein & Vaughan, 1980; Vaughan, 1981). Davison and Hunter's (1976) equation reduces to the generalized matching law (Baum, 1979; see also Lander & Irwin, 1968; Staddon, 1968).
Herrnstein's (1970) and Davison and Hunter's (1976) equations are based on the matching (or generalized matching) principle, in which the total amount of behavior (Herrnstein, 1974) is distributed according to the proportion of reinforcers obtained by emitting a behavior. Herrnstein's equation states that the absolute rate of responding on an alternative in a choice is proportional to its associated relative reinforcer rate (Herrnstein, 1970):
where B is responses per min, R is the absolute rate of reinforcers per min; Bmax is a constant (Herrnstein's k), representing “the total amount of behavior generated by all the reinforcements operating on the subject at a given time” (Herrnstein, 1974, p. 161), or the maximum overall response rate; k is a constant (Herrnstein's R0), originally representing the unknown aggregated reinforcers for responses unaccounted for in the summation in the denominator; and i is an index that covers all alternative responses measured in the situation. The constant k influences how fast the response rate increases with reinforcer rate increase—the smaller k, the faster the response rate change. Here, we interpret the constant k as a general free parameter consistent with the interpretations of Killeen (1982, 1994), Staddon (1977) and Navakatikyan (2007), rather than as an aggregated reinforcer rate for nonmeasured responses.
For single-key schedules of reinforcement, Equation 4 is a hyperbola:
For concurrent two-alternative schedules, Herrnstein's (1970) set of LOE equations for each choice alternative are shown here (Equations 6 and 7) with addition of reinforcer bias c (as used in generalized matching) scaling R1:
where the Subscripts 1 and 2 denote the first and the second alternatives.
Davison and Hunter (1976) suggested a range of steady-state LOE equations related to generalized matching equations. Navakatikyan (2007) found the best-performing of Davison and Hunter's equations was:
where a is a sensitivity parameter. For single-key schedules of reinforcement, Equation 8 becomes:
And, for two-alternative concurrent schedule performance, with addition of bias:
One of the theoretical justifications for Herrnstein's (1970) LOE equation was an analogy drawn with Michaelis-Menten kinetics of substrate-enzyme, or drug-cell receptor reaction (Heyman, 1988; Killeen, 1982, 1994; Staddon, 1977). This type of equation considers that the number of randomly emitted responses is proportional to the time available to emit responses, and to the reinforcer rate. The time available is, in turn, limited by emitted responses, as each response takes some time. A simple hyperbolic function arises from these considerations, which is described by Equation 5 above (Killeen, 1994; Staddon, 1977). The function has been called a canonical equation (Equation 4, Killeen, 1994), and the impact of reinforcers is called either response strength (Equation 22, Staddon, 1977), or activation (A) level (A = bR, where b is a coefficient, Killeen, 1994).
However, if there are two or more reinforcer sources available, each producing its own activation level (that is, A1 = bR1 and A2 = bR2), they both compete for the available time resulting in Equation 4 above. The parallel to such an interaction between reinforcers has been called competitive inhibition in enzyme kinetics (e.g., Ainsworth, 1977).
Activation level does not necessarily have to be a linear function of reinforcer rate. It can, for example, be a simple hyperbola (A = bR/(R + d), where d is a coefficient) as in the early version of incentive theory (Equation 6, Killeen, 1982). Turning to the Davison and Hunter (1976) LOE equation, we can consider that activation level is a power function similar to Stevens' (1957) law: that is, A = bRa, where a is a sensitivity parameter. In this case, competing reinforcer rates will result in Davison and Hunter's (1976) LOE equation (Equations 8 to 11), and the inhibition of the effect of one reinforcer by another remains a competitive inhibition.
Both Herrnstein's (1970) and Davison and Hunter's (1976) LOE equations can be considered extensions of a simple hyperbolic function with alternative reinforcers affecting the coefficient k of the hyperbola, while the coefficient Bmax remains constant. Thus we can rewrite the general Equations 4 and 8 using a new coefficient k* (“apparent k”), which is sum of the original coefficient k and the other reinforcer rates. Herrnstein's (1970) LOE equation for ith response is:
where k* = k + ∑Rj, and j is the index of all reinforcer rates other than the ith. Davison and Hunter's (1976) LOE equation for the ith response is:
where k* = k + ∑Rja.
The left panel of Figure 4 shows an example of a model based on competitive inhibition. Three curves are shown, each for a different level of reinforcer rate on the second alternative. As the alternative reinforcer rate R2 increases, we observe an increase in apparent value of k, that is, the speed of change of B1 with increasing R1 decreases, while the value of the maximal reinforcer rate remains constant.
Unlike Herrnstein's (1970) and Davison and Hunter's (1976) LOE equations, Navakatikyan's (2007) model is based on a noncompetitive inhibition, using the analogy from enzyme kinetics (e.g., Ainsworth, 1977). Navakatikyan hypothesized that, for a single-response schedule, reinforcers affect responding in a way similar to Herrnstein's (1970) LOE equation (Equation 5). However, when other reinforcers are present, they decrease the maximally achievable response rate. Thus, Navakatikyan's LOE equation can be regarded as a modification of Herrnstein's (1970) hyperbola with the constant Bmax being a decreasing hyperbolic function of the other reinforcers:
where B*max is the maximal apparent response rate and kred is a constant (k-reducing).
The right panel of Figure 4 shows an example of a model based on noncompetitive inhibition. As in the left panel, the model is represented by three curves, each with the same level of reinforcer rate on the second alternative. However, for this noncompetitive inhibition model, as the R2 increases, the apparent value of k remains constant, so that the ceiling of response rate is attained equally fast by all curves, but the value of the maximal response rate is decreased.
The LOE equations for the present article were selected from the range investigated by Navakatikyan (2007). The composite equation selected is the product of two hyperbolas, one increasing and the other decreasing, obtained by combining Equations 14 and 15:
and, for the case of two alternatives, with addition of response bias (c), as:
For a single alternative, the model reduces to Herrnstein's (1970) LOE equation (Equation 5) because in Equation 17 becomes 1 when R2 is 0.
Navakatikyan (2007) derived Equations 17 and 18 from another perspective, without using the analogy to enzyme-substrate kinetics, and without a distinction between competitive and noncompetitive inhibition. Two functions, whose product comprises the model in Equation 16, were termed then component functions, and Navakatikyan hypothesized that there could be a range of such functions affecting a response unit. He suggested distinguishing reinforcers that are arguments in these functions, and referred to them as enhancing and reducing reinforcers, meaning that they respectively enhance or reduce a particular behavior. Accordingly, functions of the first category of reinforcers were termed enhancing-component functions, and functions of the second category of reinforcers were termed reducing-component functions. Behavior is a product of the component functions plus a constant:
where B is the resulting behavior, for example, response rate or residence time, Fenh and Fred are the enhancing- and reducing-component functions of enhancing and reducing reinforcers, respectively, and Ba is a baseline constant. For the current approach, Equation 19 was simplified by setting Ba to 0, thus leading us to hyperbolic-hyperbolic Equations 17 and 18 of general form:
and Ri and Rj are reinforcer rates on the current and other alternatives.
Figure 5 shows an example of the component functions and an arbitrary model resulting from their multiplication. We used enhancing and reducing descriptors for the component functions, rather than just designating them as functions for current and other reinforcers for the following reason: Commonly, the reinforcers on the current alternative can be identified as enhancing reinforcers, while reinforcers on other alternatives can be identified as reducing reinforcers, but this may not always be the case. For example, in concurrent VI VI schedules, some reinforcers that are consumed on a current alternative may originate while the subject is working on the other alternative (see, for example, MacDonall, 2005), and it may be misleading to regard only reinforcers on the current alternative as those that increase responding. Thus, we prefer to regard enhancing and reducing functions as those associated with activating and inhibiting response processes.
As discussed by Navakatikyan (2007), other types of functions also performed well as the enhancing-component function, in particular a bounded exponential and power function with three free parameters, and we analyzed their performance during preparation of this article. However, we found that an adequate description can be achieved by the simple hyperbola (Equation 21). We did establish that exponential and power functions Fenh = B max (1-e−bR) and Fenh = BRb (where B is a constant) performed as well as Equation 21, and we will return to this finding in the Discussion.
As the consequence of different structures, a major difference between Navakatikyan's (2007) LOE equation and Herrnstein's (1970) and Davison and Hunter's (1976) equations lies in the predictions of preference. Herrnstein's and Davison and Hunter's equations predict constant preference between two alternatives providing constant reinforcer-rate ratios irrespective of the overall reinforcer rate. The property described above may be crucial for understanding the changes in choice observed when different overall reinforcer rates are arranged with the same reinforcer ratio (e.g., Alsop & Elliffe, 1988; Elliffe & Alsop, 1996; Logue & Chavarro, 1987; Mazur, 1992).
In summary, our goal here is to explore further the feasibility of the component-functions model (Navakatikyan, 2007) for the law of effect in comparison to two others (Davison & Hunter, 1976; and Herrnstein, 1970) by using them to analyze data from experiments on behavioral dynamics.
To demonstrate the feasibility of our modeling approach, we start with performance in single-key schedules using the results of experiments with negative-slope feedback functions (Vaughan & Miller, 1984; see also Jacobs & Hackenberg, 2000). Then, we will attempt to model the results reported by Herrnstein and Loveland (1975), who used independent concurrent VR VR schedules. The common finding with the latter schedules is exclusive, or almost exclusive, preference for the richer alternative (e.g., Davison & McCarthy, 1988; Myerson & Miezin, 1980; Vaughan, 1982, 1985; but see also Nevin, 1982). Nevertheless, the results of Herrnstein and Loveland's experiments are more informative, as only in 4 out of 12 conditions was preference greater than 90% for the richer alternative. The effects were described by Herrnstein and Loveland as follows: “When the ratios [of VR VR schedules] summed to 60 (or 61 or 62), exclusive preference was attained with a smaller relative difference between the two ratios than when the sum was 120. A relative difference of 0.15 or thereabouts sufficed, on the average, for the smaller ratios, while a relative difference about twice as great barely sufficed for the larger ratios” (Herrnstein & Loveland, 1975, p. 109, parenthetical material added).
We then model dynamical experiments that investigated transitions in concurrent VR VR schedules (Mazur, 1992; Mazur & Ratti, 1991). These experiments investigated transitional performance when the reinforcer probability ratio for two alternatives remained constant, but overall reinforcer rate was varied (Mazur, 1992), and when the difference between reinforcer probabilities was constant, but overall probability was varied (Mazur & Ratti, 1991). In these two experiments, the change in relative response allocation over time following transitions differed, and depended on both relative and overall reinforcer rate. The results did not show a tendency for exclusive preference within the time frame of the experiments. As noted by Dragoi and Staddon (1999, p. 36), these results are not compatible with most models of acquisition such as the linear-operator model (Bush & Mosteller, 1955), the kinetic model (Myerson & Miezin, 1980), melioration theory (Herrnstein & Vaughan, 1980), and ratio invariance theory (Staddon, 1988). Other models, such as the cumulative-effect model (Davis, Staddon, Machado, & Palmer, 1993), and Daly and Daly's (1982) model also failed to generate correct predictions when applied to these data (Dragoi & Staddon). Dragoi and Staddon's acquisition-extinction theory predicted the transitions quite well, but this account predicts that preference will ultimately become exclusive.
Finally, we will model experiments that examined melioration as a dynamical principle (Herrnstein, 1982; Herrnstein & Vaughan, 1980; Vaughan, 1981) as well as some related experiments, such as experiments using concurrent VI VI schedules with equal local reinforcer rates (Herrnstein & Vaughan, Experiment 3; Vaughan, 1982) and using constant-ratio unequal local reinforcer rates (Horner & Staddon, 1987, Experiment 2; Staddon 1988). We are not aware of any attempt to fit a model to the melioration experiment data, but a descriptive explanation was given by Silberberg and Ziriax (1985). This experiment remains a point of interest (e.g., Corrado, Sugrue, Seung, & Newsome, 2005).
To study dynamics, we used an approach that predicts average behavior. We do not consider here procedures in which an average behavioral measure is not a proper representation of behavior, for example where a feedback function window is short compared to average residence time (Davison & Alsop, 1991; Silberberg & Ziriax, 1985), or where there are complex local contingencies (Williams, 1991). Nor do we attempt to model behavior on response-by-response level, or with a full system of differential equations (e.g., Corrado et al., 2005; Davison & Baum, 2000, 2003; Dragoi & Staddon, 1999; Gallistel et al., 2007; Lau & Glimcher, 2005).
Unlike momentary maximization, molecular maximizing and melioration (e.g., Davison 1990; Herrnstein & Vaughan, 1980; Shimp, 1966, 1992; Silberberg, Hamilton, Ziriax, & Casey, 1978; Silberberg & Ziriax, 1982, 1985; Vaughan, 1981, 1985), we do not consider our approach as an independent local or molecular mechanism to derive an LOE equation. There is no contradiction between our dynamical model and steady-state LOE equations (for related discussions see Baum, 2002; Shimp, 2004; and Williams, 1991). To the contrary, the present approach assumes that a molar equation, that is, a law of effect itself, is an equation for steady states of dynamical models, and we use this to derive the dynamics. Thus, if a dynamical model is viable, it will be compatible with an LOE equation by default. Our primary objective here is to assess the LOE equations, thus we concentrated on dynamical experiments that we considered important to be modeled, even if some of them reported insufficient data for a full-scale analysis.
We used the QuattroPro 8 spreadsheet optimizer to fit data and to calculate graphs of behavior change over time for the visual analysis of equilibria. Time steps from 1 to 4 min were usually used. If absolute measures of behavior were not available, preference measures were used to find model parameters. We often used variance accounted for (VAC) by the model to assess the quality of fit. However, as the models have a different number of adjustable parameters, it is not always sufficient to calculate VAC, as an additional free parameter will naturally increase VAC. Navakatikyan (2007) used the Akaike second-order information criterion (AICc) and the Bayesian information criterion (BIC) to take account of the different number of parameters in the model, as the common Akaike criterion (AIC) is not recommended for small samples (Burnham & Anderson, 1998). Here, we employed only BIC, because the number of data for some of the sets was too small to allow the use of AICc. We will use the BIC formula derived for series of data, that is, for models fitted individually for a series of subjects in an experimental group (McArdle, personal communication, 2005; Navakatikyan, 2007):
where N is the number of data sets or subjects; i is index of a data set 1, 2, . . . N, RSS is the residual sum of squares for the fitted model, and K is the number of adjustable parameters in a model plus 1. The smaller the value of the BIC measure, the better the model described the data. Absolute values of BIC by themselves have no meaning for a given data set, as they depend on the dimension of RSS—for example, response rate measured in responses per seconds will produce a 602 times smaller RSS value than responses per minute.
Conventionally, models are compared using the differences between values of BIC, which are independent of the dimension of RSS. A cutoff value of 6 for a difference in information criteria is recommended by Burnham and Anderson (1998) for a model to be considered a better data description. A cutoff difference of 10 or more means that there is virtually no support for a model with the larger BIC value being a better description (Burnham & Anderson). It is common to present the results as the differences (ΔBIC) between a model's BIC and the BIC of the best model. In the results we will designate cases with ΔBIC > 6 and ΔBIC > 10 as the presence of evidence and strong evidence for the best model.
The goal of this section was mainly to demostrate the feasibility of our dynamical modeling approach. In Experiment 1 of Vaughan and Miller (1984), feedback functions were arranged in which an increase in response rate produced a linear decrease in reinforcer rate for a range of response rates. This single-key procedure has two components. The first component was a linear VI schedule in which response rate does not affect reinforcer rate over a wide range. The schedule was arranged by running VI schedules and storing reinforcers, rather than stopping timing when reinforcers are arranged. The feedback function for the linear VI schedule is R = min (B, 1/t), where t is the mean interval (Figure 6, upper left panel) and min is minimum. The function increases linearly from zero with reinforcer rate equaling response rate. Once the response rate reaches the level of the arranged VI reinforcer rate, the function becomes a horizontal line with zero slope. The second component is a negative-slope feedback function produced by subtracting reinforcers from the store using a parallel fixed ratio (FR) schedule. This results in the composite feedback function R = min(B, 1/t) − B/N, where N is the FR schedule ratio requirement. Figure 6 (upper right panel) shows an example of this feedback function, for which the maximum reinforcer rate can be achieved by responding less than 5 times per min.
In Vaughan and Miller's (1984) Experiment 1, 9 pigeons were given nine different schedules; 3 different pigeons were trained on a set of three different schedules. The schedules were combinations of three linear VI schedules, VI 30 s, 45 s and 90 s, and three FR schedules, FR 20, 40 and 60. Conditions took between 23 and 71 sessions for performance to stabilize. Equilibrium response rates produced reinforcer rates that were substantially lower than maximal, and the data were inconsistent with most simple theories of optimal performance. Vaughan and Miller suggested that the results were consistent with the assumption that reinforcement strengthens the tendency to respond, but no mechanism was offered.
The data averaged over the last five stable sessions of each condition were reconstructed from Figure 1 of Vaughan and Miller (1984). There were only three different conditions for each pigeon, so we averaged response and reinforcer rates for similar conditions.
For Navakatikyan's (2007) LOE model, the data require a single-alternative function with no bias and no reducing-component function. Thus, we used Equation 5 for Herrnstein's (1970) and Navakatikyan's (2007) LOE models, as they are identical for single-key procedures. We used Equation 9 for Davison and Hunter's (1976) LOE model. Dynamics were modeled with an arbitrary initial value of 40 responses per min and an arbitrary experiment time of 300 min. Models were optimized with respect to response rate. Data over the last 50 min of the model applications were averaged and compared with empirical data.
All three models fitted data very well, accounting for 94% to 95% of response-rate variance. There was no evidence that any one of the three models was a better model according to the values of BIC differences (Table 1), that is, all ΔBIC were less than 6. Time graphs of Herrnstein's (1970) and Navakatikyan's (2007) dynamical models are shown in Figure 6 (middle panel). Full results with the model parameters, variance accounted for, and ΔBIC are given in Appendix A (Table A1).
A graph of Herrnstein's (1970) and Navakatikyan's (2007) LOE equations and their intersections with the feedback functions (the equilibrium points) are shown in lower panel of Figure 6 (see also Baum, 1973, Figure 5, for a similar approach). The graphs for Davison and Hunter's (1976) LOE equation were very similar and were omitted from Figure 6. There were two equilibria in each of the three models, as the LOE curve (thick line in the lower panel of Figure 6) intersects every feedback function twice. The first equilibrium is located at the origin, and is unstable. Close to the origin, the current response rate (B*) is lower than the equilibrium response rate, that is, (B − B*) > 0, and thus the response rate increases until the second equilibrium is reached. This equilibrium is stable, and it is situated far from the maximal response rate. If B* becomes higher than B, the difference (B − B*) becomes negative and B* decreases. An analytical way to find equilibria for Herrnstein's and Navakatikyan's models is given in Appendix A.
Thus, all three models are viable dynamic descriptions for these experiments, producing good fits and having stable equilibria located where they are observed in the data and suggesting that the modeling approach is feasible.
We will consider the data from three experiments that arranged concurrent independent VR VR schedules. Herrnstein and Loveland's (1975) experiment arranged conditions that both produced, and did not produce, exclusive preference for the richer alternative after up to 100 sessions. In Mazur's (1992) experiment, different overall reinforcer rates with the same reinforcer ratio produced preference changes in a transition session. In Mazur and Ratti's (1991) experiment, different overall reinforcer rates with the same difference in reinforcer ratio produced changed preference over single long transition sessions.
In a standard two-key chamber, 5 pigeons were trained on concurrent VR VR schedules. The sum of two VR ratios was approximately 60 in Series 1, and 120 in Series 2 and 3. The following pairs of ratios were used in Series 1: 30, 30; 25, 35; 21, 41; and 11, 50; and in Series 2 and 3: 60, 60; 50, 70; 40, 80; and 20, 100. Thus the ratios of N1 to N2 were 1, 0.7, 0.3, and 0.2, where N1 and N2 are the responses per reinforcer on the VR schedules. In Series 1 and 2 reinforcers could not occur within 1.5 s of changing over, but responses still were counted. In Series 3 there was no changeover delay. Conditions lasted between 20 and 101 sessions, with less training given for more extreme ratios. The results were presented as response proportions averaged over the last 10 sessions. We reconstructed absolute values of response and reinforcer rates averaged over subjects from Herrnstein and Loveland's Figures 1, ,22 and and5,5, and used these data for modeling.
Experiment 1 studied pigeons' performance in transitions from equal probabilities of reinforcement for two alternatives to unequal probabilities of reinforcement. There were 50 different conditions, each consisting of three or four equal reinforcer probability (training) sessions followed by one unequal-probability (or transition) session. In equal-probability sessions, the reinforcers were arranged by running a single VR schedule that assigned a reinforcer to two alternatives with equal probability. If a reinforcer was assigned to a key, no reinforcer could be assigned to either key until that reinforcer had been collected—an interdependent concurrent VR VR schedule. This procedure was used for the first 100 responses in each transition session. After 100 responses, the schedule was switched to unequal probabilities in which two independent VR schedules were in effect on the two keys. For training sessions and the first 100 responses of transition sessions, the probability that reinforcer would be assigned for the next response was the mean of the probabilities in the transition phase. Experiment 1 had two parts. In Part 1, there was one condition with a very large difference in reinforcer probabilities (.19 and .01) and four conditions with a 51 reinforcer ratio (.20/.04, .15/.03, .10/.02, .05/.01). In Part 2, the condition with a large difference of reinforcer probabilities (.19/.01) was interspersed with conditions with a 21 reinforcer ratio (.16/.08, .12/.06, .08/.04, .04/.02).
Results were averaged over subjects and similar conditions and presented as proportions of responses to the rich alternative per block of 100 responses in transition session. The largest change in preference was observed for the 191 ratio, then for the group with 51 ratio, and then for the group with 21 ratio. Within groups of probabilities with the same ratio, Mazur (1992) reported that the fastest changes were associated with the larger overall probability of reinforcement. Absolute response rates were not available and the proportion of responses to the rich alternative was reconstructed from Mazur's Figures 1 and and22 for modeling.
In an experiment similar to that of Mazur (1992), constant differences in two probabilities of reinforcement were studied with different overall reinforcer rates. The experiment included 20 conditions, each consisting of two or three training sessions followed by one transition session. There were five different combinations of reinforcer probabilities. Four of these had differences in probabilities of .06 (.16/.10, .13/.07, .10/.04, .07/.01), while one combination had a larger difference in reinforcer probabilities (0.19/0.01). Each combination was repeated four times. Mazur and Ratti reported that preference developed more slowly when the ratio of two reinforcement probabilities was smaller (.16/.10) than when it was larger (.07/.01).
Results were again averaged over subjects and similar conditions and presented as the fractions of responses to the rich alternative per blocks of 500 responses for transition session. The first block in transition sessions was 100 responses under the conditions of training sessions. The proportion of responses to the rich alternative was reconstructed from Mazur and Ratti's (1991) Figure 1.
Equations 6 and 7, 10 and 11, and 17 and 18 (Herrnstein, 1970; Davison & Hunter, 1976; and Navakatikyan, 2007) were used as steady-state LOE equations. Feedback functions were R1 = B1/N1 and R2 = B2/N2. For the Mazur (1992) and Mazur and Ratti (1991) experiments, the probabilities of reinforcer (p) were substituted by N = 1/p for the modeling, though in the description we will use the original probabilities. For the data of Herrnstein and Loveland (1975), the same LOE equations were also used for modeling the steady states without feedback functions (nondynamically) in order to compare the two approaches.
The usual (nondynamic) modeling using steady state LOE equations was done for all 12 conditions of the experiment (Appendix B, Table B1). All the steady-state models fitted with a high degree of accuracy, with VAC values of 94%, 95% and 95% for the response rate for Herrnstein's (1970), Davison and Hunter's (1976) and Navakatikyan's (2007) models, respectively. VAC for the proportions of responses allocated to the left alternative were 96%, 98% and 98%, respectively.
The same data were then used for dynamical modeling, and the results were quite different. An initial response rate of 50 responses per min to both alternatives was used. We assumed that all sessions were of the average length reported for the last 10 sessions. Model values for the last 10 sessions were averaged and optimized against data. Models for Series 2 and 3 were virtually identical and are presented together, though these data were obtained in a slightly different number of sessions. Model parameter values and accuracy are given in Appendix B, Table B1, and accuracy is summarized in Table 1. Herrnstein's model performed poorer than others, accounting for only 58% of response rate variance and −54% of response proportions to the left alternative (PL). The negative value is possible for VAC, unlike R2, and shows the model performed poorly—there was a greater variance between the data and predictions than in the data themselves. Davison and Hunter's (1976) model accounted for 89% of response rate variance and 81% of PL variance. Navakatikyan's (2007) model performed better in all categories, accounting for 94% of response rate variance and 81% of PL variance. Differences in BIC (Table 1, Appendix B, Table B1) provided strong evidence (ΔBIC > 10) that Navakatikyan's model performed better than the others. The predictions for response rate at equilibria for all three models are given in Figure 7. The difference between models is especially pronounced for the lean alternative, where Herrnstein's (1970) model predicted complete extinction on this alternative for all but equal VR VR ratios, while Navakatikyan's and Davison and Hunter's models predicted this result only for the most extreme ratio.
The same data plotted as response proportions for the left alternative against the session number are shown in Figure 8. It is clear that, for Herrnstein's model, only the N1/N2 = 1 ratio does not immediately go to exclusive preference, but will go there eventually. Fits for Davison and Hunter's (1976) and Navakatikyan's (2007) models here are very similar, but we have to bear in mind that optimization was performed for the response rates, and not for the proportions.
The training sessions were modeled as 30-minute sessions, which was long enough to reach a steady state. We treated the training session as independent concurrent schedules rather than as interdependent schedules as their role is merely to provide approximately equal preference between two alternatives. Models were optimized against the proportion of responses allocated to the rich alternative and parameter, VAC, and ΔBIC values are given in Appendix B, Table B1, and in Table 1.
Herrnstein's (1970) model performed least well (VAC = 79%). Davison and Hunter's (1976) model performed better in terms of variance accounted for (VAC = 91%), and Navakatikyan's (2007) model performed best (VAC = 95%). BIC differences provided strong evidence (ΔBIC > 10) for Navakatikyan's model being the best description. The major problem with Herrnstein's and Davison and Hunter's models was that they failed to predict different dynamics for different overall reinforcer rates, while Navakatikyan's model did so (Figure 9).
Parameter values and the accuracy of the models are shown in Appendix B, Table B1 and in Table 1. Herrnstein's (1970) model again performed poorest (VAC = 83%). Davison and Hunter's (1976) and Navakatikyan's (2007) models performed best (VAC = 91% & 92%). BIC differences show strong evidence (ΔBIC > 10) that Navakatikyan's model is better than Herrnstein's. The model dynamics are shown in Figure 10.
The dynamical model based on Herrnstein's LOE for independent concurrent VR VR performance has an equilibrium for both B1 and B2 positive only for the rare condition when N1 = cN2. To put it simply, if there is unit bias (c = 1), the schedule ratios have to be equal (N1 = N2) to maintain this equilibrium. Otherwise, the bias has to balance the inequality in the schedule ratios' requirements in order to provide an equilibrium. The phase portrait of this system is shown in right panel of Figure 11. The equilibrium is not a point, but a line connecting two equilibria on the B1–B2 axes. The system starts with some initial condition, and then moves to the equilibrium line preserving the initial ratio of B1B2. The exact value of bias cannot realistically hold, so this equilibrium cannot be observed even if the underlying LOE equation was as Herrnstein (1970) proposed. The other possible phase portrait is shown in left panel of Figure 11. It has a single stable equilibrium on the side toward the richer alternative, thus predicting only exclusive preference.
The two other models (Davison & Hunter's, 1976, and Navakatikyan's, 2007) have a stable equilibrium located away from B1 and B2 axes for some values of model parameters, and for less extreme VR VR ratios. As an example, consider phase portraits for Navakatikyan's model, the analytical considerations for which are given in Appendix B. The trajectories were derived using Navakatikyan's model parameters for the Herrnstein and Loveland (1975) data from Table B1, Appendix B. Bias c was set to 1 in order not to distort the picture. There are three distinct phase portraits. The first one shows a stable equilibrium for both B1 and B2 > 0 (left panel, Figure 12). Ratio requirements for this portrait were N1 = 13.5, N2 = 16.5. The phase portrait also has three unstable equilibria located on the B1–B2 axes: one is at (0, 0), and the others are when either B1 or B2 equal 0. If B1 or B2 equals 0, then B2 or B1, respectively, converge to two equilibria along the axes, but cannot stay there, and move to the stable equilibrium—thus they are saddle-type unstable equilibria. In the middle panel of Figure 12, a phase portrait is plotted for the same model, but for more extreme schedule ratios, namely, N1 = 6, N2 = 24. In this case, the stable equilibrium with B1 and B2 > 0 disappeared and shifted to the axes of the richer alternative. Two other equilibria are unstable. Assessment shows that the stable equilibrium disappears when the VR VR ratio becomes more extreme than about 13. The third phase portrait (right panel of Figure 12) is for the same model and VR VR ratio as in the first phase portrait, but with the parameter k set at 1.5. Here, instead of a stable equilibrium for B1 and B2 > 0, there is an unstable (saddle) equilibrium. At the same time, the third phase portrait has two stable equilibria on the axes. Thus, the model predicts the possibility of exclusive preference for the rich or poor alternative, depending on initial conditions.
In summary, the three dynamical models for the independent VR VR schedule experiments considered here performed differently. Herrnstein's (1970) model was not an acceptable fit to the data of Herrnstein and Loveland (1975) and Mazur (1992). It does not provide different curves for transitional data (Mazur) with the same ratio of schedule constants, but different overall reinforcer rates. Equilibrium analysis confirms that the model does not have a stable equilibrium for both B1 and B2 positive, except at a very particular value of bias, and thus generally predicts only exclusive preference. Davison and Hunter's (1976) model fitted the data better, though it did not produce different transitional curves for Mazur's experiment. Nevertheless, the model does have stable equilibria for both B1 and B2 positive. Navakatikyan's (2007) model fitted data well, and shows the existence of equilibria for B1 and B2 > 0, as observed by Herrnstein and Loveland. It also predicts exclusive preference when the ratio of VR VR constants become more extreme, and allows for preference for the lean alternative given appropriate initial values as, for example, in Herrnstein (1958).
The melioration experiment described by Vaughan (1981) used 3 pigeons working on concurrent arithmetic VI VI schedules. Pressing one of two alternative keys added 2 s to the cumulative timer for an alternative. The timer for the alternative response timed, and the associated VI tape advanced, unless a reinforcer was delivered or the other key was pecked. The feedback functions were arranged so both relative and overall reinforcer rates depended on the proportion of time spent responding on the right alternative (fr). At the end of each 4-min period, this proportion was calculated and new local reinforcer rates were arranged for the next 4 min. We reconstructed the feedback functions (Figure 13) from Figures 1 and and33 of Vaughan (1981) approximating the curvilinear portions by logistic equations.
During the first 26 sessions, the feedback function called Condition a was applied (upper left panel, Figure 13). The local reinforcer rates were arranged such that if subjects chose according to melioration dynamics, relative performance fr would stabilize in the range .125 to .25, which occurred (fr = .196, .160, and .148 were observed). If fr increased above .25, the local reinforcer rate on the right alternative decreased and returned choice to the .125 to .25 range. If fr decreased below .125, the local reinforcer rate on the left alternative decreased and again returned choice to the .125 to .25 range. Overall rate of reinforcement in the .125 to .25 range was three reinforcers per min. Starting from Session 27, the feedback function called Condition b was applied (upper right panel, Figure 13). Now, local reinforcer rates above fr = .25 were arranged that would, according to melioration dynamics, push fr toward the range .75 to .875. The new range was also arranged in a way that precluded behavior drifting away from the area of .75 to .875. Overall rate of reinforcement in the .75 to .875 range was one reinforcer per min; thus, according to melioration, the feedback function induced choice to move to the lower overall reinforcer rate area. The shift toward the .75 to .875 range commenced almost immediately for Bird 1, and after 10 sessions for Bird 3. For Bird 2, though, this change required the successive introduction of two more conditions (Conditions b1 and b2) to facilitate the initial shift in behavior. Even with these additional conditions, it took another 5 sessions until Bird 2's choice changed. After total of 84 sessions fr of the 3 birds stabilized at the values .792, .768 and .782, respectively.
Data on time spent responding and not responding were taken for modeling from Table 1 of Vaughan (1981). To calculate reinforcer rate properly from local rates, we subtracted the fraction of time that pigeons were not responding: 16%, 27% and 12% for Birds 1 to 3, respectively. Accuracy of modeling was also checked against relative time spent on the right from Vaughan's Figure 2, but the optimization was performed using the absolute values of time spent responding.
The same three LOE equations (Herrnstein's, 1970; Davison & Hunter's, 1976; and Navakatikyan's, 2007) were used to construct dynamic models by combining them with the arranged feedback function. The time spent on an alternative was averaged over the same sessions as in Vaughan's original experiment. These were the initial Sessions 23 to 27 (Condition a), and the final Sessions 79 to 83 for all birds. Time for the transition phase was averaged over Sessions 28 to 32 for Bird 1, Sessions 61 to 65 for Bird 2 and Sessions 36 to 40 for Bird 3. To induce the transition from Condition a to b, we decided not to rely on random processes but we added a constant impulse to three consecutive 4-min intervals at the start of the transition for each bird. Initial values of times spent responding were taken equal to the values obtained in the transition phases. As there were only six values of time to optimize the models, we had to limit the scope of search for the best solution. Thus, we first obtained initial values for the parameters of the LOE equations without applying a feedback function. Then we selected an impulse value in steps of 0.5 s to induce a minimal response. Then the dynamical constant was selected to lie from 0.025 to 0.07. The parameters for the LOE equations were then optimized (Appendix C, Table C1 and Table 1). An example, using Herrnstein's and Navakatikyan's models for Bird 3 is given in Figure 14. Davison and Hunter's model for Bird 3 was almost identical to Herrnstein's and was thus omitted. The VAC of predicted time spent responding were 82%, 82%, and 95% for Herrnstein's, Davison and Hunter's, and Navakatikyan's models, respectively. BIC differences provided strong evidence (ΔBIC > 10) that Navakatikyan's model was the best description of the data. However, as the number of data points was small, this result must be taken cautiously. The VAC for the predictions of fr ranged from 85% to 86%. The most important result here is that reasonably successful models for all three LOE equations were built without assumptions that local reinforcer rates drive the dynamics as required by melioration.
All three models have similar phase portraits and equilibria. In Condition a there was a stable equilibrium at about fr = .125, which keeps behavior in this area. In Condition b a second stable equilibrium at about fr = .75 appeared. Two stable equilibria in Condition b are separated by an unstable one at about fr = 0.4. This unstable equilibrium prevents an easy transition from the first stable equilibrium to the second, as Vaughan (1981) observed. To start the transition, a fluctuation in behavior is required that results in the allocation of some additional time for the second alterative.
An experiment with equal local reinforcer rates (Experiment 3, Vaughan, 1982; also Herrnstein & Vaughan, 1980) was conducted using the same method as used for Vaughan's (1981) experiment on melioration. In Vaughan's (1982) experiment, 3 pigeons worked on independent concurrent arithmetic VI VI schedules. As described in the previous section, each VI tape advanced 2 s if the associated key was pecked. Every 4 min, the fraction of time allocated to the right alternative (fr) was calculated and used to set the overall reinforcer rate for the next 4 min, while local reinforcer rates were kept equal. The feedback function has an asymmetric maximum of overall reinforcer rate equal to two reinforcers per min at fr = .25, and a minimum rate of one reinforcer per min at the extremes. We reconstructed the function from Herrnstein and Vaughan's Figure 5.8 by quadratic approximations of the left and right parts of function (Figure 15).
After 26 sessions, the time allocation of all 3 pigeons had reached neither exclusive preference nor the maximum overall reinforcer rate, but averaged fr = .6 (range .44 to .74). Average values of fr for Sessions 1–5, 6–10, 11–15, 16–20 and 21–26 from Herrnstein and Vaughan's (1980) Figure 5.8 were taken for modeling.
Modeling was conducted for 28-min sessions across the 26 sessions. As there were only five fr data points for each model, we limited the variation in our model parameters by setting Bmax = 50, and kt = 0.05 in all models. Models' parameter and accuracy values are given in Appendix D, Table D1 and Table 1. Fits are shown in Figure 16. Herrnstein's (1970) model again performed poorer than the others (VAC = 66%). Davison and Hunter's (1976) and Navakatikyan's (2007) models performed better and did not differ from each other (VAC = 74% for both). BIC difference provided evidence (ΔBIC > 6) that both models were better descriptions than Herrnstein's.
Using VAC as measure of accuracy does not accurately reflect the relative quality of the models in this case. First, data points for Bird 1, for example, did not deviate far from indifference and provide little variance to account for. Second, for Birds 2 and 3, Herrnstein's model gave reasonably good predictions because bias, c, was close to unity (0.97 for both birds; see Appendix D, Table D1). A unit value of bias would keep preference constant, while a slight deviation from unity allows for a slow transition to exclusive preference, which fitted the data even though the data themselves were not indicative of exclusive preference.
The equilibrium properties of the models are similar to those for independent concurrent VR VR schedules. Herrnstein's (1970) model has a stable equilibrium for both B1 and B2 positive only when c = 1 (Figure 17, right panel). The equilibrium, as for independent concurrent VR VR schedules, is a line.. The system starts with some initial condition, then moves to the equilibrium line preserving the initial value of fr. If c < 1, the system has a stable equilibrium on the T2 axis at fr = 0; if c > 1, the system has a stable equilibrium on the T1 axis at fr = 1, as shown in the left and middle panels of Figure 17). If behavior adhered to the Herrnstein LOE equation, we would expect exclusive preference to be observed in the experiment, because c exactly equaling 1 is improbable.
All other models have stable equilibria in the area centered on fr = .5 for a range of parameter values that is similar to the ones shown in left panel of Figure 12 (Navakatikyan's 2007 model for interdependent concurrent VR VR schedules). Thus, Davison and Hunter's (1976) and Navakatikyan's models were consistent with observed data. For Navakatikyan's model, for example, stable equilibria occur when k < kred by factor of 2 to 2.5. A change in parameters can lead to exclusive preference in two ways: by an increase in bias to the second alternative, but not toward the first, or by increase in value of k. Further increases in the value of k create two stable equilibria on the axes, and one unstable equilibrium in the central region like that shown in the right panel in Figure 12. Finally, under a rare combination of parameters, a stable line equilibrium can occur. The dynamical system moves toward the equilibrium line keeping the initial value of fr constant.
Four pigeons were trained on asymmetric interdependent concurrent VR VR schedules by Horner and Staddon (1987, Experiment 2; see also Staddon, 1988). The schedule is asymmetric as the local reinforcer rate on the majority alternative was always twice that on the minority alternative. The schedules were also interdependent as the overall reinforcer rate depended on the proportion of responses to the minority alternative (fr). Probability of reward on the minority and majority alternatives were linear functions: pr = 0.066fr + 001 and pl = 2pr, respectively. Thus, the schedules were arranged so that choice proportions favoring the higher local reinforcer rate alternative (the melioration strategy) will always have the lower overall probability of reward compared to a maximization strategy (Figure 18).
The experiment consisted of 10 sessions with the left and right alternatives as the majority and minority alternatives, and then the alternatives were reversed for a further 10 sessions. In most cases, the pigeons exhibited a unimodal distribution of choice, allowing the conclusion that choice was at equilibrium. The main result was a partial preference for the majority alternative. For 6 of 8 pigeons, preference was represented by a single large modal peak of fr distribution below fr = .33, but not at exclusive preference; for the other 2 pigeons, smaller peaks were observed in the same region.
The dynamical models based on Herrnstein's (1970), Davison and Hunter's (1976) and Navakatikyan's (2007) LOE equations were constructed. As there were no dynamical data in the original article, we simply investigated the equilibrium properties of the models.
Herrnstein's (1970) model has a stable line equilibrium for positive values of both B1 and B2 for c = 0.5 only (lower panels in Figure 19). The equilibrium depends on value of parameter k (Herrnstein's R0, which originally represented unknown aggregated reinforcers for unaccounted responses, Equations 6 and 7). As k increases, the equilibrium transforms from the line connecting two positive points on the axes, through the line connecting a positive point on the B2 axis with the origin, and is finally located at the origin. While some of the trajectories from the model with c = 0.5 can create a unimodal distribution of choice in the area of partial preference for the majority alternative, maintaining this constant bias is unlikely. For all other values of bias, the dynamical model exhibits exclusive preference. For c > 0.5, there is a preference for the majority alternative, that is, there is a stable equilibrium on the B1 axis (upper three panels in Figure 19), and for c < 0.5, the preference switches to the minority alternative. As k increases, the stable equilibrium moves to the origin. Thus, Herrnstein's model cannot account for the data.
However, Davison and Hunter's (1976) and Navakatikyan's (2007) models had stable equilibria in the area of partial preference for fr < .33 for some range of parameter values (see left panel of Figure 12 for a similar phase portrait). In other words, the equilibria are located where the major peaks of fr distributions were observed for 6 of 8 of Horner and Staddon's (1987) pigeons. The value of reinforcer bias (c) affects the position of the equilibrium. For c > 0.5, the equilibrium is biased toward majority alternative (fr < 0.5), for c < 0.5 it is biased toward the minority alternative.
While it is difficult to assess the accuracy of models in the way similar to the other studies investigated here, we present some summary indication of performance in Table 1. We conservatively denote success in terms of major peaks in choice distribution that are successfully described by a model without bias. Under this approach, Herrnstein's (1970) model accounts for none of the data, while the other models account for the obtained fr distributions in 6 of 8 pigeons.
Three dynamical models were constructed from Herrnstein's (1970), Davison and Hunter's (1976) and Navakatikyan's (2007) LOE equations. The idea behind the development of the dynamical models was linearization – the assumption that average behavior changes linearly in proportion to the difference between the current state of behavior and the state that is an equilibrium given current reinforcer rates. In a similar way, Hull described changes in habit strength in time as being in linear proportion to the difference between present habit strength and physiological maximum under current conditions (Hull, 1943; Spence, 1942). This idea circumvents the necessity to write the primary differential equations. We simply assume that the steady-state LOE equations are descriptions of equilibrium behavior. Behavior is attracted to this equilibrium.
There is a direct analog between our dynamical modeling and short-term and long-term expectancies of reinforcement that drives operant behavior in models such as, for example, Dragoi and Staddon's (1999) acquisition-extinction theory. They suggested that, when short-term expectancy is greater than long-term expectancy (i.e., when reinforcement increases), the strength of operant responses increases, and vice versa. The short-term expectancy is equivalent to the behavioral measure given the current (short-term) reinforcer rate in our model, while long-term expectancy is equivalent to the current behavior rate in our model. The principle of using a steady-state formulation for the law of effect as the basis for dynamical models has the advantage of allowing a test of the behavior of molar models at the local level. What it lacks, though, is a prediction of the fluctuations that allow for sampling. As a result, in Vaughan's (1981) melioration experiment, we resorted to using an additional pulse applied to behavior in order to leave the area of one equilibrium and to start the transition toward another. This problem can be avoided if a generator of random responses were added to the model. But the advantage of not using a random generator is the possibility of fitting a model to the data without resorting to multiple simulations.
The dynamical models based on Navakatikyan's (2007) formulations for the law of effect were preferable in terms of their accuracy of description, though for some schedules they performed on par with other models (see Table 1). The accuracy of the descriptions of the dynamics and equilibria based on Navakatikyan's model was also generally high for all analyses. It is notable that the seemingly lower overall values of VAC for Herrnstein and Vaughan's (1980) and Vaughan's (1982) equal local reinforcer-rate data were caused by the nature of the behavioral measures—behavior had little variability and tended toward indifference, providing little variation to be explained.
All models performed equally well in describing the data from the single-key experiment with negative feedback function (Vaughan & Miller, 1984, Experiment 1) but, in this case, all models reduced to a similar and more simple form. Thus, the relative advantages of Davison and Hunter's (1976) and Navakatikyan's (2007) LOE equations arose principally in multi-alternative choice, where Davison and Hunter's and Navakatikyan's LOE models performed better than Herrnstein's (1970), apart from Vaughan's (1981) melioration experiment in which Herrnstein's and Davison and Hunter's models were equivalent.
Navakatikyan's (2007) model performed better than Davison and Hunter's (1976) in describing Herrnstein and Loveland's (1975) data, Mazur's (1992) data, and Vaughan's (1981) data. The principal difference between the dynamical models is in describing Mazur's data (Figure 9). Unlike Navakatikyan's model, both Davison and Hunter's and Herrnstein's LOE-based models did not allow different time graphs for preference when the ratio of reward probabilities for different alternatives was the same (Figure 9). Mazur's data are particularly challenging for many other models (see the Introduction, and Dragoi & Staddon, 1999, p. 36), but they are described by Dragoi and Staddon's acquisition-extinction theory. The accuracy of description of Navakatikyan's model was considerably higher than that of Dragoi and Staddon's model (VAC = 95%, Table 1, versus 63%, calculated from Dragoi and Staddon's Figure 11). Nevertheless, we need to be cautious about this difference, as we are unsure to what extent they optimized the parameters of their model. Nevertheless, there was a difference—while the model based on Navakatikyan's LOE equation predicts that the time graphs for Mazur's data will converge to some stable-state values of response rates allocated to both alternatives, acquisition-extinction theory predicts exclusive preference beyond the time boundaries of the data (Dragoi, personal communication, 2008). Whether or not Mazur's data would have converged to exclusive preference if the length of sessions had been prolonged is difficult to predict, though the figures presented by Mazur suggest stabilization, rather than exclusive preference (see our re-creation of the data in the upper left panel of Figure 9). Nonexclusive stabilization is also suggested by the concurrent VR VR data of Herrnstein and Loveland—in most conditions, exclusive preference was not reached even after considerable training.
Surprisingly, no model had difficulty describing Mazur and Ratti's (1991) data. Even Herrnstein's (1970) model predicted data that were close to those observed (Figure 10), with an accuracy higher than that of acquisition-extinction theory (Table 1, VAC = 83% versus 57%, calculated from Dragoi and Staddon, 1999, Figure 12).
As was mentioned in the Introduction, the other forms of enhancing-component function suggested by Navakatikyan (2007) were investigated, in particular, the bounded exponential and unbounded power functions (Fenh = B max (1-e−bR) and Fenh = BRb). However, these models did not perform systematically differently compared to the hyperbolic model used here. Thus, we cannot select the hyperbola as the sole representative for our model on the basis of the data considered here. Nevertheless, there are indications from single-key VI schedules in rats (McDowell & Dallery, 1999) that the hyperbola performed better than both a bounded exponential function of the same form, and a bounded power function that can be expressed as Fenh = B max (1−(R+1)b).
To account for different number of free parameters when comparing the models, we used the Bayesian Information Criterion (BIC), which is a statistic combining accuracy of fit with a penalty for the number of model parameters. Yet, this might not be sufficient. It has been shown that quantitative models with the same number of free parameters differ in the flexibility with which they are able to describe data (Myung, Balasubramanian, & Pitt, 2000; Pitt, Kim, & Myung, 2003). Myung et al. compared two 2-parameter psychophysical models: y = axb (Stevens' 1957 model) and y = a ln(x + b) (Fechner's 1860 model). When artificial data were generated from Stevens' and Fechner's models with the addition of random noise, they were recovered differently using information criteria, in particular, by BIC. If the data were generated from Stevens' model, then Stevens' model was always chosen as the better. However, if data were generated from Fechner's model, then Steven's model was still chosen on 67% of trials.
We decided to check whether BIC was an adequate criterion to distinguish between Herrnstein's (1970), Davison and Hunter's (1976), and Navakatikyan's (2007) LOE equations. We generated 50 sets of data for each of three LOE equations in unbiased form, that is, with c = 1. The parameters of the models (Appendix B, Table B1), as well as the set of 12 pairs of R1 and R2, were taken from the results of modeling data from Herrnstein and Loveland's (1975) experiment. We created 12 values of B1 for each of 50 × 3 datasets by adding random, normally distributed noise of approximately 15% of the variation in B1. If negative values of response rate were generated, they were truncated to zero. The values of residuals in the best model for Herrnstein and Loveland's (1975) data (Navakatikyan's model, Appendix B, Table B1) were normally distributed according to D'Agostino's K2 test statistic (D'Agostino, Belanger, & D'Agostino, 1990).
All datasets were optimized using the three LOE equations. Herrnstein's (1970) and Davison and Hunter's (1976) equations were compared pair-wise with Navakatikyan's (2007). The best model was chosen according to whether the difference in BIC exceeded 6. We found that if datasets were generated by Herrnstein's equation, Herrnstein's equation was chosen by BIC as better than Navakatikyan's in 18 out of 50 cases; in 1 case Navakatikyan's equation was chosen as better. In the remaining cases, the BIC difference was less than 6. If datasets were generated using Davison and Hunter's equation, Davison and Hunter's equation was chosen by BIC as better than Navakatikyan's in 27 out of 50 cases, while only in 2 cases was Navakatikyan's equation chosen as better.
If datasets were generated by Navakatikyan's (2007) equation, Herrnstein's (1970) equation was better than Navakatikyan's in just 1 case, while Navakatikyan's equation was chosen in 25 cases out of 50 as a better description than Herrnstein's. On these data sets, Davison and Hunter's (1976) equation was chosen over Navakatikyan's in just 2 cases, while Navakatikyan's equation was chosen over Davison and Hunter's equation as the better in 28 of 50 cases. In summary, though this simulation is limited, we can conclude that Navakatikyan's LOE equation appears not to have higher flexibility than the competing LOE equations, and that the use of BIC is supported as an effective analysis tool in this case.
Thus, the superiority of Navakatikyan's (2007) model in describing the data used here was not due to it being more flexible than the other models considered.
All models had an equilibrium state located where it was observed for the single-key experiment with negative slope (Vaughan & Miller, 1984, Experiment 1). In two-alternative procedures, all models behaved similarly for Vaughan's (1981) melioration experiment. It seems that the feedback function used in this experiment will assure that almost any LOE model will have equilibria in the same areas as reported, namely, in the area of low fr for Condition a, and the area of high fr for Condition b.
In all other two-alternative schedules investigated here, the dynamical model based on Herrnstein's (1970) LOE equation was stable in the area of nonexclusive preference only for some unique conditions that are unlikely to occur in real data. For concurrent VR VR schedules, the model had equilibria for response rates greater than zero on both alternatives only if N1 = cN2. In this case only, the equilibrium is a straight line that depends on initial conditions (Figure 11). For experiments with equal local reinforcer rates (Herrnstein & Vaughan, 1980; Vaughan, 1982, Experiment 3), equilibrium in areas of positive response rates is reached only when c = 1 (Figure 17, right panel). For the experiments with constant-ratio unequal local reinforcer rates (Horner & Staddon, 1987, Experiment 2), equilibrium in the area of positive response rates is reached only when c = 0.5 (Figure 19). The dynamical models based on Davison and Hunter's (1976) and Navakatikyan's (2007) LOE equations had stable equilibria in the area of positive response rates on both alternatives for some range of model parameters. Thus, they predict the absence of exclusive preference, as was observed in the majority cases under consideration.
In summary, only Navakatikyan's (2007) model described the observed behavior in all cases, and in general it described them more accurately. Davison and Hunter's (1976) model was a close second, but did not describe Mazur's (1992) data effectively. As has been mentioned, there was no significant difference in performance of Navakatikyan's models if power, exponential and hyperbolic functions were used as enhancing-component functions. Thus, we cannot make a choice between them. Moreover, we do not believe that a choice between these models is important, as their success is probably produced simply by the structure of the model, the product of two-component functions.
The experiments with negative-slope feedback functions (Vaughan & Miller, 1984) were originally explained using a response-strength account, rather than an optimization account, as the overall reinforcer rate was obviously not being optimized in the study. “…it seems plausible to assume that reinforcement simply increases the tendency to respond, independent of the fact that the increase in response rate drives down the rate of reinforcement.” (Vaughan & Miller, p. 346). The success of all three LOE-based dynamical models considered here is evidence that this is the case.
Originally, nonexclusive preference in VR VR schedules (Herrnstein & Loveland, 1975) was discussed in terms of interaction between some maximizing process and matching. It was assumed that maximizing would result in exclusive preference, were it not for an additional tendency to minimize deviation from matching. Melioration also predicts exclusive preference on concurrent VR VR schedules. But independent concurrent VR VR schedules in transition (Mazur, 1992; Mazur & Ratti, 1991) produced results incompatible with most of the dynamical models, except Dragoi and Staddon's (1999) model. Mazur's (1992) model (his Equations 1 to 3) produces patterns of preference very similar to those observed, but was not designed to predict absolute response rate.
We converted the Mazur (1992) model from a stochastic into continuous function and ran optimizations against the preference data reported in this paper and by Mazur and Ratti (1991). According to the model, the value (V) assigned to an alternative increases with each reinforcer by r(1−V) and decreases with each nonreinforcer by nV, where r and n are constants and V is bounded by 1. The average change in value is: ΔV = pr(1−V) − (1−p)nV, where p is the schedule probability of reinforcer such that p = 1/N, where N is the responses per reinforcer on the VR schedule. Optimization of V1/(V1+V2) gave VAC = 86% for Mazur's data, and 78% for the Mazur and Ratti's data. However, in both cases the parameter r reached a value of 10−8, to which it was constrained, and the values of alternatives (V1 and V2) were unrealistically low in the range 10−5 to 10−8, far from the maximum of unity.
Thus, the models based on Navakatikyan's (2007) LOE equation predict both absolute response rates for the data of Herrnstein and Loveland (1975) and changes in preference in the Mazur (1992) and Mazur and Ratti (1991) data. Neither matching, maximization, nor melioration (Vaughan, 1985) is needed to describe behavior in concurrent VR VR schedules.
Another result from modeling of Herrnstein and Loveland's (1975) data is worth mentioning. While all three steady-state models can be fitted nondynamically to these data accurately, with VACs in the range 94% to 96%, dynamical modeling discriminates between them, with Navakatikyan's (2007) model outperforming the others.
Neither matching, not a simple maximization of the reinforcer rate, can explain the melioration data reported by Vaughan (1981). However, we demonstrated that all three dynamical models based on LOE equations can indeed describe these results. The explanation is that all models considered here are similar in terms of local dynamics: an increase in local response rate if one local reinforcer rate is increased and the other local reinforcer rate is kept constant. Thus, we can suggest that the original explanation of melioration as a by-product of the law of effect was correct. “If,... we assume that the strengthening of responses in one direction, and/or their weakening in the other, leads to a shift (because of these changes of strength) in the distribution of behavior such that relatively more time is spent in the locally better situation, melioration (and by implication matching) may be viewed as the outcome of the relative strengths of changeover responses within choice situations” (Vaughan, 1981, p. 148, see also Vaughan, 1982). It is worth mentioning that the dynamic model based on Herrnstein's (1970) LOE equation fits the original requirements for melioration dynamics: Local response rate follows local reinforcement rate, and choice converges to strict matching. Unfortunately, the dynamical model based on Herrnstein's LOE equation did not always perform well, and predicts exclusive preference in experiments where this result was not observed—such as experiments with equal and constant-ratio unequal local reinforcer rates (Herrnstein & Vaughan, 1980; Vaughan, 1982, Experiment 3; Horner & Staddon, 1987, Experiment 2), as well as in most conditions of Herrnstein and Loveland's (1975) study.
As we showed in the Introduction, the major difference in the structure of Herrnstein's (1970) and Davison and Hunter's (1976) LOE equations in comparison to Navakatikyan's (2007) LOE model is in the way that reinforcers from other than current response alternatives decrease behavior. Navakatikyan's LOE equation assumed non-competitive inhibition, whereas Herrnstein's and Davison and Hunter's models depend on competitive inhibition (Killeen, 1982, 1994; Staddon, 1977). In the latter, responses compete for available time. The success of Navakatikyan's model in describing the datasets considered here does not favor the competitive inhibition. Indeed, in a series of experiments, Catania (1969) showed that signaling reinforcer availability on one alternative of equal concurrent VI VI schedules (thus providing more time for the alternative response to occur) did not increase response rate on the other alternative. However, if one alternative was changed to extinction, the response rate on the other alternative did increase. Both extinction and reinforcer signaling dramatically decreased response rate on the alternative on which it was arranged. Thus, response inhibition in concurrent VI VI schedules is caused by alternative reinforcers, and not by alternative responses competing for available time, supporting the approach taken by Navakatikyan.
In terms of model parameters, Herrnstein's (1970), and Davison and Hunter's (1976) LOE equations imply that maximal response rate (Bmax) remains constant when behavior on other alternatives is reinforced, while Navakatikyan's (2007) LOE model implies that Bmax decreases in accordance with a reducing-component function of other reinforcers. Navakatikyan's assumption is consistent with the multivariate rate equation (McDowell, 1980; McDowell & Kessel, 1979), which also predicts an increase in Bmax with increases in reinforcer magnitude. This result was demonstrated for varying sucrose concentration solutions as reinforcer by Dallery, McDowell, and Lancaster (2000), and for varying water deprivation by McDowell and Dallery (1999). Similarly, Hull (1943) considered the effect of reinforcer magnitude on physiological maximum of habit strength (M') as a negatively accelerated exponential function, M' = M (1 − e−kw), where M is the physiological maximum of habit strength under optimal conditions, w is the magnitude of the reinforcing agent, and k a constant.
In conclusion, the linearization principle for building dynamical models proved to be a feasible approach to assess models in relation to data. As a dynamical system, the two-component functions molar model for the law of effect suggested by Navakatikyan (2007), based on the principle of noncompetitive inhibition, performed better than models based on Herrnstein's (1970) and Davison and Hunter's (1976) LOE equations. It accurately described the behavioral dynamics in experiments with negative-slope feedback functions (Vaughan & Miller, 1984), in concurrent VR VR schedules (Herrnstein & Loveland, 1975; Mazur, 1992; Mazur & Ratti, 1991), in Vaughan's (1981) melioration experiment, and in experiments with equal (Herrnstein & Vaughan, 1980; Vaughan, 1982), and constant-ratio unequal (Horner & Staddon, 1987; Staddon, 1988) local reinforcer rates. In all these experiments, Navakatikyan's law of effect formulation was shown to be an adequate explanatory principle. Further research will be needed to discover the generality of this approach.
We acknowledge the contributions to this work by Mark Stewart, James Sneyd, Wiremu Solomon and Brian McArdle.
where B is the response rate, and Bmax is the maximum response rate constant; R and 1/k are the reinforcer rate and reinforcer-rate constant in reinforcers per hour; t and N are constants for VI and FR schedules, respectively.
To find an equilibrium state, we have to solve Equations A1 and A2 simultaneously. Care has to be taken with the units of the numerical values of parameters to be compatible with units of reinforcers. B and Bmax have to be expressed, for example, in responses per hour, and t then has to be expressed in hours per reinforcer. Once a solution is found, response rate can be reconverted to the usual dimension of responses per minute.
Equation A2 simplifies to:
Solving Equations A1 and A3 for B gives:
which transforms into the quadratic Equation A4:
Equation A4 has two positive roots, the smaller of these two being relevant to the problem.
If we designate:
then response rate at the stable equilibrium is:
where B is in responses per hour.
Reinforcer rate in the stable equilibrium is obtained by substitution of B from Equation A5 into Equation A1, taking care to express B and Bmax in the same units:
|Law of effect equations||Model parameter values & accuracy|
|Davison & Hunter, 1976||80.8||2.62||0.34||0.025||94.6||1.8|
The general form of a difference equation for our models is Equation 2:
where B*i+1 and B*i are the next and current value of behavior; B is behavior at the steady state; Δt is time step, kt is a dynamic constant. At equilibrium B = B*.
Herrnstein's (1970) LOE equations are the same as Equations 6 and 7:
Substituting the feedback function for reinforcer rate, or R = B/N, where N is the VR schedule constant:
There are three obvious equilibrium solutions related to the axes. The first one is B1 = 0, B2 = 0, and it is unstable. Second and third are: B1 = 0, B2 = Bmax − kN2 and B2 = 0, B1 = Bmax − kN1. One of them is a stable equilibrium and is related to a rich alternative; the other one is unstable. These hold for N1 ≠ cN2 and is shown by a phase portrait (Figure 11). The condition for equilibrium with B1 > 0 and B2 > 0 can be derived by dividing Equation B1 by B2:
which simplifies into N1 = cN2. This equilibrium is actually not a point, but a line connecting two equilibria on B1-B2 axes. The system starts with some initial condition, then it moves to the equilibrium line preserving the initial ratio of B1B2 (see Figure 11). The value of the bias cannot be sustained, so this equilibrium cannot be observed in real behavior.
As in the model considered above, we use Equations 17 and 18 as steady-state LOE equations:
and, after substituting feedback function R = B/N, we obtain a pair of equations for the dynamical system:
For the further analysis we set c = 1, as its value is absorbed by N1, and can be disregarded for simplicity. We can recover bias from the solutions by substituting back N1 for cN1.
There are three solutions located on response rate (B1 and B2) axes. The first is B1 = 0, B2 = 0. The second and third are solutions for B1 = 0 or B2 = 0. If B1 = 0, then B2 = Bmax − kN2. If B2 = 0, then B1 = Bmax − kN1. For the fourth and fifth solutions a quadratic equation has to be solved. Equations B3 and B4 transform into:
Substituting Equation B6 into Equation B5 we can solve quadratic Equation B7 for B1. Once B1 is known, B2 is found from Equation B6. Omitting intermediate stages we have the following quadratic equation to solve for B1:
where coefficients a, b, c (valid only for Equation B7) are as follows:
Equation B7 has always one positive (B1 > 0, B2 > 0) solution, which can be stable or unstable (see Figure 12).
|Law of effect equations||Model parameter values & accuracy|
|Bmax||k||a or kred||c, bias||kt||B||PL or P|
|Herrnstein and Loveland,1975|
|Herrnstein, 1970||131.8||1.03||-||0.99||0.079||58.5||44.9 x||−54.0||-|
|Davison & Hunter, 1976||146.6||0.65||0.76||0.81||0.342||88.9||16.5 x||80.5||-|
|Herrnstein, 1970||138.0||0||-||1.00||0.229||-||-||78.8||156.3 x|
|Davison & Hunter, 1976||145.5||0||0.50||0.96||0.666||-||-||91.2||64.3 x|
|Mazur & Ratti, 1991|
|Herrnstein, 1970||150.0||1.64||-||1.03||0.114||-||-||83.1||28.0 x|
|Davison & Hunter, 1976||150.0||0.00||0.67||1.04||0.334||-||-||91.2||2.7|