Rats were trained and tested in a task () that mirrors critical performance aspects of the BART task used in human beings. Across sessions, the probabilities that each add-lever press resulted in an increase in reward cache size or to trial failure were orthogonally varied between sessions and were never altered within a single session.
Figure 1 Trial schematic for the decision making under risk task. At trial onset, rats have an opportunity to accept risk to seek larger rewards by responding on an ‘add' lever or to avoid risk and get access to that reward already earned by responding (more ...)
We first examined whether responding was independently sensitive to the levels of risk and the reinforcement rate imposed in the task. A group of modestly trained rats (~2–4 weeks of daily training on risk and no-risk contingencies; n=81) were tested in sessions in which risk and reinforcement probability were independently varied in a within subjects design. In the 0% risk conditions, no trials ended in failure, but as risk increased from 11.1 to 16.7%, the number of failed trials increased dramatically (; main effect of risk: F(1,80)=129.6, p<0.0001). The effect of reinforcement probability reached the trend level (F(1,80)=3.4, p=0.07) and did not interact with risk (F(1,80)=0.8, p=0.3).
Number of Failed Trials Under Different Reinforcement Schedules (50–100%) and Risk Functions (0, 11.1, or 16.7%)
Regarding the average number of add-lever presses made per trial (), ANOVA revealed significant main effects of risk (F(2,87)=64.6, p<0.0001) and reinforcement probability (F(1,80)=20.3, p<0.0001), along with a significant interaction between these two factors (F(2,153)=4.3, p=0.02). These effects were mediated by the fact that increasing risk dramatically decreased mean add-lever responses made per trial, and lower probabilities of reinforcer accrual were associated with higher responding per trial. Moreover, the increase in responding associated with reduced probability of reward accrual was greater when risk was low, meaning that rats dynamically increased reward-seeking behavior to a larger degree when it was associated with no chance of reward loss. Though four separate cohorts of rats were tested to generate the dataset reported above (n=81 subjects), this large sample size was not required to statistically justify these conclusions, as these relationships were observable and significant in each of the three cohorts that were independently powered to find such an effect (ie the three of the four cohorts that made up n=24 subjects each).
Figure 2 Risk sensitivity of performance. (a) Whether reinforcement probability was 100 or 50% per add-lever press, increasing risk led to decreased average responses per trial. (b) Free-choice behavior in probe trials intermixed with standard trials during (more ...)
A decrease in the average number of add-lever responses made per trial under increasing risk conditions could be accounted for solely by the increase of failures on trials that would have contributed higher response values, truncating the resulting distribution. To address this, we tested a subset of rats under conditions wherein up to 10 probe trials were intermixed with 50 standard risk trials in a pseudorandom manner (roughly every 4–6 trials); during probe trials, rats were able to respond as many times as they chose before cashing out. This analysis provided a direct measure of the rats' voluntary responding in the session. Risk and reinforcement probability were again varied. Under these conditions, ANOVA revealed a significant main effect of risk (F(1,21)=3.7, p=0.02), with no main effect of reinforcement probability or risk × reinforcement probability interaction (both Fs<0.6). shows that, under the 50% reinforcement condition, rats voluntarily responded less, meaning that their behavior was risk sensitive in the expected manner.
Sensitivity to Variations in Reinforcement Probability
A recent study (Bornovalova et al, 2009
) indicates that risk aversion increases as the payoff per inflation increases, leading to the hypothesis that subjects may target a desired prospect and accept only the amount of risk necessary to achieve that prospect. Consonant with this hypothesis, decreasing the probability that each add-lever press led to an additional pellet being earned (100 vs
33%) significantly increased the mean number of presses made per cash-out trial (; main effect of reinforcement probability: F(2,160)
<0.005; post hoc
comparisons by two-tailed paired t
-tests: 100 vs
=0.04; 100 vs
=0.002). That being said, most subjects were still observably risk averse under partial reinforcement conditions, making fewer than the optimal number of responses per trial (50%: 2.9±0.1; 33%: 3.0±0.1) and earning a total reward allocation below that which they were able to eat. This indicates that their risk aversion was not solely because of the fact that they could earn enough pellets to become sated, even when avoiding risk.
Figure 3 Behavioral characteristics of performance showing sensitivity to reinforcement probability and risk. (a) The theoretical relationship between the mean responses per trial and the overall size of the reward cache earned is an inverted U, with maxima at (more ...)
Optimization of Responding
The inverted-U function in shows the theoretical relationship predicted by an ideal observer between average number of responses per trial and total reward earned in a session, as well as actual data from 81 adult male rats. Subjects were tested under conditions in which (1) the first add-lever press was risk free, (2) each subsequent response was associated with an 11.1% risk of trial failure, and (3) each add-lever press earned an additional pellet. These data indicate that subjects were relatively risk averse (), producing an average of 2.8±0.1 responses per trial that was less than the total reward earned if they would have made ~5 responses per trial. In this sense, rats are similar to human subjects in that they exhibit risk-averse profiles when performing the BART, producing fewer than the optimal number of responses (Bornovalova et al, 2009
) and earning less reward than is possible probably because of over-estimation of the risk associated with the task.
also suggests that, relative to the ideal, a significant number of rats earn fewer pellets than they should; however, these curves anticipate that subjects produce the same number of responses on all trials in a session, in a sense optimizing their responding. In reality, subjects exhibit individual differences in the variability of their responding, and we hypothesized that high intra-subject, intra-session variance in the number of add-lever presses made across all 50 trials—reflecting combinations of both higher than optimal and lower than optimal trial completions—could be the cause of their inability to maximize reward receipt, particularly under high-risk conditions. To empirically examine this point, we calculated the variance of responses made per trial and divided it by the mean (to account for the fact that variance will naturally increase as the mean does). We compared performance (both mean and variance/mean of the responses per trial) for animals experiencing task conditions in which the chance of trial failure was 0% (a completely ‘safe' control condition) or 16.7% (to impose considerable risk); the probability that an add-lever press would gain another pellet was held constant at 50%. Overall, both the mean and variance/mean measures were lower in the risk vs no-risk conditions (n=81; mean: t(80)=7.8, p<0.0001; variance/mean: t(80)=7.0, p<0.0001; two-tailed paired t-tests). The latter result suggests that subjects optimize or constrain performance as risk increases. Precisely as expected, stepwise regression shows that, when there was no risk of trial failure, the mean number of responses per trial was positively associated with pellets earned (adjusted r2=0.99, p<0.001; n=81; data not shown); in this case, the variability of responding was not significantly explanatory. However, in the condition in which there was a high degree of risk associated with each add-lever press, stepwise regression found that the variability of responding was negatively associated with pellets earned (; adjusted r2=0.15, p<0.001), with no variance being explained to a significant degree by the mean number of responses per trial. These data indicate that relatively high intra-subject, intra-session variability of responding could indicate relatively poor top-down control over behavior.
To test the idea that high mean and/or high variance responding may be primarily attributable to an incentive process (‘wanting' of the pellets), we conducted an experiment in a set of rats that were tested as usual or after pre-feeding with reinforcer pellets. The probability that each add-lever press was reinforced was 100%, and the risk of trial failure was 11.1% per add-lever press. Pre-feeding significantly reduced the mean number of responses made per trial (; n=16; t(15)=2.3, p=0.03 by two-tailed paired t-test), but had no effect on the variability of responding (; t(15)=−1.3, p=0.22; two-tailed paired t-test). These data further support the notion that different psychological processes mediate these two dependent measures, with the average number of presses being related to an incentive process, while response variability is not.
Mean Add-Lever Responses Per Trial and Response Variability Under Pre-fed or Standard Testing Conditions
Frontal Cortical Contributions
We next explored the function for distinct frontal cortical brain regions in aspects of risk-related responding, given recent results indicating that frontal regions contribute to a network of fore- and midbrain structures involved in decision making about risk and effective behavioral control during reward seeking (Rao et al, 2008
). Rats (n
=29) performed a version of the task in which reinforcement probability was fixed at 33% per add-lever press (to promote higher responding), but in which risk was varied across sessions (0 vs
11.1%); they were surgically cannulated, recovered, and then trained. Microinfusions of a GABA agonist cocktail (0.03
nmol of muscimol and 0.3
nmol of baclofen in 0.5
μl of vehicle; MUS+BAC) to transiently inactivate neural activity, or vehicle microinfusions, were made into either the mPFC—the functional analog of the human dorsolateral prefrontal cortex (Preuss, 1995
; Brown and Bowman, 2002
)—or into the OFC; the anatomical location of each infusion is shown in . When considering the mean number of responses per trial, omnibus ANOVA revealed a significant interaction between brain region (mPFC vs
OFC) and infusion (vehicle vs
MUS+BAC) (; F(1,27)
=29); this interaction was due to the fact that inactivation of the OFC elicited a significant reduction in the mean number of presses per trial (; 11.1% risk: t(14)
=0.003; 0% risk: t(14)
=0.06 by two-tailed paired t
=15), whereas suppression of neural activity in the mPFC failed to affect this measure (11.1%: t(13)
=0.58; 0%: t(13)
=0.98 by two-tailed paired t
=14). The opposite effect was found when examining the variability of responding after GABA agonist infusion into these brain regions (; brain region × infusion interaction: F(1,27)
=0.007). Infusion of the GABA agonist cocktail into the mPFC increased the variance of responding (; 11.1%: t(13)
=0.006; 0%: t(13)
=0.11), which was unaffected after inactivation of the OFC (; 11.1%: t(14)
=0.25; 0%: t(14)
=0.28). Neither infusion affected response latencies (data not shown).
Figure 4 Dissociable contributions of the mPFC and OFC on decision making under risk. (a) Infusion locations were centered on the ventral and lateral OFC or mPFC. (b) Inactivation of the OFC (top panel), but not of the mPFC (bottom panel) decreased the mean number (more ...)
Importantly, the change in variability of responding elicited by mPFC inactivation produced a suboptimal response profile in that we measured a significant, negative relationship between the variance measure and reward obtained under mPFC inactivation (; adjusted r2
=0.04 by simple linear regression), but not after vehicle infusion (; adjusted r2
=0.69). (A negative relationship between baseline response variability and pellets earned, such as the one described above, is generally only observable under high-risk conditions). These data show a clear double dissociation between mPFC and OFC, as they relate to goal-directed decision making under risk and indicate that a neural circuit involving the mPFC is implicated in coordinating adaptive responding under risk in the rat, providing direct evidence that this structure participates in voluntary behavioral control in rats, as well as in human beings (Hare et al, 2009