Recent insights into the neural mechanisms of decision-making have come from investigations in behavioral economics. Participants typically decide between limited numbers of options differing in probability, risk, and amount of reward(1
). Despite their success in explaining the choices animals make(2
) the optimal foraging models of ecology have had little impact on cognitive neuroscience(4
) or economics(5
). The key foraging choice is usually not a binary one between currently available options, instead it is whether or not to engage with options as they are encountered(2
). It depends not just on 1) the value of the option encountered (encounter value
) but also on estimates of 2) the environment’s average value (search value
) and 3) the cost of leaving to forage for alternatives (search cost
). We used functional magnetic resonance imaging (fMRI) to examine the neural mechanisms mediating foraging.
Human participants made foraging-style choices (forages
) to either engage
with current options of known value or search
among a set of potential alternatives also of known value. All the stimuli were drawn with replacement from a set of 12 with values, learned in a previous session (SOM.1.2). Pre- and post-scanning checks and analyses of choices during scanning confirmed value retention(Fig.S7
). Two visual stimuli indicated reward magnitudes potentially available if the subject engaged (their weighted combination constituted the encounter value
; SOM.Equations.2-4, Fig.S2
). Rewards were points that translated into money on experiment completion. Six additional boxed stimuli indicated the values of the potential alternatives (search value
). Choosing to search entailed a risk of paying a search cost
(high, mid or low) in loss of points indicated by box color. If the subjects engaged they went on to make a comparative decision
between the two components that constituted the encounter option, after being informed about their associated reward probabilities(). The introduction of probability information ensured decisions could only be made at this point and that forages and decisions were separated in time. When participants chose to search, new options drawn at random from the boxed alternatives were encountered. Participants searched as often as they wished but risked the same costs each time.
Fig.1 A) Trials started with two central stimuli (encounter value), six alternative stimuli (search value) in a box at the top (drawn from a set of 12 learned in a previous session) while box color indicated current potential search cost. The horizontal bar (more ...)
Logistic regression identified factors weighing on forages and decisions. Engaging was promoted by search costs and encounter values but retarded by all components of search values (, Fig.S7
). Participants were biased against search, requiring objectively more value gain for searching than engaging (the constant from the regression reflects subjects’ biases against searching; we call this parameter forage readiness
). Decisions were influenced by reward probability and magnitude differences between options().
Comparison of average activity during foraging and decisions identified ACC among other regions(). Usually in decisions, the most common signal observed in ACC is inversely related to the value difference between chosen and unchosen options. Such inverse value difference effects have been interpreted as indicating that ACC/dorsomedial frontal cortex is a “comparator” comparing choice values. According to this theory the region is more active when unchosen values are larger because a smaller difference between chosen and unchosen values means comparison takes longer before a choice is made(6
). Related accounts emphasize an ACC role in monitoring for conflict between responses(8
Fig.2 ACC activity was higher in forages than decisions (A), better related to the inverse value difference (VD) during decisions than foraging(B), reflected the main effect of search value during foraging (C), and better related to search VD than decision (more ...)
However, our task also allowed us to test whether the ACC signal reflects the relative benefit of the alternative course of action or the value of exploring the environment. This hypothesis predicts that ACC, during forages, will stop reflecting the value of the unchosen option, and always represent the value of searching. We therefore refined the analysis (SOM.1.5) and tested for a region that demonstrated both of these effects: Coding for the unchosen-chosen value difference during decisions but not forages(), and, on forages, instead coding for the search value(). Both tests identified overlapping ACC regions. When these two effects were combined into a compound test (forage(search value-encounter value)-decision(chosen value-unchosen value)) the same ACC region was implicated().
We analyzed foraging signal time courses in a region centered on the overlap between foraging search value and decision value difference effects(). ACC BOLD was positively correlated with the value of searching the environment, and negatively correlated with the value of engaging with the current encounter option, regardless of the choice participants ultimately made(). The frame of reference in which values are encoded in ACC is thus fixed in relation to response strategy, searching or engaging. This contrasts with vmPFC and other regions where value is encoded in a flexible reference frame tied to the choice taken or attended(9
). Comparing search value signals in ACC, we found a more rapid increase (greater slope) on search than engage choices [t(17) = −2.54,p = 0.021] consistent with earlier, stronger signals in search decisions (Fig.S8
) and faster accumulation of search evidence in ACC on search choices(4
). In search choices there was also an effect of search cost().
We next examined whether individual differences in ACC activity reflected differences in foraging. Both, behavioral variation in the influence of search value in promoting searches was correlated with neural variation in ACC search value effects() and behavioral differences in the influence of the lowest and highest alternative values were correlated with ACC activity(Fig.S5
). While average search value determined search choices() it did not predict the rate at which participants repeatedly searched again and again in pursuit of the best alternative on each trial. Such perseverative search rates were, however, predicted by ACC responses to best alternatives(). Finally we looked at the decision phase; ACC activity still reflected the search value from the prior forage, as if still encoding how good it would be to search for alternatives(). Brain activity conveyed knowledge of environmental richness even during simultaneous binary decision-making when the signal was no longer relevant. Knowledge of environmental richness, which is normally pertinent to foraging but irrelevant to binary decision-making, impinges on, and impairs, simultaneous binary decision-making in behavioral experiments(5
Despite their limitations(11
) and alternative explanations of reward- and error-related activity in ACC(8
), conflict and comparator-based theories remain the most influential accounts of decision-related activity in ACC. However, the presence of an average reward signal(search value), a negative effect of search cost, anchoring of value representations with respect to search/engage strategies, differential rates of search signal accumulation on search and engage trials, and correlation, across subjects, between ACC signal variance and search choice variance(,Fig.S5
) cannot be accommodated within comparator- and conflict-based ACC theories. Instead we suggest ACC codes the value of switching to a course of action alternative to that which is taken or is the default. ACC supplies such a signal even when subjects are not asked to forage but to make decisions. As soon as the subject switches to the alternative the signal dissipates but it is maintained if the course of behavior is maintained (compare red lines 2f versus 2e,h).
VmPFC encodes the value of chosen/attended options in comparison to unchosen/unattended options(9
). During foraging, however, vmPFC activity only reflected the chosen option value when participants engaged and there was no representation of search value(). When subjects searched, the chosen search value was actually negatively correlated with vmPFC activity and there was no representation of encounter value. The absence of any representation of search value – the average value of the environment – and of search cost() restricts any role vmPFC might play in foraging.
Fig.3 (A) VmPFC time courses during forages (conventions as ). (B) Activity better related to decision VD than to forage VD. (C) VmPFC time course for engage forages and the subsequent decision phase (conventions as ). D) Individual peak vmPFC BOLD (more ...)
In contrast, seconds after foraging vmPFC played an important role in decisions. Comparison of average activity during decisions and forages and between decision and forage value differences(decision(chosen value-unchosen value)
-forage(chosen value-unchosen value)
) identified vmPFC (). It coded, negatively and positively, for values of unchosen and chosen options respectively. It effectively encoded the value difference between options. During the transition from foraging to decisions, vmPFC rapidly changed from positively encoding both components of encounter value, weighting both in the same way as participants did behaviorally(Fig.S4
), to representing the value difference between chosen and unchosen components in decisions (). The reference frame in which values are encoded in vmPFC is thus flexible and concerned with the value dimensions and contrasts most pertinent to decision-making. Such a reference frame makes vmPFC suitable for goal-based(14
) and multi-attribute(15
) decision making. Its importance during decisions was underlined by individual variation in vmPFC reward magnitude effects being correlated with decision accuracy ().
Reward prediction error signals associated with the ventral striatum, and its interactions with orbitofrontal cortex(16
), allow decision-making to change with experience. They occur even when there is little opportunity for learning(17
) as in our task. We therefore examined whether forage prediction errors were also encoded by the striatum(Fig.S3
) and its interactions with the ACC. Despite its weak activation with search value it exhibited post-search prediction error-like signals (positive effect of new encounter value, negative effect of previous search value:). It also responded to search costs(). The prediction error response had higher positive peaks in people who searched less (as if they had expected less:). Across subjects, search costs activated striatum in proportion to the degree that they deterred searching ().
Fig.4 Ventral striatal time courses after feedback following search forages (A). Effect of search costs when search is chosen (B). Individual peak BOLD β-weights for new encounter value (Ci) and peak BOLD β-weights for new search costs on searching (more ...)
An ACC region overlapping with, but anterior to, the search value effect() was more coupled with left ventral striatum when search costs increased and search was chosen (). The coupling appeared related to disinhibition of effortful choices because the same ACC region was also more active in subjects more willing to overcome costs; individual differences in foraging readiness were associated with increased anterior ACC activation().
VmPFC and ACC have been thought to operate in sequence during choice (6
) but our results suggest ACC represents choice in a manner at odds with intuitions of how comparative decisions are made. Because ACC value representations are anchored to response strategy (engage/search), our results confirm it is well placed to guide response selection. However, the different signals in ACC and vmPFC attest to independent roles in forages and decisions. The implication of ACC in foraging and encoding of the average value of the foraging environment may facilitate understanding of the reward signal it carries(12
), its prominence during exertion of effort(20
), in go-no-go decisions(22
) and in representing alternative and counterfactual choice values(25
). Some action value learning tasks previously used to investigate ACC(12
) may have been treated as foraging tasks and animals may have been choosing whether to stay with the current choice or switch to an alternative. Such a perspective also makes it possible to reinterpret ACC activation recorded during exploration tasks (24
) as reflecting estimates of richness of alternatives in the environment. ACC activity is frequently recorded(27
) and might reflect the value of alternative choices in other tasks and the inclination to refrain from engaging in the currently offered choice(28
). Foraging entails energetic costs and we found ACC activity also reflected the cost of foraging. ACC neurons have been shown to encode value signals that integrate both cost and reward(29
). By contrast, vmPFC, a primate specialization(30
), may underpin fine-grained, accurate, and flexible decision-making(6
One sentence summary: Humans, like other animals, have evolved to forage and anterior cingulate cortex, unlike any other brain area, contains three signals predicted by foraging theory; it signals the average richness of the environment, the cost of foraging, and the value of each choice option encountered in a reference frame invariantly tied to the foraging decision.