In stable environments, decision makers can exploit their previously learned strategies for optimal outcomes, while exploration might lead to better options in unstable environments. Here, to investigate the cortical contributions to exploratory behavior, we analyzed singleneuron activity recorded from 4 different cortical areas of monkeys performing a matching pennies task and a visual search task, which encouraged and discouraged exploration, respectively. We found that neurons in multiple regions in the frontal and parietal cortex tended to encode signals related to previously rewarded actions more reliably than unrewarded actions. In addition, signals for rewarded choices in the supplementary eye field were attenuated during the visual search task, and were correlated with the tendency to switch choices during the matching pennies task. These results suggest that the supplementary eye field might play a unique role in encouraging animals to explore alternative decision-making strategies.
Leathers and Olson (Reports, 5 October 2012, p. 132) draw the strong
conclusion that neurons in the monkey lateral intraparietal (LIP) cortical area
encode only cue salience, and not action value, during value-based
decision-making. Although their findings regarding cue salience are interesting,
their broader conclusions are problematic because (i) their primary conclusion
is based on responses observed during a brief interval at the beginning of
behavioral trials but is extended to all subsequent temporal epochs and (ii) the
authors failed to replicate basic hallmarks of LIP physiology observed in those
subsequent temporal epochs by many laboratories.
Although choices of both humans and animals are more strongly influenced by immediate than delayed rewards, methodological limitations have made it difficult to estimate the precise form of temporal discounting in animals. In the present study, we sought to characterize temporal discounting in rats and to test the role of the orbitofrontal cortex (OFC) in this process. Rats were trained in a novel intertemporal choice task in which the sequence of delay durations was randomized across trials. The animals tended to choose a small immediate reward more frequently as the delay for a large reward increased, and, consistent with previous findings in other species, their choice behavior was better accounted for by hyperbolic than exponential discount functions. In addition, model comparisons showed that the animal’s choice behavior was better accounted for by more complex discount functions with an additional parameter than a hyperbolic discount function. Following bilateral OFC lesions, rats extensively trained in this task showed no significant change in their intertemporal choice behavior. Our results suggest that the rodent OFC may not always play a role in temporal discounting when delays are randomized and/or after extensive training.
orbitofrontal cortex; intertemporal choice; temporal discounting; delayed reward; decision making
Adaptive behaviors increase the likelihood of survival and reproduction and improve the quality of life. However, it is often difficult to identify optimal behaviors in real life due to the complexity of the decision maker’s environment and social dynamics. As a result, although many different brain areas and circuits are involved in decision making, evolutionary and learning solutions adopted by individual decision makers sometimes produce suboptimal outcomes. Although these problems are exacerbated in numerous neurological and psychiatric disorders, their underlying neurobiological causes remain incompletely understood. In this review, theoretical frameworks in economics and machine learning and their applications in recent behavioral and neurobiological studies are summarized. Examples of such applications in clinical domains are also discussed for substance abuse, Parkinson’s disease, attention-deficit/hyperactivity disorder, schizophrenia, mood disorders, and autism. Findings from these studies have begun to lay the foundations necessary to improve diagnostics and treatment for various neurological and psychiatric disorders.
reinforcement learning; default network; impulsivity; addiction; Parkinson’s disease; schizophrenia; autism; depression; anxiety
Social decision making is arguably the most complex cognitive function performed by the human brain. This is due to two unique features of social decision making. First, predicting the behaviors of others is extremely difficult. Second, humans often take into consideration the well-beings of others during decision making, but this is influenced by many contextual factors. Despite such complexity, studies on the neural basis of social decision making have made substantial progress in the last several years. They demonstrated that the core brain areas involved in reinforcement learning and valuation, such as the ventral striatum and orbitofrontal cortex, make important contribution to social decision making. Furthermore, the contribution of brain systems implicated for theory of mind during decision making is being elucidated. Future studies are expected to provide additional details about the nature of information channeled through these brain areas.
Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain.
prefrontal cortex; neuroeconomics; reward; striatum; uncertainty
Subjective values of actions are influenced by the uncertainty and immediacy of expected rewards. Multiple brain areas, including the prefrontal cortex and basal ganglia, are implicated in selecting actions according to their subjective values. Alterations in these neural circuits therefore might contribute to symptoms of impulsive choice behaviors in disorders such as substance abuse and attention-deficit hyperactivity disorder (ADHD). In particular, the α-2A noradrenergic system is known to have a key influence on prefrontal cortical circuits, and medications that stimulate this receptor are currently in use for the treatment of ADHD.
We tested whether the preference of rhesus monkeys for delayed and uncertain reward is influenced by the α-2A adrenergic receptor agonist, guanfacine.
In each trial, the animal chose between a small, certain and immediate reward and another larger, more delayed reward. In half of the trials, the larger reward was certain, whereas in the remaining trials, the larger reward was uncertain.
Guanfacine increased the tendency for the animal to choose the larger and more delayed reward only when it was certain. By applying an econometric model to the animal’s choice behavior, we found that guanfacine selectively reduced the animal’s time preference, increasing their choice of delayed, larger rewards, without significantly affecting their risk preference.
In combination with previous findings that guanfacine improves the efficiency of working memory and other prefrontal functions, these results suggest that impulsive choice behaviors may also be ameliorated by strengthening prefrontal functions.
temporal discounting; intertemporal choice; reward; decision making; neuroeconomics; prefrontal cortex; gambling; impulsivity; guanfacine; ADHD
Rational, value-based decision-making mandates selecting the option with highest subjective expected value after appropriate deliberation. We examined activity in the dorsolateral prefrontal cortex (DLPFC) and striatum of monkeys deciding between smaller, immediate rewards and larger, delayed ones. We previously found neurons that modulated their activity in this task according to the animal's choice, while it deliberated (choice neurons). Here we found neurons whose spiking activities were predictive of the spatial location of the selected target (spatial-bias neurons) or the size of the chosen reward (reward-bias neurons) before the onset of the cue presenting the decision-alternatives, and thus before rational deliberation could begin. Their predictive power increased as the values the animals associated with the two decision alternatives became more similar. The ventral striatum (VS) preferentially contained spatial-bias neurons; the caudate nucleus (CD) preferentially contained choice neurons. In contrast, the DLPFC contained significant numbers of all three neuron types, but choice neurons were not preferentially also bias neurons of either kind there, nor were spatial-bias neurons preferentially also choice neurons, and vice versa. We suggest a simple winner-take-all (WTA) circuit model to account for the dissociation of choice and bias neurons. The model reproduced our results and made additional predictions that were borne out empirically. Our data are compatible with the hypothesis that the DLPFC and striatum harbor dissociated neural populations that represent choices and predeliberation biases that are combined after cue onset; the bias neurons have a weaker effect on the ultimate decision than the choice neurons, so their influence is progressively apparent for trials where the values associated with the decision alternatives are increasingly similar.
pre-deliberation decision bias; value-based decision-making; decision circuit-modeling; free-choice decision making; dorsolateral prefrontal cortex; ventral striatum; caudate nucleus; monkey single-neuron recording
Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive, when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning, and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning.
belief learning; decision making; game theory; reinforcement learning; reward
Limb movement is smooth and corrections of movement trajectory and amplitude are barely noticeable midflight. This suggests that skeletomuscular motor commands are smooth in transition, such that the rate of change of acceleration (or jerk) is minimized. Here we applied the methodology of minimum-jerk submovement decomposition to a member of the skeletomuscular family, the head movement. We examined the submovement composition of three types of horizontal head movements generated by nonhuman primates: head-alone tracking, head-gaze pursuit, and eye-head combined gaze shifts. The first two types of head movements tracked a moving target, whereas the last type oriented the head with rapid gaze shifts toward a target fixed in space. During head tracking, the head movement was composed of a series of episodes, each consisting of a distinct, bell-shaped velocity profile (submovement) that rarely overlapped with each other. There was no specific magnitude order in the peak velocities of these submovements. In contrast, during eye-head combined gaze shifts, the head movement was often comprised of overlapping submovements, in which the peak velocity of the primary submovement was always higher than that of the subsequent submovement, consistent with the two-component strategy observed in goal-directed limb movements. These results extend the previous submovement composition studies from limb to head movements, suggesting that submovement composition provides a biologically plausible approach to characterizing the head motor recruitment that can vary depending on task demand.
Impulsivity refers to a set of heterogeneous behaviors that are tuned suboptimally along certain temporal dimensions. Impulsive inter-temporal choice refers to the tendency to forego a large but delayed reward and to seek an inferior but more immediate reward, whereas impulsive motor responses also result when the subjects fail to suppress inappropriate automatic behaviors. In addition, impulsive actions can be produced when too much emphasis is placed on speed rather than accuracy in a wide range of behaviors, including perceptual decision making. Despite this heterogeneous nature, the prefrontal cortex and its connected areas, such as the basal ganglia, play an important role in gating impulsive actions in a variety of behavioral tasks. Here, we describe key features of computations necessary for optimal decision making, and how their failures can lead to impulsive behaviors. We also review the recent findings from neuroimaging and single-neuron recording studies on the neural mechanisms related to impulsive behaviors. Converging approaches in economics, psychology, and neuroscience provide a unique vista for better understanding the nature of behavioral impairments associated with impulsivity.
intertemporal choice; temporal discounting; basal ganglia; speed-accuracy tradeoff; response inhibition; switching
Knowledge about hypothetical outcomes from unchosen actions is beneficial only when such outcomes can be correctly attributed to specific actions. Here, we show that during a simulated rock-paper-scissors game, rhesus monkeys can adjust their choice behaviors according to both actual and hypothetical outcomes from their chosen and unchosen actions, respectively. In addition, neurons in both dorsolateral prefrontal cortex and orbitofrontal cortex encoded the signals related to actual and hypothetical outcomes immediately after they were revealed to the animal. Moreover, compared to the neurons in the orbitofrontal cortex, those in the dorsolateral prefrontal cortex were more likely to change their activity according to the hypothetical outcomes from specific actions. Conjunctive and parallel coding of multiple actions and their outcomes in the prefrontal cortex might enhance the efficiency of reinforcement learning and also contribute to their context-dependent memory.
Despite widespread neural activity related to reward values, signals related to upcoming choice have not been clearly identified in the rodent brain. Here, we examined neuronal activity in the lateral (AGl) and medial (AGm) agranular cortex, corresponding to the primary and secondary motor cortex, respectively, in rats performing a dynamic foraging task. Choice signals arose in the AGm before behavioral manifestation of the animal’s choice earlier than in any other areas of the rat brain previously studied under free-choice conditions. The AGm also conveyed significant neural signals for decision value and chosen value. In contrast, upcoming choice signals arose later and value signals were weaker in the AGl. We also found that AGm lesions made the animal’s choices less dependent on dynamically updated values. These results suggest that rodent secondary motor cortex might be uniquely involved in both representing and reading out value signals for flexible action selection.
Many of the cognitive deficits of normal aging (forgetfulness, distractibility, inflexibility, and impaired executive functions) involve prefrontal cortical (PFC) dysfunction1–4. The PFC guides behavior and thought using working memory5, essential functions in the Information Age. Many PFC neurons hold information in working memory through excitatory networks that can maintain persistent neuronal firing in the absence of external stimulation6. This fragile process is highly dependent on the neurochemical environment7. For example, elevated cAMP signaling reduces persistent firing by opening HCN and KCNQ potassium channels8,9. It is not known if molecular changes associated with normal aging alter the physiological properties of PFC neurons during working memory, as there have been no in vivo recordings from PFC neurons of aged monkeys. Here we characterize the first recordings of this kind, revealing a marked loss of PFC persistent firing with advancing age that can be rescued by restoring an optimal neurochemical environment. Recordings showed an age-related decline in the firing rate of DELAY neurons, while the firing of CUE neurons remained unchanged with age. The memory-related firing of aged DELAY neurons was partially restored to more youthful levels by inhibiting cAMP signaling, or by blocking HCN or KCNQ channels. These findings reveal the cellular basis of age-related cognitive decline in dorsolateral PFC, and demonstrate that physiological integrity can be rescued by addressing the molecular needs of PFC circuits.
prefrontal cortex; working memory; aging; cAMP signaling; HCN channels; KCNQ channels; α2A adrenoceptors
In choosing between different rewards expected after unequal delays, humans and animals often prefer the smaller but more immediate reward, indicating that the subjective value or utility of reward is depreciated according to its delay. Here, we show that the neurons in the primate caudate nucleus and ventral striatum modulate their activity according to temporally discounted values of rewards with a similar time course. However, neurons in the caudate nucleus encoded the difference in the temporally discounted values of the two alternative targets more reliably than the neurons in the ventral striatum. In contrast, the neurons in the ventral striatum largely encoded the sum of the temporally discounted values, and therefore, the overall goodness of available options. These results suggest a more pivotal role for the dorsal striatum in action selection during intertemporal choice.
The value of an object acquired by a particular action often determines the motivation to produce that action. Previous studies found neural signals related to the values of different objects or goods in the orbitofrontal cortex, while the values of outcomes expected from different actions are broadly represented in multiple brain areas implicated in movement planning. However, how the brain combines the values associated with various objects and the information about their locations is not known. In this study, we tested whether the neurons in the dorsolateral prefrontal cortex (DLPFC) and striatum in rhesus monkeys might contribute to translating the value signals between multiple frames of reference. Monkeys were trained to perform an oculomotor intertemporal choice in which the color of a saccade target and the number of its surrounding dots signaled the magnitude of reward and its delay, respectively. In both DLPFC and striatum, temporally discounted values (DVs) associated with specific target colors and locations were encoded by partially overlapping populations of neurons. In the DLPFC, the information about reward delays and DVs of rewards available from specific target locations emerged earlier than the corresponding signals for target colors. Similar results were reproduced by a simple network model built to compute DVs of rewards in different locations. Therefore, DLPFC might play an important role in estimating the values of different actions by combining the previously learned values of objects and their present locations.
intertemporal choice; prefrontal cortex; reward; temporal discounting; utility
According to reinforcement learning theory of decision making, reward expectation is computed by integrating past rewards with a fixed timescale. By contrast, we found that a wide range of time constants is available across cortical neurons recorded from monkeys performing a competitive game task. By recognizing that reward modulates neural activity multiplicatively, we found that one or two time constants of reward memory can be extracted for each neuron in prefrontal, cingulate, and parietal cortex. These timescales ranged from hundreds of milliseconds to tens of seconds, according to a power-law distribution, which is consistent across areas and reproduced by a “reservoir” neural network model. These neuronal memory timescales were weakly but significantly correlated with those of monkey's decisions. Our findings suggest a flexible memory system, where neural subpopulations with distinct sets of long or short memory timescales may be selectively deployed according to the task demands.
We investigated how different sub-regions of rodent prefrontal cortex contribute to value-based decision making, by comparing neural signals related to animal’s choice, its outcome, and action value in orbitofrontal cortex (OFC) and medial prefrontal cortex (mPFC) of rats performing a dynamic two-armed bandit task. Neural signals for upcoming action selection arose in the mPFC, including the anterior cingulate cortex, only immediately before the behavioral manifestation of animal’s choice, suggesting that rodent prefrontal cortex is not involved in advanced action planning. Both OFC and mPFC conveyed signals related to the animal’s past choices and their outcomes over multiple trials, but neural signals for chosen value and reward prediction error were more prevalent in the OFC. Our results suggest that rodent OFC and mPFC serve distinct roles in value-based decision making, and that the OFC plays a prominent role in updating the values of outcomes expected from chosen actions.
Since its first discovery in the prefrontal cortex, persistent activity during the interval between a transient sensory stimulus and a subsequent behavioral response has been identified in many cortical and subcortical areas. Such persistent activity is thought to reflect the maintenance of working memory representations that bridge past events with future contingent plans. Indeed, the term persistent activity is sometimes used interchangeably with working memory. In this review, we argue that persistent activity observed broadly across many cortical and subcortical areas reflects not only working memory maintenance, but also a variety of other cognitive processes, including perceptual and reward-based decision making.
Humans and animals often must choose between rewards that differ in their qualities, magnitudes, immediacy, and likelihood, and must estimate these multiple reward parameters from their experience. However, the neural basis for such complex decision making is not well understood. To understand the role of the primate prefrontal cortex in determining the subjective value of delayed or uncertain reward, we examined the activity of individual prefrontal neurons during an inter-temporal choice task and a computer-simulated competitive game. Consistent with the findings from previous studies in humans and other animals, the monkey’s behaviors during inter-temporal choice were well accounted for by a hyperbolic discount function. In addition, the activity of many neurons in the lateral prefrontal cortex reflected the signals related to the magnitude and delay of the reward expected from a particular action, and often encoded the difference in temporally discounted values that predicted the animal’s choice. During a computerized matching pennies game, the animals approximated the optimal strategy, known as Nash equilibrium, using a reinforcement learning algorithm. We also found that many neurons in the lateral prefrontal cortex conveyed the signals related to the animal’s previous choices and their outcomes, suggesting that this cortical area might play an important role in forming associations between actions and their outcomes. These results show that the primate lateral prefrontal cortex plays a central role in estimating the values of alternative actions based on multiple sources of information.
game theory; inter-temporal choice; reinforcement learning; utility theory; temporal discounting
Game theory analyses optimal strategies for multiple decision makers interacting in a social group. However, the behaviours of individual humans and animals often deviate systematically from the optimal strategies described by game theory. The behaviours of rhesus monkeys (Macaca mulatta) in simple zero-sum games showed similar patterns, but their departures from the optimal strategies were well accounted for by a simple reinforcement-learning algorithm. During a computer-simulated zero-sum game, neurons in the dorsolateral prefrontal cortex often encoded the previous choices of the animal and its opponent as well as the animal's reward history. By contrast, the neurons in the anterior cingulate cortex predominantly encoded the animal's reward history. Using simple competitive games, therefore, we have demonstrated functional specialization between different areas of the primate frontal cortex involved in outcome monitoring and action selection. Temporally extended signals related to the animal's previous choices might facilitate the association between choices and their delayed outcomes, whereas information about the choices of the opponent might be used to estimate the reward expected from a particular action. Finally, signals related to the reward history might be used to monitor the overall success of the animal's current decision-making strategy.
prefrontal cortex; decision making; reward
Activity of the neurons in the lateral intra-parietal cortex or LIP displays a mixture of sensory, motor, and memory signals. Moreover, they often encode signals reflecting the accumulation of sensory evidence that certain eye movements might lead to a desirable outcome. However, when the environment changes dynamically, animals are also required to combine the information about its previously chosen actions and their outcomes appropriately to update continually the desirabilities of alternative actions. Here, we investigated whether LIP neurons encoded signals necessary to update an animal’s decision making strategies adaptively during a computer-simulated matching pennies game. Using a reinforcement learning algorithm, we estimated the value functions that best predicted the animal’s choices on a trial-by-trial basis. We found that immediately before the animal revealed its choice, approximately 18% of LIP neurons changed their activity according to the difference in the value functions for the two targets. In addition, a somewhat higher fraction of LIP neurons displayed signals related to the sum of the value function, which might correspond to the state value function or an average rate of reward used as a reference point. Similar to the neurons in the prefrontal cortex, many LIP neurons also encoded the signals related to the animal’s previous choices. Thus, the posterior parietal cortex might be a part of the network that provides the substrate for forming appropriate associations between actions and outcomes.
decision; parietal; reward; feedback; control; cognition
Human behaviors can be more powerfully influenced by conditioned reinforcers, such as money, than by primary reinforcers. Moreover, people often change their behaviors to avoid monetary losses. However, the effect of removing conditioned reinforcers on choices has not been explored in animals, and the neural mechanisms mediating the behavioral effects of gains and losses are not well understood. To investigate the behavioral and neural effects of gaining and losing a conditioned reinforcer, we trained rhesus monkeys for a matching pennies task in which the positive and negative values of its payoff matrix were realized by the delivery and removal of a conditioned reinforcer. Consistent with the findings previously obtained with non-negative payoffs and primary rewards, the animal’s choice behavior during this task was nearly optimal. Nevertheless, the gain and loss of a conditioned reinforcer significantly increased and decreased, respectively, the tendency for the animal to choose the same target in subsequent trials. We also found that the neurons in the dorsomedial frontal cortex, dorsal anterior cingulate cortex, and dorsolateral prefrontal cortex often changed their activity according to whether the animal earned or lost a conditioned reinforcer in the current or previous trial. Moreover, many neurons in the dorsomedial frontal cortex also signaled the gain or loss occurring as a result of choosing a particular action as well as changes in the animal’s behaviors resulting from such gains or losses. Thus, primate medial frontal cortex might mediate the behavioral effects of conditioned reinforcers and their losses.
cingulate cortex; decision making; prefrontal cortex; reinforcement learning; reward; punishment; neuroeconomics
Decision-makers often face choices whose consequences unfold over time. To explore the neural basis of such inter-temporal choice behavior, we devised a novel two-alternative choice task with probabilistic reward delivery and contrasted two conditions that differed only in whether the outcome was revealed immediately or after some delay. In the immediate condition, we simply varied the reward probability of each option and the outcome was revealed immediately. In the delay condition, the outcome was revealed after a delay during which the reward probability was governed by a constant hazard rate. Functional imaging revealed a set of brain regions, such as the posterior cingulate cortex, parahippacampal gyri, and frontal pole, that exhibited activity uniquely associated with the temporal aspects of the task. This engagement of the so-called “default network” suggests that during inter-temporal choice, decision-makers simulate the impending delay via a process of prospection.
decision; fMRI; inter-temporal choice; prospection; discounting; temporal resolution of uncertainty