Human choice behaviors during social interactions often deviate from the predictions of game theory. This might arise partly from the limitations in cognitive abilities necessary for recursive reasoning about the behaviors of others. In addition, during iterative social interactions, choices might change dynamically, as knowledge about the intentions of others and estimates for choice outcomes are incrementally updated via reinforcement learning. Some of the brain circuits utilized during social decision making might be general-purpose and contribute to isomorphic individual and social decision making. By contrast, regions in the medial prefrontal cortex and temporal parietal junction might be recruited for cognitive processes unique to social decision making.
game theory; reinforcement learning; arbitration; prefrontal cortex
Dopamine D2/3 receptor signaling is critical for flexible adaptive behavior; however, it is unclear whether D2, D3, or both receptor subtypes modulate precise signals of feedback and reward history that underlie optimal decision making. Here, PET with the radioligand [11C]-(+)-PHNO was used to quantify individual differences in putative D3 receptor availability in rodents trained on a novel three-choice spatial acquisition and reversal-learning task with probabilistic reinforcement. Binding of [11C]-(+)-PHNO in the midbrain was negatively related to the ability of rats to adapt to changes in rewarded locations, but not to the initial learning. Computational modeling of choice behavior in the reversal phase indicated that [11C]-(+)-PHNO binding in the midbrain was related to the learning rate and sensitivity to positive, but not negative, feedback. Administration of a D3-preferring agonist likewise impaired reversal performance by reducing the learning rate and sensitivity to positive feedback. These results demonstrate a previously unrecognized role for D3 receptors in select aspects of reinforcement learning and suggest that individual variation in midbrain D3 receptors influences flexible behavior. Our combined neuroimaging, behavioral, pharmacological, and computational approach implicates the dopamine D3 receptor in decision-making processes that are altered in psychiatric disorders.
SIGNIFICANCE STATEMENT Flexible decision-making behavior is dependent upon dopamine D2/3 signaling in corticostriatal brain regions. However, the role of D3 receptors in adaptive, goal-directed behavior has not been thoroughly investigated. By combining PET imaging with the D3-preferring radioligand [11C]-(+)-PHNO, pharmacology, a novel three-choice probabilistic discrimination and reversal task and computational modeling of behavior in rats, we report that naturally occurring variation in [11C]-(+)-PHNO receptor availability relates to specific aspects of flexible decision making. We confirm these relationships using a D3-preferring agonist, thus identifying a unique role of midbrain D3 receptors in decision-making processes.
addiction; computational analyses; decision-making; dopamine D3 receptors; PET; reinforcement learning
Psychiatric disorders such as schizophrenia are worsened by stress, and working memory deficits are often a central feature of illness. Working memory is mediated by the persistent firing of prefrontal cortical (PFC) pyramidal neurons. Stress impairs working memory via high levels of dopamine D1 receptor (D1R) activation of cAMP signaling, which reduces PFC neuronal firing. The current study examined whether D1R-cAMP signaling reduces neuronal firing and impairs working memory by increasing the open state of hyperpolarization-activated cyclic nucleotide-gated (HCN) cation channels, which are concentrated on dendritic spines where PFC pyramidal neurons interconnect.
A variety of methods were employed to test this hypothesis: dual immunoelectron microscopy localized D1R and HCN channels, in vitro recordings tested for D1R actions on HCN channel current (Ih), while recordings in monkeys performing a working memory task tested for D1R-HCN channel interactions in vivo. Finally, cognitive assessments following intra-PFC infusions of drugs examined D1R-HCN channel interactions on working memory performance.
Immunoelectron microscopy confirmed D1R colocalization with HCN channels near excitatory-like synapses on dendritic spines in primate PFC. Mouse PFC slice recordings demonstrated that D1R stimulation increased Ih, while local HCN channel blockade in primate PFC protected task-related firing from D1R-mediated suppression. D1R stimulation in rat or monkey PFC impaired working memory performance, while HCN channel blockade in PFC prevented this impairment in rats exposed to either stress or D1R stimulation.
These findings suggest that D1R stimulation or stress weakens PFC function via opening of HCN channels at network synapses.
Prefrontal cortex; working memory; stress; D1 dopamine receptor; cAMP; HCN channel
Specialization and hierarchy are organizing principles for primate cortex, yet there is little direct evidence for how cortical areas are specialized in the temporal domain. We measured timescales of intrinsic fluctuations in spiking activity across areas, and found a hierarchical ordering, with sensory and prefrontal areas exhibiting shorter and longer timescales, respectively. Based on our findings, we suggest that intrinsic timescales reflect areal specialization for task-relevant computations over multiple temporal ranges.
Although human and animal behaviors are largely shaped by reinforcement and punishment, choices in social settings are also influenced by information about the knowledge and experience of other decision-makers. During competitive games, monkeys increased their payoffs by systematically deviating from a simple heuristic learning algorithm and thereby countering the predictable exploitation by their computer opponent. Neurons in the dorsomedial prefrontal cortex (dmPFC) signaled the animal’s recent choice and reward history that reflected the computer’s exploitative strategy. The strength of switching signals in the dmPFC also correlated with the animal’s tendency to deviate from the heuristic learning algorithm. Therefore, the dmPFC might provide control signals for overriding simple heuristic learning algorithms based on the inferred strategies of the opponent.
Choices of humans and non-human primates are influenced by both actually experienced and fictive outcomes. To test whether this is also the case in rodents, we examined rat's choice behavior in a binary choice task in which variable magnitudes of actual and fictive rewards were delivered. We found that the animal's choice was significantly influenced by the magnitudes of both actual and fictive rewards in the previous trial. A model-based analysis revealed, however, that the effect of fictive reward was more transient and influenced mostly the choice in the next trial, whereas the effect of actual reward was more sustained, consistent with incremental learning of action values. Our results suggest that the capacity to modify future choices based on fictive outcomes might be shared by many different animal species, but fictive outcomes are less effective than actual outcomes in the incremental value learning system.
In stable environments, decision makers can exploit their previously learned strategies for optimal outcomes, while exploration might lead to better options in unstable environments. Here, to investigate the cortical contributions to exploratory behavior, we analyzed singleneuron activity recorded from 4 different cortical areas of monkeys performing a matching pennies task and a visual search task, which encouraged and discouraged exploration, respectively. We found that neurons in multiple regions in the frontal and parietal cortex tended to encode signals related to previously rewarded actions more reliably than unrewarded actions. In addition, signals for rewarded choices in the supplementary eye field were attenuated during the visual search task, and were correlated with the tendency to switch choices during the matching pennies task. These results suggest that the supplementary eye field might play a unique role in encouraging animals to explore alternative decision-making strategies.
Leathers and Olson (Reports, 5 October 2012, p. 132) draw the strong
conclusion that neurons in the monkey lateral intraparietal (LIP) cortical area
encode only cue salience, and not action value, during value-based
decision-making. Although their findings regarding cue salience are interesting,
their broader conclusions are problematic because (i) their primary conclusion
is based on responses observed during a brief interval at the beginning of
behavioral trials but is extended to all subsequent temporal epochs and (ii) the
authors failed to replicate basic hallmarks of LIP physiology observed in those
subsequent temporal epochs by many laboratories.
Although choices of both humans and animals are more strongly influenced by immediate than delayed rewards, methodological limitations have made it difficult to estimate the precise form of temporal discounting in animals. In the present study, we sought to characterize temporal discounting in rats and to test the role of the orbitofrontal cortex (OFC) in this process. Rats were trained in a novel intertemporal choice task in which the sequence of delay durations was randomized across trials. The animals tended to choose a small immediate reward more frequently as the delay for a large reward increased, and, consistent with previous findings in other species, their choice behavior was better accounted for by hyperbolic than exponential discount functions. In addition, model comparisons showed that the animal’s choice behavior was better accounted for by more complex discount functions with an additional parameter than a hyperbolic discount function. Following bilateral OFC lesions, rats extensively trained in this task showed no significant change in their intertemporal choice behavior. Our results suggest that the rodent OFC may not always play a role in temporal discounting when delays are randomized and/or after extensive training.
orbitofrontal cortex; intertemporal choice; temporal discounting; delayed reward; decision making
Adaptive behaviors increase the likelihood of survival and reproduction and improve the quality of life. However, it is often difficult to identify optimal behaviors in real life due to the complexity of the decision maker’s environment and social dynamics. As a result, although many different brain areas and circuits are involved in decision making, evolutionary and learning solutions adopted by individual decision makers sometimes produce suboptimal outcomes. Although these problems are exacerbated in numerous neurological and psychiatric disorders, their underlying neurobiological causes remain incompletely understood. In this review, theoretical frameworks in economics and machine learning and their applications in recent behavioral and neurobiological studies are summarized. Examples of such applications in clinical domains are also discussed for substance abuse, Parkinson’s disease, attention-deficit/hyperactivity disorder, schizophrenia, mood disorders, and autism. Findings from these studies have begun to lay the foundations necessary to improve diagnostics and treatment for various neurological and psychiatric disorders.
reinforcement learning; default network; impulsivity; addiction; Parkinson’s disease; schizophrenia; autism; depression; anxiety
Social decision making is arguably the most complex cognitive function performed by the human brain. This is due to two unique features of social decision making. First, predicting the behaviors of others is extremely difficult. Second, humans often take into consideration the well-beings of others during decision making, but this is influenced by many contextual factors. Despite such complexity, studies on the neural basis of social decision making have made substantial progress in the last several years. They demonstrated that the core brain areas involved in reinforcement learning and valuation, such as the ventral striatum and orbitofrontal cortex, make important contribution to social decision making. Furthermore, the contribution of brain systems implicated for theory of mind during decision making is being elucidated. Future studies are expected to provide additional details about the nature of information channeled through these brain areas.
Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal’s knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain.
prefrontal cortex; neuroeconomics; reward; striatum; uncertainty
Subjective values of actions are influenced by the uncertainty and immediacy of expected rewards. Multiple brain areas, including the prefrontal cortex and basal ganglia, are implicated in selecting actions according to their subjective values. Alterations in these neural circuits therefore might contribute to symptoms of impulsive choice behaviors in disorders such as substance abuse and attention-deficit hyperactivity disorder (ADHD). In particular, the α-2A noradrenergic system is known to have a key influence on prefrontal cortical circuits, and medications that stimulate this receptor are currently in use for the treatment of ADHD.
We tested whether the preference of rhesus monkeys for delayed and uncertain reward is influenced by the α-2A adrenergic receptor agonist, guanfacine.
In each trial, the animal chose between a small, certain and immediate reward and another larger, more delayed reward. In half of the trials, the larger reward was certain, whereas in the remaining trials, the larger reward was uncertain.
Guanfacine increased the tendency for the animal to choose the larger and more delayed reward only when it was certain. By applying an econometric model to the animal’s choice behavior, we found that guanfacine selectively reduced the animal’s time preference, increasing their choice of delayed, larger rewards, without significantly affecting their risk preference.
In combination with previous findings that guanfacine improves the efficiency of working memory and other prefrontal functions, these results suggest that impulsive choice behaviors may also be ameliorated by strengthening prefrontal functions.
temporal discounting; intertemporal choice; reward; decision making; neuroeconomics; prefrontal cortex; gambling; impulsivity; guanfacine; ADHD
Rational, value-based decision-making mandates selecting the option with highest subjective expected value after appropriate deliberation. We examined activity in the dorsolateral prefrontal cortex (DLPFC) and striatum of monkeys deciding between smaller, immediate rewards and larger, delayed ones. We previously found neurons that modulated their activity in this task according to the animal's choice, while it deliberated (choice neurons). Here we found neurons whose spiking activities were predictive of the spatial location of the selected target (spatial-bias neurons) or the size of the chosen reward (reward-bias neurons) before the onset of the cue presenting the decision-alternatives, and thus before rational deliberation could begin. Their predictive power increased as the values the animals associated with the two decision alternatives became more similar. The ventral striatum (VS) preferentially contained spatial-bias neurons; the caudate nucleus (CD) preferentially contained choice neurons. In contrast, the DLPFC contained significant numbers of all three neuron types, but choice neurons were not preferentially also bias neurons of either kind there, nor were spatial-bias neurons preferentially also choice neurons, and vice versa. We suggest a simple winner-take-all (WTA) circuit model to account for the dissociation of choice and bias neurons. The model reproduced our results and made additional predictions that were borne out empirically. Our data are compatible with the hypothesis that the DLPFC and striatum harbor dissociated neural populations that represent choices and predeliberation biases that are combined after cue onset; the bias neurons have a weaker effect on the ultimate decision than the choice neurons, so their influence is progressively apparent for trials where the values associated with the decision alternatives are increasingly similar.
pre-deliberation decision bias; value-based decision-making; decision circuit-modeling; free-choice decision making; dorsolateral prefrontal cortex; ventral striatum; caudate nucleus; monkey single-neuron recording
Behavioral changes driven by reinforcement and punishment are referred to as simple or model-free reinforcement learning. Animals can also change their behaviors by observing events that are neither appetitive nor aversive, when these events provide new information about payoffs available from alternative actions. This is an example of model-based reinforcement learning, and can be accomplished by incorporating hypothetical reward signals into the value functions for specific actions. Recent neuroimaging and single-neuron recording studies showed that the prefrontal cortex and the striatum are involved not only in reinforcement and punishment, but also in model-based reinforcement learning. We found evidence for both types of learning, and hence hybrid learning, in monkeys during simulated competitive games. In addition, in both the dorsolateral prefrontal cortex and orbitofrontal cortex, individual neurons heterogeneously encoded signals related to actual and hypothetical outcomes from specific actions, suggesting that both areas might contribute to hybrid learning.
belief learning; decision making; game theory; reinforcement learning; reward
Limb movement is smooth and corrections of movement trajectory and amplitude are barely noticeable midflight. This suggests that skeletomuscular motor commands are smooth in transition, such that the rate of change of acceleration (or jerk) is minimized. Here we applied the methodology of minimum-jerk submovement decomposition to a member of the skeletomuscular family, the head movement. We examined the submovement composition of three types of horizontal head movements generated by nonhuman primates: head-alone tracking, head-gaze pursuit, and eye-head combined gaze shifts. The first two types of head movements tracked a moving target, whereas the last type oriented the head with rapid gaze shifts toward a target fixed in space. During head tracking, the head movement was composed of a series of episodes, each consisting of a distinct, bell-shaped velocity profile (submovement) that rarely overlapped with each other. There was no specific magnitude order in the peak velocities of these submovements. In contrast, during eye-head combined gaze shifts, the head movement was often comprised of overlapping submovements, in which the peak velocity of the primary submovement was always higher than that of the subsequent submovement, consistent with the two-component strategy observed in goal-directed limb movements. These results extend the previous submovement composition studies from limb to head movements, suggesting that submovement composition provides a biologically plausible approach to characterizing the head motor recruitment that can vary depending on task demand.
Impulsivity refers to a set of heterogeneous behaviors that are tuned suboptimally along certain temporal dimensions. Impulsive inter-temporal choice refers to the tendency to forego a large but delayed reward and to seek an inferior but more immediate reward, whereas impulsive motor responses also result when the subjects fail to suppress inappropriate automatic behaviors. In addition, impulsive actions can be produced when too much emphasis is placed on speed rather than accuracy in a wide range of behaviors, including perceptual decision making. Despite this heterogeneous nature, the prefrontal cortex and its connected areas, such as the basal ganglia, play an important role in gating impulsive actions in a variety of behavioral tasks. Here, we describe key features of computations necessary for optimal decision making, and how their failures can lead to impulsive behaviors. We also review the recent findings from neuroimaging and single-neuron recording studies on the neural mechanisms related to impulsive behaviors. Converging approaches in economics, psychology, and neuroscience provide a unique vista for better understanding the nature of behavioral impairments associated with impulsivity.
intertemporal choice; temporal discounting; basal ganglia; speed-accuracy tradeoff; response inhibition; switching
Knowledge about hypothetical outcomes from unchosen actions is beneficial only when such outcomes can be correctly attributed to specific actions. Here, we show that during a simulated rock-paper-scissors game, rhesus monkeys can adjust their choice behaviors according to both actual and hypothetical outcomes from their chosen and unchosen actions, respectively. In addition, neurons in both dorsolateral prefrontal cortex and orbitofrontal cortex encoded the signals related to actual and hypothetical outcomes immediately after they were revealed to the animal. Moreover, compared to the neurons in the orbitofrontal cortex, those in the dorsolateral prefrontal cortex were more likely to change their activity according to the hypothetical outcomes from specific actions. Conjunctive and parallel coding of multiple actions and their outcomes in the prefrontal cortex might enhance the efficiency of reinforcement learning and also contribute to their context-dependent memory.
Despite widespread neural activity related to reward values, signals related to upcoming choice have not been clearly identified in the rodent brain. Here, we examined neuronal activity in the lateral (AGl) and medial (AGm) agranular cortex, corresponding to the primary and secondary motor cortex, respectively, in rats performing a dynamic foraging task. Choice signals arose in the AGm before behavioral manifestation of the animal’s choice earlier than in any other areas of the rat brain previously studied under free-choice conditions. The AGm also conveyed significant neural signals for decision value and chosen value. In contrast, upcoming choice signals arose later and value signals were weaker in the AGl. We also found that AGm lesions made the animal’s choices less dependent on dynamically updated values. These results suggest that rodent secondary motor cortex might be uniquely involved in both representing and reading out value signals for flexible action selection.
Many of the cognitive deficits of normal aging (forgetfulness, distractibility, inflexibility, and impaired executive functions) involve prefrontal cortical (PFC) dysfunction1–4. The PFC guides behavior and thought using working memory5, essential functions in the Information Age. Many PFC neurons hold information in working memory through excitatory networks that can maintain persistent neuronal firing in the absence of external stimulation6. This fragile process is highly dependent on the neurochemical environment7. For example, elevated cAMP signaling reduces persistent firing by opening HCN and KCNQ potassium channels8,9. It is not known if molecular changes associated with normal aging alter the physiological properties of PFC neurons during working memory, as there have been no in vivo recordings from PFC neurons of aged monkeys. Here we characterize the first recordings of this kind, revealing a marked loss of PFC persistent firing with advancing age that can be rescued by restoring an optimal neurochemical environment. Recordings showed an age-related decline in the firing rate of DELAY neurons, while the firing of CUE neurons remained unchanged with age. The memory-related firing of aged DELAY neurons was partially restored to more youthful levels by inhibiting cAMP signaling, or by blocking HCN or KCNQ channels. These findings reveal the cellular basis of age-related cognitive decline in dorsolateral PFC, and demonstrate that physiological integrity can be rescued by addressing the molecular needs of PFC circuits.
prefrontal cortex; working memory; aging; cAMP signaling; HCN channels; KCNQ channels; α2A adrenoceptors
In choosing between different rewards expected after unequal delays, humans and animals often prefer the smaller but more immediate reward, indicating that the subjective value or utility of reward is depreciated according to its delay. Here, we show that the neurons in the primate caudate nucleus and ventral striatum modulate their activity according to temporally discounted values of rewards with a similar time course. However, neurons in the caudate nucleus encoded the difference in the temporally discounted values of the two alternative targets more reliably than the neurons in the ventral striatum. In contrast, the neurons in the ventral striatum largely encoded the sum of the temporally discounted values, and therefore, the overall goodness of available options. These results suggest a more pivotal role for the dorsal striatum in action selection during intertemporal choice.
The value of an object acquired by a particular action often determines the motivation to produce that action. Previous studies found neural signals related to the values of different objects or goods in the orbitofrontal cortex, while the values of outcomes expected from different actions are broadly represented in multiple brain areas implicated in movement planning. However, how the brain combines the values associated with various objects and the information about their locations is not known. In this study, we tested whether the neurons in the dorsolateral prefrontal cortex (DLPFC) and striatum in rhesus monkeys might contribute to translating the value signals between multiple frames of reference. Monkeys were trained to perform an oculomotor intertemporal choice in which the color of a saccade target and the number of its surrounding dots signaled the magnitude of reward and its delay, respectively. In both DLPFC and striatum, temporally discounted values (DVs) associated with specific target colors and locations were encoded by partially overlapping populations of neurons. In the DLPFC, the information about reward delays and DVs of rewards available from specific target locations emerged earlier than the corresponding signals for target colors. Similar results were reproduced by a simple network model built to compute DVs of rewards in different locations. Therefore, DLPFC might play an important role in estimating the values of different actions by combining the previously learned values of objects and their present locations.
intertemporal choice; prefrontal cortex; reward; temporal discounting; utility
According to reinforcement learning theory of decision making, reward expectation is computed by integrating past rewards with a fixed timescale. By contrast, we found that a wide range of time constants is available across cortical neurons recorded from monkeys performing a competitive game task. By recognizing that reward modulates neural activity multiplicatively, we found that one or two time constants of reward memory can be extracted for each neuron in prefrontal, cingulate, and parietal cortex. These timescales ranged from hundreds of milliseconds to tens of seconds, according to a power-law distribution, which is consistent across areas and reproduced by a “reservoir” neural network model. These neuronal memory timescales were weakly but significantly correlated with those of monkey's decisions. Our findings suggest a flexible memory system, where neural subpopulations with distinct sets of long or short memory timescales may be selectively deployed according to the task demands.
We investigated how different sub-regions of rodent prefrontal cortex contribute to value-based decision making, by comparing neural signals related to animal’s choice, its outcome, and action value in orbitofrontal cortex (OFC) and medial prefrontal cortex (mPFC) of rats performing a dynamic two-armed bandit task. Neural signals for upcoming action selection arose in the mPFC, including the anterior cingulate cortex, only immediately before the behavioral manifestation of animal’s choice, suggesting that rodent prefrontal cortex is not involved in advanced action planning. Both OFC and mPFC conveyed signals related to the animal’s past choices and their outcomes over multiple trials, but neural signals for chosen value and reward prediction error were more prevalent in the OFC. Our results suggest that rodent OFC and mPFC serve distinct roles in value-based decision making, and that the OFC plays a prominent role in updating the values of outcomes expected from chosen actions.