|Home | About | Journals | Submit | Contact Us | Français|
When choosing between two options, correlates of their value are represented in neural activity throughout the brain. Whether these representations reflect activity fundamental to the computational process of value comparison, as opposed to other computations covarying with value, is unknown. Here, we investigated activity in a biophysically plausible network model that transforms inputs relating to value into categorical choices. A set of characteristic time-varying signals emerged that reflect value comparison. We tested these model predictions in magnetoencephalography data recorded from human subjects performing value-guided decisions. Parietal and prefrontal signals matched closely with model predictions. These results provide a mechanistic explanation of neural signals recorded during value-guided choice, and a means of distinguishing computational roles of different cortical regions whose activity covaries with value.
Deciding upon the best course of action amongst a range of competing alternatives has been a fundamental problem addressed within the fields of economics1, psychology2, behavioural ecology3, machine learning4, and more recently, cognitive neuroscience5-8. To select the choice yielding greatest long-term reward, it has been proposed that neural circuits should take inputs reflecting the subjective value of alternatives, and compare these inputs to form a categorical decision8. Representations of value have been found in many cortical and subcortical brain regions9-18 but whether and how activity changes in these representations might constitute the decision process itself is unknown. The uncertainty is partly a consequence of not knowing how the signature of a decision would manifest itself at the level of the activity that can be recorded in a population of neurons.
One potential neuronal mechanism for value comparison is competition by mutual inhibition19,20. In this class of models, separate pools of neurons representing different options are excited by the value of their respective options, but inhibit each other such that activity only survives in the eventual winning pool. This mechanism is particularly attractive as it can be implemented in networks of neurons that respect known neurobiology20. Indeed, such models accurately predict single cell activity in the parietal cortex during perceptual decisions21.
It has been proposed that similar mechanisms might also underlie value-guided choice, but this proposal has rarely been tested empirically10,22. A key problem is that model predictions are of single-unit activity, but it is impossible to simultaneously measure this across the many brain regions that exhibit value-related activity. However, if such inhibitory mechanisms were to exhibit a characteristic signature that could be measured not in single cell activity, but in the summed activity of the local network, then we could use imaging techniques to search for this signature across the entire brain and isolate those regions fundamental to value comparison.
In this study, we adopted such an approach. We analysed a biophysically realistic network model of decision-making in order to generate predictions of the temporal dynamics of value correlates in local field potentials. We then applied the exact same analysis to source-reconstructed magnetoencephalography (MEG) data, a whole-brain human imaging technique that affords the requisite temporal resolution to test model predictions. Importantly, MEG allows coverage of signal from the entirety of neocortex, allowing for predictions to be tested in multiple brain regions simultaneously with high temporal resolution. Regions of ventromedial prefrontal and superior parietal cortex matched well with the biophysical model, implicating them in value comparison. Value correlates in other cortical regions matched poorly, implicating them in separate computational processes that covary with value.
We used a mean-field version23 of a biophysical cortical attractor network model20 to derive predictions of the temporal dynamics of activity in a cortical region that selects between inputs reflecting the value of two options. The model comprises two populations of excitatory pyramidal cells selective for each option, with strong recurrent excitation between cells of similar selectivity, and effective inhibition between the two pools mediated by inhibitory interneurons20 (see online methods). This effective inhibition mediates a competition between the two excitatory pools, with one pool ending up in a high firing attractor state (chosen option), and the other pool staying in a low firing attractor state (unchosen option). Neurons selective for option o receive inputs ro at firing rates proportional to the subjective value of that option, sEVo. The neurons further receive background noise inputs and currents from other cells in the network. Importantly, the network has very few free parameters that are not otherwise constrained by their biophysical plausibility. The behavior of single units in the network has been described elsewhere20,24; here, we focus on predictions suited to investigation with MEG25, namely behavior of the summed input currents to all pyramidal cells.
We simulated network behavior using a set of trials with varying sEV0 (as used in the human experiment, below). We sorted trials by overall value (sEV1+sEV2; Fig. 1A top) and value difference (sEVchosen−sEVunchosen; Fig. 1A bottom). In both cases, the network attracted faster to a decision when overall value or value difference were higher, yielding the prediction of decreased reaction times (RTs) under these conditions. We tested this prediction more formally using a multiple regression in which model RTs were predicted as a function of both overall value and value difference; both variables were found to have a negative effect on RT (fig 1B). Intuitively, the model reaches an asymmetric attractor state when the basin of attraction for this option is larger due to larger value difference (which determines the difference between the two inputs). An increase in overall value causes the network activity to rise faster and diverge faster, also resulting in faster RT.
We then performed a time-frequency analysis of network responses, which aided our subsequent comparison of model predictions with MEG data. We used Morlet wavelets to decompose network activity on each trial26, and regressed the decomposed data onto overall value and value difference. Network transitions typically took several hundred milliseconds to occur, and so most key model predictions were limited to frequencies ranging from approximately 2–10Hz (Fig. 1C). Overall value had a broadband effect on model activity in the 3–9 Hz frequency range, soon after selective inputs were delivered to the network (Fig. 1C, top), whereas value difference had a later and slightly lower-frequency effect, predominantly in the 2–4.5 Hz range (Fig. 1C, bottom). The different frequencies reflect the fact that overall value affects the population synaptic input earlier and over a shorter time period than value difference. The effect of the two regressors on network responses is a reflection of the fact that network transitions occur at different speeds depending upon the input presented; thus, the network does not explicitly ‘represent’ such quantities, but these effects are a manifestation of trial-to-trial variability in the speed of the different network transitions. If we collapsed across the relevant frequencies, the temporal progression from an overall value signal to a value difference signal could be clearly seen (Fig. 1D). It was also found that on trials where the network model made an error (i.e. sEVchosen<sEVunchosen), there was an effect of overall value on the model’s activity, but no clear effect of value difference (Fig. 1D, dashed lines). (However, it should be borne in mind that ‘error’ trials inherently covered a smaller range of value differences than ‘correct’ trials, which may have caused the absence of any effect). The key predictions derived from the model were therefore: (i) the temporal evolution from an ‘overall’ value signal to a ‘difference’ value signal; (ii) a difference in the frequency of the response, with value difference dominating responses at lower frequencies than overall value; (iii) the presence of an overall, but little or no difference signal, on error trials.
We designed a simple value-guided choice task to test these predictions. 30 subjects repeatedly selected between two options of differing value (Fig. 2) whilst undergoing MEG. Each option had a certain number of points available, represented by the width of an onscreen bar, and a probability of obtaining those points, represented by a percentage underneath the bar. The aim was to accumulate points (displayed on a progress bar) in order to reach a gold target, at which point monetary reward was delivered and the progress bar was reset to its initial position. To accumulate maximal returns, subjects should compute the objective Pascalian value (bar width multiplied by probability of winning, denoted EVo for option o), and select the option with the higher value on each trial. In fact, most subjects tended to overweight low probabilities of winning and underweight high probabilities, and exhibited a concave utility function, in accordance with predictions from Prospect theory (supplementary Fig. S1; supplementary table S1)27. Subject RTs correlated negatively with both the difference in subjective option values (=sEVchosen−sEVunchosen) and with the overall value of the decision (=sEV1+sEV2), in line with model predictions (Fig. 1B). Fig. 3A shows a typical subject, and Fig. 3B shows the group results of a multiple linear regression of value difference (T(29)=−7.98,p<0.0005) and overall value (T(29)=−2.36,p<0.05) on reaction times across all subjects (see also supplementary Fig. S2). We also included some trials in which both reward magnitude and probability were higher on one option than the other. There was an additional bonus in speed beyond that related to value for these ‘no brainer’ trials (T(29)=−8.32, p<0.0005; Fig. 3B). Subjects were therefore faster on average on these trials than on those where probability and magnitude advocated opposing choices, and so needed to be translated into a ‘common currency’ in which the two stimulus features could be equated. There was a steady decrease in reaction time as subjects progressed through the task (Fig. 3C), without any coincident change in parameters describing choice behavior (supplementary table S2), suggesting subjects became less deliberative and more automated in their choices as they became familiar with the task.
We used linearly constrained minimum variance beamforming28 to spatially filter MEG data to locations in source space. We epoched data with respect to both stimulus onset and subject response, and focused our analyses on responses in the 2–10 Hz frequency range, in accordance with model predictions. We first used a whole-brain statistical parametric mapping analysis to look for areas showing a main effect of performing the task relative to a pre-stimulus (−300ms to −100ms) or post-response (+100ms to +300ms) baseline. We hypothesized that, in addition to areas important to stimulus valuation such as ventromedial prefrontal cortex, the stimulus-locked analysis would reveal early visual areas involved in basic stimulus processing, and the response-locked analysis would reveal areas involved in visually guided manual movements in parietal and premotor cortices29, in addition to primary motor areas.
A distributed network of areas was found to be task-sensitive at these frequencies (Fig. 4A–F; supplementary movie S1, S2). Stimulus-locked, early visual cortex activation (Fig. 4A) was followed by slowly ramping bilateral activation at the frontal pole and ventromedial prefrontal cortex (Fig. 4B). Whilst 2–10Hz activity in these frontal regions peaked relatively late in the trial (1000ms after stimulus onset), it ramped from a much earlier point in the trial (see Fig. 5B and supplementary Fig. S5, discussed below). Response-locked, a prolonged activation spread from a mid-posterior portion of the superior parietal lobule, which extended medially into the marginal ramus of the posterior cingulate sulcus (Fig. 4C), to a bilateral medial portion of the mid-intraparietal sulcus (IPS) (Fig. 4D). This was followed by bilateral activation of the angular/supramarginal gyri (Fig. 4D) and right premotor cortex (Fig. 4E) before finally bilateral inferior frontal sulci and primary sensorimotor cortices (Fig. 4F) were activated at the time of the response.
Having isolated areas that showed changes in activity relative to baseline, we then examined whether activity within these regions co-varied with decision values, and where this activity matched with predictions derived from the biophysical decision model. Importantly, by selecting regions based on the main effect of task versus baseline, we ensured we would not be subject to a selection bias when examining these regions for value-related activity. We also investigated activity in several a priori defined areas commonly found to be important in functional MRI studies of decision making, bearing in mind that value correlates might not be restricted to regions showing a main effect of task versus baseline. Crucially, we applied exactly the same analysis to the timeseries from the source reconstructed MEG data as we had applied to the biophysical model (Fig. 1).
We found that activity in the right posterior superior parietal lobule (pSPL) bore several hallmarks of the biophysical model (Fig. 5A). On trials where subjects chose the option with higher subjective value (‘correct’ trials), activity in pSPL showed a broad correlate of overall value between 3 and 10 Hz (p<0.0005, permutation test, cluster corrected for multiple comparisons across time), followed by a lower frequency (2–4Hz) correlate of value difference (Fig. 5A(ii)) (p<0.01, corrected), as predicted by the model (cf. Fig. 1C). When we collapsed across the relevant frequencies (Fig. 5A(iii)), activity on these correct trials differed from error trials; error trials showed a positive correlate of overall value (dashed black line, Fig. 5A(iii)) (p<0.05, corrected), but no such positive correlate of value difference (dashed grey line in Fig. 5A(iii); compare with Fig. 1D) (p>0.5). Finally, we tested an additional model prediction: that across subjects there would be a behavioral speed-accuracy tradeoff, elicited by varying the degree of recurrent excitation in the network model, and that this would predict cross-subject variance in neural data. This prediction was also found to hold in pSPL (see supplementary information; supplementary Figs. S3–4).
We also investigated whether the main effects of task performance in this region was affected by factors shown behaviorally to modulate reaction time independently of value. We looked for changes in activity in early trials relative to late trials (where reaction time was speeded, Fig. 3C), and also compared activity in trials where reward magnitude and probability advocated opposing choices with activity on ‘no brainer’ trials (where an additional bonus to reaction time was present beyond that explained by overall value or value difference, Fig. 3B). There was some difference between the patterns of activity in pSPL on these trials; an increase in 2–5 Hz power relative to baseline that was present on the first half of trials (Fig. 5A(i) top left) was largely absent on the second half of trials (Fig. 5A(i) bottom left). A similar distinction could be seen between activity on trials where reward magnitude and probability advocated opposing choices, and a ‘common currency’ representation might need to be formed (Fig. 5A(i) top right) and ‘no brainer trials’ (Fig. 5A(i) bottom right).
We also investigated value-related activity in ventromedial prefrontal cortex (VMPFC), focusing our analyses on a subregion that has often been shown to signal value-related metrics during decision tasks11-16,30,31. Notably, there has been debate over the precise role of this region in value-guided choice6,32, perhaps triggered by the heterogeneity of responses observed there; in some functional magnetic resonance imaging (fMRI) studies it has been found to signal a difference between chosen and unchosen values14,31, whilst in others it has appeared to signal the overall value of available reward15, or the value of just the chosen option16. In VMPFC, there was an even more striking distinction between those situations where subjects would be more deliberative and exhibit slower RTs (Fig. 5B(i), top panels) versus later (Fig. 5B(i), bottom left panel) or ‘no brainer’ (Fig. 5B(i), bottom right panel) trials. VMPFC recruitment steadily decreased through the task, as could be seen more clearly when trials were further subdivided into separate quartiles of the experiment (supplementary Fig. S5). We found that this region transitioned from signaling overall value (p<0.05, corrected) to value difference (p<0.05, corrected) (Fig. 5B(ii)–(iii)) specifically if we restricted our analysis to the first half of trials in which it was task active (Fig. 5B(i)). When we directly contrasted the effect of overall value and value difference on early and late trials, we found that only the value difference signal was significantly stronger on earlier trials in this region (supplementary Fig. S6). There was not a significant effect of either overall value or value difference on error trials (Fig. 5B(iii), dashed lines), although the somewhat weaker signals in this region relative to pSPL may result from the relative insensitivity of MEG to deep, anterior sources, as opposed to posterior, superficial ones33,34, and from the analysis including only half the number of trials.
One possible concern with the differences between the first and second halves of the experiment is that it might reflect more trivial cognitive differences, such as subject fatigue, rather than a change in the cortical networks underlying choice behavior. To address these concerns, we performed an additional whole-brain analysis in which we searched for regions coding more strongly for value difference in the second than in the first half of the experiment - that is, the opposite pattern of activity witnessed in VMPFC. A bilateral portion of the anterolateral intraparietal sulcus - more lateral than the main effect pSPL activation described above - selectively reflected value difference in the second half of trials (supplementary Fig. S7). In this region, there were also no clear differences between the main effect of task performance on early versus late trials, or on harder trials versus ‘no brainers’ (supplementary Fig. S8).
Lastly, we also searched for effects of value in other regions identified in the main effect contrast of task vs. baseline (Fig. 4), and in several regions defined a priori from previous fMRI studies of value-based choice. In these analyses, we found that several areas exhibited value-dependent activity, but none of these regions matched well with predictions from the biophysical decision model (supplementary Fig. S9). We hypothesize that the value correlates in these regions might be better described by appealing to their role in other computational processes that are likely to covary with value, such as attention or response preparation. Alternatively, it may be the case that these other regions are involved in value comparison, but do so in a manner that is different to that proposed using the biophysical modeling approach.
The cortical correlates of value during decision under risk are typically spread over a distributed network of areas, but the unique contribution of each of these areas to choice is unclear. A region involved in value comparison should receive inputs relating to the value of available options, and transform these inputs into a categorical choice. We used a biophysically plausible model that exhibits this property to derive novel predictions of the temporal dynamics of cortical activity. We applied linear regression to investigate at which timepoints and in which frequency bands value correlates could be found in network activity. These responses typically occurred in low frequencies (<10 Hz), consistent with a slow integrative process. We then applied the same analysis to source-reconstructed MEG data, in order to identify regions involved in value comparison. A distributed network of areas were task-sensitive at the relevant frequencies, but only pSPL and VMPFC closely matched predictions of the biophysical model, with the latter doing so selectively in trials early in the experiment. Other regions were found to show value correlates, but did not match closely with predictions from the biophysical model; this suggests that extensions to the model are necessary to fully capture the role of different brain regions in the task. Furthermore, MEG will be limited in its ability to resolve sources from deep brain structures that do not possess an ‘open field’ layout, such as in striatum17; thus, the role of alternative mechanisms for selection (dependent upon cortico-basal ganglia loops) cannot be addressed in the current study.
A key feature of the biophysical model is the ability to slowly integrate value-related inputs, afforded by its recurrent excitatory structure and long synaptic time constants mediated by NMDA receptors. It is not immediately obvious that value comparison should be subject to a process of integration in the same way as a noisy sensory stimulus. However, the observed distribution of reaction times fits well with a process of integration, as has been investigated more closely in previous studies that used a ‘drift diffusion’ model to predict reaction times35,36. The drift diffusion model was originally designed to make predictions of behavioral data, and has often been used to make predictions of single unit activity during perceptual choice. However, because it essentially describes differences in activity between different populations of selective cells and also ignores any non-selective activity, it is not transparent how to make predictions of imaging measures such as MEG or fMRI. In this study, we elected to use a biophysical implementation of a competition model, which makes clear and explicit predictions of the measurable data. Unsurprisingly, when we used the pseudo-variable in the DDM as a marker for integrated brain activity, we found differences between the predictions (supplementary information; supplementary Fig. S10).
The predictions from the model also form a striking example of the distinction between two types of representation - ‘content’ and ‘functional’ representations - in cortical circuits37. To the external observer, recording with an imaging technique (or an electrode), the content of the network first appears to ‘represent’ the overall value and later the value difference between the two options. By contrast, the functional representations in the network - those used by the brain - are quite different. There is a representation of option values on the input to the network, and a representation of choice on the outputs of the network, as should be decoded by a suitable downstream observer. The reason that the network shows value-related activity is simply that the same network transitions occur faster on high value and high value difference trials. Hence, whilst neural activity in the network may covary with the overall value and value difference, this content need never be decoded by another brain region. Thus, the extent to which the network can be said to functionally ‘represent’ these two quantities in a meaningful way is questionable37.
The region in pSPL isolated as matching with model predictions is close to the cytoarchitectonic region hIP338, which may be the human homologue of the medial intraparietal area (MIP). It is also referred to as IPS4 and DIPSA, which resembles macaque MIP39. In the macaque, this region is often implicated in visually-guided movements of the forelimbs29. It may therefore have a role in integrating information in order to guide limb movements that is analogous to the role of LIP in generating saccades. This process of saccade generation is closely linked to the tracking of value associated with generating a saccade in a particular direction9,10.
The region in VMPFC is often found using fMRI to be responsive to the value of stimuli during decision tasks11-16,30,31, but its precise role has been debated6,32, perhaps as a result of the relative absence of published single-unit recording data in comparison to the nearby lateral orbitofrontal cortex5,7. In early trials, this region was found to transition from signaling overall value to signaling value difference. Strikingly, this same transition was also recently found in single-unit recordings from the most ventral portion of the striatum17, which receives a particularly dense projection from VMPFC40, and also in prefrontal cortex41. Similarly to the present study, this task required monkeys to combine two stimulus properties to form their decision, namely the reward magnitude and the delay to reward delivery. In our task, VMPFC was selectively activated in trials where subjects had to combine probability and magnitude information to choose accurately. This is also consistent with the finding that lesions to this area, but not nearby lateral orbitofrontal cortex, produce impairments in value comparison32, and more specifically produce changes in tasks where multiple dimensions have to be considered in forming a choice42.
Some previous studies have attempted to apply a modeling approach to capture signals from distributed cortical regions during choice, measured using fMRI. These studies have made predictions based either on drift diffusion models43 or biophysically plausible networks44 but the predictions of these models are heavily dependent upon whether fMRI signal is assumed to reflect activity from all timepoints including after a decision has been formed44, or whether it only reflects activity until the decision threshold is reached43. Moreover, several key predictions of these models also relate to how their activity evolves over time as a decision is made, and the slow hemodynamic response means fMRI is limited in how well it can tease apart these predictions of temporal dynamics. We argue that it is important to use a time-resolved technique, such as MEG, to test these predictions.
Biophysically-inspired models have also been used to infer the structure of connections between or within different cortical areas from M/EEG data45. However, these studies have not inferred the specific neuronal mechanism underlying a particular cognitive process, as we have proposed here. The present model performs the critical computation of transforming value-related inputs into a choice, and does so in a way that has captured single-unit activity during perceptual decision tasks. The application of this computational biophysical modeling approach may not be limited to decision making paradigms. Novel predictions might, for instance, be derived from biophysical models already designed to capture single unit data in inhibitory control or working memory processes46. In models of working memory, for instance, gamma-band (30–70 Hz) responses can be elicited46, and parametric modulation of input to these models may explain variation in gamma-band frequencies seen during working memory tasks in frontal cortex47. Alternatively, by varying internal parameters of a biophysical model, novel predictions might be derived of the effects of cross-subject variation on cortical responses measurable with M/EEG (see also supplementary discussion). Because these parameters relate to specific biophysical properties such as the density of network connectivity or the concentration of a specific neurotransmitter, it may be possible to directly relate these parameters to cross-subject variation in these properties, for instance via local measurements of neurotransmitter concentrations48, or perhaps genetic polymorphism or pharmacological challenge.
We thank V. Litvak and G. Barnes for many helpful discussions, C. Stagg and K. Friston for comments on the manuscript, T. Nichols for assistance implementing 4D cluster-based permutation testing, and S. Braeutigam and A. Rao for help with data collection. This work was supported by the Wellcome Trust (L.T.H., N.S.K., M.W.W., T.E.J.B.), CONNECT (L.T.H., T.E.J.B.), the UK MRC (M.F.S.R.) and the UK EPSRC (M.W.W.). The project “CONNECT” acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 238292.
Author Contributions. L.T.H., T.E.J.B. and M.F.S.R. designed experiment; L.T.H. and N.S.K. collected data; A.S. and L.T.H. built models and analysed model predictions; M.W.W. wrote code for source reconstruction; L.T.H., T.E.J.B., N.S.K. and M.W.W analysed data; L.T.H., M.F.S.R. and T.E.J.B. wrote the paper. All authors discussed the results and commented on the manuscript.
Author Information. Reprints and permissions information is available at www.nature.com/reprints.
Supplementary information is linked to the online version of the paper at www.nature.com/neuro