Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Neurosci. Author manuscript; available in PMC 2012 March 14.
Published in final edited form as:
PMCID: PMC3303166

Behavioral and neural properties of social reinforcement learning


Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based upon work in non-human primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging (fMRI). Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis - social preferences, response latencies and modeling neural responses – are consistent with reinforcement learning theory and non-human primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one’s peers in altering subsequent behavior.

Keywords: social learning, reward, ventral striatum, fMRI, reinforcement, secondary reinforcer


Successfully navigating our social environment relies upon learning from positive and negative encounters with others and shaping future behavior toward those individuals. Psychologists have proposed that positive social exchanges are fundamentally rewarding for humans (Bandura and Walters, 1963; Baumeister and Leary, 1995; Steinberg, 2008), suggesting that learning from social interactions may draw upon basic reinforcement learning mechanisms. The present study was designed to test this hypothesis by building upon reinforcement learning studies in non-human primates and human imaging studies (Schultz et al., 1997; Fiorillo et al., 2003; McClure et al., 2003; D’Ardenne et al., 2008).

Reinforcement learning from primary (e.g. food) and secondary reinforcers (e.g. money) has been shown to engage specific neural circuitry. In its simplest form it is explained by the classic Rescorla-Wagner model (Rescorla and Wagner, 1972). According to this model, learning to associate arbitrary cues with positive outcomes results in expectations of future positive outcomes in the presence of these cues. If there are discrepancies between the expected outcome to the cue and the actual outcome, a prediction error signal is generated. Non-human primate and human imaging studies have implicated the ventral striatum and orbital frontal cortex (OFC) in prediction error signaling (Schultz et al., 1997; Berns et al., 2001; Fiorillo et al., 2003; McClure et al., 2003). Studies have shown that as cues become reliably associated with receipt of a reward, manual responses to these cues quicken over time (O’Doherty et al., 2006; Spicer et al., 2007), while others demonstrate changes in choice behaviors based on reinforcement manipulations (Daw et al., 2006; Li and Daw, 2011). The learned association generates a neural signal to the cue that previously was associated with the reward itself (Schultz et al., 1997; O’Doherty et al., 2006). The current study examines whether similar changes in behavior (response latencies) and neural circuitry engaged during basic reinforcement learning are involved during learning within a social context.

This study tests the extent to which social reinforcement learning relies on similar learning mechanisms as those employed in basic reinforcement learning. To do so, we created a task in which participants learned to differentiate three peers, each of whom was associated with a unique probability of social reinforcement (i.e., providing socially accepting feedback). Social reinforcement learning processes were evaluated at three levels of analysis – preference ratings, response latencies, and neural responses to expected cue values and prediction errors. We hypothesized that social preference ratings would become more favorable, and response latencies would become faster, toward the peer with the greatest probability of providing social acceptance to the participant. We applied a simple Rescorla-Wagner rule in behavioral and functional imaging analyses to target the neural bases of these behavioral changes, hypothesizing that the ventral striatum and OFC would code prediction error signals (Schultz et al., 1997; O’Doherty, 2007). Thus, the current study elucidates neurobiological mechanisms for key learning processes during social exchanges that shape behavior through positive interactions.

Materials & Methods


Forty-six adults (22 females; aged 18–28 years) participated in the experiment. Thirty-six completed the task during functional magnetic resonance imaging (fMRI) (19 females, aged 18–28 years, all right-handed). Three individuals in the fMRI group were excluded due to less than 80% accuracy in any condition (n = 2, 1 male) or non-compliance with the task (n = 1, male). Participants reported no history of neurological and/or psychiatric disorders in a standard screening and the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID; First et al., 2007) and imaging participants reported no contraindications for an MRI. Two participants did not complete the SCID due to time constraints. All participants provided written consent approved by the Institutional Review Board at Weill Cornell Medical College and were debriefed and compensated following their participation.

Experiment Cover Story

The experiment was conducted during two separate sessions. The first session introduced the cover story leading participants to believe they would receive actual social feedback during a task that would be completed on the second visit. Participants were shown up to five photographs of gender- and ethnicity-matched peers. They then selected three with whom they would like to interact, and rated the three peers for how likeable and attractive they looked on a scale from 1 (not very) to 10 (very). Participants also completed a personal survey where they listed information about themselves (birthday, hometown, and favorite music, TV shows, books, quotes and activities). Participants were told that each of the three selected peers would see their survey over the next few days as well as the surveys of two other supposed participants. These three peers would write notes indicating a positive interest in the participant’s survey or in one of the other two surveys. Participants were told that each of these individuals could write a small number of notes, emphasizing their limited number and enhancing the positive value of receiving a note. Participants were then scheduled for a second session.

At the second session, participants were told that the experimenters had compiled the notes from the three selected peers. During the experiment, participants would be shown how often each of the peers decided to write notes to them (termed ‘positive social reinforcement’) or to one of the other supposed participants (termed ‘no positive social reinforcement’). Although it is possible that participants experienced the ‘no positive social reinforcement’ trials as mildly rejecting, we have chosen not to adopt this interpretation because we do not have conclusive data speaking to this possibility. Rather, these operational definitions were selected for consistency with studies of basic reward learning. At the beginning of the second session, participants were also reminded that receiving a note symbolized that the peer was interested in something written in their personal survey.

Unbeknownst to the participants, peer interaction (i.e. delivery of notes) was experimentally manipulated such that each of the three peers was associated with a distinct probability of social reinforcement (Figure 1A) with: 1) “Rare” interaction defined by positive social reinforcement (notes) on 33% of the trials and no positive social reinforcement on 66% of the trials; 2) “Frequent” interaction defined by positive social reinforcement on 66% of the trials and no positive social reinforcement on 33% of the trials; and 3) “Continuous” interaction defined by positive social reinforcement on all trials (100%). The probability of reinforcement associated with each of the face stimuli was counterbalanced across participants to equate for low-level stimulus features across conditions.

Figure 1
Task Parameters. A) Three peers chosen by the participant are associated with distinct probabilities of positive reinforcement. B) Schematic of one trial within a run. The face of one peer (Cue) is displayed for 2 seconds, during which the face stimulus ...

Task Parameters

At the start of each trial (Figure 1B), a picture of one of the three peers was presented for two seconds (Cue). During the two seconds, the stimulus would wink for 500msec in either the left or right eye indicating that a note was ready to be passed. Participants signaled that they were ready to receive the note by pressing one of two buttons indicating whether the wink was in the left or the right eye. This behavioral component was included to ensure attention and to collect reaction time data as an index of learning about the reinforcement contingencies for each of the three peers across the experiment. After a jittered inter-stimulus interval of a picture of a folded note (2, 4, 6 or 8 sec), three hands appeared at the bottom of the screen with one hand holding a note for two seconds (Feedback). Participants had been instructed that if the middle hand held the note, this signified that the participant had received a note from that peer (positive social reinforcement). If the note appeared in one of the hands to the left or right of the middle hand, this signified that the note was given to someone else (no positive social reinforcement). If the participant pressed incorrectly or did not respond during the cue, no feedback was given. A jittered inter-trial interval (2, 4, 6 or 8 sec) followed where participants rested while viewing a fixation crosshair. Participants viewed 18 trials per run in a pseudo-randomized order with six trials per condition (Rare, Frequent, Continuous) for six runs, for a total of 108 trials, 36 trials per condition. To enhance the believability of the cover story and keep participants engaged, one of the supposed “notes” was shown between each run, which were generated by the experimenters and always indicated positive interest in the participant’s personal survey (e.g. ‘I love playing soccer too, and I am part of a weekend league’; ‘Where did you go when you visited Hawaii?’; ‘I also have a golden retriever’).

To further index learning with the reaction time data at the end of the experiment, participants completed a reversal run (18 trials) after the six experimental runs while reaction times were recorded. Contingencies were reversed for the Rare and Continuous conditions such that the “Rare” peer now provided 100% reinforcement to the participant, and the “Continuous” peer now provided 33% reinforcement to the participant. The “Frequent” peer’s probability (66%) did not change.

The task was presented using E-Prime software, and the participants who completed the task during fMRI viewed images on an overhead liquid crystal display (LCD) panel with the Integrated Functional Imaging System-Stand Alone (IFIS-SA) (fMRI Devices Corporation, Waukesha, WI). E-Prime software, integrated with the IFIS system, recorded button responses and reaction times using the Fiber Optic Button Response System (Psychology Software Tools, Inc, Sharpsburg, PA).

At the end of the experiment, participants completed posttest ratings of attractiveness and likeability for each peer on the same scale used at the beginning of the experiment. To assess whether participants held explicit knowledge of the social reinforcement contingencies associated with each peer, they were asked whether any of the three peers provided positive reinforcement more often than any others. If the participant said yes, they were asked to describe what pattern they noticed, and descriptions were scored based on whether the participant accurately stated which peer provided the most, middle and least positive social feedback. Three of the 43 participants correctly ranked the three peers in this way and were thus considered explicitly aware of the social reinforcement contingencies. Participants were then debriefed regarding the cover story and the rationale of the experiment.

Image Acquisition

Participants were scanned with a General Electric Signa HDx 3.0T MRI scanner (General Electric Medical Systems, Milwaukee, WI) with a quadrature head coil. A high resolution, 3D magnetization prepared rapid acquisition gradient echo anatomical scan (MPRAGE) was acquired (256 × 256 in-plane resolution, FOV=240 mm; 124 1.5 mm sagittal slices). Functional scans were acquired with a spiral in and out sequence (Glover and Thomason, 2004) (repetition time TR = 2000 ms, echo time = 30 ms, flip angle = 90 degrees). Twenty-nine 5-mm thick contiguous coronal slices were acquired per TR, for 129 TRs per functional run with a resolution of 3.125 × 3.125 mm (64 × 64 matrix, FOV = 200 mm) covering the entire brain except for the posterior portion of the occipital lobe.

Behavioral Analysis

Change in attractiveness and likeability of the peers before and after the task was tested with a 3 (probability: Rare, Frequent, Continuous) X 2 (time: before task, after task) repeated measures analysis of variance (ANOVA) using PASW Statistics 18 software (SPSS, Chicago, IL). Attractiveness and likeability ratings for three of the 43 participants were lost due to technical error.

Reaction times were analyzed in response to the cue after the wink occurred. Reaction times were z-score transformed for each individual after removing outliers (defined as reaction times 3 standard deviations above or below the individual’s mean reaction time). Changes in reaction times and accuracy for the three conditions during the early and late trials were each tested with a 3 (probability: Rare, Frequent, Continuous) X 2 (time: first half of trials (early), second half of trials (late)) repeated measures ANOVA.

To test for reaction time modulation as a function of contingency reversal, we compared reaction times from the sixth run of the experiment to the reversal run with a 2 (probability: Rare and Continuous) X 2 (time: 6th run and reversal run) repeated measures ANOVA.

Prior research has demonstrated that not receiving reinforcement on a given trial modulates behavioral responses on the next trial (Liu et al., 2007). To determine whether reinforcement outcome influenced response latencies on the subsequent trial, we compared reaction times from trials when the participant had received positive social reinforcement on the preceding trial versus when they had not with a paired samples t-test.

Reinforcement Learning Model

We used a simple reinforcement learning algorithm (Rescorla-Wagner) to model the trial-by-trial variance in participants’ reaction times (Rescorla and Wagner, 1972). The Rescorla-Wagner rule probes learning through a prediction error (PE) signal δ, which is the difference between the experienced outcome (R: positive social feedback or no positive feedback) and expected outcome (V) for each trial. PE takes the form of δ = R-V and can be used to subsequently update expected outcome weighted by a fixed learning rate α: Vt+1 = Vt + αδt for given trial t. Reaction time has been shown in previous studies to be a reliable indicator of learning contingencies and speeding or slowing in reaction times has been associated with conditioning as predicted by reinforcement learning models (Seymour et al., 2004; Bray and O’Doherty, 2007). We thus fitted the Rescorla-Wagner model to participants’ trial-by-trial z-score transformed reaction times using a linear regression model to derive the best-fitting model parameters (α & V0). We tested the rate of learning for each subject based on his or her individual reaction time history, which yielded an average learning rate (α) of 0.15 across participants, suggesting learning effects on reaction time measures (one-sample t-test of learning rate versus null hypothesis of 0: p < 0.001). The average learning rate of participants who completed the behavioral version of the experiment was comparable to the imaging sample (p > 0.3), suggesting consistency in our model.

Imaging Analysis

The fMRI data analyses were performed with Analysis of Functional Neuroimages (AFNI) software (Cox, 1996). Functional data were slice-time corrected, realigned within and across runs to correct for head movement, co-registered with the high resolution anatomical scan, scaled to percent signal change units, and smoothed with a 6 mm full-width at half maximum (FWHM) Gaussian kernel. Images with movement greater than 2 mm along the x, y, or z planes were excluded from the analysis. Functional data were transformed into standard Talairach coordinate space (Talairach and Tournoux, 1988) by using the warping parameters obtained from the Talairach transformation of the high resolution anatomical scan. Talairach-transformed functional data were resampled to a resolution of 3 × 3 × 3 mm.

For imaging analysis, we generated a linear reinforcement learning model with linear regression using reaction times of all participants to obtain a single set of signed model parameters (α & V0) that best fit participants’ behavior (r = 0.19, p < 0.001). This approach has been suggested to be less susceptible to extreme parameter value estimation for individual participants and tends to more stable (Daw et al., 2006; Bray and O’Doherty, 2007; Li et al., 2011). The learning rate (α = 0.07) defined from modeling of the behavioral data was used to generate the PE and expected outcome values that were included as parametric regressors with signed numbers in individual-subject general linear models.

A general linear model analysis was performed to estimate neural responses to stimuli as a function of reinforcement learning. Each participant’s GLM contained five task regressors: 1) cue onset times, defined as the time points at which peer faces were presented; 2) a parametric regressor paired with cue timings containing expected value estimates for each trial (Vt); 3) feedback onset times, containing values corresponding to the time points at which the note feedback was presented; 4) a parametric regressor paired with feedback onset time representing prediction error values (δt); 5) incorrect trial onset times. Task regressors were convolved with a gamma-variate hemodynamic response function. Regressors of non-interest included motion parameters and linear and quadratic trends for each run. Separate random effects group analyses were conducted on individual participant beta estimates for the parametric regressor representing prediction error values (δt) and individual participant beta estimates for the parametric regressor representing expected values to the cues (Vt).

To test for basic effects of prediction error during the feedback presentation, a within-subjects voxel-wise one-sample t-test was performed to identify regions demonstrating activity that positive correlated with prediction error learning signals. To identify neural responses to expected values during the cue presentation of the trials, a within-subjects voxel-wise one-sample t-test was performed to identify regions showing activity that positively correlated with expected values to the cues. Results of all whole-brain analyses were considered significant by exceeding a p-value/cluster size combination (p < 0.005/50 voxels) that corresponded to whole-brain p < 0.05, corrected for multiple comparisons as calculated with Monte Carlo simulations in AFNI.

As the OFC has been implicated in prior studies of prediction error learning (Berns et al., 2001; Takahashi et al., 2009), we selected this region as an a priori structure in which to search for learning signals. Specifically, an anatomical mask of the bilateral OFC was created, encompassing Brodmann areas 11 and 47 (x = −50: 51, y = 10: 57, z = −3: −23; 22,086 mm3 voxels). Group analyses conducted within this mask applied p < 0.05 small volume corrected (svc) statistical thresholding.



Likeability and Attractiveness Ratings

Enhanced social preference for peers was modulated by the probability of reinforcement experienced during the experiment, as indicated by a significant interaction between reinforcement probability and time (pre-interaction, post-interaction) on likeability ratings (F(2,78) = 5.48, p < 0.01; Figure 2A). Post-hoc analyses indicated that post-task ratings decreased linearly with decreasing interaction probability, such that peers who interacted less with the participant were rated as less likeable (linear term F(1,39) = 7.17, p < 0.02). Whereas pre-task likeability ratings were equivalent for all three peers (p’s > 0.48), after the task the Frequent (t(39) = −2.26, p < 0.03) and Continuous (t(39) = −2.68, p < 0.02) peers were rated as more likeable than the rarely reinforcing peer, though there was not a significant difference in likeability ratings after the task between the Frequent and Continuous peers (t(39) = −0.69, p > 0.49). Attractiveness ratings were not significantly modulated by task conditions (main effects of reinforcement probability, time and interactions p’s > 0.09).

Figure 2
Behavioral responses to cues. A) Likeability ratings for the three peers before engaging in the task (pre-interaction) and after the task (post-interaction). B) Reaction times to the wink for the three peers, broken down by early and late trials of the ...


Participants responded correctly to 95.63 % of trials, (SD = 3.54 %). Response accuracy was not significantly modulated by the task conditions (main effects of reinforcement probability, time and interactions p’s > 0.29).

Reaction Time

Response latencies to the cue varied as participants learned the reinforcement contingency outcomes associated with each peer, as indicated by a significant interaction between probability of reinforcement and time (F(2,84) = 3.98 p < 0.03; Figure 2B). Post-hoc t-tests showed that whereas there was no difference in reaction times in the early trials (all p’s > 0.41), individuals were faster during the late trials of the Frequent reinforcement condition (t(42) = 2.49, p < 0.02), as compared to the Rare reinforcement condition. There was a trend for responses being faster in the Continuous reinforcement condition (t(42) = 2.01, p < 0.06) than in the Rare reinforcement condition. Overall, participants were faster during the late versus early trials (F(1,42) = 15.21, p < 0.01) and there was no main effect of probability of social reinforcement when collapsing across time (F(2,84) = 1.43, p > 0.25).

To further test for the effects of learning, we compared reaction times to Rare and Continuous reinforcement before and after reinforcement contingencies were reversed at the end of the experiment. Evidence that participants had implicitly learned the contingencies was further supported by the interaction between time (6th run versus reversal) and reinforcement probability (rarely reinforcing versus continuously reinforcing) on reaction times (F(1,42) = 10.15, p < 0.01; Figure 2C). Post-hoc tests showed a significant reaction time speeding when the Rare condition switched to delivering Continuous reinforcement (t(42) = 3.13, p < 0.01). There were no main effects of reinforcement probability (F(1,42) = 2.77, p = 0.1) or time (F(1,42) = 1.35, p > 0.25). There was also no difference in the Frequent condition (unchanged during reversal) reaction times between the last run and the reversal run (t(42) = −1.48, p > 0.15).

Additionally, we examined how reaction times changed based upon feedback from the preceding trial as another index of how the reinforcement contingencies altered behavior. We found that participants were faster on the subsequent trial after not receiving positive reinforcement (mean z-score RT: −0.01, SD: 0.13) versus when they had received positive reinforcement (mean z-score RT: 0.07, SD: 0.1; t(42) = 2.86, p < 0.01).


Prediction Errors

As indexed by the prediction error parametric regressor, prediction error signals (δt) were positively associated with activity in the rostral anterior cingulate cortex, ventral striatum, anterior insula, and OFC (see Table 1 & Figure 3). The parametric values in the general linear model encompassed positive and negative prediction errors, demonstrating that the BOLD fluctuations in these regions tracked learning signals reflecting reinforcement expectancies. Together these findings delineate an orbital frontostriatal circuit showing significantly greater activity associated with the unexpected outcomes of either receiving or not receiving positive social reinforcement.

Figure 3
Brain regions reflecting positive correlations with prediction errors. A. Circles denote activity in the ventral striatum. Image threshold p < 0.05, whole brain corrected. B. Circle denotes activity in the lateral orbital frontal cortex. Image ...
Table 1
Brain regions reflecting positive correlations with prediction errors. Coordinates represent activation clusters exceeding p < 0.05, whole-brain corrected, thresholding and are listed in Talairach & Tournoux coordinate space.

Expected Cue Values

We also examined regions of the brain that positively correlated with learning to distinguish the faces of the peers based on their differential rates of positive social reinforcement (learned cue value). Specifically, group analysis of the cue phase of trials that tracked positively with modulations of expected value (Vt) identified greater activity in the rostral anterior cingulate cortex with larger expected value (see Figure 4). No other regions survived whole brain correction. No regions within the frontostriatal circuitry of interest demonstrated negative correlations with expected value (i.e. brain regions sensitive to lower expected values) at corrected thresholding.

Figure 4
Neural activity with positive correlations with learned cue value. Activity in the rostral anterior cingulate cortex reflects a positive correlation with expected values for the cues. Image threshold p < 0.05, whole brain corrected. Statistical ...


Repeated social exchanges shape our behavior toward others. In this experiment, we examined how different probabilities of positive interaction from distinct peers rapidly influence social learning. Within a reinforcement learning framework, we developed a novel social paradigm and demonstrated that the neural systems engaged while forming social expectations are similar to those involved in basic reward learning. This overlap in neural circuitry and function is consistent with prediction error related learning and with our hypothesis that positive social interactions can serve as secondary reinforcers, taking on the attributes of primary rewards essential to survival (e.g., food).

Over the course of the experiment, participants learned to differentiate each of the cues (peers) by their distinct reinforcement outcomes. Specifically, one peer always provided positive social reinforcement, another one frequently provided positive social reinforcement, and the third rarely provided positive social reinforcement. Ratings of likeability changed from the beginning to the end of the experiment, with less reinforcing peers becoming less likeable, and more reinforcing peers yielding higher ratings of likeability by the end of the task. By asking participants to make a simple button response during the cue presentation, we tested whether speeding of response latencies (action tendencies) indexed learned associations between a given peer and their probability of providing positive social reinforcement. As expected due to the simplicity of the task, accuracy was at ceiling and there were no statistical differences in accuracy for the three peers.

In the current study, we observed faster responses to peers who provided positive social reinforcement more often, similar to studies where participants respond more quickly to cues that reliably predict receiving a primary or secondary reward (O’Doherty et al., 2006; Spicer et al., 2007). Measuring differences in reaction times to cues to index learning differs from reinforcement studies that use modulated choice behavior as an indicator of learning (Tanaka et al., 2004; Daw and Doya, 2006; Schonberg et al., 2007). Choice tasks index changes in explicit preferences or a participant’s strategy in maximizing reinforcement, while in the current study changes in responses are thought to index differences in approach behaviors that are based upon learning from a prior history of social feedback. In addition, participants showed faster reaction times after trials that did not provide positive social reinforcement. This finding is similar to studies that demonstrate improved performance on a trial that follows receiving punishment (Hester et al., 2010) or choosing to make a bet more often after losing money than winning money (Liu et al., 2007), though the present study did not assess strategic behavior directly. Taken together, the behavioral findings demonstrate that participants learned the reinforcement contingencies and thus provide an objective index of social learning.

The changes in likeability ratings and response latencies did not appear to be conscious behavioral choices. The majority (93%) of participants were unable to articulate the reinforcement patterns, suggesting little if any explicit awareness of the reinforcement contingencies. These findings demonstrate that social preferences and actions can be influenced after only brief encounters with peers and without conscious awareness. Such rapid changes highlight the influence of positive social interactions on effectively altering subsequent behavior.

The neural correlates of these behavioral changes draw upon the same neural circuitry as that implicated in reinforcement learning (Alexander et al., 1986; Haber and Knutson, 2010). Prediction error (δt) learning engaged the ventral striatum and orbital frontal cortex, similar to previous studies using single cell recordings (Schultz et al., 1997; Fiorillo et al., 2003; Sul et al., 2010) and human imaging studies with primary reinforcers such as juice (McClure et al., 2003; O’Doherty et al., 2003; D’Ardenne et al., 2008) and secondary reinforcers such as money or attractive faces (Bray and O’Doherty, 2007; Valentin and O’Doherty, 2009). Together, these findings support a role for the orbital frontostriatal circuit in generating learning signals from positive social reinforcement and provide a neural basis for how feedback during a social interaction is flexibly updated in order to inform subsequent social expectations.

The present study is distinct in its capacity to test whether registering violations in expectations of social acceptance draws upon basic mechanisms that support prediction error learning. Using a simple Rescorla-Wagner learning model, we show that violations in expected social interaction are tightly coupled with changes in ventral striatal activity. No prior studies to our knowledge have applied a classic reinforcement learning model in the examination of learning from social reinforcers. In the social domain, studies have modeled trial-by-trial decisions about charitable donations (Hare et al., 2010) or intentions to trust a partner during economic exchanges (King-Casas et al., 2005). Furthermore, the current paradigm is distinct from previous studies that compare social acceptance to rejection (Eisenberger et al., 2003; Somerville et al., 2006; Guyer et al., 2009), as it targets the process of learning from the social feedback, rather than comparing acceptance to rejection. Therefore, the present study offers a unique explanation for how we learn from positive social interactions.

Our finding that the striatum is sensitive to expectations about receiving social feedback converges with other work targeting the neural mechanisms of social learning. Recently, Harris and Fiske (2010) showed sensitivity in this region to violations in expectations about personality trait information, and others have shown that the striatum is sensitive to violations of social group norms (Klucharev et al., 2009; Phan et al., 2010) as well as forming predictions about investors’ decisions (King-Casas et al., 2005; Phan et al., 2010). Our results complement these studies by demonstrating a neural mechanism for how prior positive interactions with others shape our expectations for future interactions. Given the increased sensitivity in the ventral striatum to appetitive stimuli during adolescence (Galvan et al., 2006; Somerville et al., 2010), as well as the greater influence of peers during adolescence (Spear, 2000; Gardner and Steinberg, 2005), this work clearly raises the question of how peer interaction differentially impacts learning and behavior across development and how this may be differentially represented in the brain. Accordingly, it would be interesting to explore whether adolescents show increased sensitivity during social learning relative to children and adults.

The expected values (Vt) to the cues corresponded with greater activity in the rostral anterior cingulate cortex. Previous studies have shown the rostral anterior cingulate cortex/medial prefrontal cortex is sensitive to cues that predict reward receipt (Tanaka et al., 2004; Knutson et al., 2005; Palminteri et al., 2009) and may play a role in general learning about the value of information and utilizing this information for future decisions (Rushworth and Behrens, 2008). Lesion studies in non-human primates have shown this region is important for establishing patterns of social interest in other individual male or female macaques (Rudebeck et al., 2006). Human imaging studies have shown this region is sensitive when choosing to approach peers relative to celebrities (Guroglu et al., 2008) and when engaging in a series of actions during live relative to recorded interactions (Redcay et al., 2010). Given these studies examining social value in the anterior cingulate cortex, and the extensive literature showing a general sensitivity of this region in monitoring response conflict (Botvinick et al., 1999; Botvinick et al., 2004), our findings suggest that learning social cue values drives changes in behavior that may differ or conflict with the cognitive demands of the situation (e.g. task demands). Over the course of the experiment, this conflict may increase as behavior is modulated in response to changing expected values. Although the current study did not find that the orbital frontostriatal circuit was sensitive to expected values, the findings in the anterior cingulate cortex may suggest a role for this region in processing behavioral tendencies towards learned social cues. These findings thus offer insight into the neural processing of quick social decisions.


Our findings provide direct evidence for how brief, positive, social interactions can significantly shape social learning across three discrete measures - social preferences, behavioral actions, and neural activity. After short interactions with others, social preferences and actions can be altered, highlighting the significance of social acceptance in biasing behavior. Moreover, we show that formal computational models of reinforcement learning apply to secondary reinforcement learning in the social domain. We demonstrate that the neural circuitry involved in forming prediction error signals about receiving social reinforcement, including the ventral striatum and orbital frontal cortex, overlaps with circuitry that subserves learning about other types of rewards (e.g. food or money). Overall, the findings suggest that similar mechanisms underlie basic reinforcement learning and our ability to rapidly and flexibly update our expectations during interactions with others, which enables us to effectively navigate the social environment.


We gratefully acknowledge the assistance of the resources and staff at the Biomedical Imaging Core Facility of the Citigroup Biomedical Imaging Center at Weill Cornell Medical College. This work was supported by NIDA R01 DA018879, NIDA T-32 training grant DA007274, the Mortimer D. Sackler family, and the Dewitt-Wallace fund.


The authors report no conflicts of interest.


  • Alexander GE, DeLong MR, Strick PL. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu Rev Neurosci. 1986;9:357–381. [PubMed]
  • Bandura A, Walters RH. Social learning and personality development. New York: Holt, Rinehart & Winston; 1963.
  • Baumeister RF, Leary MR. The need to belong: desire for interpersonal attachments as a fundamental human motivation. Psychol Bull. 1995;117:497–529. [PubMed]
  • Berns GS, McClure SM, Pagnoni G, Montague PR. Predictability modulates human brain response to reward. J Neurosci. 2001;21:2793–2798. [PubMed]
  • Botvinick M, Nystrom LE, Fissell K, Carter CS, Cohen JD. Conflict monitoring versus selection-for-action in anterior cingulate cortex. Nature. 1999;402:179–181. [PubMed]
  • Botvinick MM, Cohen JD, Carter CS. Conflict monitoring and anterior cingulate cortex: an update. Trends Cogn Sci. 2004;8:539–546. [PubMed]
  • Bray S, O’Doherty J. Neural coding of reward-prediction error signals during classical conditioning with attractive faces. J Neurophysiol. 2007;97:3036–3045. [PubMed]
  • Cox RW. AFNI: Software for analysis and visualization of functional magnetic resonance neuroimages. Computers and Biomedical Research. 1996;29 [PubMed]
  • D’Ardenne K, McClure SM, Nystrom LE, Cohen JD. BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science. 2008;319:1264–1267. [PubMed]
  • Daw ND, Doya K. The computational neurobiology of learning and reward. Curr Opin Neurobiol. 2006;16:199–204. [PubMed]
  • Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441:876–879. [PMC free article] [PubMed]
  • Eisenberger NI, Lieberman MD, Williams KD. Does rejection hurt? An FMRI study of social exclusion. Science. 2003;302:290–292. [PubMed]
  • Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science. 2003;299:1898–1902. [PubMed]
  • First MB, Spitzer RL, Gibbon M, Williams JBW. Structured Clinical Interview for DSM-IV-TR Axis I Disorders - Non-patient Edition (SCID-I/NP, 1/2007 revision) New York: Biometrics Research, New York State Psychiatric Institute; 2007.
  • Galvan A, Hare TA, Parra CE, Penn J, Voss H, Glover G, Casey BJ. Earlier development of the accumbens relative to orbitofrontal cortex might underlie risk-taking behavior in adolescents. J Neurosci. 2006;26:6885–6892. [PubMed]
  • Gardner M, Steinberg L. Peer influence on risk taking, risk preference, and risky decision making in adolescence and adulthood: an experimental study. Dev Psychol. 2005;41:625–635. [PubMed]
  • Glover GH, Thomason ME. Improved combination of spiral-in/out images for BOLD fMRI. Magn Reson Med. 2004;51:863–868. [PubMed]
  • Guroglu B, Haselager GJ, van Lieshout CF, Takashima A, Rijpkema M, Fernandez G. Why are friends special? Implementing a social interaction simulation task to probe the neural correlates of friendship. Neuroimage. 2008;39:903–910. [PubMed]
  • Guyer AE, McClure-Tone EB, Shiffrin ND, Pine DS, Nelson EE. Probing the neural correlates of anticipated peer evaluation in adolescence. Child Dev. 2009;80:1000–1015. [PMC free article] [PubMed]
  • Haber SN, Knutson B. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology. 2010;35:4–26. [PMC free article] [PubMed]
  • Hare TA, Camerer CF, Knoepfle DT, Rangel A. Value computations in ventral medial prefrontal cortex during charitable decision making incorporate input from regions involved in social cognition. J Neurosci. 2010;30:583–590. [PubMed]
  • Harris LT, Fiske ST. Neural regions that underlie reinforcement learning are also active for social expectancy violations. Soc Neurosci. 2010;5:76–91. [PubMed]
  • Hester R, Murphy K, Brown FL, Skilleter AJ. Punishing an error improves learning: the influence of punishment magnitude on error-related neural activity and subsequent learning. J Neurosci. 2010;30:15600–15607. [PubMed]
  • King-Casas B, Tomlin D, Anen C, Camerer CF, Quartz SR, Montague PR. Getting to know you: reputation and trust in a two-person economic exchange. Science. 2005;308:78–83. [PubMed]
  • Klucharev V, Hytonen K, Rijpkema M, Smidts A, Fernandez G. Reinforcement learning signal predicts social conformity. Neuron. 2009;61:140–151. [PubMed]
  • Knutson B, Taylor J, Kaufman M, Peterson R, Glover G. Distributed neural representation of expected value. J Neurosci. 2005;25:4806–4812. [PubMed]
  • Li J, Daw ND. Signals in human striatum are appropriate for policy update rather than value prediction. J Neurosci. 2011;31:5504–5511. [PMC free article] [PubMed]
  • Li J, Delgado MR, Phelps EA. How instructed knowledge modulates the neural systems of reward learning. Proc Natl Acad Sci U S A. 2011;108:55–60. [PubMed]
  • Liu X, Powell DK, Wang H, Gold BT, Corbly CR, Joseph JE. Functional dissociation in frontal and striatal areas for processing of positive and negative reward information. J Neurosci. 2007;27:4587–4597. [PubMed]
  • McClure SM, Berns GS, Montague PR. Temporal prediction errors in a passive learning task activate human striatum. Neuron. 2003;38:339–346. [PubMed]
  • O’Doherty JP. Lights, camembert, action! The role of human orbitofrontal cortex in encoding stimuli, rewards, and choices. Ann N Y Acad Sci. 2007;1121:254–272. [PubMed]
  • O’Doherty JP, Buchanan TW, Seymour B, Dolan RJ. Predictive neural coding of reward preference involves dissociable responses in human ventral midbrain and ventral striatum. Neuron. 2006;49:157–166. [PubMed]
  • O’Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ. Temporal difference models and reward-related learning in the human brain. Neuron. 2003;38:329–337. [PubMed]
  • Palminteri S, Boraud T, Lafargue G, Dubois B, Pessiglione M. Brain hemispheres selectively track the expected value of contralateral options. J Neurosci. 2009;29:13465–13472. [PubMed]
  • Phan KL, Sripada CS, Angstadt M, McCabe K. Reputation for reciprocity engages the brain reward center. Proc Natl Acad Sci U S A. 2010;107:13099–13104. [PubMed]
  • Redcay E, Dodell-Feder D, Pearrow MJ, Mavros PL, Kleiner M, Gabrieli JD, Saxe R. Live face-to-face interaction during fMRI: a new tool for social cognitive neuroscience. Neuroimage. 2010;50:1639–1647. [PMC free article] [PubMed]
  • Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical Conditioning II: Current Research and Theory. New York: Appleton Century Crofts; 1972. pp. 64–99.
  • Rudebeck PH, Buckley MJ, Walton ME, Rushworth MF. A role for the macaque anterior cingulate gyrus in social valuation. Science. 2006;313:1310–1312. [PubMed]
  • Rushworth MF, Behrens TE. Choice, uncertainty and value in prefrontal and cingulate cortex. Nat Neurosci. 2008;11:389–397. [PubMed]
  • Schonberg T, Daw ND, Joel D, O’Doherty JP. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. J Neurosci. 2007;27:12860–12867. [PubMed]
  • Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. [PubMed]
  • Seymour B, O’Doherty JP, Dayan P, Koltzenburg M, Jones AK, Dolan RJ, Friston KJ, Frackowiak RS. Temporal difference models describe higher-order learning in humans. Nature. 2004;429:664–667. [PubMed]
  • Somerville LH, Heatherton TF, Kelley WM. Anterior cingulate cortex responds differentially to expectancy violation and social rejection. Nat Neurosci. 2006;9:1007–1008. [PubMed]
  • Somerville LH, Hare T, Casey BJ. Frontostriatal Maturation Predicts Cognitive Control Failure to Appetitive Cues in Adolescents. J Cogn Neurosci 2010 [PMC free article] [PubMed]
  • Spear LP. The adolescent brain and age-related behavioral manifestations. Neurosci Biobehav Rev. 2000;24:417–463. [PubMed]
  • Spicer J, Galvan A, Hare TA, Voss H, Glover G, Casey B. Sensitivity of the nucleus accumbens to violations in expectation of reward. Neuroimage. 2007;34:455–461. [PMC free article] [PubMed]
  • Steinberg L. A Social Neuroscience Perspective on Adolescent Risk-Taking. Dev Rev. 2008;28:78–106. [PMC free article] [PubMed]
  • Sul JH, Kim H, Huh N, Lee D, Jung MW. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron. 2010;66:449–460. [PMC free article] [PubMed]
  • Takahashi YK, Roesch MR, Stalnaker TA, Haney RZ, Calu DJ, Taylor AR, Burke KA, Schoenbaum G. The orbitofrontal cortex and ventral tegmental area are necessary for learning from unexpected outcomes. Neuron. 2009;62:269–280. [PMC free article] [PubMed]
  • Talairach J, Tournoux P. Co-planar stereotaxic atlas of the human brain. New York, NY: Thieme Medical Publishers; 1988.
  • Tanaka SC, Doya K, Okada G, Ueda K, Okamoto Y, Yamawaki S. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci. 2004;7:887–893. [PubMed]
  • Valentin VV, O’Doherty JP. Overlapping prediction errors in dorsal striatum during instrumental learning with juice and money reward in the human brain. J Neurophysiol. 2009;102:3384–3391. [PubMed]