|Home | About | Journals | Submit | Contact Us | Français|
Despite the ubiquity of endogenous emotions and their role in both resilience and pathology, the processes supporting their generation are largely unknown. We propose a neural component process model of endogenous generation of emotion (EGE) and test it in two functional magnetic resonance imaging (fMRI) experiments (N=32/293) where participants generated and regulated positive and negative emotions based on internal representations, usin self-chosen generation methods. EGE activated nodes of salience (SN), default mode (DMN) and frontoparietal control (FPCN) networks. Component processes implemented by these networks were established by investigating their functional associations, activation dynamics and integration. SN activation correlated with subjective affect, with midbrain nodes exclusively distinguishing between positive and negative affect intensity, showing dynamics consistent generation of core affect. Dorsomedial DMN, together with ventral anterior insula, formed a pathway supporting multiple generation methods, with activation dynamics suggesting it is involved in the generation of elaborated experiential representations. SN and DMN both coupled to left frontal FPCN which in turn was associated with both subjective affect and representation formation, consistent with FPCN supporting the executive coordination of the generation process. These results provide a foundation for research into endogenous emotion in normal, pathological and optimal function.
From melancholic reminiscence to joyful anticipation, we frequently experience emotions caused by internal mental processes, such as thoughts and memories (Killingsworth and Gilbert, 2010). Such endogenous emotion is described as richer and more intense than emotion elicited by external events (Salas et al., 2012) and is known to play an important role in affective psychopathology, such as depression (Nolen-Hoeksema et al., 2008) and anxiety (Freeston et al., 1996). There is also evidence that the endogenous generation of positive emotional states can used as an effective means to regulate emotional reactions to external events (Engen and Singer, 2015), and the trait tendency to do this is a predictor of psychological resilience (Tugade and Fredrickson, 2004). Thus, understanding the psychological and neural mechanisms of endogenous generation of emotion (EGE) can yield important insight into normal, pathological and even optimal emotional function.
Despite this, research into EGE has been limited, stemming mainly from behavioral studies using EGE as a method to induce emotional states. This research shows that EGE can be occasioned by a range of information-processing modalities, including mental imagery and semantic analysis of emotional information (Vrana et al., 1986), interoception of bodily signals (Philippot et al., 2002) or recall of episodic autobiographical memories (Mayberg et al., 1999). It has also been shown that EGE can effectively occur when individuals immerse themselves in hypothetical scenarios (Wilson-Mendenhall et al., 2013a). This latter finding demonstrates the theoretically important point that EGE is not limited to reinstantiation of previously experienced emotional situations but can also simulate states appropriate for novel contexts. Indeed, emotions are frequently elicited by spontaneous cognition about future events (Ruby et al., 2013), suggesting that an important use of EGE is predicting the affective relevance of hypothetical future scenarios (Baumgartner et al., 2008). Although these studies were not focused at exploring EGE as a process in its own right, they show that multiple means (e.g. different strategies or different information modalities) can be utilized to generate emotional states, dependent on the representational content of the target emotional experience. Mirroring recent constructivist theories of emotion (Barrett and Wilson-Mendenhall, 2014; Russell, 2014), this suggests that a comprehensive account of EGE needs to distinguish between (i) processes supporting the generation of the hedonic or core affective quality (Wilson-Mendenhall et al., 2013b) of an endogenous emotional experience from (ii) processes supporting the formation of representations of the context to which this affective state applies or stems from. Importantly, this opens for the possibility that the two are mechanistically distinct, with different neural systems supporting core affect generation varying as a function of hedonic qualities, while systems supporting representation formation should vary as a function of the specific implementation of the generation process.
Presently neuroimaging studies of EGE using comparable protocols are limited, making evaluation of this hypothesis difficult. One exception is a series of early positron emission tomography (PET) experiments in which participants generated emotional states by volitionally recalling significant emotional experiences (Pardo and Raichle, 1993; Gemar et al., 1996; George et al., 1996; Reiman et al., 1997; Kimbrell et al., 1999; Mayberg et al., 1999; Damasio et al., 2000; Liotti et al., 2000). Considered in aggregate (Supplementary Figure S1), these studies suggest the involvement of three large-scale functional networks in EGE: (i) The default mode network (DMN; Raichle and Snyder, 2007), including ventromedial prefrontal cortex (vmPFC), posterior cingulate cortex (PCC), left temporoparietal junction (TPJ) and hippocampus (HC), (ii) the extended Salience Network (SN; Seeley et al., 2007), including anterior insula (AI), dorsomedial PFC (dmPFC) and structures in basal ganglia and midbrain and (iii) The frontoparietal control network (Spreng et al., 2010; FPCN; Laird et al., 2011; Spreng et al., 2013) centered on lateral and dorsomedial prefrontal and inferior parietal cortices. There is a notable overlap between this putative neural architecture and that known to support the construction of mental representations in general: DMN is associated with numerous forms of psychological processes involving simulation based on endogenous information (Spreng et al., 2009) and appears to be involved in the integration of information about a given topic into detailed episodic representations. Interestingly, DMN does not appear to support the initial generation of the representational core that these details pertain to (Addis et al., 2007). Rather, this initial generation is thought to involve the direct activation of domain-specific and task-relevant networks (Hassabis and Maguire, 2007). In the context of emotion, the SN is a likely candidate such a network. Composed of cortical (AI, dmPFC) limbic [amygdala (AMY), ventral striatum (VS)] and midbrain structures [periaqueductal gray (PAG), substantia nigra/ventral tegmentum (SN/VTA)], the SN is closely associated with the representation and generation of core affect and homeostatic regulation (Seeley et al., 2007; Lindquist et al., 2012). Interestingly, DMN and SN appears to be intrinsically anticorrelated (Buckner et al., 2013; Spreng et al., 2013), strengthening the claim that they support dissociable component processes in EGE. This anticorrelation also suggests that an intermediary network coordinates and maintains activation of the SN and DMN, pointing to the need for executive processes to coordinate and maintain the generation process. Possibly, the FPCN supports this role as it is known to support adaptive cognitive control, as well as interfacing with SN (Dosenbach et al., 2008), affording a pathway by which core affective states can be generated in a goal-directed fashion. Similarly, FPCN and DMN are known to couple during goal-directed internal mentation (Spreng et al., 2010) and to be implicated in the domain-general control of retrieval processes important for representation formation (Badre and Wagner, 2007).
To the degree that this functional-process architecture holds, an interesting question is how these processes interact over the course of a given EGE event. Addis et al. (2007) have shown that the construction of endogenous simulations of events involves distinct generation and elaboration phases, with the initial phase involving retrieval of the core semantic features of the representation and the subsequent elaboration phase involving the elaboration of the core information in question with details about the specific event. If this model holds for EGE, one would expect involvement of SN primarily in the early stages of process, corresponding to core affect serving as a semantic anchor for later elaboration efforts. Plausibly, however, the opposite could be true, such that generation involves setting up representations of emotional situations, which in turn elicit core affective states (Kross et al., 2009). A major objectives of the current investigations was to establish this relationship.
The objective of this study was to investigate this and to establish a comprehensive neural component-process architecture for EGE. Based on the earlier considerations, we expected EGE to be neurally implemented by DMN, SN and FPCN. Each of these networks were hypothesized to support dissociable functional component processes. Specifically, we suggest that SN supports the generation of core affective states that serve as a guide for the formation of detailed representations via processes instantiated in the DMN, resulting in an emotional experience. Finally, we propose that FPCN supports the executive maintenance and coordination of the generation process, coupling with both SN and DMN. Importantly, as we propose they form the functional core of EGE, we expect that these networks should partake in EGE irrespective of the hedonic quality of the emotional state or the precise means or modality used to generate it. We tested this model in two experiments (N=32 and N=293) with a newly developed paradigm aimed at maximizing ecological validity and generalizability of EGE. To ensure that participants generated comparable emotional states, they were anchored using a multimodal emotion induction procedure prior to scanning. This procedure elicited multiple markers of emotional states (semantic, visual, auditory and bodily) prior to scanning, avoiding artificially biasing participants’ implementation toward particular information modalities. To maximize ecological validity and task compliance, participants were instructed to implement EGE as they experienced most efficacious. Thus, in Experiment 1 participants were given complete freedom in how they generated emotions, while in Experiment 2 they were allowed to combine four generation modalities (Semantic Analysis, Episodic and Auditory Imagery and Bodily Interoception; i.e. the endogenous analogs to the modalities used in the induction procedure), in whichever way they found most effective.
Participants then completed a cue-based fMRI paradigm (Figure 1A). Trials consisted of a Generation phase and a Modulation phase. In the Generation phase, participants used their self-selected techniques to generate positive and negative emotional states or actively attempted to remain neutral. Thus, we could distinguish the neural correlates of general emotion generation, from those supporting generation of particular implementations of generation. In the subsequent Modulation phase, participants maintained this state (Maintain condition), actively suppressed it (Regulate condition) or simply ceased their generation efforts (Cease condition; Experiment 1 only). This approach enabled us to dissociate neural systems supporting different component processes based on their activation profiles. Finally, participants provided ratings of their affective states following each trial, allowing identification of the neural correlates of generation success.
For Experiment 1, 32 participants were recruited from an in-house participant database (15 female, mean age=30.3, range 21–51, SD=9). For Experiment 2, participants were recruited in the context of the large-scale longitudinal ReSource Project (see Supplementary Materials for screening procedure). Baseline data from this study were used. Three hundred and thirty-two participants were recruited for the ReSource Project, with 305 participants completing the current paradigm. Of these, five participants were excluded on account of missing auxiliary data (post-scan questionnaire, structural MRI) and technical difficulties. Four participants reported difficulties (e.g. nausea or sleepiness) during the scanning session and were dropped from analysis. From the sample of 296 with complete data, a further three participants were removed due to aberrant behavioral report and/or unacceptable data quality after preprocessing (>1 voxel movement,>5% corrupted time points, design VIF>2), leaving a final sample of 293 (170 female, mean age=40.4, range: 20–55, SD=9.3). All participants had normal or corrected to normal vision. The study was approved by the Ethics Committee of the University of Leipzig and Humboldt University, Berlin and was carried out in compliance with the Declaration of Helsinki. All participants gave written informed consent, were paid for their participation and were debriefed after the study was completed.
Before scanning, participants underwent an automated training procedure (see Supplementary Materials for details), including a multimodal emotion induction aimed at minimizing between-participant variance in implemented emotional states. In Experiment 2, participants were also instructed in the use of four generation modalities (Semantic, Episodic, Auditory and Bodily) and instructed to select to which degree they to use each in the following experiment according to their own preferences. Additionally, participants were shown a number of neutral stimuli (e.g. pictures of scenery) and instructed to actively attain the sort of neutral emotional state these stimuli evoked (see Supplementary Materials for details). Participants were instructed to attempt to attain such states during the Neutral condition, and also when requested to downregulate their erstwhile generated emotional states. After the scanning session, participants were debriefed. In Experiment 1, verbal debriefing was done with an experimenter. In Experiment 2, participants reported the degree to which they used each of the generation modalities using a nine-point Likert.
Each trial (Figure 1A) started with a 4–6 s white fixation cross indicating the start of trial. Then a 10 s Generation phase was entered, in which subjects were shown a colored symbol indicating which emotional state to generate (Red minus=Negative, Green plus=Positive, Blue 0=Neutral). This was followed by a 5 s Modulation phase where participants either maintained the generation of the emotional state or downregulated it so as to attain a neutral emotional state. In the Maintain condition, the instruction symbol remained the same as in the Generation phase. In the Regulation condition, the symbol changed to a Blue 0. Finally, in Experiment 1 we included a partial-trial condition where the instruction cue changed to a fixation cross (Cease condition; Experiment 1 only). For the Neutral condition the symbol did not change but remained a Blue 0. Thus, Experiment 1 consisted of a total of seven different conditions (Maintain Positive/Negative, Regulate Positive/Negative, Cease Positive/Negative and Neutral). Experiment 2 omitted the Cease condition and thus had a total of five conditions. Experiment 1 had two runs of five trials per condition (35 per run), while Experiment 2 had a single run of 10 trials per condition (50 total). Condition sequence was pseudorandomized, ensuring no direct repetitions of conditions occurred. Finally, a 5 s fixation cross was presented followed by a 5 s presentation of a continuous Visual Analog rating Scale ranging from ‘Extremely negative’ via ‘Neutral’ to ‘Extremely positive’ [range± 251 from the neutral point (0)]. Initial cursor position was jittered randomly around the Neutral point. Participants responded using a button box and the right hand index and middle finger. Participants were instructed to report their affective state as it was at the moment of report. Stimuli were back-projected using a mirror setup. Task setup was identical in both experiments, except for the omission of the Cease condition in Experiment 2 due to time constraints.
For both experiments, MRI data were acquired on a 3T Siemens Verio Scanner (Siemens Medical Systems, Erlangen, Germany) using a 32-channel head-coil. High-resolution structural images were acquired using a T1-weighted 3D-MPRAGE sequence (TR=2300 ms, TE=2.98 ms, TI=900 ms, flip angle=7°, iPat=2; 176 sagittal slices, FOV=256 mm, matrix size=240×256, 1^3mm voxels; total acquisition time=5.10min). For the functional imaging, we employed a T2*-weighted gradient EPI sequence that minimized distortions in medial orbital and anterior temporal regions (TR=2000 ms, TE=27 ms, flip angle=90°, iPat=2; 37 slices tilted ~30° from the AC/PC axial plane, FOV=210mm, matrix size=70×70, 33 mm voxels, 1 mm gap). For Experiment 2, we acquired B0 field maps using a double-echo gradient-recalled sequence with matching dimensions to the EPI images (TR=517ms, TE=4.92 and 7.38ms).
Preprocessing was performed using a combination of SPM12 (r6225) functions and the ArtRepair toolbox (Mazaika et al., 2005) running on Matlab 2013b. Functional images were realigned (Experiment 1) or realigned and unwarped to additionally correct for distortion using B0 field maps (Experiment 2). ArtRepair procedures were then employed, including slice wise artifact detection and repair using interpolation (art_slice; 5% cutoff), time series diagnostics (art_global) identifying and repairing via interpolation volumes showing large global intensity fluctuation (>1.3%), volume-by-volume movement exceeding 0.5mm and overall movement (>3 mm) and despiking with a 5% signal change cutoff (art_despike). T1 structural images were registered to the mean realigned volume and segmented. Using DARTEL (Ashburner, 2007) procedures, functional images were normalized and smoothed with an isotropic kernel of 8 mm FWHM.
Individual-level models included separate sets of regressors for the Generation and Modulation phase. For the Generation phase, three regressors were specified corresponding to the emotional target (Positive, Negative and Neutral) of the trial. For the Modulation phase, separate regressors were specified for each condition. Thus, the model in Experiment 1 included seven regressors [Valence (Positive and Negative) * Modulation (Maintain, Cease and Regulate)+Neutral] for the Modulation phase, for a total of 10 regressors The model in Experiment 2, where the Cease condition was omitted, included five regressors for the Modulation phase, for a total of eight regressors.
Regressors were convolved with canonical hemodynamic response functions (HRFs) with a 10 s (Generation) or 5 s (Modulation) duration, as well as regressors specifying parametric modulations by trial-wise subjective affect ratings. An additional regressor was specified for the Rating period. Movement parameters derived from the realignment step (six regressors), their derivatives and squared values were added (24 regressors). Potential physiological confounds were controlled for by adding four additional regressors reflecting volume-wise mean signal from white matter and cerebrospinal fluid, global signal and highest-variance voxel time course.
All second analyses were conducted using robust regression (Wager et al., 2005), with covariates of no interest coding elected arousal level, age and gender. Second level models for Experiment 2 additionally included regressors coding self-reported generation modality usage (four regressors) as continuous covariates.
All results were corrected for multiple comparisons using cluster extent family-wise error rate (FWEc) correction at an alpha of P < 0.05, unless otherwise indicated. Cluster extents were estimated using Monte Carlo simulation and estimated intrinsic smoothness [3DClustSim and 3DFWHMx from the AFNI package (Forman et al., 1995)], as implemented in NeuroElf. Note that peak-forming thresholds were adapted for Experiments 1 (P < 0.001) and 2 (P < 0.00005) to account for differences in sample size. Correlational and mediation results also used a less strict peak threshold of P < 0.0005.
All analyses were masked with a gray matter template derived from the DARTEL created template, thresholded at 95% gray matter probability, supplemented by a hand-drawn masks of brainstem nuclei due to poor differentiation of white from gray matter in these regions.
In Experiment 2, we adopted a data-driven approach using constrained principal components analysis (CPCA; see Woodward et al., 2013 for details) of fMRI time series using the CPCA-fMRI package (www.nitrc.org/projects/fmricpca). CPCA analysis of fMRI data is a multivariate method that involves a singular value decomposition of BOLD time series to identify functional networks followed by an estimation of BOLD change in each network over peristimulus time as a function of experimental condition. Here, we used finite impulse response (FIR) modeling to identify task-specific functional connectivity networks based on the 15 bins (i.e. 30s, allowing for hemodynamic lag) following the onset of the generation cue. Importantly, using a FIR model allows hemodynamic response (HDR) profiles to be identified for each component separately, allowing the identification of task-relevant functional connectivity networks with dissociable temporal profiles. Finally, CPCA provides HDR estimates at the individual level, allowing the resultant predictor weights to be used to explore the correlates of individual differences in component activation.
To differentiate components of the generation network involved in generation using a specific modality from components involved in generation in general, we followed previous work aimed at identifying the large-scale networks supporting emotion regulation performance via mediation modeling (Denny et al., 2014). First, regions whose activation during generation of emotion (relative to neutral) were identified using robust regression. Mediation effect parametric mapping as implemented in the M3 mediation toolbox (Wager et al., 2008) was used to investigate modality-specific and modality general pathways of emotion generation. We performed a whole-brain search for voxels whose activity during emotion generation (relative to the neutral baseline) showing a relationship with reported use of a given modality that was mediated by the activity in regions independently correlated with usage of that modality in a robust regression model. Statistics were assessed using the bootstrapping approach implemented in the M3 toolbox (10 000 samples).
The first objective our analyses was to establish the overall neural architecture of EGE. To achieve this, we first sought establish the validity of our experiment by investigating subjective and physiological indices of emotional states. Next, we contrasted combined positive and negative EGE with the neutral baseline, thereby identifying the overall neural basis of EGE. We next sought to test the component process mapping proposed in the introduction in two ways: first, based on the data from Experiment 1, we enacted a contrast-based decomposition, based on a model of the activation dynamics expected for each of the component processes. To complement this, we next performed a data-driven decomposition of the data from Experiment 2 using CPCA, to identify the functional networks central in EGE. Together, the results from these three analyses allowed a description of the overall network and functional subcomponents supporting EGE in general. Following on this, the second objective of the analyses was to differentiate general EGE networks from those supporting specific implementations of EGE, such as the generation of a particular valence, or using a specific modality. By investigating how subjective ratings for positive and negative generation parametrically modulated signal, we could differentiate regions activated in a valence-specific manner from those supporting specifically the generation of positive and negative emotional states. Finally, by investigating the correlation of activation with reported usage of different modalities, we could identify specific regions supporting modality-specific implementation, and, using mediation analysis, identify the networks supporting EGE modality usage. Moreover, by comparing these networks we could differentiate parts of these networks supporting specific modalities from those supporting EGE in general.
Our first objective was to validate our experimental design, using a combination of behavioral and psychophysiological measures to ascertain that participants were able to generate and regulate emotional states as measured by subjective and objective markers of emotional arousal.
Post-trial ratings were analyzed using paired t-tests, reported in Table 1. Figure 1B shows subjective ratings in each condition for Experiment 1. Relative to the Neutral baseline condition, increased reports of corresponding affect were observed for both Maintain and Cease conditions. The Cease condition also showed significantly higher ratings for both positive and negative affect compared with their respective Maintain conditions. Finally, regulation resulted in decreased ratings for both positive and negative emotion relative to their respective Maintain conditions. Figure 1C shows subjective ratings as function of condition for Experiment 2. Relative to the neutral baseline condition, increased reports of corresponding affect were observed for both positive and negative Maintain conditions. Regulation conditions also showed decreased ratings for both positive and negative affect, relative to their respective Maintain conditions. These results demonstrate that participants were subjectively able to generate and regulate endogenous emotional states of both positive and negative valence in both experiments. Importantly, they also show that, while a generated emotional states decay without active maintenance, they remain subjectively significant for at least a short time following generation, consistent with the representation of the emotional state persisting even without active generation efforts.
We next sought to establish whether participants’ generation efforts also elicited objective emotional arousal responses. To this end, we concurrently assessed elicited skin conductance levels (SCL) in Experiment 2 (see Supplementary Materials for details on data acquisition and preprocessing). Two hundred and twenty-five recordings had acceptable data quality and were used to investigate the impact of generation instructions on objective measures of emotional arousal, as well as their interaction with subjective ratings. As SCL is the most frequently reported measure in investigations of exogenously induced emotional states (Kreibig, 2010), an interaction would suggest that the elicited states can be construed of as bona fide emotional states and that behavioral ratings can be taken as proxy for emotional arousal. Using linear mixed modeling of trial-wise SCL responses during the Generation period, we predicted the trial-wise log-transformed estimates of SCL measured in microsiemens (μS) using a subject-level random intercept model. The model further included a factorial fixed effect for condition (Generate Positive, Generate Negative, Neutral) and a continuous fixed covariate for scaled trial-wise ratings of subjective affect. To control for potential learning/fatigue effects, trial number was entered as a nuisance covariate (for more detail on the effect of fatigue in the current experiment, please see Supplementary Analyses). This analysis revealed a main effect of Condition [F(2, 11012.639) =3.155, P<0.05], Rating [F(1, 11013.700) =4.625, P<0.05], as well as a Condition * Rating interaction [F(2, 11014.119) =17.815, P<0.001]. Bonferroni corrected t-tests (Figure 1D) were performed to clarify the main effect of Condition, showing that, relative to the neutral (mean=1.205, SE= 0.32) baseline condition, higher SCL levels were observed for both negative [mean=1.267, SD= 0.032; paired t-test: t(224)= 4.44, P<0.001] and positive [mean=1.268, SD=0.032; paired t-test: t(224)= 5, P <0.001] emotion generation conditions. Closer investigation of the Condition * Rating interaction showed that it consisted of a significant difference in the slopes of the rating effect between negative and positive generation [t(8770.515)=5.63, P < 0.0001]. Specifically, SCL had a negative relationship with ratings [t(4278.64) = −3.71, P < 0.001] during negative generation and a positive relationship [t(4278.64) = 4.16, P < 0.001] during positive generation. Corresponding to the bipolar scale used (Figure 1A), this shows that SCL levels increased with stronger affect ratings for both positive and negative emotion (Figure 1E). These results show that participants were capable of generating both positive and negative emotional states, as measured by both subjective and objective indices of emotional arousal, and these indices were correlated, such that behavioral report corresponded to objective physiological arousal.
Finally, we sought to explore what kind of emotional states participants elected to generate. During debriefing, participants in Experiment 2 were asked whether they generated high or low arousal exemplars of positive and negative emotional states after the experiment. Thirty-nine percent of participants reported generating high arousal positive emotional states, like joy or happiness, with the complementary 61% generating low arousal positive emotion like calmness or caring. Similarly, 29% of participants reported generating high arousal negative emotions like fear or anger, while 71% reported generating low arousal states like sadness or melancholia. All subsequent analyses in Experiment 2 control for this between-subject variance.
Our next objective was to establish whether our hypothesized three-network architecture of EGE was in evidence. To identify the neural correlates of emotion generation, we contrasted the combined Generation and Maintenance periods for both positive and negative affect generation with the Neutral baseline condition, with one sample t-tests performed using robust regression (Wager et al., 2005). For Experiment 1, a primary cluster-forming threshold of P < 0.001, T>3.38 was used. In Experiment 2, a more stringent threshold of P < 0.00005, T>3.95 was used for the primary contrasts to balance increased power. Using Monte-Carlo simulation (Forman et al., 1995), cluster thresholds were determined to be k>40 and k>10, respectively, for FWEc α < 0.05.
In Experiment 1 (Figure 2A and Supplementary Table S1), we observed activation in core nodes of the DMN [vmPFC, PCC, left TPJ, left middle temporal gyrus (MTG) and HC] and SN [AI, dmPFC, including pre-supplemental motor area (pre-SMA) and dorsal anterior cingulate cortex (dACC)]. Activation was also observed in nodes of the SN most closely associated with hedonic processing (VS, SN/VTA), as well as cerebellar regions. Deactivations were observed in right FPCN, in addition to inferior temporal gyrus (ITG) and superior occipital gyrus. In Experiment 2 (Figure 2B and Supplementary Table S1), we observed activation and deactivation patterns substantially similar to Experiment 1, albeit markedly stronger, consistent with the increased power in Experiment 2 (N=293). Additional activation was also observable in the frontal portions of the left FPCN [bilateral inferior (IFG) and middle (MFG) frontal gyrii]. Stronger activations were observed in midbrain, including both SN/VTA and PAG, as well as hypothalamus, thalamus, basal ganglia and ventral AI. Again, deactivations centered on right FPCN and occipital regions.
These results replicate previous work and support our contention that the DMN, SN and FPCN key are components in the neural architecture supporting EGE. They also expand on them demonstrating that this relationship holds for EGE as it is freely implemented in the population. Finally, they suggest that EGE additionally involves the active suppression of right frontoparietal and occipital regions, explainable by the known deactivation of these regions in internally focused processing (Andrews-Hanna et al., 2014).
Differentiating the initial generation of endogenous states from their subsequent elaboration into experiential representations is commonly done by having participants report the moment they subjectively experience to have completed the generation process (e.g. retrieval of core semantic information about an event) and begin the process of elaborated mental simulation by adding details about the context (Addis et al., 2007, 2009). In the context of EGE, achieving a similar subjective differentiation is difficult, since the emotional experiences are inherently situated in a given context (Wilson-Mendenhall et al., 2013a). We therefore took a model-based decomposition approach to test our proposed component process structure. We reasoned that the 10second Generation period of each trial should include activation of all constituent component processes (i.e. generation, elaboration and maintenance), and that regions supporting these processes should be distinguishable by their activation dynamics in different conditions during the Modulation phase. By masking out activation attributable to either maintenance or elaboration from the 10second Generation period, one should therefore be left with regions involved exclusively in the initial generation of emotional experiences.
Figure 3A schematically illustrates the hypothesized activation dynamics for each component as a function of condition. Specifically, we hypothesized that regions supporting the initial generation of the affective state should show early and phasic activation corresponding to their involvement in the generation of the affective core of the experiences. Importantly, they should also be largely unaffected by modulation efforts, as these should target the neural substrates of representation of the emotional experience rather than those involved in generation (Gross et al., 2011). Conversely, regions supporting the elaborated representation of emotional experiences should be affected by modulation efforts and be mainly in evidence in the later part of the trial. Moreover, given that emotional experiences tend to persist over time (Buchanan, 2007; Verduyn et al., 2009), it should be possible to dissociate the neural substrates of the elaborated mental simulation from those supporting the active maintenance of the generation process in contrasting activation in the Cease condition with the Maintenance condition (in the Modulation phase only). For these analyses alone, we used a more lenient threshold of P < 0.005 (uncorrected), k>10, due to the lower power of the component contrasts.
To begin, component contrasts were calculated according to our process dynamic logic (Supplementary Figure S2 and Table S4). The Generate>Neutral contrast for the Generation period defined the neural reference space for EGE overall (Supplementary Figure S2A). Regions involved in the elaborated representation of emotional experiences were identified in contrasting the average of the Cease and Maintain conditions with the Regulate condition (Supplementary Figure S2B). This was done because both Cease and Maintain conditions were associated with elevated subjective emotional experiences relative to both Neutral and Regulate conditions (Figure 1B). We opted to use the Regulate rather than the Neutral condition since the Regulate condition actively suppress emotional representations. Finally, regions involved in effortful maintenance of EGE was identified by the Maintain>Cease contrast (Supplementary Figure S2C). Next, the maintenance and representation contrasts were inclusively masked with the Generate>Neutral contrast to ensure that all activations were associated with EGE. To ensure orthogonality maintenance and representation processes, these were mutually masked (overlapping regions are reported in Supplementary Figure S2D). Finally, the Generate>Neutral contrast was exclusively masked with both maintenance and representation process contrasts, leaving exclusively activation not attributable to either of the two.
The results from this masking approach are reported in Figure 3B and Supplementary Table S3. Generation was primarily associated with activation of the extended SN, including left dorsal AI, dACC, basal ganglia and brainstem regions, in addition to known mnemonic structures such as temporal pole and HC. Maintenance, conversely, uniquely activated nodes of FPCN, including left IFG, PMC and pre-SMA, in addition to occipital regions and subgenual ACC. Finally, elaborated representation uniquely activated large portions of the DMN (PCC, vmPFC, TPJ and MTG) and thalamus, in addition to the ventral AI, nucleus accumbens (NACC) and AMY—all regions traditionally thought to subserve core affect processing (Lindquist et al., 2012).
As a final step, we verified that our approach indeed identified regions with appropriate temporal dynamics by extracting FIR-fitted time courses using MarsBar for select regions in each contrast. This revealed strong correspondence between observed and hypothesized dynamics (Figure 3C).
These results conform to our hypotheses, showing that EGE is supported by at least three separable component processes, and that these roughly overlap with each of the three core intrinsic networks observed in our main contrasts. Specifically, FPCN appears primarily to support the active maintenance of generation efforts, while DMN primarily supports the representation of the generated states as evidenced by it being the primary target for downregulation, as well as it remaining active even in the absence of generation efforts. Finally, the cortical and midbrain aspects of SN selectively responded in manner consistent with being involved initial generation of emotional states. Interestingly, we found that several regions in the limbic subcomponent of the SN (vAI, AMY and NACC) responded in a manner consistent with them supporting elaborated representations, suggesting that midbrain and limbic components of SN differ in their functional contribution to EGE.
Having found evidence consistent with our hypothesized component process mapping, our next step was to (i) establish the functional significance of these networks and (ii) investigate their dynamic interaction during EGE. To address this, we used CPCA (Woodward et al., 2013; Lavigne et al., 2015) in our larger sample in Experiment 2. Briefly, CPCA is a data-driven multivariate method that combines multiple regression with PCA analysis to identify component of mutually correlating voxels, i.e. functional networks, involved in a task based on their specific task-related activation dynamics (see Experimental Procedure for detail). As CPCA provides individual-level estimates of activation of each network component, we could identify networks specifically predicting individual differences in EGE efficacy, thereby establishing both their involvement and their functional significance. Finally, as CPCA does not enforce spatial orthogonality on components it allows the identification of regions partaking in multiple network components with differing temporal dynamics, as would be expected if, as hypothesized, FPCN coupled to both SN and DMN.
Eigenvalue plots indicated that six components should be extracted using the scree criterion. To differentiate components supporting general task processes (e.g. sensory processing, motor responses) from those specifically supporting EGE, we calculated the component-specific AUC of loadings in the task window (4–22s post-stimulus; allowing for 4–6s hemodynamic lag) for the Maintain condition, subtracting the Neutral baseline. This yielded individual-level estimates of overall component activation during EGE, which were orthogonalized and entered into a multiple regression model predicting individual differences in self-reported generation success (i.e. average affect ratings in the Maintain condition only). Individual differences in component activation explained a significant amount of variance in generation success [F(6286) = 3.124, P < 0.01, R2 = 0.062], with two components directly predicting generation success. To interpret these, loading maps were thresholded at the dominant 10% of component loadings with k>30 (Lavigne et al., 2015). The first component [β = 0.153, t(292) = 2.675, P < 0.01; Figure 4A] included central nodes of the DMN (vmPFC, PCC, left TPJ) and FPCN (bilateral BA47, BA45, MFG and PMC), as well as VS (NACC, caudate) dACC, SMA/pre-SMA, mid cingulate cortex (MCC), bilateral superior temporal gyrus/transversal gyrus (STG/TRANS), left MTG and right somatosensory cortex, similar to the Representation network identified earlier. This similarity extended to activation dynamics, showing both early and sustained activation in the Modulation phase for the Maintain condition and evidence of suppression in the Regulate condition. The second component [β = 0.146, t(292) = 2.574, P = 0.01; Figure 4B] included cortical nodes of SN (bilateral AI, pre-SMA) as well as portions of the FPCN (bilateral BA47, left BA45, angular gyrus and MFG), plus thalamus, occipital cortex and superior cerebellum. Dynamics closely resembled the Generation network identified in earlier, with no observed difference between Maintain and Regulate conditions. Conjoining the individually thresholded component maps (Figure 4C and Supplementary Table S5), we found that left lateral FPCN regions (IFG, BA45/45, MFG), as well as pre-SMA and dACC were part of both components, in addition to thalamus and retrosplenial cortex. Notably, this closely overlaps with the Maintenance network identified in our model-based analyses. These results expand on our model-based approach earlier, establishing the unique functional significance of our three candidate networks in EGE. Further, they demonstrate that FPCN coupled with both SN and DMN, while these did not show evidence of coupling, supporting the hypothesis that FPCN coordinates activation of these two networks during EGE.
Having found evidence for both the functional significance and dissociability of our putative EGE networks, we next sought to establish neural implementation of core affect generation and representation formation. Overall success at generating emotions was evaluated in the debriefing session of Experiment 2 on a 0–9 scale ranging from completely unsuccessful to completely successful. Participants reported significantly higher success at generating positive (mean=5.99) than negative (mean=5.54) emotions [paired t-test: t(292)=4.11, P < 0.001]. To avoid biasing results by potential effort or success effects, we therefore combined positive and negative valence conditions, and instead focused on trial-wise parametric modulation of activation as a function of ratings. Using robust regression, we performed one sample t-tests on parametric modulation maps separately for positive and negative trials, averaging across all conditions involving emotion generation for both Generation and Modulation periods (Supplementary Table S5). To differentiate regions supporting emotion generation success in general from valence-specific regions, we conjoined the resulting FWEc thresholded maps (Figure 5). In Experiment 1, this revealed valence-general modulation in the basal ganglia, including putamen and caudate body, while positive ratings uniquely modulated caudate head/NACC and negative ratings uniquely modulated left dorsal AI and pre-SMA. In Experiment 2, valence-general modulation was more extensive, including left frontal portions of the FPCN, particularly IFG and MFG, as well as key nodes of the SN (dorsal AI, dACC and pre-SMA) in addition to thalamus. Valence-specific modulation of activation was observed in caudate head/NACC and SN/VTA for positive affect ratings, while modulation of deactivation was observed in occipital and right lateralized frontal regions overlapping with the deactivated regions reported in the main contrast. Negative affect ratings modulated activation in right dorsal AI and PAG. These results show that activation levels of FPCN and SN support successful emotion generation. Moreover, midbrain portions of the SN were shown to be recruited in a valence-specific fashion, consistent with the known association of these regions with domain general hedonic processing (Kringelbach and Berridge, 2009; Buhle et al., 2013). This supports our hypothesis that SN is particularly important for the generation of the affective core of EGE. Further, in line with our CPCA analysis, we find that frontal FPCN supports a general role in the generation of core affective states, possibly associated with the initiation of the generation process.
Finally, we sought to identify the neural bases of representation formation. To this end, participants in Experiment 2 were constrained to use four specific generation modalities: (i) Semantic Analysis, involving the use of verbalized thoughts affective thoughts (ii) Episodic Imagery, involving the generation of visual emotional imagery, (iii) Auditory Imagery, involving the generation of affective soundscapes and (iv) Bodily Interoception, involving focus on and interpretation of bodily signatures of emotional states (for precise instructions, see Supplementary Materials). These modalities corresponded to our multimodal induction procedure, ensuring participants were equally primed to using each of them, and have clear analogs in daily life [e.g. thinking self-deprecating thoughts (semantic), remembering or anticipating an emotional event (episodic), humming a sad song (auditory) or noticing a dry mouth and racing heart when making a presentation (bodily)]. Finally, participants were allowed to freely combine these modalities in whichever they found best enabled them to generate emotional states. These four modalities can be combined to different degrees in an intuitive fashion (e.g. internal affective monolog combined with a concrete emotional episode where a specific song was playing) ensuring variance in the combinations utilized by participants.
Post-scan self-reports of the degree to which participants used each of these modalities (Figure 6B) showed that the Episodic modality was the most used (40%), followed by Semantic (24%), Bodily (21%) and Auditory (15%). Entering the degree to which each participant reported using each modality as covariates in our main robust regression analysis (Generate and Maintain>Neutral), we could identify the neural correlates of modality usage (Figure 6C). Due to the noisy nature of self-report, we used a more lenient cluster-forming threshold of P < 0.0005, T>3.32 (k>42, FWEc P < 0.05). This revealed that use of the Semantic modality was correlated with activation of the left MTG, corresponding to the inferior border of Wernicke’s area, as well as a region in the left dorsal frontal cortex. Use of the Episodic modality was correlated with signal in the anterior superior PCC, a region known to be a part of the mnemonic subsystem of the DMN (Andrews-Hanna et al., 2014). Use of Bodily modality was correlated with signal in bilateral dorsal and mid-AI, a region known as interoceptive cortex representing bodily signals (Craig, 2011). As no significant correlations were found for the Auditory modality, we did not explore this further.
We next identified the extended neural pathways by which these regions influenced the generation network as a whole, using mediation analysis (Wager et al., 2008; Denny et al., 2014). Specifically, we implemented a mediation model (Figure 6A) consisting of a whole-brain search for voxels where the relationship between their Generate>Neutral contrast value and reported modality usage was mediated by the contrast values in the modality-specific regions identified earlier. Thus, these analyses identify voxels whose relationship with modality usage is mediated by activation of the modality-specific regions, suggesting that they are part of the functional pathway by which that modality is implemented. For each analysis, reported usage of all other modalities were entered as covariates. To identify unique pathways for each modality mediation maps were thresholded at Z>3.25, P < 0.005, k>30 and masked exclusively with the maps for the remaining two modalities, revealing exclusive modality-specific pathways (Figure 6D and Supplementary Table S3). The Semantic pathway included left BA45 and BA22/35, approximating Broca’s and Wernicke’s areas, respectively, left temporal pole and premotor regions and dACC and anterior PCC, closely corresponding to the extended semantic system described in a recent meta-analysis (Binder et al., 2009). The Episodic pathway included the majority of the DMN, including vmPFC, PCC, left MTG and HC, bilateral angular gyrus, ventral AI and left IFG, as well as subgenual ACC extending into VS, caudate and pallidum. Finally, the Bodily pathway included regions involved in body representation, including right posterior insula (Craig, 2011), bilateral fusiform body area and left extrastriate body area (Taylor et al., 2007), in addition to MCC and PC, perigenual ACC and bilateral dorsolateral PFC, including premotor cortices.
To identify modality-independent pathways, we conjoined the thresholded maps, revealing a shared pathway (Figure 6E and Supplementary Table S3) overlapping with the Representation network identified earlier including the ventral AI and portions of the dorsomedial subsystem of the DMN (Andrews-Hanna et al., 2010), as well as a substantial portion of the FPCN (bilateral IFG and MTG, left angular gyrus, MCC), as well as posterior insula. Thus, our findings support our hypothesis that DMN, together with vAI, support representation formation in cooperation with FPCN. Importantly, the Generation network was not involved in either general nor modality-specific pathways, supporting the hypothesized distinction between representation formation and core affect generation.
We hypothesized that the EGE involves the cooperation of three core functional networks: the saliency network (SN), the DMN and the FPCN, respectively, supporting core affect generation, episodic representation and executive maintenance of emotional states. In two independent samples, we found that EGE activations centered on our three candidate networks. Decomposing these networks based on hypothesized activation profiles of component processes, we found support for our process-network mapping, showing that cortical and midbrain SN primarily contributed in the initial stages of the generation process and were unaffected by subsequent modulation efforts. Activation of cortical (dorsal AI, pre-SMA) and limbic (basal ganglia) nodes of SN were modulated by subjective experience of both positive and negative emotion, while midbrain nodes of SN showed valence-specific modulation, with PAG tracking with negative affect and SN/VTA tracking with positive affect. Overall, this is strongly supportive of the SN primarily supporting the initial generation of core affect in EGE.
Conversely, activation of DMN was observed in all conditions where participants reported elevated affect, also after active generation had ceased, and was deactivated when participants suppressed their emotional states. Notably, this pattern was not exclusive to DMN, but was also observed for limbic regions (AMY, NACC) heavily implicated in affective processing (Lindquist et al., 2012), and ventral AI, known to be associated with the intensity of emotional experience (Touroutoglou et al., 2012). Furthermore, dorsomedial DMN together with ventral AI was found to be part of a general network supporting representation formation. Overall, this supports our hypothesis that the DMN plays a central role in the representational component of EGE, expanding on it by showing that key affective regions partake in this process, which is a likely signature of the emotional nature of the representations in question.
We also found that left lateral and dorsomedial portions of FPCN, together with ITG and posterior MTG uniquely activated during extended generation efforts. The FPCN was also found to be unique being part of both components found to predict generation success in our data-driven decomposition analysis, coupling with both DMN and SN, consistent with it coordinating activation of these networks. Supporting this, left lateral frontal FPCN activation was found to predict trial-wise generation success and also to partake in the core, modality-independent pathway supporting representation formation. Thus, left lateral FPCN appears important for both the initiation and maintenance of EGE, is coherent with the known role of this region in the cognitive control of other internal processes, like memory retrieval (Badre and Wagner, 2007) and working memory. In summary, our findings suggest that EGE is a dynamic process in which left FPCN engages cortical and midbrain portions of the SN to establish a hedonic core affective state. Concurrently, FPCN couples to DMN, key limbic and insular regions, and regions supporting specific representational content, enabling the elaboration of the core affective state into an emotional experience.
It is notable that our findings show a large degree of overlap of with recent meta-analytical models of the neural bases of emotion derived from experiments using mainly exogenous, typically pictorial, stimuli (Kober et al., 2008; Lindquist et al., 2012). Based on clustering of coactivation patterns, Kober et al. (2008) described six functional networks supporting emotion processing. Interestingly, though the precise functional clustering differs somewhat, the current data suggest the involvement of at least five of these clusters, suggesting a large degree of overlap between the neural bases of exogenous and endogenous emotion. Indeed, the one component distinguishing the Kober results from the current primarily include occipital regions known to be involved primarily in visual processing—as one would expect given the lack of emotionally relevant stimuli in the current experiment. As such, our findings are largely consistent with constructivist models of emotion (Lindquist and Barrett, 2008; Barrett and Wilson-Mendenhall, 2014) and suggest that the neural architecture of emotional processing is largely similar across induction modalities.
However, despite the similarity in overall architecture, the current results do suggest that EGE might involve different functional roles for specific structures. This is most notable in the case of AMY and basal ganglia. It is commonly assumed that these support generation of core affect, i.e. the qualities of valence and arousal that form the emotional foundation of experiences (Lindquist and Barrett, 2012). Although the observation in the current data that basal ganglia activity is correlated with the intensity of both positive and negative subjective affect could be taken in support of such a relationship for emotional arousal, valence generation appears to be centered in midbrain regions, consistent with their known role in reward (SN/VTA; Berridge and Kringelbach, 2013) and aversion (PAG; Buhle et al., 2013). In our data, limbic structures like AMY and NACC appear instead to support the extended representation of emotional states, and thus are more closely linked to the experience than the generation of emotion. Although this could be specific to endogenous emotions, we note that a recent meta-analysis of mainly exogenous emotion generation experiments did not find evidence for valence-specific processing in limbic regions (Lindquist et al., 2016).
A natural question our findings raise is whether they are applicable to endogenous emotions that are not actively generated but occur spontaneously. Although ultimately an empirical question, extant evidence from the study of spontaneous mental activity in general suggests that this also is supported by the coupling of FPCN, DMN and mnemonic regions (Christoff et al., 2009), in a manner similar to what we observe. However, while the general architecture is similar, one could expect that spontaneous EGE might show different dynamics. For one, while we find evidence that core affective states are elicited concurrently with representations when EGE is volitionally initiated, in spontaneous EGE the representation is likely to be a dynamic process in which occasionally affectively salient constellations appear. One possibility is that SN is then triggered via its dmPFC node, known to be associated with monitoring ongoing cognitive processes (Dosenbach et al., 2008).
An interesting aspect of our findings is the similarity of the neural networks supporting EGE to those supporting emotion regulation. Reappraisal, one of the most closely investigated and efficacious regulation strategies in the current literature, consistently shows activation in FPCN and, in particular, the same regions of the left lateral frontal cortex we found to be at the functional core of EGE (Buhle et al., 2014). This region has also been implicated in a prefrontal-subcortical pathway predicting the capacity to regulate emotion using reappraisal (Wager et al., 2008) and is thought to play a key role in the cognitive control of memory (Badre and Wagner, 2007). Interestingly, a study comparing a variety of cognitive emotion regulation strategies showed that Reappraisal uniquely activated the left FPCN (Dörfel et al., 2014). This could relate to Reappraisal requiring the active generation of new emotional meaning, and thus indicate that Reappraisal partly depends on the capacity to endogenously generate emotion. Dörfel et al. (2014)also showed that right FPCN appears to form a core regulation network utilized across strategies, thought to implement inhibitory processes. We found that these regions were strongly deactivated during EGE. This could point to a neural basis for distinguishing between the regulation and generation of emotion, an important issue in current emotion theory (Gross and Barrett, 2011). A corollary to this is whether emotion generation can itself be used as an emotion regulation technique. Recent work in our lab (Engen and Singer, 2015) suggests so, showing that meditation-based generation of compassion can be used to actively regulate emotional responses to external stimuli, and is associated with activation of largely the same network we describe here. Importantly, no activation was observable in the right FPCN in that study, suggesting that this constituted a non-inhibitory type of regulation. Though it is unknown whether this generalizes to other generation-techniques, this suggests that regulation based on counter-generation of emotion should be considered distinct from inhibitory strategies. The current findings appear to support this, showing that EGE in general appears to, if anything, involve the deactivation of right FPCN. This is of potential practical importance, as it points to the possibility of developing interventions aimed at enhancing emotion generation ability, which could facilitate coping in individuals who are unable to utilize inhibitory strategies due to either circumstance or pathology.
The large scale of the current data required the use of a relatively compressed paradigm, which could have skewed the results. For one, as we used a fixed length for the Generation phase, it is possible that some of the effects seen are attributable to anticipation of the next phase of the experiment. Future studies could avoid this by including a variable length generation phase akin to that used in previous studies of constructive memory (Addis et al., 2009). A potential limitation of our design is that we only used a single symbol (a Blue 0) to denote that participants should aim to achieve a neutral emotional state. Although piloting indicated that using a single cue was less confusing for participants than having separate regulation cues, it is possible that this could have led to participants conflating the neutral and regulation conditions.
Another limitation to the paradigm was that we had no clear indications of the precise discrete emotions the participants generated or how these fluctuated over the course of task-implementation, as a function of for example fatigue or habituation. Future studies could investigate this by performing more detailed analysis of trial-wise contents of generated states, which could provide more information about the differences between emotional states in terms of their neural underpinnings. Similarly, a more fine-grained analysis of trail-wise variation could provide valuable insight into potential fluctuations of generation strategies during repeated emotion generation, which could provide valuable insight into the relative efficacies of generation modalities. This would also be a possible way of minimizing potential confounds stemming from fatigue/habituation like we observed here. Moreover, acquiring more details on the specific scenarios used during EGE (i.e. whether they are past or future related or whether they involved specific emotional states) would allow for more nuanced modeling of how information about emotion is stored, retrieved and combined in the construction of emotional experiences.
Tania Singer, as principal investigator, received funding for the ReSource Project from a) the European Research Council under the European Community s Seventh Framework Program (FP7/2007-2013/ ERC Grant Agreement Number 205557 to T.S.), and b) from the Max Planck Society.
We are thankful to all the members of the Department of Social Neuroscience involved in the ReSource study over many years, to Astrid Ackermann, Christina Bochow, Matthias On the endogenous control of affect Bolz, and Sandra Zurborg for managing the large-scale longitudinal study, to Hannes Niederhausen, Henrik Grunert and Torsten Kðner for their technical support, to Sylvia Tydeks, Elisabeth Murzick, Manuela Hofmann, Sylvie Neubert, and Nicole Pampus for their help with recruitment and data collection.
Supplementary data are available at SCAN online.
Conflict of interest. None declared.