|Home | About | Journals | Submit | Contact Us | Français|
While sarcasm can be conveyed solely through contextual cues such as counterfactual or echoic statements, face-to-face sarcastic speech may be characterized by specific paralinguistic features that alert the listener to interpret the utterance as ironic or critical, even in the absence of contextual information. We investigated the neuroanatomy underlying failure to understand sarcasm from dynamic vocal and facial paralinguistic cues. Ninety subjects (20 frontotemporal dementia, 11 semantic dementia [SemD], 4 progressive nonfluent aphasia, 27 Alzheimer’s disease, 6 corticobasal degeneration, 9 progressive supranuclear palsy, 13 healthy older controls) were tested using the Social Inference – Minimal subtest of The Awareness of Social Inference Test (TASIT). Subjects watched brief videos depicting sincere or sarcastic communication and answered yes-no questions about the speaker’s intended meaning. All groups interpreted Sincere (SIN) items normally, and only the SemD group was impaired on the Simple Sarcasm (SSR) condition. Patients failing the SSR performed more poorly on dynamic emotion recognition tasks and had more neuropsychiatric disturbances, but had better verbal and visuospatial working memory than patients who comprehended sarcasm. Voxel-based morphometry analysis of SSR scores in SPM5 demonstrated that poorer sarcasm comprehension was predicted by smaller volume in bilateral posterior parahippocampii (PHc), temporal poles, and R medial frontal pole (pFWE<0.05). This study provides lesion data suggesting that the PHc may be involved in recognizing a paralinguistic speech profile as abnormal, leading to interpretive processing by the temporal poles and right medial frontal pole that identifies the social context as sarcastic, and recognizes the speaker’s paradoxical intentions.
Sarcasm is a type of ironic speech in which an implicit criticism of a specific target is conveyed via contextual or paralinguistic cues. Its social function is to heighten dramatic effect (McDonald, 1999) while simultaneously increasing the perceived politeness of the speaker (Jorgensen, 1996) and decreasing the aggressiveness of the critical comment (Dews & Winner, 1995). While sarcasm can be conveyed solely through contextual cues such as counterfactual or echoic statements, and thus may be recognized in text communications, face-to-face sarcastic speech may be characterized by a specific paralinguistic profile that alerts the listener not to interpret the utterance sincerely, even in the absence of contextual information. Analysis of the vocal qualities of sarcastic speech suggests that it is characterized by an increased range and amplitude of fundamental voice frequency, higher emphatic stress, shorter pauses, and a caricatured lengthening of syllables compared to sincere speech (Anolli et al., 2000; Rockwell, 2007). Sarcasm is “a technique that plays with the voice, not in a natural but in a studied way” that is “both premeditated and affected.” (Anolli et al., 2000). Analysis of sarcasm’s non-acoustic paralinguistic features suggests that it involves varying or flattening the range and intensity of one’s facial expression, and using techniques such as widened, rolling eyes, more rapid blinking, increased grimacing and smirks to help alert the listener that the meaning is ironic (Attardo et al., 2003; Rockwell, 2001).
The ability to recognize sarcasm from paralinguistic cues develops earlier (around age 5) than the ability to correctly interpret sarcasm from contextual cues (around age 7) (Laval & Bert-Eboul, 2005), and multiple lines of evidence converge to suggest that the latter is a more complex, difficult task. Accordingly, patients with traumatic brain injury (TBI), schizophrenia, autism, and dementia have demonstrated deficits interpreting sarcasm from contextual cues (Bara et al., 2000; Champagne et al., 2003; Channon et al., 2005; Channon et al., 2007; Dennis et al., 2001; Leitman et al., 2006; Martin & McDonald, 2004; Rajendran et al., 2005). However, some studies using either audio or audio-visual sarcastic stimuli suggest that these deficits may persist even when subjects are presented with paralinguistic sarcasm cues (McDonald, 1996; McDonald et al., 2006; McDonald et al., 2003). Schizophrenic subjects not only fail to detect sarcasm in auditory stimuli, but are biased toward identifying statements as sincere compared to controls (Leitman et al., 2006). The one study using dynamic stimuli to assess sarcasm comprehension in patients with frontotemporal dementia used stimuli that mixed paralinguistic and contextual cues (Kipps et al., 2009), thus the performance of patients with neurodegenerative disease on sarcasm tasks using purely paralinguistic rather than contextual stimuli remains unknown.
While poorer recognition of paralinguistic sarcasm cues shows some correlation with emotion recognition in patient groups (Leitman et al., 2006; McDonald et al., 2006; Shamay-Tsoory, Tomer, & Aharon-Peretz, 2005), their relationship is unclear. Schizophrenic patients who show deficits recognizing paralinguistic sarcasm also perform poorly on voice prosody tasks, suggesting that voice prosody may play a significant role in sarcasm recognition (Leitman et al., 2006). Sarcasm comprehension has also been related to deficits in other cognitive areas such as slowed information processing speed, poorer working memory, reduced verbal and nonverbal new learning, and deficits in complex non-verbal executive reasoning, but the degree to which these skills are involved in the interpretation of the paralinguistic versus contextual aspects of sarcasm has never been delineated (McDonald et al., 2006).
The right temporal lobe is involved in recognizing and categorizing vocal prosody and facial cues (Allison et al., 2000), and correct interpretation of textual irony appears to be partly mediated by right temporal and dorsomedial frontal structures (Champagne et al., 2003; Eviatar & Just, 2006; Shamay-Tsoory, Tomer, & Aharon-Peretz, 2005). However, neuroanatomic studies of sarcasm recognition have primarily used text stimuli, and the anatomy underpinning paralinguistic sarcasm interpretation has not been directly studied in healthy controls or patient groups.
We investigated the neuroanatomic correlates of the ability to use paralinguistic cues to recognize sarcasm in patients with neurodegenerative disease by first testing subjects with a psychometrically validated measure of sarcasm comprehension, then performing quantitative analysis of structural MRI scans. The aim of this study was to determine the degree to which regional differences in brain volumes correspond to the ability to detect sarcasm from dynamic vocal and facial paralinguistic stimuli.
Ninety subjects were studied, including 77 patients diagnosed with one of six neurodegenerative diseases and 13 healthy older normal controls. Patients were recruited into the study from a dementia specialty clinic. These included 20 patients who met the Neary criteria(Neary et al., 1998) for the frontotemporal dementia (bvFTD) variant of frontotemporal lobar degeneration (FTLD) (typically characterized by bilateral frontal disease and a progressive behavioral syndrome) (Snowden et al., 2007), 11 with the semantic dementia (SemD) variant of FTLD (typically characterized by left anterior temporal and orbitofrontal atrophy along with profound semantic loss) (Hodges & Patterson, 2007), and four with the progressive nonfluent aphasia (PNFA) variant of FTLD (typically characterized by left inferior frontal atrophy and non-fluent speech) (Gorno-Tempini, Dronkers et al., 2004). In addition to the FTLD patients, 27 subjects had Alzheimer’s disease (AD) (diagnosed by NINDS-ADRDA criteria (McKhann et al., 1984), six had corticobasal degeneration (CBD) defined according to the criteria specified in Boxer (Boxer et al., 2005), and nine had progressive supranuclear palsy (PSP), diagnosed by the Litvan criteria (Litvan et al., 1996). Patient diagnosis was derived by a multidisciplinary team consisting of neurologists, neuropsychologists, psychiatrists, and nurses, who performed extensive neurological, behavioral, neuropsychological, and neuroimaging assessments. Patients from diverse diagnostic groups with variable behavioral test scores and patterns of gray matter atrophy were included to provide variability in the sample and thus increase the power of the correlation analysis. Patients with a Clinical Dementia Rating (CDR) score >2 were excluded, as were subjects who were not fluent in English.
Healthy control subjects were recruited through advertisements in local newspapers and recruitment talks at local senior community centers, then underwent an extensive multidisciplinary clinical evaluation. For inclusion as healthy controls for this study, subjects had to have a normal neurologic exam, a Clinical Dementia Rating Scale (CDR) score=0, MMSE score equal to or greater than 28/30, and delayed memory performance equal to or greater than the 25th percentile in both verbal and visuospatial domains.
All subjects, and where applicable their caregivers, signed an institutional review board-approved research consent form to participate in the study. Patients seen at the clinic represented a broad sample of the population in terms of ethnicity, sex, education level, and socioeconomic status, and an attempt was made to recruit all available consecutive patients for this study. Subjects’ demographic characteristics can be seen in Table 1. Subjects’ mean age was 61.8 (SD ± 8.3), and they averaged 16.1 (SD ± 3.0) years of education. There were 43 males and 47 females, and the mean CDR score for all non-normal subjects was 0.9 (SD ± 0.5). Statistically significant differences were seen across groups in sex, but not age or education. Because sex, age, and MMSE were included as potentially confounding covariates in the imaging analyses, these were also included as covariates in all analyses of test scores.
Each subject performed the Social Inference – Minimal subtest of The Awareness of Social Inference Test, (TASIT-2)(McDonald, 2002). This test is designed to assess subjects’ ability to interpret naturalistic social interactions in which the speaker utilizes sincerity, sarcasm, or paradoxical sarcasm to communicate. Only the Sincere (SIN) and Simple Sarcasm (SSR) subtests were analyzed for this study. Subjects watch 10 brief (less than 1 minute) video vignettes in which professional actors interact, and then answer four yes-no questions about the actions, thoughts, words, and emotions of the characters. The vignettes for the Sincere and Simple Sarcasm conditions (5 each) are presented after an unscored sample video is used to instruct the subject about the task, and are mixed together throughout the test, so subjects could not develop a “sincere” or “sarcastic” response set based on the order of the items.
In the Sincere and Simple Sarcasm conditions, the scripted verbal content is neutral and is interchangeable between the conditions (e.g., “I’d be happy to do it. I’ve got plenty of time.”), so subjects must observe paralinguistic cues, including facial expression, voice prosody, hand and head gestures, and body posture, to determine the speaker’s intended meaning. In the Sincere condition, the speaker’s non-verbal cues are consistent with the verbal content, thus no irony is implied, and the verbal content accurately signifies the speaker’s intended meaning. If the example script above appeared in the Sincere condition, the correct interpretation would be that the speaker truly is eager to help, and has enough spare time to do the work. In the Simple Sarcasm condition, the speaker uses exaggerated facial, vocal, and body language indicating sarcasm, thus their intended meaning is ironic and diverges from the manifest verbal content of their speech. If the example script above was used in the Simple Sarcasm condition, the correct interpretation would be that the speaker doesn’t want to do the task because she is too busy. Though the neutral content is theoretically interchangeable between conditions, no script was used for more than one single item during the test, so subjects were not given the opportunity to compare a sincere and a sarcastic reading of the same script. The four yes-no questions after each items required the subject to correctly identify the speaker’s meaning (e.g., “Is Ruth trying to pressure Gary into helping her?” “Is she annoyed with him?”)
Performance on the Sincere condition was used to gauge subjects’ ability to perform the basic demands of the task, such as comprehending the actors’ speech, following the flow of the social interaction, remembering the vignette long enough to answer yes-no questions about it, and comprehending the questions themselves. For both conditions, approximately half of the questions were reversed, so a positive or negative response set would indicate nonsensical responding and would result in a failing (chance level) score on both conditions.
Testing was completed within 4 months before or after the MRI scan, and the average span of time between testing and scan was 13 days (SD ± 26 days).
Additionally, at the time of their sarcasm testing, subjects underwent testing of emotion recognition in facial, vocal, and combined modalities. Subjects were tested with 2 subtests of the Comprehensive Affect Testing System (CATS)(Froming et al., 2001), including Emotional Prosody Discrimination (discriminating same or different emotional voice prosody with neutral semantic content), and Name Emotional Prosody (multiple-choice naming emotional voice prosody with the four emotions happy, sad, frightened, and angry). To assess their ability to identify emotions with more ecologically valid, dynamic, multimodal stimuli, subjects performed an abbreviated form of the Emotion Evaluation Subtest of the TASIT. For this test, subjects watch brief (~20 secs) videos of actors performing semantically neutral scripts portraying one of the seven basic emotions (happy, surprised, neutral, sad, anxious, frightened, revolted), and must choose the correct emotion from a card on which the seven options are written. To reduce the effects of fatigue on our elderly, demented subjects, we administered only items 1–14, for a maximum score of 14.
Each subject also underwent 2 hours of cognitive testing, and a measure of neuropsychiatric functioning was administered through an informant interview. Standard neuropsychological measures of language, visuospatial, memory, and executive functioning were used, and are detailed in Table 2. Subjects were also evaluated with the Geriatric Depression Scale (GDS), a 30-item self-report questionnaire (Yesavage et al., 1983). Behavior was measured using the Neuropsychological Inventory (NPI), a caregiver interview designed to assess the frequency and severity of behaviors that commonly occur as a result of a dementia syndrome (Cummings, 1997). Cognitive and neuropsychiatric assessment occurred within 3 months of sarcasm detection testing, and the average time between assessments was 20 days (SD ± 38 days).
To determine whether subjects who performed poorly on the Simple Sarcasm task differed from the other subjects on any neuropsychological or neuropsychiatric variables, patients were divided into two groups and directly compared. Patient scores on the Simple Sarcasm test were converted to z-scores using the healthy older control group as the standardization sample. The patients were then grouped by whether they had passed or failed the Simple Sarcasm task, with a cutoff for failure at z < −1.50 (i.e., patients were considered to have failed the task if they performed at less than the 7th percentile compared to healthy older normal controls, chosen because this level of performance signifies clinical impairment relative to controls in standard neuropsychological assessment). T-tests, using age as a covariate, were used to compare the scores of the two groups across all emotion, cognitive, and neuropsychiatric measures.
MRI scans were obtained on a 1.5-T Magnetom VISION system (Siemens Inc., Iselin, N.J.) equipped with a standard quadrature head coil. A volumetric magnetization prepared rapid gradient echo MRI (MPRAGE, TR/TE/TI = 10/4/300 milliseconds) was used to obtain T1-weighted images of the entire brain, 15-degree flip angle, coronal orientation perpendicular to the double spin echo sequence, 1.0 × 1.0 mm2 in-plane resolution and 1.5 mm slab thickness.
The voxel-based morphometry (VBM) technique utilizes an image pre-processing step (spatial normalization, segmentation, modulation, and smoothing) followed by statistical analysis. Both stages were performed using the SPM5 software package (Wellcome Department of Cognitive Neurology, London; http://www.fil.ion.ucl.ac.uk/spm) running on Matlab 7.0.1 (MathWorks, Natick, MA). MRI images were pre-processed primarily using SPM5 default settings and tissue probability maps, though light cleanup of partitions was performed. Spatially normalized, segmented, and modulated grey matter images were then smoothed with a 12 mm FWHM isotropic Gaussian Kernel.
Covariates-only statistical models were used to show the relationship between TASIT scores and voxel-wise gray matter volume. To control for subjects’ ability to comprehend the test, scores for both conditions (Sincere and Simple Sarcasm) were entered into each design matrix, as were the confounding covariates age, sex, and MMSE score (as a proxy for disease severity). Total intracranial volume (TIV) was used as a global covariate to correct for individual differences in head size. Regionally specific differences in grey matter volumes at each voxel were assessed using the general linear model, and the significance of each effect was determined using the theory of Gaussian fields (SPM5 defaults). Results were considered significant if they survived correction for family-wise error across the whole brain (pFWE<0.05)
The following contrasts were performed: 1) To look at the main effect of paralinguistic sarcasm comprehension, controlling for performance on the Sincere condition, a [0 1] t-contrast was used (with additional zeros for nuisance covariates), assuming that poorer test performance would be associated with decreased gray matter volumes. 2) To determine whether performance deficits on the Sincere condition could be localized to a specific region, a [1 0] t-contrast was used.
Because regional atrophy is not randomly distributed across this sample, but is represented in patterns of atrophy that are similar within, and to some degree across, diagnostic categories, the main effects analysis was expected to demonstrate some degree of confounding due to co-atrophy effects. This artifact, typical of VBM studies in patients with neurodegenerative disease, occurs when brain regions are identified by VBM that are not directly related to (in this case) sarcasm score, but instead are the result of disease-specific patterns of co-atrophy occurring along with regions truly related to sarcasm task performance. The superior quality of preprocessing afforded by the SPM5 software compared to SPM2 has the benefit of increasing the sensitivity of VBM analyses, but it simultaneously heightens the degree to which co-atrophy artifact appears, increasing the importance of further analysis of main effect results.
To determine the relative contribution of the various regions found to have an independent relationship to sarcasm in the massively univariate main effects analysis, we performed a linear regression analysis of the voxel values at each peak coordinate using the SAS 9.1 statistical program. Voxel probabilities were extracted from the smoothed, warped, modulated, gray-matter images of each subject at each peak voxel that was significant in the main effects analysis. These voxel probability values were then analyzed together in linear regression analyses, including age, sex, MMSE, SIN, and TIV in each model as potentially confounding covariates, and using SSR score as the outcome variable. We used the modified Allen-Cady predictor selection technique specified in Vittinghoff (Vittinghoff et al., 2004), forcing age, sex, MMSE, SIN, and TIV into the model as covariates, and setting a very permissive inclusion threshold at p < 0.20 to ensure that brain regions showing at least a modest independent relationship to SSR score remained in the model.
The linear regression analysis uses a data-driven approach to identify and reject brain regions that appear in the VBM main effect results simply because they are statistically more likely to co-atrophy with other brain regions directly related to sarcasm. However, this analysis does not rule out the possibility that significant findings hold true only in one diagnostic group and do not represent a generalizable brain-behavior relationship. It is logically possible for this kind of illusory correlation to occur in any VBM analysis using patients from multiple neurodegenerative disease groups, because if disease group membership predicts a region of atrophy (G→A), and disease group membership also predicts poor performance on the behavior task (G→B), then that region of atrophy may appear to directly correlate with the behavior (A↔B), when that correlation is actually spurious (A←/→B).
In order to perform a second error-check for co-atrophy, we parameterized each diagnosis (0=no, 1=yes) and entered all 7 diagnostic groups into the design matrix as confounding covariates (using 6 dummy variables to represent the 7 groups). Then the relationship between sarcasm score and atrophy was examined using a [0 1 0 0 0 0 0 0 0 0 0] contrast (See Figure 2). The results of this analysis shows regions of atrophy significantly related to Sarcasm score only if they appear in more than one diagnostic group. These results must be considered in light of the regression results, however, because this method will improperly exclude any regions that are legitimately related to Sarcasm score, but which only atrophy in a single diagnostic group.
Based on evidence from functional and lesion-based studies of social and emotional functions, we hypothesized that deficits in the ability to interpret paralinguistic cues as sarcastic would correspond with grey-matter atrophy in a right temporal-frontal network (Allison et al., 2000; Gallagher & Frith, 2003; Perry et al., 2001; Rosen et al., 2002; Shamay-Tsoory, Tomer, & Aharon-Peretz, 2005). Poor performance on the more general Sincere control condition was expected to occur because of deficits in different cognitive modalities, according to the functions particularly affected by any one of the multiple neurodegenerative diseases represented in our sample (e.g., impaired verbal or visual memory, working memory, semantic loss, or syntax comprehension deficits). Because detection across the sample of the source of failure on this task would not be isolated to a single anatomic network, since no single diagnostic group showed disproportionate deficits on the Sincere task, we hypothesized that there would be no suprathreshold clusters for the Sincere condition main effect,
An omnibus analysis of variance using a general linear model, controlling for sex, age, and MMSE, showed no significant diagnostic group differences in how subjects performed on the Sincere condition, suggesting all disease groups were able to adequately comprehend the test despite their cognitive deficits. However, there were significant differences across groups on Simple Sarcasm score (p<0.0007) (Table 1, Figure 1). SemD patients showed significantly lower Simple Sarcasm scores than controls (p<0.05 based on a post-hoc Dunnett-Hsu test controlling for sex, age, and MMSE). No other dementia group (bvFTD, PNFA, AD, CBD, PSP) showed impairment on either condition relative to normal controls. Sincere scores did not correlate significantly with Simple Sarcasm scores.
The patients were then grouped by whether they had passed or failed the Simple Sarcasm task, and the cognitive, emotion, and neuropsychiatric profiles of the two groups were compared (Table 2). Patients failing the Sarcasm task included 4 bvFTDs, 8 SDs 2 ADs and 1 PSP patient. The “Fail” group performed significantly worse than the “Pass” group on tests of dynamic emotion recognition (TASIT EET), confrontation naming, semantic fluency, and verbal recognition memory, and they showed a significantly more impaired neuropsychiatric profile on the NPI. However, the “Fail” group performed significantly better than the “Pass” group on tests of visuospatial functioning, verbal and nonverbal working memory, and ability to inhibit an automatic verbal response. Also, patients in the “Fail” group were statistically more likely to give correct interpretations of videos from the Sincere condition of the TASIT. There were no differences between the “Pass” and “Fail” groups on the simple emotional voice prosody naming or recognition tasks (CATS), or on other neuropsychological tests.
The main effect of simple sarcasm comprehension (Simple Sarcasm score controlling for Sincere score) included voxels at the bilateral temporal poles, bilateral parahippocampal gyrii, the right middle temporal gyrus, the right superior frontal gyrus, and the head of the caudate (p<0.05, FWE) (Table 3 and Figure 2). These results demonstrated a pattern very similar to the regions of the right and left temporal lobe frequently affected in SD, probably as a result of the predicted co-atrophy artifact (see Methods). Plots of voxel intensity at each of the ten significant peak voxels against total Simple Sarcasm score showed no outliers on the independent variable, and sarcasm detection scores were widely distributed throughout the range of voxel intensities, demonstrating no outliers and suggesting that there was no restriction of range. Analysis of the main effect of performance on the Sincere task showed no significant voxels.
Voxel probability scores were extracted from the smoothed, modulated, normalized grey-matter images at each of the ten peak voxels identified in the main effect analysis. These included the right superior temporal pole (RSTP), the right parahippocampal gyrus (RPHG), the right middle temporal gyrus (RMTG), two peaks within the left parahippocampal gyrus (LPHG1, LPHG2), the left superior temporal pole (LSTP), and left inferior temporal pole (LITP), the head of the caudate (CH), and two peaks within the right superior frontal gyrus (RSFG1, RSFG2) (Table 3). When these regions and Sarcasm score were entered into a partial correlation matrix, controlling for sex, age, MMSE, SIN, and TIV, all regions were significantly correlated with each other, and Sarcasm score, at p<0.01, with the strength of correlations ranging from r=0.27 to r=0.81. A linear regression was performed in which variables with no discernable unique relationship to Sarcasm score were removed (see Methods), as these variables may have been significant in the Main Effects result because of disease-specific co-atrophy patterns, rather than because of a direct relationship to sarcasm comprehension. The variables that remained for analysis included the RPHG (24, −14, −32), LPHG2 (−22, −26, −16), LITP (−46, 12, −42), and RSFG1 (28, 64, 16).
Performing the main effects analysis described above, but controlling for diagnostic group membership, the left temporal lobe regions dropped out of the analysis, and more focal correlations between Sarcasm score and atrophy were seen in the right superior temporal pole, right caudate, and bilateral parahippocampal gyrii. The peaks in the right superior frontal gyrus and right posterior middle temporal gyrus increased in magnitude, as measured by their t-score (p<0.05, FWE) (See Table 3 and Figures 2 and and33).
The main effects analysis yielded significant results throughout both anterior temporal lobes that appeared similar to the atrophy pattern seen in the SemD patients, who were most likely to fail the sarcasm task. However, the two coatrophy error checks reduced these areas to the regions most likely to have a significant relationship with sarcasm score, statistically independent of other significant brain regions and of diagnostic group membership. The areas surviving both error checks included regions in the right and left parahippocampal gyrii and the right superior frontal gyrus. The left temporal pole survived the linear regression error check, but dropped out of the shared effects analysis, most likely because this region is atrophic in only one diagnostic group, SemD. The right superior temporal pole survived the shared effects error check, but did not survive the linear regression error check, potentially because it was highly collinear with the homologous left temporal pole, decreasing the effectiveness of the regression analysis to segregate each region’s independent contribution. Thus, though the relative contribution of left versus right temporal pole is unclear, there is a high likelihood that one or both regions showed a significant independent relationship to sarcasm score. Other regions (inferior temporal regions, caudate, posterior middle temporal gyrus) failed at least one error check, thus the evidence supporting their independent relationship to sarcasm score is weaker.
VBM was used in patients with neurodegenerative disease and healthy older adults to correlate MRI-derived brain volumes with a measure of the ability to detect sarcasm based on paralinguistic cues. The primary finding was that lower scores on sarcasm recognition corresponded most significantly with atrophy to the temporal lobes bilaterally, particularly the parahippocampal gyrii and the temporal poles, as well as the right superior frontal gyrus. Subjects who failed the sarcasm recognition task performed more poorly on realistically dynamic, but not simple static emotion recognition tasks, and had more neuropsychiatric disturbances. However, they performed significantly better than other patients at correctly interpreting sincere communication, and had significantly better verbal and visuospatial working memory. Previous studies have demonstrated a link between the right hemisphere and the ability to comprehend paralinguistic prosody in patients with neurological disorders (Brown et al., 2005; Cutica et al., 2006; Pell, 2007). However, this is the first study to use quantitative image analysis to more precisely link paralinguistic comprehension deficits with damage to specific brain structures.
In order for subjects to perform the sarcasm task used in this study, they were required to answer yes-no questions about the thoughts, words, feelings, and actions of the speakers. Subjects in all groups were able to answer a high proportion of these questions correctly when the speakers’ intentions were sincere, thus demonstrating adequate memory, working memory, syntax comprehension, and semantic comprehension to perform the task. However, a subset of patients answered questions on the sarcastic items as if they believed the speakers to be sincere. The breakdown of subjects’ ability to comprehend paralinguistic sarcasm may have occurred at different levels, including 1) initial failure to correctly process vocal and visual information so that the paralinguistic profile is recognized as atypical, 2) failure to interpret the atypical paralinguistic profile as a cue that the social context has shifted and that the speaker’s statements should no longer be interpreted literally, but require additional downstream processing, or 3) failure to correctly infer the speaker’s non-literal meaning, including their thoughts, intentions, and emotions. Because this study used clinical patients and an atrophy model, rather than a dynamic model of normal function using fMRI, the study design does not allow unequivocal identification of the level or levels at which subjects’ failure occurred. However, examination of the known functions of the regions found in this study suggests that different subsets of patients may have experienced breakdown primarily at the latter two stages, involving more complex downstream interpretation of the speaker’s meaning, while early, upstream processing of suprasegmantal vocal cues may have remained largely intact. This interpretation is also supported by the finding that the patient group failing the Sarcasm condition was no more likely than the passing group to have difficulty with discriminating or naming emotional prosody on simple auditory testing.
Studies have established that the distinct auditory profile of sarcastic speech compared to sincere speech includes a higher fundamental frequency (fo) with a greater range amplitude, higher energy values, shorter pauses, and lengthened syllables (Anolli et al., 2000). These studies suggest that sarcasm is primarily identified based on its temporal and spectral vocal features, functions associated with the temporal lobes (Beaucousin et al., 2007; Belin et al., 2000; Wildgruber et al., 2006) Our study found that sarcasm comprehension decreased in conjunction with atrophy to specific regions of the temporal lobe that may be involved in social signal detection and higher-level conceptual processing.
Though not commonly identified as a region involved in processing social stimuli, volume in the parahippocampal cortex (PHc) showed a strong relationship with the ability to correctly interpret sarcastic communication. The PHc is a higher-order polymodal association area with strong afferent and efferent connections to temporal, parietal, and frontal cortices. The more lateral region of the PHc (area TF) receives visuospatial information from parietal area 7 (in the dorsal stream or “where is it” pathway), along with strong unimodal inputs from visual areas V4, TEO, and TE, as well as projections from the agranular, dysgranular, and granular portions of the dorsal insula. The more medial PHc (area TH) receives significant projections from auditory association cortex in the superior temporal gyrus, as well as projections from the parainsular cortex. TF and TH both receive substantial projections from dorsal frontal areas (BA 46 and 9), the rostral portions of the anterior cingulate (BA 24 and 32) and the retrosplenium (BA 30 and 29) (Suzuki & Amaral, 1994). The PHc also shows strong reciprocal projections to all of these temporal, parietal, frontal, and insular regions (Lavenex et al., 2002). The PHc is involved in encoding and retrieving information about contexts, and its function is distinct from that of anterior perirhinal cortex, which is involved in object memory. A recent review of functional imaging studies examining this area suggests that it is involved in encoding and retrieving contextual information (Diana et al., 2007). For instance, the PHc activates bilaterally when subjects view familiar objects in a novel visuospatial arrangement, but not in response to novel objects if the spatial configuration remains the same (Pihlajamäki et al., 2004). One study showed bilateral PHc activity when subjects were discriminating correct and incorrect background features of auditory stimuli on retrieval (i.e., whether the speaker’s voice was male or female at the time words were encoded), but not of visually encoded information (i.e., whether the background texture to a picture was a lawn or clouds) (Peters et al., 2007). Patients with resections of either the right or left PHc do not recognize dissonant musical compositions as more unpleasant than consonant music (Gosselin et al., 2006), suggesting that the PHc may be involved in deriving positive and negative valence from complex auditory cues, and that PHc damage may prevent listeners from recognizing that dissonance represents a change in the overall mood of the music. This finding has a clear analogy to the interpretation of sarcastic vocal prosody.
We propose that the significant relationship between volume loss in the PHc and the inability to recognize sarcasm in this study may suggest that many subjects primarily failed at the level of social signal detection. If the paralinguistic input to the PHc from superior temporal auditory association cortex is typical, then the background communicative context associated with the spoken words is assumed to be sincere, and the speaker’s thoughts and feelings are interpreted as matching their words. However, if the paralinguistic input is recognized by the PHc as atypical, particularly if the vocal prosody adopts the dissonant and unpleasant cadences inherent to sarcasm, this alerts the listener that the background communicative context has changed. Patients who do not recognize that the sarcastic speaker is signaling a change of interpretive context will continue in the default mode of communication, incorrectly assuming that the speaker’s words are sincere, and will thus fail to initiate the additional downstream processing required to correctly interpret the speaker’s paradoxical statements. The fact that subjects failing the sarcasm detection task did not perform worse than other subjects on simple prosody discrimination tasks suggests that they may be able to hear the prosody of the sarcasm, but still fail to recognize that it has any social importance. Not only have vocal cues more consistently and strongly been associated with identification of sarcasm in normal subjects than visual features (Rockwell, 2001, 2007), but volume loss to the most medial TH region of the PHc, an area much more extensively interconnected with auditory than visual cortex, correlated more strongly with failure to detect sarcasm in our study. The presence of extensive afferent and efferent connections between the PHc and both insular and dorsomedial frontal cortex suggests that the PHc may provide direct bottom-up input to regions involved in higher social and emotional processing, or these anterior regions may exert a top-down influence on the PHc in reinforcing the social salience of the sarcastic paralinguistic profile. Clearly, however, this study was not designed to identify the specific role of the PHc in sarcasm comprehension, only to establish the correlation, thus these proposed functions are speculative and should be subject to further investigation in other research modalities.
Though both temporal poles were significantly related to sarcasm processing in the main-effects analysis, additional analysis was unable to clarify the relative contributions of the left versus right temporal pole. Like the frontal pole, the temporal poles are made up of tertiary association cortex (Mesulam, 1998) and thus are liable to be involved in downstream processing of sarcasm, most likely at the stage of conceptual interpretation once the paralinguistic profile of sarcasm has been detected. Both temporal poles seem to be involved in higher-level conceptual knowledge (Ralph et al., 2008), and the type of information processed by either pole appears to be partly dependent on the modality of the input from that side of the brain (e.g., social-emotional vs. linguistic). The right temporal pole is associated with processing social and emotional information (Olson et al., 2007; Phan et al., 2002) and has been linked with the ability to generate an empathic response (Rankin et al., 2006). It is likely involved in high-level integration of social and emotional signals based on multiple external and internally generated sources of information, yielding higher level social conceptual knowledge (Zahn et al., 2009). Patients with damage to this area may have been more likely to fail the sarcasm task due to inability to correctly read the speaker’s emotional and social intent, even if they did recognize the paralinguistic speech profile as abnormal. The left temporal pole is involved in linguistic semantic networks, and damage to this area in patients with neurodegenerative disease results in patients’ loss of word and object knowledge (Murre et al., 2001). Patients demonstrated normal comprehension of the post-video questions in the sincere condition,, and our analyses controlled for performance on the Sincere task in order to remove variance associated with simple comprehension deficits. However, it is possible that correct interpretation of the sarcastic videos may have required a higher level of conceptual semantic information that did not directly concern the meaning of words in the videos or post-video questions, which was accounted for by sincere control task performance. Functional imaging shows that knowledge of higher-order social concepts is primarily associated with bilateral anterior temporal cortex (Zahn et al., 2007), and loss of social conceptual knowledge in bvFTD patients has been associated with damage to the right dorsal anterior temporal lobe region also seen in our analysis (Zahn et al., 2009). Sarcasm is a learned socio-linguistic construct (Laval & Bert-Eboul, 2005), and perhaps loss of the acquired knowledge that language can be insincere limited patients’ comprehension.
Our study did find a small but significant cluster of atrophy in the right superior frontal gyrus, corresponding to Brodmann’s area 10, that correlated with sarcasm task performance. This area is significantly downstream from the initial steps of processing the paralinguistic features of the communication, and is more likely to be involved in recognizing that the sarcastic speaker intends to convey a meaning other than their words would suggest. In a recent review, Krueger et al. (Kreuger et al., 2009) suggest that the anterior medial prefrontal cortex is involved in interpreting low-frequency social scripts, and deriving intentions and event outcomes from goal-directed action sequences. The frontal pole has repeatedly been linked with social perspective taking, a skill that is likely involved in the ability to correctly interpret a sarcastic speaker’s paradoxical intentions. In a review of evidence from clinical and neuroimaging studies, Decety (Decety & Jackson, 2004) suggests that dorsomedial frontal areas may facilitate perspective taking by inhibiting the default self-perspective, in order to temporarily attend to and make inferences about the other’s point of view. These areas may also be involved in performing a “triadic attention” task, in which one’s own perspective, the other’s perspective, and reality must be held online and compared (Gallagher & Frith, 2003; Saxe, 2006; Zysset et al., 2002). Studies of brain injured patients also have documented worse perspective taking and irony detection in patients with lesions to the dorsomedial and medial frontopolar areas including BA 10 (Shamay-Tsoory et al., 2003; Shamay-Tsoory, Tomer, Berger et al., 2005; Shamay-Tsoory et al., 2004; Stone et al., 1998; Stuss et al., 2001),
The fact that this study saw significant dorsomedial, but not dorsolateral, correlations in the frontal cortex is consistent with the behavioral finding that executive impairment did not predict sarcasm comprehension failure in this group of patients. In fact, patients who failed the sarcasm task had significantly better working memory and response inhibition than patients who passed. If this study had performed a text-based task, or a task basing interpretation of sarcasm on other non-paralinguistic contextual cues, subjects’ performance might have correlated with executive deficits in areas such as working memory and complex non-verbal reasoning.
These results raise the question of whether particular aspects of this pathway are differentially affected within specific disease groups. However, single diagnostic group analyses of behavioral correlates using VBM is methodologically unsound, because severe restriction of range in both anatomic and behavioral diversity results in inadequate power for unbiased whole-brain analyses. As a result, this study could not directly examine neural correlates of sarcasm within single diagnostic groups.
Technical limitations notwithstanding, these data do imply that certain brain-behavior relationships exert a differential impact across disease groups. Right temporal damage, particularly in the context of neurodegenerative disease, has been associated with emotion comprehension deficits (Rosen et al., 2004), disturbed emotional expression (Mendez et al., 2006; Miller et al., 1993), lack of empathy (Mendez & Perryman, 2003; Rankin et al., 2006; Rankin et al., 2005), and general loss of social sensitivity (Gorno-Tempini, Rankin et al., 2004; Perry et al., 2001). In this study, SemD patients were more likely to fail the sarcasm recognition task than any other group, including bvFTD patients, though our anatomic analysis implicated both left and right temporal structures in this deficit. The semantic dementia subtype is diagnosed primarily on the basis of left-temporally-mediated language symptoms such as loss of word knowledge. These language comprehension deficits should presumably have caused SemD patients in this study to attend more closely to the non-linguistic features of the videos, but instead, their performance suggests they took the speakers’ words at face value, ignoring the sarcastic paralinguistic profile. Unexpectedly, these early SemD patients outperformed normal control subjects in their ability to correctly respond to questions about characters’ thoughts, feelings, words, and actions in the sincere videos, suggesting more than adequate word comprehension for the task. The association between sarcasm comprehension and the posterior parahippocampal region suggests that some SemD patients may have failed to recognize the paralinguistic profile of sarcasm as a dissonant, unpleasant signal indicating that the social context had changed and required more careful attention and interpretation. Anterior temporal lobe damage causing loss of the learned social concept of “sarcasm” may also have biased SemD patients towards interpreting the communications as sincere.
While temporal lobe damage is ubiquitous in SemD, behavioral variant FTD patients show a wide variety of neuroanatomic disease patterns, with only a subset of patients developing temporal lobe involvement. However, the right superior frontal gyrus is more consistently affected in bvFTD, and this structure was also found to be important for sarcasm comprehension. Considering the anatomic diversity across bvFTD, patients with temporal damage may fail to comprehend sarcasm because of upstream deficits in paralinguistic processing, or semantic loss of the social concept of “sarcasm”. On the other hand, patients without temporal damage may show downstream failure inferring intentions (“Theory of Mind”) due to dorsomedial frontal damage reducing their ability to imagine the speaker’s perspective, and hold it online for analysis. Though 20% of the bvFTD patients failed the sarcasm task, the bvFTD group as a whole did not show statistically significant differences from healthy older control subjects with these stimuli, which primarily relied upon paralinguistic rather than contextual cues. In a recent study using stimuli that combined paralinguistic with contextual cues, bvFTD patients showed much greater impairment in sarcasm comprehension (Kipps et al., 2009). This suggests that if bvFTD patients were studied in a paradigm asking them to identify sarcasm based purely upon contextual cues, a more developmentally and executively complex skill (Laval & Bert-Eboul, 2005), they would be more likely to fail.
Using VBM with structural MRI images, the anatomic substrate of the ability to recognize the paralinguistic profile of sarcastic speech was delineated in patients with neurodegenerative disease and healthy older controls. This study provides lesion data suggesting that the posterior parahippocampus may be involved in recognizing a paralinguistic speech profile as abnormal, which in turn activates interpretive processing by the temporal poles and right medial frontal pole that can access identify the social context as sarcastic, and recognize the speaker’s paradoxical intentions.
This research was supported in part by the National Institute on Aging (NIA) grants 5-K23 AG021606, 5-R01 AG029577 and 5-P01 AG19724, the State of California, Alzheimer’s Disease Research Center of California (ARCC) grant 03-75271, and the Larry L. Hillblom Foundation, Inc., grant 2002/2J.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.