Ninety subjects were studied, including 77 patients diagnosed with one of six neurodegenerative diseases and 13 healthy older normal controls. Patients were recruited into the study from a dementia specialty clinic. These included 20 patients who met the Neary criteria(Neary et al., 1998
) for the frontotemporal dementia (bvFTD) variant of frontotemporal lobar degeneration (FTLD) (typically characterized by bilateral frontal disease and a progressive behavioral syndrome) (Snowden et al., 2007
), 11 with the semantic dementia (SemD) variant of FTLD (typically characterized by left anterior temporal and orbitofrontal atrophy along with profound semantic loss) (Hodges & Patterson, 2007
), and four with the progressive nonfluent aphasia (PNFA) variant of FTLD (typically characterized by left inferior frontal atrophy and non-fluent speech) (Gorno-Tempini, Dronkers et al., 2004
). In addition to the FTLD patients, 27 subjects had Alzheimer’s disease (AD) (diagnosed by NINDS-ADRDA criteria (McKhann et al., 1984
), six had corticobasal degeneration (CBD) defined according to the criteria specified in Boxer (Boxer et al., 2005
), and nine had progressive supranuclear palsy (PSP), diagnosed by the Litvan criteria (Litvan et al., 1996
). Patient diagnosis was derived by a multidisciplinary team consisting of neurologists, neuropsychologists, psychiatrists, and nurses, who performed extensive neurological, behavioral, neuropsychological, and neuroimaging assessments. Patients from diverse diagnostic groups with variable behavioral test scores and patterns of gray matter atrophy were included to provide variability in the sample and thus increase the power of the correlation analysis. Patients with a Clinical Dementia Rating (CDR) score >2 were excluded, as were subjects who were not fluent in English.
Healthy control subjects were recruited through advertisements in local newspapers and recruitment talks at local senior community centers, then underwent an extensive multidisciplinary clinical evaluation. For inclusion as healthy controls for this study, subjects had to have a normal neurologic exam, a Clinical Dementia Rating Scale (CDR) score=0, MMSE score equal to or greater than 28/30, and delayed memory performance equal to or greater than the 25th percentile in both verbal and visuospatial domains.
All subjects, and where applicable their caregivers, signed an institutional review board-approved research consent form to participate in the study. Patients seen at the clinic represented a broad sample of the population in terms of ethnicity, sex, education level, and socioeconomic status, and an attempt was made to recruit all available consecutive patients for this study. Subjects’ demographic characteristics can be seen in . Subjects’ mean age was 61.8 (SD ± 8.3), and they averaged 16.1 (SD ± 3.0) years of education. There were 43 males and 47 females, and the mean CDR score for all non-normal subjects was 0.9 (SD ± 0.5). Statistically significant differences were seen across groups in sex, but not age or education. Because sex, age, and MMSE were included as potentially confounding covariates in the imaging analyses, these were also included as covariates in all analyses of test scores.
Table 1 General and neuropsychiatric characteristics of subject sample classified by diagnostic group. For CATS and TASIT tests, the F-statistic and p-values are for overall diagnostic group differences controlling for age, sex, and MMSE; for other measures, (more ...)
Assessment of sarcasm detection
Each subject performed the Social Inference – Minimal subtest of The Awareness of Social Inference Test, (TASIT-2)(McDonald, 2002
). This test is designed to assess subjects’ ability to interpret naturalistic social interactions in which the speaker utilizes sincerity, sarcasm, or paradoxical sarcasm to communicate. Only the Sincere (SIN) and Simple Sarcasm (SSR) subtests were analyzed for this study. Subjects watch 10 brief (less than 1 minute) video vignettes in which professional actors interact, and then answer four yes-no questions about the actions, thoughts, words, and emotions of the characters. The vignettes for the Sincere and Simple Sarcasm conditions (5 each) are presented after an unscored sample video is used to instruct the subject about the task, and are mixed together throughout the test, so subjects could not develop a “sincere” or “sarcastic” response set based on the order of the items.
In the Sincere and Simple Sarcasm conditions, the scripted verbal content is neutral and is interchangeable between the conditions (e.g., “I’d be happy to do it. I’ve got plenty of time.”), so subjects must observe paralinguistic cues, including facial expression, voice prosody, hand and head gestures, and body posture, to determine the speaker’s intended meaning. In the Sincere condition, the speaker’s non-verbal cues are consistent with the verbal content, thus no irony is implied, and the verbal content accurately signifies the speaker’s intended meaning. If the example script above appeared in the Sincere condition, the correct interpretation would be that the speaker truly is eager to help, and has enough spare time to do the work. In the Simple Sarcasm condition, the speaker uses exaggerated facial, vocal, and body language indicating sarcasm, thus their intended meaning is ironic and diverges from the manifest verbal content of their speech. If the example script above was used in the Simple Sarcasm condition, the correct interpretation would be that the speaker doesn’t want to do the task because she is too busy. Though the neutral content is theoretically interchangeable between conditions, no script was used for more than one single item during the test, so subjects were not given the opportunity to compare a sincere and a sarcastic reading of the same script. The four yes-no questions after each items required the subject to correctly identify the speaker’s meaning (e.g., “Is Ruth trying to pressure Gary into helping her?” “Is she annoyed with him?”)
Performance on the Sincere condition was used to gauge subjects’ ability to perform the basic demands of the task, such as comprehending the actors’ speech, following the flow of the social interaction, remembering the vignette long enough to answer yes-no questions about it, and comprehending the questions themselves. For both conditions, approximately half of the questions were reversed, so a positive or negative response set would indicate nonsensical responding and would result in a failing (chance level) score on both conditions.
Testing was completed within 4 months before or after the MRI scan, and the average span of time between testing and scan was 13 days (SD ± 26 days).
Assessment of emotion comprehension
Additionally, at the time of their sarcasm testing, subjects underwent testing of emotion recognition in facial, vocal, and combined modalities. Subjects were tested with 2 subtests of the Comprehensive Affect Testing System (CATS)(Froming et al., 2001
), including Emotional Prosody Discrimination (discriminating same or different emotional voice prosody with neutral semantic content), and Name Emotional Prosody (multiple-choice naming emotional voice prosody with the four emotions happy, sad, frightened, and angry). To assess their ability to identify emotions with more ecologically valid, dynamic, multimodal stimuli, subjects performed an abbreviated form of the Emotion Evaluation Subtest of the TASIT. For this test, subjects watch brief (~20 secs) videos of actors performing semantically neutral scripts portraying one of the seven basic emotions (happy, surprised, neutral, sad, anxious, frightened, revolted), and must choose the correct emotion from a card on which the seven options are written. To reduce the effects of fatigue on our elderly, demented subjects, we administered only items 1–14, for a maximum score of 14.
Each subject also underwent 2 hours of cognitive testing, and a measure of neuropsychiatric functioning was administered through an informant interview. Standard neuropsychological measures of language, visuospatial, memory, and executive functioning were used, and are detailed in . Subjects were also evaluated with the Geriatric Depression Scale (GDS), a 30-item self-report questionnaire (Yesavage et al., 1983
). Behavior was measured using the Neuropsychological Inventory (NPI), a caregiver interview designed to assess the frequency and severity of behaviors that commonly occur as a result of a dementia syndrome (Cummings, 1997
). Cognitive and neuropsychiatric assessment occurred within 3 months of sarcasm detection testing, and the average time between assessments was 20 days (SD ± 38 days).
Neuropsychological and neuropsychiatric characteristics of non-normal subjects (N=77), stratified by performance on simple sarcasm recognition task (SSR). F-statistics are derived from general linear models controlling for age, sex, and MMSE score.
Analysis of test performance
To determine whether subjects who performed poorly on the Simple Sarcasm task differed from the other subjects on any neuropsychological or neuropsychiatric variables, patients were divided into two groups and directly compared. Patient scores on the Simple Sarcasm test were converted to z-scores using the healthy older control group as the standardization sample. The patients were then grouped by whether they had passed or failed the Simple Sarcasm task, with a cutoff for failure at z < −1.50 (i.e., patients were considered to have failed the task if they performed at less than the 7th percentile compared to healthy older normal controls, chosen because this level of performance signifies clinical impairment relative to controls in standard neuropsychological assessment). T-tests, using age as a covariate, were used to compare the scores of the two groups across all emotion, cognitive, and neuropsychiatric measures.
MRI scans were obtained on a 1.5-T Magnetom VISION system (Siemens Inc., Iselin, N.J.) equipped with a standard quadrature head coil. A volumetric magnetization prepared rapid gradient echo MRI (MPRAGE, TR/TE/TI = 10/4/300 milliseconds) was used to obtain T1-weighted images of the entire brain, 15-degree flip angle, coronal orientation perpendicular to the double spin echo sequence, 1.0 × 1.0 mm2 in-plane resolution and 1.5 mm slab thickness.
The voxel-based morphometry (VBM) technique utilizes an image pre-processing step (spatial normalization, segmentation, modulation, and smoothing) followed by statistical analysis. Both stages were performed using the SPM5 software package (Wellcome Department of Cognitive Neurology, London; http://www.fil.ion.ucl.ac.uk/spm
) running on Matlab 7.0.1 (MathWorks, Natick, MA). MRI images were pre-processed primarily using SPM5 default settings and tissue probability maps, though light cleanup of partitions was performed. Spatially normalized, segmented, and modulated grey matter images were then smoothed with a 12 mm FWHM isotropic Gaussian Kernel.
VBM Analyses of Sarcasm Processing
Covariates-only statistical models were used to show the relationship between TASIT scores and voxel-wise gray matter volume. To control for subjects’ ability to comprehend the test, scores for both conditions (Sincere and Simple Sarcasm) were entered into each design matrix, as were the confounding covariates age, sex, and MMSE score (as a proxy for disease severity). Total intracranial volume (TIV) was used as a global covariate to correct for individual differences in head size. Regionally specific differences in grey matter volumes at each voxel were assessed using the general linear model, and the significance of each effect was determined using the theory of Gaussian fields (SPM5 defaults). Results were considered significant if they survived correction for family-wise error across the whole brain (pFWE<0.05)
1. Main effect analyses: Voxel-wise regression of gray matter on SSR score
The following contrasts were performed: 1) To look at the main effect of paralinguistic sarcasm comprehension, controlling for performance on the Sincere condition, a [0 1] t-contrast was used (with additional zeros for nuisance covariates), assuming that poorer test performance would be associated with decreased gray matter volumes. 2) To determine whether performance deficits on the Sincere condition could be localized to a specific region, a [1 0] t-contrast was used.
2. First co-atrophy error check: Linear regression comparison of significant peak voxels
Because regional atrophy is not randomly distributed across this sample, but is represented in patterns of atrophy that are similar within, and to some degree across, diagnostic categories, the main effects analysis was expected to demonstrate some degree of confounding due to co-atrophy effects. This artifact, typical of VBM studies in patients with neurodegenerative disease, occurs when brain regions are identified by VBM that are not directly related to (in this case) sarcasm score, but instead are the result of disease-specific patterns of co-atrophy occurring along with regions truly related to sarcasm task performance. The superior quality of preprocessing afforded by the SPM5 software compared to SPM2 has the benefit of increasing the sensitivity of VBM analyses, but it simultaneously heightens the degree to which co-atrophy artifact appears, increasing the importance of further analysis of main effect results.
To determine the relative contribution of the various regions found to have an independent relationship to sarcasm in the massively univariate main effects analysis, we performed a linear regression analysis of the voxel values at each peak coordinate using the SAS 9.1 statistical program. Voxel probabilities were extracted from the smoothed, warped, modulated, gray-matter images of each subject at each peak voxel that was significant in the main effects analysis. These voxel probability values were then analyzed together in linear regression analyses, including age, sex, MMSE, SIN, and TIV in each model as potentially confounding covariates, and using SSR score as the outcome variable. We used the modified Allen-Cady predictor selection technique specified in Vittinghoff (Vittinghoff et al., 2004
), forcing age, sex, MMSE, SIN, and TIV into the model as covariates, and setting a very permissive inclusion threshold at p < 0.20 to ensure that brain regions showing at least a modest independent relationship to SSR score remained in the model.
3. Second co-atrophy error check—shared effects analysis: Voxel-wise regression of gray matter on SSR score controlling for diagnostic group membership
The linear regression analysis uses a data-driven approach to identify and reject brain regions that appear in the VBM main effect results simply because they are statistically more likely to co-atrophy with other brain regions directly related to sarcasm. However, this analysis does not rule out the possibility that significant findings hold true only in one diagnostic group and do not represent a generalizable brain-behavior relationship. It is logically possible for this kind of illusory correlation to occur in any VBM analysis using patients from multiple neurodegenerative disease groups, because if disease group membership predicts a region of atrophy (G→A), and disease group membership also predicts poor performance on the behavior task (G→B), then that region of atrophy may appear to directly correlate with the behavior (A↔B), when that correlation is actually spurious (A←/→B).
In order to perform a second error-check for co-atrophy, we parameterized each diagnosis (0=no, 1=yes) and entered all 7 diagnostic groups into the design matrix as confounding covariates (using 6 dummy variables to represent the 7 groups). Then the relationship between sarcasm score and atrophy was examined using a [0 1 0 0 0 0 0 0 0 0 0] contrast (See ). The results of this analysis shows regions of atrophy significantly related to Sarcasm score only if they appear in more than one diagnostic group. These results must be considered in light of the regression results, however, because this method will improperly exclude any regions that are legitimately related to Sarcasm score, but which only atrophy in a single diagnostic group.
Figure 2 Design matrices and transparent axial views of SPM5 “glass brain” representing results of main effect analysis (controlling for Sincere condition score, MMSE, age, sex, and TIV) and shared effect analysis (additionally controlling for (more ...)
Based on evidence from functional and lesion-based studies of social and emotional functions, we hypothesized that deficits in the ability to interpret paralinguistic cues as sarcastic would correspond with grey-matter atrophy in a right temporal-frontal network (Allison et al., 2000
; Gallagher & Frith, 2003
; Perry et al., 2001
; Rosen et al., 2002
; Shamay-Tsoory, Tomer, & Aharon-Peretz, 2005
). Poor performance on the more general Sincere control condition was expected to occur because of deficits in different cognitive modalities, according to the functions particularly affected by any one of the multiple neurodegenerative diseases represented in our sample (e.g., impaired verbal or visual memory, working memory, semantic loss, or syntax comprehension deficits). Because detection across the sample of the source of failure on this task would not be isolated to a single anatomic network, since no single diagnostic group showed disproportionate deficits on the Sincere task, we hypothesized that there would be no suprathreshold clusters for the Sincere condition main effect,