Voice-induced cross-taxa emotional recognition is the ability to understand the emotional state of another species based on its voice. In the past, induced affective states, experience-dependent higher cognitive processes or cross-taxa universal acoustic coding and processing mechanisms have been discussed to underlie this ability in humans. The present study sets out to distinguish the influence of familiarity and phylogeny on voice-induced cross-taxa emotional perception in humans. For the first time, two perspectives are taken into account: the self- (i.e. emotional valence induced in the listener) versus the others-perspective (i.e. correct recognition of the emotional valence of the recording context). Twenty-eight male participants listened to 192 vocalizations of four different species (human infant, dog, chimpanzee and tree shrew). Stimuli were recorded either in an agonistic (negative emotional valence) or affiliative (positive emotional valence) context. Participants rated the emotional valence of the stimuli adopting self- and others-perspective by using a 5-point version of the Self-Assessment Manikin (SAM). Familiarity was assessed based on subjective rating, objective labelling of the respective stimuli and interaction time with the respective species. Participants reliably recognized the emotional valence of human voices, whereas the results for animal voices were mixed. The correct classification of animal voices depended on the listener's familiarity with the species and the call type/recording context, whereas there was less influence of induced emotional states and phylogeny. Our results provide first evidence that explicit voice-induced cross-taxa emotional recognition in humans is shaped more by experience-dependent cognitive mechanisms than by induced affective states or cross-taxa universal acoustic coding and processing mechanisms.
As language rhythm relies partly on general acoustic properties, such as intensity and duration, mastering two languages with distinct rhythmic properties (i.e., stress position) may enhance musical rhythm perception. We investigated whether competence in a second language (L2) with different rhythmic properties than a L1 affects musical rhythm aptitude. Turkish early (TELG) and late learners (TLLG) of German were compared to German late L2 learners of English (GLE) regarding their musical rhythmic aptitude. While Turkish and German present distinct linguistic rhythm and metric properties, German and English are rather similar in this regard. To account for inter-individual differences, we measured participants' short-term and working memory (WM) capacity, melodic aptitude, and time they spent listening to music. Both groups of Turkish L2 learners of German perceived rhythmic variations significantly better than German L2 learners of English. No differences were found between early and late learners' performance. Our findings suggest that mastering two languages with different rhythmic properties enhances musical rhythm perception, providing further evidence of shared cognitive resources between language and music.
speech rhythm; L2; musical rhythm; rhythmic aptitude; Turkish; German; English
Auditory scene analysis describes the ability to segregate relevant sounds out from the environment and to integrate them into a single sound stream using the characteristics of the sounds to determine whether or not they are related. This study aims to contrast task performances in objective threshold measurements of segregation and integration using identical stimuli, manipulating two variables known to influence streaming, inter-stimulus-interval (ISI) and frequency difference (Δf). For each measurement, one parameter (either ISI or Δf) was held constant while the other was altered in a staircase procedure. By using this paradigm, it is possible to test within-subject across multiple conditions, covering a wide Δf and ISI range in one testing session. The objective tasks were based on across-stream temporal judgments (facilitated by integration) and within-stream deviance detection (facilitated by segregation). Results show the objective integration task is well suited for combination with the staircase procedure, as it yields consistent threshold measurements for separate variations of ISI and Δf, as well as being significantly related to the subjective thresholds. The objective segregation task appears less suited to the staircase procedure. With the integration-based staircase paradigm, a comprehensive assessment of streaming thresholds can be obtained in a relatively short space of time. This permits efficient threshold measurements particularly in groups for which there is little prior knowledge on the relevant parameter space for streaming perception.
auditory scene analysis; perceptual grouping; threshold measurement; psychophysics; adaptive method; streaming; auditory streams
While there is ample evidence on the functional and connectional differentiation of the caudate nucleus (CN), less is known about its potential microstructural subdivisions. However, this latter aspect is critical to the local information processing capabilities of the tissue. We applied diffusion MRI, a non-invasive in vivo method that has great potential for the exploration of the brain structure-behavior relationship, in order to characterize the local fiber structure in gray matter of the CN. We report novel evidence of a functionally meaningful structural tri-partition along the anterior-posterior axis of this region. The connectivity of the CN subregions is in line with connectivity evidence from earlier invasive studies in animal models. In addition, histological validation using polarized light imaging (PLI) confirms these results, corroborating the notion that cortico-subcortico-cortical loops involve microstructurally differentiated regions in the caudate nucleus. Methodologically speaking, the comparison with advanced analysis of diffusion MRI shows that diffusion tensor imaging (DTI) yields a simplified view of the CN fiber architecture which is refined by advanced high angular resolution imaging methods.
Previous research suggests that emotional prosody processing is a highly rapid and complex process. In particular, it has been shown that different basic emotions can be differentiated in an early event-related brain potential (ERP) component, the P200. Often, the P200 is followed by later long lasting ERPs such as the late positive complex. The current experiment set out to explore in how far emotionality and arousal can modulate these previously reported ERP components. In addition, we also investigated the influence of task demands (implicit vs. explicit evaluation of stimuli). Participants listened to pseudo-sentences (sentences with no lexical content) spoken in six different emotions or in a neutral tone of voice while they either rated the arousal level of the speaker or their own arousal level. Results confirm that different emotional intonations can first be differentiated in the P200 component, reflecting a first emotional encoding of the stimulus possibly including a valence tagging process. A marginal significant arousal effect was also found in this time-window with high arousing stimuli eliciting a stronger P200 than low arousing stimuli. The P200 component was followed by a long lasting positive ERP between 400 and 750 ms. In this late time-window, both emotion and arousal effects were found. No effects of task were observed in either time-window. Taken together, results suggest that emotion relevant details are robustly decoded during early processing and late processing stages while arousal information is only reliably taken into consideration at a later stage of processing.
P200; LPC; ERPs; arousal; task demands; emotion; prosody
In the current event-related potential (ERP) study, we investigated how speech rhythm impacts speech segmentation and facilitates the resolution of syntactic ambiguities in auditory sentence processing. Participants listened to syntactically ambiguous German subject- and object-first sentences that were spoken with either regular or irregular speech rhythm. Rhythmicity was established by a constant metric pattern of three unstressed syllables between two stressed ones that created rhythmic groups of constant size. Accuracy rates in a comprehension task revealed that participants understood rhythmically regular sentences better than rhythmically irregular ones. Furthermore, the mean amplitude of the P600 component was reduced in response to object-first sentences only when embedded in rhythmically regular but not rhythmically irregular context. This P600 reduction indicates facilitated processing of sentence structure possibly due to a decrease in processing costs for the less-preferred structure (object-first). Our data suggest an early and continuous use of rhythm by the syntactic parser and support language processing models assuming an interactive and incremental use of linguistic information during language processing.
Metrical patterning and rhyme are frequently employed in poetry but also in infant-directed speech, play, rites, and festive events. Drawing on four line-stanzas from nineteenth and twentieth German poetry that feature end rhyme and regular meter, the present study tested the hypothesis that meter and rhyme have an impact on aesthetic liking, emotional involvement, and affective valence attributions. Hypotheses that postulate such effects have been advocated ever since ancient rhetoric and poetics, yet they have barely been empirically tested. More recently, in the field of cognitive poetics, these traditional assumptions have been readopted into a general cognitive framework. In the present experiment, we tested the influence of meter and rhyme as well as their interaction with lexicality in the aesthetic and emotional perception of poetry. Participants listened to stanzas that were systematically modified with regard to meter and rhyme and rated them. Both rhyme and regular meter led to enhanced aesthetic appreciation, higher intensity in processing, and more positively perceived and felt emotions, with the latter finding being mediated by lexicality. Together these findings clearly show that both features significantly contribute to the aesthetic and emotional perception of poetry and thus confirm assumptions about their impact put forward by cognitive poetics. The present results are explained within the theoretical framework of cognitive fluency, which links structural features of poetry with aesthetic and emotional appraisal.
meter; rhyme; emotion; aesthetics; cognitive fluency; poetry
There is an ongoing debate as to whether singing helps left-hemispheric stroke patients recover from non-fluent aphasia through stimulation of the right hemisphere. According to recent work, it may not be singing itself that aids speech production in non-fluent aphasic patients, but rhythm and lyric type. However, the long-term effects of melody and rhythm on speech recovery are largely unknown. In the current experiment, we tested 15 patients with chronic non-fluent aphasia who underwent either singing therapy, rhythmic therapy, or standard speech therapy. The experiment controlled for phonatory quality, vocal frequency variability, pitch accuracy, syllable duration, phonetic complexity and other influences, such as the acoustic setting and learning effects induced by the testing itself. The results provide the first evidence that singing and rhythmic speech may be similarly effective in the treatment of non-fluent aphasia. This finding may challenge the view that singing causes a transfer of language function from the left to the right hemisphere. Instead, both singing and rhythmic therapy patients made good progress in the production of common, formulaic phrases—known to be supported by right corticostriatal brain areas. This progress occurred at an early stage of both therapies and was stable over time. Conversely, patients receiving standard therapy made less progress in the production of formulaic phrases. They did, however, improve their production of non-formulaic speech, in contrast to singing and rhythmic therapy patients, who did not. In light of these results, it may be worth considering the combined use of standard therapy and the training of formulaic phrases, whether sung or rhythmically spoken. Standard therapy may engage, in particular, left perilesional brain regions, while training of formulaic phrases may open new ways of tapping into right-hemisphere language resources—even without singing.
left-hemispheric stroke; non-fluent aphasia; melodic intonation therapy; singing; rhythmic speech; formulaic language; left perilesional brain regions; right corticostriatal brain areas
Humans rely on multiple sensory modalities to determine the emotional state of others. In fact, such multisensory perception may be one of the mechanisms explaining the ease and efficiency by which others' emotions are recognized. But how and when exactly do the different modalities interact? One aspect in multisensory perception that has received increasing interest in recent years is the concept of cross-modal prediction. In emotion perception, as in most other settings, visual information precedes the auditory information. Thereby, leading in visual information can facilitate subsequent auditory processing. While this mechanism has often been described in audiovisual speech perception, so far it has not been addressed in audiovisual emotion perception. Based on the current state of the art in (a) cross-modal prediction and (b) multisensory emotion perception research, we propose that it is essential to consider the former in order to fully understand the latter. Focusing on electroencephalographic (EEG) and magnetoencephalographic (MEG) studies, we provide a brief overview of the current research in both fields. In discussing these findings, we suggest that emotional visual information may allow more reliable predicting of auditory information compared to non-emotional visual information. In support of this hypothesis, we present a re-analysis of a previous data set that shows an inverse correlation between the N1 EEG response and the duration of visual emotional, but not non-emotional information. If the assumption that emotional content allows more reliable predicting can be corroborated in future studies, cross-modal prediction is a crucial factor in our understanding of multisensory emotion perception.
cross-modal prediction; emotion; multisensory; EEG; audiovisual
Verbal language is the most widespread mode of human communication, and an intrinsically social activity. This claim is strengthened by evidence emerging from different fields, which clearly indicates that social interaction influences human communication, and more specifically, language learning. Indeed, research conducted with infants and children shows that interaction with a caregiver is necessary to acquire language. Further evidence on the influence of sociality on language comes from social and linguistic pathologies, in which deficits in social and linguistic abilities are tightly intertwined, as is the case for Autism, for example. However, studies on adult second language (L2) learning have been mostly focused on individualistic approaches, partly because of methodological constraints, especially of imaging methods. The question as to whether social interaction should be considered as a critical factor impacting upon adult language learning still remains underspecified. Here, we review evidence in support of the view that sociality plays a significant role in communication and language learning, in an attempt to emphasize factors that could facilitate this process in adult language learning. We suggest that sociality should be considered as a potentially influential factor in adult language learning and that future studies in this domain should explicitly target this factor.
language; learning; social interaction; communication; joint attention
Successful social communication draws strongly on the correct interpretation of others' body and vocal expressions. Both can provide emotional information and often occur simultaneously. Yet their interplay has hardly been studied. Using electroencephalography, we investigated the temporal development underlying their neural interaction in auditory and visual perception. In particular, we tested whether this interaction qualifies as true integration following multisensory integration principles such as inverse effectiveness. Emotional vocalizations were embedded in either low or high levels of noise and presented with or without video clips of matching emotional body expressions. In both, high and low noise conditions, a reduction in auditory N100 amplitude was observed for audiovisual stimuli. However, only under high noise, the N100 peaked earlier in the audiovisual than the auditory condition, suggesting facilitatory effects as predicted by the inverse effectiveness principle. Similarly, we observed earlier N100 peaks in response to emotional compared to neutral audiovisual stimuli. This was not the case in the unimodal auditory condition. Furthermore, suppression of beta–band oscillations (15–25 Hz) primarily reflecting biological motion perception was modulated 200–400 ms after the vocalization. While larger differences in suppression between audiovisual and audio stimuli in high compared to low noise levels were found for emotional stimuli, no such difference was observed for neutral stimuli. This observation is in accordance with the inverse effectiveness principle and suggests a modulation of integration by emotional content. Overall, results show that ecologically valid, complex stimuli such as joined body and vocal expressions are effectively integrated very early in processing.
The study of emotional speech perception and emotional prosody necessitates stimuli with reliable affective norms. However, ratings may be affected by the participants' current emotional state as increased anxiety and depression have been shown to yield altered neural responding to emotional stimuli. Therefore, the present study had two aims, first to provide a database of emotional speech stimuli and second to probe the influence of depression and anxiety on the affective ratings.
We selected 120 words from the Leipzig Affective Norms for German database (LANG), which includes visual ratings of positive, negative, and neutral word stimuli. These words were spoken by a male and a female native speaker of German with the respective emotional prosody, creating a total set of 240 auditory emotional stimuli. The recordings were rated again by an independent sample of subjects for valence and arousal, yielding groups of highly arousing negative or positive stimuli and neutral stimuli low in arousal. These ratings were correlated with participants' emotional state measured with the Depression Anxiety Stress Scales (DASS). Higher depression scores were related to more negative valence of negative and positive, but not neutral words. Anxiety scores correlated with increased arousal and more negative valence of negative words.
These results underscore the importance of representatively distributed depression and anxiety scores in participants of affective rating studies. The LANG-audition database, which provides well-controlled, short-duration auditory word stimuli for the experimental investigation of emotional speech is available in Supporting Information S1.
Studies on the maturation of auditory motion processing in children have yielded inconsistent reports. The present study combines subjective and objective measurements to investigate how the auditory perceptual abilities of children change during development and whether these changes are paralleled by changes in the event-related brain potential (ERP). We employed the mismatch negativity (MMN) to determine maturational changes in the discrimination of interaural time differences (ITDs) that generate lateralized moving auditory percepts. MMNs were elicited in children, teenagers, and adults, using a small and a large ITD at stimulus offset with respect to each subject's discrimination threshold. In adults and teenagers large deviants elicited prominent MMNs, whereas small deviants at the behavioral threshold elicited only a marginal or no MMN. In contrast, pronounced MMNs for both deviant sizes were found in children. Behaviorally, however, most of the children showed higher discrimination thresholds than teens and adults. Although automatic ITD detection is functional, active discrimination is still limited in children. The lack of MMN deviance dependency in children suggests that unlike in teenagers and adults, neural signatures of automatic auditory motion processing do not mirror discrimination abilities. The study critically accounts for advanced understanding of children's central auditory development.
development; auditory motion processing; event-related brain potentials; MMN
How quickly do listeners recognize emotions from a speaker's voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the “identification point” for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing.
The question of whether singing may be helpful for stroke patients with non-fluent aphasia has been debated for many years. However, the role of rhythm in speech recovery appears to have been neglected. In the current lesion study, we aimed to assess the relative importance of melody and rhythm for speech production in 17 non-fluent aphasics. Furthermore, we systematically alternated the lyrics to test for the influence of long-term memory and preserved motor automaticity in formulaic expressions. We controlled for vocal frequency variability, pitch accuracy, rhythmicity, syllable duration, phonetic complexity and other relevant factors, such as learning effects or the acoustic setting. Contrary to some opinion, our data suggest that singing may not be decisive for speech production in non-fluent aphasics. Instead, our results indicate that rhythm may be crucial, particularly for patients with lesions including the basal ganglia. Among the patients we studied, basal ganglia lesions accounted for more than 50% of the variance related to rhythmicity. Our findings therefore suggest that benefits typically attributed to melodic intoning in the past could actually have their roots in rhythm. Moreover, our data indicate that lyric production in non-fluent aphasics may be strongly mediated by long-term memory and motor automaticity, irrespective of whether lyrics are sung or spoken.
non-fluent aphasia; melodic intonation therapy; basal ganglia; long-term memory; automaticity of formulaic expressions
Event-related potential (ERP) data in monolingual German speakers have shown that sentential metric expectancy violations elicit a biphasic ERP pattern consisting of an anterior negativity and a posterior positivity (P600). This pattern is comparable to that elicited by syntactic violations. However, proficient French late learners of German do not detect violations of metric expectancy in German. They also show qualitatively and quantitatively different ERP responses to metric and syntactic violations. We followed up the questions whether (1) latter evidence results from a potential pitch cue insensitivity in speech segmentation in French speakers, or (2) if the result is founded in rhythmic language differences. Therefore, we tested Spanish late learners of German, as Spanish, contrary to French, uses pitch as a segmentation cue even though the basic segmentation unit is the same in French and Spanish (i.e., the syllable). We report ERP responses showing that Spanish L2 learners are sensitive to syntactic as well as metric violations in German sentences independent of attention to task in a P600 response. Overall, the behavioral performance resembles that of German native speakers. The current data suggest that Spanish L2 learners are able to extract metric units (trochee) in their L2 (German) even though their basic segmentation unit in Spanish is the syllable. In addition Spanish in contrast to French L2 learners of German are sensitive to syntactic violations indicating a tight link between syntactic and metric competence. This finding emphasizes the relevant role of metric cues not only in L2 prosodic but also in syntactic processing.
auditory language processing; P600; speech segmentation; trochee; L2
The basal ganglia (BG) have repeatedly been linked to emotional speech processing in studies involving patients with neurodegenerative and structural changes of the BG. However, the majority of previous studies did not consider that (i) emotional speech processing entails multiple processing steps, and the possibility that (ii) the BG may engage in one rather than the other of these processing steps. In the present study we investigate three different stages of emotional speech processing (emotional salience detection, meaning-related processing, and identification) in the same patient group to verify whether lesions to the BG affect these stages in a qualitatively different manner. Specifically, we explore early implicit emotional speech processing (probe verification) in an ERP experiment followed by an explicit behavioral emotional recognition task. In both experiments, participants listened to emotional sentences expressing one of four emotions (anger, fear, disgust, happiness) or neutral sentences. In line with previous evidence patients and healthy controls show differentiation of emotional and neutral sentences in the P200 component (emotional salience detection) and a following negative-going brain wave (meaning-related processing). However, the behavioral recognition (identification stage) of emotional sentences was impaired in BG patients, but not in healthy controls. The current data provide further support that the BG are involved in late, explicit rather than early emotional speech processing stages.
L2 syntactic processing has been primarily investigated in the context of syntactic anomaly detection, but only sparsely with syntactic ambiguity. In the field of event-related potentials (ERPs) syntactic anomaly detection and syntactic ambiguity resolution is linked to the P600. The current ERP experiment examined L2 syntactic processing in highly proficient L1 Spanish-L2 English readers who had acquired English informally around the age of 5 years. Temporary syntactic ambiguity (induced by verb subcategorization information) was tested as a language-specific phenomenon of L2, while syntactic anomaly resulted from phrase structure constraints that are similar in L1 and L2. Participants judged whether a sentence was syntactically acceptable or not. Native readers of English showed a P600 in the temporary syntactically ambiguous and syntactically anomalous sentences. A comparable picture emerged in the non-native readers of English. Both critical syntactic conditions elicited a P600, however, the distribution and latency of the P600 varied in the syntactic anomaly condition. The results clearly show that early acquisition of L2 syntactic knowledge leads to comparable online sensitivity towards temporal syntactic ambiguity and syntactic anomaly in early and highly proficient non-native readers of English and native readers of English.
ERPs; P600; L2 syntactic processing; Syntactic ambiguity; Syntactic anomaly
Studies have found that neural activity is greater for irregular grammatical items than regular items. Findings with monolingual Spanish speakers have revealed a similar effect when making gender decisions for visually presented nouns. The current study extended previous studies by looking at the role of regularity in modulating differences in groups that differ in the age of acquisition of a language. Early and late learners of Spanish matched on measures of language proficiency were asked to make gender decisions to regular (-o for masculine and –a for feminine) and irregular items (which can end in e,l,n,r,s,t and z). Results revealed increased activity in left BA 44 for irregular compared to regular items in separate comparisons for both early and late learners. In addition, within group-comparisons revealed that neural activity for irregulars extended into left BA 47 for late learners and into left BA 6 for early learners. Direct comparisons between-groups revealed increased activity in left BA 44/45 for irregular items indicating the need for more extensive syntactic processing in late learners. The results revealed that processing of irregular grammatical gender leads to increased activity in left BA 44 and adjacent areas in the left IFG regardless of when a language is learned. Furthermore, these findings suggest differential recruitment of brain areas associated with grammatical processing in late learners. The results are discussed with regard to a model which considers L2 learning as emerging from the competitive interplay between two languages.