|Home | About | Journals | Submit | Contact Us | Français|
Behavioral and neurophysiological transfer effects from music experience to language processing are well-established but it is currently unclear whether or not linguistic expertise (e.g., speaking a tone language) benefits music-related processing and its perception. Here, we compare brainstem responses of English-speaking musicians/non-musicians and native speakers of Mandarin Chinese elicited by tuned and detuned musical chords, to determine if enhancements in subcortical processing translate to improvements in the perceptual discrimination of musical pitch. Relative to non-musicians, both musicians and Chinese had stronger brainstem representation of the defining pitches of musical sequences. In contrast, two behavioral pitch discrimination tasks revealed that neither Chinese nor non-musicians were able to discriminate subtle changes in musical pitch with the same accuracy as musicians. Pooled across all listeners, brainstem magnitudes predicted behavioral pitch discrimination performance but considering each group individually, only musicians showed connections between neural and behavioral measures. No brain-behavior correlations were found for tone language speakers or non-musicians. These findings point to a dissociation between subcortical neurophysiological processing and behavioral measures of pitch perception in Chinese listeners. We infer that sensory-level enhancement of musical pitch information yields cognitive-level perceptual benefits only when that information is behaviorally relevant to the listener.
Pitch is a ubiquitous parameter of human communication which carries important information in both music and language (Plack, Oxenham, Fay, & Popper, 2005). In music, pitches are selected from fixed hierarchical scales and it is the relative relationships between such scale tones which largely contributes to the sense of a musical key and harmony in tonal music (Krumhansl, 1990). In comparison to music, tone languages provide a unique opportunity for investigating linguistic uses of pitch as these languages exploit variations in pitch at the syllable level to contrast word meaning (Yip, 2002). However, in contrast to music, pitch in language does not contain a hierarchical (i.e., scalar) framework; there is no “in-” or “out-of-tune”. Consequently, the perception of linguistic pitch patterns largely depends on cues related to the specific trajectory (i.e., contour) of pitch movement within a syllable (Gandour, 1983, 1984) rather than distances between consecutive pitches, as it does in music (Dowling, 1978).
It is important to emphasize that linguistic pitch patterns differ substantially from those used in music; lexical tones are continuous and curvilinear (Gandour, 1994; Xu, 2006) whereas in music, pitches unfold in a discrete, stair-stepped manner (Burns, 1999, p. 217; Dowling, 1978). Indeed, specific training or long-term exposure in one domain entrains a listener to utilize pitch cues associated with that domain. Neurophysiological evidence from cortical brain potentials suggests, for instance, that musicians exploit interval based pitch cues (Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004; Krohn, Brattico, Valimaki, & Tervaniemi, 2007) while tone language speakers exploit contour based cues (Chandrasekaran, Gandour, & Krishnan, 2007). Such “cue weighting” is consistent with each group’s unique listening experience and the relative importance of these dimensions to music (Burns & Ward, 1978) and lexical tone perception (Gandour, 1983), respectively. Given these considerable differences, it is unclear a priori, whether pitch experience in one domain would provide benefits in the other domain. That is, whether musical training could enhance language-related pitch processing (music-to-language transfer) or conversely, whether tone language experience could benefit music-related processing (language-to-music transfer).
Yet, a rapidly growing body of evidence suggests that brain mechanisms governing music and language processing interact (Bidelman, Gandour, & Krishnan, 2011; Koelsch et al., 2002; Maess, Koelsch, Gunter, & Friederici, 2001; Patel, 2008; Slevc, Rosenberg, & Patel, 2009). Such cross-domain transfer effects have now been extensively reported in the direction from music to language; musicians demonstrate perceptual enhancements in a myriad of language specific abilities including phonological processing (Anvari, Trainor, Woodside, & Levy, 2002), verbal memory (Chan, Ho, & Cheung, 1998; Franklin et al., 2008), formant and voice pitch discrimination (Bidelman & Krishnan, 2010), sensitivity to prosodic cues (Thompson, Schellenberg, & Husain, 2004), degraded speech perception (Bidelman & Krishnan, 2010; Parbery-Clark, Skoe, Lam, & Kraus, 2009), second language proficiency (Slevc & Miyake, 2006), and lexical tone identification (Delogu, Lampis, & Olivetti Belardinelli, 2006, 2010; Lee & Hung, 2008). These perceptual advantages are corroborated by electrophysiological evidence demonstrating that both cortical (Chandrasekaran, Krishnan, & Gandour, 2009; Marie, Magne, & Besson, 2010; Moreno & Besson, 2005; Pantev, Roberts, Schulz, Engelien, & Ross, 2001; Schon, Magne, & Besson, 2004) and even subcortical (Bidelman, Gandour, et al., 2011; Musacchia, Strait, & Kraus, 2008; Wong, Skoe, Russo, Dees, & Kraus, 2007) brain circuitry tuned by long-term music experience facilitates the encoding of speech related signals. However, whether or not the reverse effect exists, that is, the ability of language experience to positively influence music processing/perception, remains an unresolved question (e.g., Schellenberg & Peretz, 2008; Schellenberg & Trehub, 2008).
Studies investigating transfer effects in the direction of language-to-music are scarce. Almost all have focused on putative connections between tone language experience and absolute pitch (AP) (e.g., Deutsch, Henthorn, Marvin, & Xu, 2006; Lee & Lee, 2010), i.e., the rare ability to label musical notes without any external reference. It has been noted, however, that AP is largely irrelevant to most music listening (Levitin & Rogers, 2005, p. 26). Musical tasks predominantly involve monitoring the relative relationships between pitches (i.e., intervallic distances) rather than labeling absolute pitch values in isolation. Of language-to-music studies that have focused on intervallic aspects of music, a behavioral study by Pfordresher and Brown (2009) indicated that tone language speakers were better able to discriminate the size (cf. height) of two-tone musical intervals relative to English-speaking non-musicians. In an auditory electrophysiological study, Bidelman et al. (2011) found that brainstem responses evoked by musical intervals were both more accurate and more robust in native speakers of Mandarin Chinese compared to English-speaking non-musicians, suggesting that long-term experience with linguistic pitch may transfer to subcortical encoding of musical pitch. Together, these studies demonstrate a potential for linguistic pitch experience to carry over into nonlinguistic (i.e., musical) domains (Bidelman, Gandour, et al., 2011, p.430; Pfordresher & Brown, 2009, p.1385). However, to date, no study has investigated concurrently the nature of the relationship between tone language listeners’ neural processing and their behavioral perception of more complex musical stimuli (e.g., chords or arpeggios).
The aim of the present work is to determine the effect of domain-specific pitch experience on the neural processing and perception of musical stimuli. Specifically, we attempt to determine whether the previously observed superiority in Chinese listeners’ subcortical representation of musical pitch (e.g., Bidelman, Gandour, et al., 2011) provides any benefit in music perception (e.g., Pfordresher & Brown, 2009). We bridge the gap between neurophysiological and behavioral studies of language-to-music transfer effects by examining brainstem responses in conjunction with corresponding perceptual discrimination of complex musical stimuli. We have previously shown that musicians have more robust brainstem representation for defining features of tuned (i.e., major and minor) and detuned chordal arpeggios, and moreover, a superior ability to detect whether they are in or out of tune behaviorally (Bidelman, Krishnan, & Gandour, 2011). By comparing brainstem responses and perceptual discrimination of such chords in Chinese non-musicians versus English-speaking musicians and non-musicians, we are able to assess not only the extent to which linguistic pitch experience enhances preattentive, subcortical encoding of musical chords, but also whether or not such language-dependent neural enhancements confer any behavioral advantages in music perception.
Eleven English-speaking musicians (M: 7 male, 4 female), 11 English-speaking non-musicians (NM: 6 male, 5 female), and 11 native speakers of Mandarin Chinese (C: 5 male, 6 female) were recruited from the Purdue University student body to participate in the experiment. All participants exhibited normal hearing sensitivity at audiometric frequencies between 500–4000 Hz and reported no previous history of neurological or psychiatric illnesses. They were closely matched in age (M: 22.6 ± 2.2 yrs, NM: 22.8 ± 3.4 yrs, C: 23.7 ± 4.1 yrs), years of formal education (M: 17.1 ± 1.8 yrs, NM: 16.6 ± 2.6 yrs, C: 17.2 ± 2.8 yrs), and were strongly right handed (laterality index > 73%) as measured by the Edinburgh Handedness Inventory (Oldfield, 1971). Musicians were amateur instrumentalists having ≥ 10 years of continuous instruction on their principal instrument (μ ±σ; 12.4 ± 1.8 yrs), beginning at or before the age of 11 (8.7 ± 1.4 yrs) (Table 1). All currently played his/her instrument(s). Non-musicians had ≤ 1 year of formal music training (0.5 ± 0.5 yrs) on any combination of instruments. No English-speaking participant had any prior experience with a tone language. Chinese participants were born and raised in mainland China. All Chinese subjects were classified as late onset Mandarin/English bilinguals who exhibited moderate proficiency in English, as determined by a language history questionnaire (Ping, Sepanski, & Zhao, 2006). They used their native language in the majority (M = 67%) of their combined daily activities. None had received formal instruction in English before the age of 9 (12.5 ± 2.7 yrs) nor had more than 3 years of musical training (0.8 ± 1.1 yrs). Each participant was paid for his/her time and gave informed consent in compliance with a protocol approved by the Institutional Review Board of Purdue University.
Four triad arpeggios (i.e., three-note chords played sequentially) were constructed which differed only in their chordal third (Fig. 1; Supplemental material). Two sequences were exemplary arpeggios of Western music practice (major and minor); the other two represented detuned versions of these chords (detuned up, +; detuned down, −) whose third was slightly sharp or flat of the actual major or minor third, respectively. Individual notes were synthesized using a tone-complex consisting of 6 harmonics (sine phase) with 100 ms duration (including a 5 ms rise-fall time). For each sequence, the three notes were concatenated to create a contiguous chordal arpeggio 300 ms in duration (Fig. 1A). The fundamental frequency (F0) of each of the three notes (i.e., chordal root, third, fifth) per triad were as follows (Fig. 1B): major = 220, 277, 330 Hz; minor = 220, 262, 330 Hz; detuned up = 220, 287, 330 Hz; detuned down = 220, 252, 330 Hz (Fig. 1B). In the detuned arpeggios, mistuning in the chord’s third represent a +4% or −4% difference in F0 from the actual major or minor third, respectively (cf. a full musical semitone which represents a 6% difference in frequency). Thus, because F0s of the first (root) and third (fifth) notes were identical across stimuli (220 and 330 Hz, respectively) the chords differed only in the F0 of their 2nd note (third).
As a window into the early stages of subcortical pitch processing we utilize the scalp recorded frequency-following response (FFR). The FFR reflects sustained phase-locked activity from a population of neural elements within the rostral midbrain (for review, see Krishnan, 2007) which provides a robust index of the brainstem’s transcription of speech (for reviews, see Johnson, Nicol, & Kraus, 2005; Krishnan & Gandour, 2009; Skoe & Kraus, 2010) and musically relevant features (Bidelman, Gandour, et al., 2011; Bidelman & Krishnan, 2009, 2011; Bidelman, Krishnan, et al., 2011) of the acoustic signal. The FFR recording protocol was similar to that used in previous reports from our laboratory (e.g., Bidelman & Krishnan, 2009, 2010). Participants reclined comfortably in an acoustically and electrically shielded booth to facilitate recording of brainstem responses. They were instructed to relax and refrain from extraneous body movement (to minimize myogenic artifacts), ignore the sounds they hear, and were allowed to sleep throughout the duration of FFR recording. FFRs were elicited from each participant by monaural stimulation of the right ear at a level of 80 dB SPL through a magnetically shielded insert earphone (ER-3A; Etymotic Research, Elk Grove Village, IL, USA). Each stimulus was presented using rarefaction polarity at a repetition rate of 2.44/s. Presentation order was randomized both within and across participants. Control of the experimental protocol was accomplished by a signal generation and data acquisition system (Intelligent Hearing Systems; Miami, FL, USA).
FFRs were obtained using a vertical electrode montage which provides the optimal configuration for recording brainstem activity (Galbraith et al., 2000). Ag-AgCl scalp electrodes placed on the midline of the forehead at the hairline (~Fz; non-inverting, active) and right mastoid (A2; inverting, reference) served as the inputs to a differential amplifier. Another electrode placed on the mid-forehead (Fpz) served as the common ground. The raw EEG was amplified by 200000 and filtered online between 30–5000 Hz. All inter-electrode impedances were maintained ≤ 1 kΩ. Individual sweeps were recorded using an analysis window of 320 ms at a sampling rate of 10 kHz. Sweeps containing activity exceeding ± 35 μV were rejected as artifacts and excluded from the final average. FFR response waveforms were further band-pass filtered offline from 100 to 2500 Hz (−6 dB/octave roll-off) to minimize low-frequency physiologic noise and limit the inclusion of cortical activity (Musacchia et al., 2008). In total, each FFR response waveform represents the average of 3000 artifact-free stimulus presentations.
FFR encoding of information relevant to the pitch of the chord defining note (i.e., the third) was quantified by measuring the magnitude of the F0 component from the corresponding portion of the response waveform per melodic triad. FFRs were segmented into three 100 ms sections (15–115 ms; 115–215 ms; 215–315 ms) corresponding to the sustained portions of the response to each musical note. The spectrum of the second segment, corresponding to the response to the chord’s third, was computed by taking the Fast Fourier Transform (FFT) of a time-windowed version of its temporal waveform (Gaussian window). For each subject, the magnitude of F0 was measured as the peak in the FFT, relative to the noise floor, which fell in the same frequency range expected by the input stimulus (note 2: 245–300 Hz; see stimulus F0 tracks, Fig. 1B). All FFR data analyses were performed using custom routines coded in MATLAB® 7.10 (The MathWorks, Inc., Natick, MA, USA).
Behavioral fundamental frequency difference limens (F0 DLs) were measured for each participant in a two-alternative forced choice (2AFC) discrimination task (e.g., Bidelman & Krishnan, 2010; Krishnan, Bidelman, & Gandour, 2010). For a given trial, participants heard two sequential intervals, one containing a reference pitch (i.e., tone complex) and one containing a comparison, assigned randomly. The reference pitch had a fixed F0 frequency of 270 Hz (the average F0 of all chordal stimuli, see Fig. 1); the pitch of the comparison was always greater (i.e., higher F0). The participant’s task was to identify the interval which contained a higher sounding pitch. Following a brief training run, discrimination thresholds were measured using a two-down, one-up adaptive paradigm which tracks the 71% correct point on the psychometric function (Levitt, 1971). Following two consecutive correct responses, the frequency difference of the comparison was decreased for the subsequent trial, and increased following a single incorrect response. Frequency difference between reference and comparison intervals was varied using a geometric step size of between response reversals. Sixteen reversals were measured and the geometric mean of the last 12 taken as the individual’s F0 DL, that is, the minimum frequency difference needed to detect a change in pitch.
Five participants from each group took part in a second pitch discrimination task which was conducted to more closely mimic the FFR protocol and determine whether groups differed in their ability to detect chordal detuning at a perceptual level. Discrimination sensitivity was measured separately for the three most meaningful stimulus pairings (major/minor, major/detuned up, minor/detuned down) using a same-different task (Bidelman, Krishnan, et al., 2011). For each of these three conditions, participants heard 100 pairs of the chordal arpeggios presented with an interstimulus interval of 500 ms. Half of these trials contained chords with different thirds (e.g., major-detuned up) and half were catch trials containing the same chord (e.g., major-major), assigned randomly. After hearing each pair, participants were instructed to judge whether the two chord sequences were the “same” or “different” via a button press on the computer. The number of hits and false alarms were recorded per condition. Hits were defined as “different” responses to a pair of physically different stimuli and false alarms as “different” responses to a pair in which the items were actually identical. All stimuli were presented at ~75 dB SPL through circumaural headphones (Sennheiser HD 580; Sennheiser Electronic Corp., Old Lyme, CT, USA). Stimulus presentation and response collection for both behavioral tasks were implemented in custom interfaces coded in MATLAB.
A two-way, mixed-model ANOVA (SAS®; SAS Institute, Inc., Cary, NC, USA) was conducted on F0 magnitudes derived from FFRs in order to evaluate the effects of pitch experience (i.e., musical, linguistic, none) and stimulus context (i.e., prototypical vs. non-prototypical sequence) on brainstem encoding of musical pitch. Group (3 levels; musicians, Chinese, non-musicians) functioned as the between-subjects factor and stimulus (4 levels; major, minor, detuned up, detuned down) as the within-subjects factor. An a priori level of significance was set at α = 0.05. All multiple pairwise comparisons were adjusted with Bonferroni corrections. Where appropriate, partial eta-squared (η2p) values are reported to indicate effect sizes.
Behavioral discrimination sensitivity scores (d′) were computed using hit (H) and false alarm (FA) rates (i.e., d′ = z(H)- z(FA), where z(.) represents the z-score operator). Based on initial diagnostics and the Box-Cox procedure (Box & Cox, 1964), both d′ and F0 DL scores were log-transformed to improve normality and homogeneity of variance assumptions necessary for a parametric ANOVA. Log-transformed d′ scores were then submitted to a two-way mixed model with group as the between-subjects factor and stimulus pair (3 levels; major/detuned up, minor/detuned down, major/minor) as the within-subjects factor. Transformed F0 DLs were analyzed using a similar one-way model with group as the between-subjects factor of interest.
FFR time-waveforms in response to the major chord are shown for the three groups in Fig. 2. Similar time-waveforms were observed in response to the other three stimuli. For all groups, clear onset components (i.e., large negative deflections) are seen at the three time marks corresponding to the individual onset of each note (note 1: ~17 ms, note 2: ~ 117 ms, note 3: ~ 217 ms). Relative to non-musicians, musician and Chinese FFRs contain larger amplitudes during the chordal third (2nd note, ~110–210 ms), the defining pitch of the sequence. Within this same time window, non-musician responses show reduced amplitude indicating poorer representation of this chord-defining pitch (see also Fig. 3A).
FFR encoding of F0 for the thirds of chordal standard and detuned arpeggios are shown in Fig. 3. Individual panels show the meaningful comparisons contrasting a prototypical musical chord with its detuned counterpart: panel A, major vs. detuned up; panel B, minor vs. detuned down. An omnibus ANOVA on F0 encoding revealed significant main effects of group [F2, 30 = 14.01, p < 0.001, η 2p = 0.48] and stimulus [F3, 90= 8.30, p = 0.0001, η 2p = 0.22] on F0 encoding, as well as a group x stimulus interaction [F6, 90 = 2.89, p = 0.0126, η 2p = 0.16].
A priori contrasts revealed that regardless of the eliciting arpeggio, musicians’ brainstem responses contained a larger F0 magnitude than non-musicians. Chinese, on the other hand, had larger F0 magnitudes than non-musicians only for the detuned stimuli. No differences were found between musicians and Chinese for any of the four triads. These group comparisons ([M = C] > NM) indicate that either music or linguistic pitch experience is mutually beneficial to brainstem mechanisms implicated in music processing.
By group, F0 magnitude did not differ across triads for either the musicians or Chinese, indicating superior encoding in both cohorts regardless of whether the chordal third was major or minor, in or out of tune, and irrespective of the listener’s domain of pitch expertise. For non-musicians, F0 encoding was identical between the major and minor chords, two of the most regularly occurring sequences in music (Budge, 1943). However, it was significantly reduced for the detuned up and down sequences in comparison to the major and minor chords, respectively. Together, these results indicate brainstem encoding of musical pitch is impervious to changes in chordal temperament for musicians and Chinese but is diminished with chordal detuning in non-musicians.
Mean F0 DLs are shown per group in Fig. 4A. An ANOVA revealed a significant main effect of group on F0 DLs [F2, 30 = 19.90, p < 0.0001, η 2p = 0.57]. Post hoc multiple comparisons revealed that musicians obtained significantly better (i.e., smaller) DLs than musically naïve subjects (M < [C = NM]), meaning that musicians were better able to detect minute changes in absolute pitch. On average, musicians’ DLs were approximately 3–3.5 times smaller than those of the musically untrained subjects (C and NM).
Group behavioral chordal discrimination sensitivity, as measured by d′, are shown in Fig. 4B. Values represent one’s ability to discriminate melodic triads where only the third of the chord differed between stimulus pairs. By convention, d′ = 1 (dashed line) represents performance threshold and d′ = 0, chance performance. An ANOVA on d′ scores revealed significant main effects of group [F2, 12 = 13.90, p = 0.0008, η 2p = 0.70] and stimulus pair [F2, 24= 16.21, p < 0.0001, η 2p = 0.57], as well as a group x stimulus pair interaction [F4, 24= 3.92, p = 0.0137, η 2p = 0.40]. By group, a priori contrasts revealed that musicians’ ability to discriminate melodic triads was well above threshold regardless of stimulus pair. No significant differences in their discrimination ability were observed between standard (major/minor) and detuned (major/up, minor/down) stimulus pairs. Chinese and non-musicians, on the other hand, were able to achieve above threshold performance only when discriminating the major/minor pair. They could not accurately distinguish the detuned stimulus pairs (major/up, minor/down). These results indicate that only musicians perceive minute changes in musical pitch, which are otherwise undetectable by musically untrained listeners (C, NM).
By condition, musicians’ ability to discriminate melodic triads was superior to non-musicians across all stimulus pairs. Importantly, musicians were more accurate in discriminating detuned stimulus pairs (major/up, minor/down) than Chinese listeners but did not differ in their major/minor discrimination. No significant differences in chordal discrimination were observed between Chinese and non-musicians.
To investigate brain-behavior connections (or lack thereof), we regressed neural (FFR) and behavioral (F0 DL/chordal discrimination) measures against one another to determine the extent to which subcortical responses to detuned musical chords could predict perceptual performance in both behavioral tasks. Note that only n = 5 subjects per group participated in the chordal discrimination task. Pearson’s correlations (r) were computed between brainstem F0 magnitudes elicited in each of the four stimulus conditions and behavioral F0 DLs. Pooled across all listeners, only brainstem responses in the detuned up condition showed close correspondence with F0 DL performance (rall = 0.57, p < 0.001). By group, only musicians revealed a connection between neural and perceptual measures (rM = 0.70, p = 0.01); neither Chinese (rC = 0.19, p = 0.57) nor non-musicians (rNM = 0.34, p = 0.30) exhibited a brain-behavior correlation for this same condition (Fig. 5A). A significant association between these measures in musicians suggests that better pitch discrimination performance (i.e., smaller F0 DLs) is, at least in part, predicted by more robust FFR encoding. The fact that a FFR/F0 DL correlation is found only in musicians indicates that such a brain-behavior connection is limited to musical training and not the result of pitch experience per se (e.g., no correlation is found in the Chinese group).
Across groups, brainstem F0 encoding in the detuned up condition also predicted listeners’ performance in discriminating the major chord from its detuned version, as measured by d′ (r = 0.76, p = 0.001; Fig. 5B). Note that a similar correspondence (r = 0.60, p = 0.018) was observed between F0 encoding in the detuned down condition and the corresponding condition in the behavioral task (min/down; data not shown). In this neural-perceptual space, musicians appear maximally separated from musically untrained listeners (Chinese and non-musicians); they have more robust neurophysiological encoding for musical pitch and maintain a higher fidelity (i.e., larger d′) in perceiving minute deviations in chordal arpeggios. The correspondence between FFR encoding for out of tune chords and behavioral detection of such detuning across listeners suggests that an individual’s ability to distinguish pitch deviations in musical sequences is, in part, predicted by how well such auditory features are encoded at a subcortical level.
By measuring brainstem responses to prototypical and detuned musical chord sequences we found that listeners with extensive pitch experience (musicians and tone-language speakers) have enhanced subcortical representation for musically relevant pitch when compared to inexperienced listeners (English speaking non-musicians). Yet, despite the relatively strong brainstem encoding in the Chinese (neurophysiological language-to-music transfer effect), behavioral measures of pitch discrimination and chordal detuning sensitivity reveal that this neural improvement does not necessarily translate to perceptual benefits as it does in the musician group. Indeed, tone language speakers performed no better than (i.e., as poorly as) English non-musicians in discriminating musical pitch stimuli.
Our findings provide further evidence for experience-dependent plasticity induced by long-term experience with pitch. Both musicians and Chinese exhibit stronger pitch encoding of three-note chords as compared to individuals who are untrained musically and lack any exposure to tonal languages (non-musicians) (Figs. 2–3). Their FFRs show no appreciable reduction in neural representation of pitch with parametric manipulation of the chordal third (i.e., major or minor, in or out of tune). These findings demonstrate that sustained brainstem activity is enhanced after long-term experience with pitch regardless of a listener’s specific domain of expertise (i.e., speaking Mandarin vs. playing an instrument). They also converge with previous studies which demonstrate that subcortical pitch processing is not hardwired, but rather, is malleable and depends on an individual’s auditory history and/or training (Bidelman, Gandour, et al., 2011; Bidelman & Krishnan, 2010; Carcagno & Plack, 2011; Krishnan, Gandour, & Bidelman, 2010; Krishnan, Gandour, Bidelman, & Swaminathan, 2009; Musacchia et al., 2008; Parbery-Clark, Skoe, & Kraus, 2009).
It is possible that the brainstem response enhancements we observe in our “pitch experts” originate as the result of top-down influence. Extensive descending projections from cortex to the inferior colliculus (IC) which comprise the corticofugal pathway are known to modulate/tune response properties in the IC (Suga, 2008), the presumed neural generator of the FFR in humans (Sohmer, Pratt, & Kinarti, 1977). Indeed, animal data indicates that corticofugal feedback strengthens IC response properties as a stimulus becomes behaviorally relevant to the animal through associative learning (Gao & Suga, 1998). Thus, it is plausible that during tone language or music acquisition, corticofugal influence may act to strengthen and hence provide a response “gain” for behaviorally relevant sound (e.g., linguistic tones, musical melodies). Interestingly, this experience-dependent mechanism seems somewhat general in that both tone language and musical experience lead to enhancements in brainstem encoding of pitch, regardless of whether the signal is specific to music or language per se (present study; Bidelman, Gandour, et al., 2011).
In contrast to musicians and Chinese listeners, brainstem responses of non-musicians are differentially affected by the musical context of arpeggios. This context effect results in diminished magnitudes for detuned chords relative to their major/minor counterparts (Fig. 3). The more favorable encoding of prototypical musical sequences may be likened to the fact that even non-musicians are experienced listeners with the major and minor triads (e.g., Bowling, Gill, Choi, Prinz, & Purves, 2010) which are among the most commonly occurring chords in tonal music (Budge, 1943; Eberlein, 1994). Indeed, based on neural data alone, it would appear prima facie that non-musicians encode only veridical musical pitch relationships and “filter out” detunings. The fact that their neural responses distinguish prototypical from non-prototypical musical arpeggios notwithstanding (Fig. 3), such differentiation appears to play no cognitive role. Non-musicians show the poorest behavioral performance in pitch and chordal discrimination, especially when judging true musical chords from their detuned counterparts (Fig. 4B).
Despite their lack of experience with musical pitch patterns, Chinese non-musicians show superior encoding for chord-defining pitches relative to English-speaking non-musicians. Thus, the neurophysiological benefits afforded by tone language experience extend beyond the bounds of language-specific stimuli. Music-to-language transfer effects are well documented as evidenced by several studies demonstrating enhanced cortical and subcortical linguistic pitch processing in musically trained listeners (e.g., Besson, Schon, Moreno, Santos, & Magne, 2007; Bidelman, Gandour, et al., 2011; Bidelman & Krishnan, 2010; Marques, Moreno, Castro, & Besson, 2007; Moreno & Besson, 2005; Moreno et al., 2009; Parbery-Clark, Skoe, & Kraus, 2009; Schon et al., 2004; Wong et al., 2007). Our findings provide evidence for the reverse language-to-music transfer effect, i.e., neurophysiological enhancement of musical pitch processing in native speakers of a tone language (cf. Bidelman, Gandour, et al., 2011; Marie, Kujala, & Besson, 2010). Thus, although the origin of one’s pitch expertise may be exclusively musical or linguistic in nature, years of active listening to complex pitch patterns in and of themselves, regardless of domain, mutually sharpens neural mechanisms recruited for both musical and linguistic pitch processing.
In contrast to the neurophysiological findings, behaviorally, only musicians achieve superior performance in detecting deviations in pitch and chordal temperament. On average, musicians’ F0 DLs were approximately 3–3.5 times smaller (i.e., better) than those of the musically untrained subjects (i.e., C and NM; Fig. 4A). These data are consistent with previous reports demonstrating superior pitch discrimination thresholds for musicians (Bidelman & Krishnan, 2010; Bidelman, Krishnan, et al., 2011; Kishon-Rabin, Amir, Vexler, & Zaltz, 2001; Micheyl, Delhommeau, Perrot, & Oxenham, 2006; Strait, Kraus, Parbery-Clark, & Ashley, 2010).
Though one may suppose that linguistic pitch experience would enhance a listener’s ability to detect deviations in absolute pitch (e.g., Deutsch et al., 2006), we observe no benefit of tone language experience on F0 DLs (Fig. 4A, C =NM). Our data converge with previous studies which have failed to demonstrate any advantage in absolute pitch sensitivity (i.e., improved JND) for tone language speakers (Bent, Bradlow, & Wright, 2006; Burns & Sampat, 1980; Pfordresher & Brown, 2009; Stagray & Downs, 1993). Prosodic information carried in language depends on pitch contour, direction, and height cues rather than absolute variations in pitch per se (e.g., Gandour, 1983; for review, see Patel, 2008, pp. 46–47). As such, the inability of our Chinese listeners to exploit absolute pitch cues as well as musicians may reflect either the irrelevance of such cues to language or alternatively, the possibility that musical training has a larger impact than linguistic experience on an individual’s perceptual ability to discriminate pitch.
Similarly, in the chordal discrimination task, only musicians are able to discriminate standard and detuned arpeggios well above threshold (Fig. 4B). This finding conflicts with a previous study demonstrating that tone language speakers better discriminate the size (i.e., height) of two-note musical intervals than English-speaking non-musicians (Pfordresher & Brown, 2009). Using three-note musical chords in the present study, non-musicians and Chinese reliably discriminate major from minor chords, in which case the difference between stimuli is a full semitone. However, their discrimination fails to rise above threshold when presented with chords differing by less than a semitone (cf. C and NM on major/minor vs. major/up and minor/down, Fig. 4B). The apparent discrepancy with Pfordresher & Brown (2009) is likely attributable to differences in stimulus complexity and task demands. Whereas their two-note interval stimuli require a simple comparison of pitch height between sets of simultaneously sounding notes (harmonic discrimination), our three-note chordal arpeggios require a listener to monitor the entire triad over time, in addition to comparing the size of its constituent intervals (melodic discrimination). Thus, while tone language speakers may enjoy a perceptual advantage in simple pitch height (cf. intervallic distance) discrimination (Pfordresher & Brown, 2009), this benefit largely disappears when the listener is required to detect subtle changes (i.e., < 1 semitone) within a melodic sequence of pitches. This ability, on the other hand, is a defining characteristic of a musician as reflected by their ceiling performance in discriminating all chordal conditions. Indeed, it may also be the case that the superiority of musicians over tone language speakers we observe for both behavioral tasks is linked to differences in auditory working memory and attention which are typically heightened in musically trained individuals (Pallesen et al., 2010; Parbery-Clark, Skoe, Lam, et al., 2009; Strait et al., 2010).
Our findings point to a dissociation between neural encoding of musical pitch and behavioral measures of pitch perception in tone language speakers (Fig. 5). Tone language experience may enhance brain mechanisms that subserve pitch processing in music as well as language. These mechanisms nevertheless may support distinct representations of pitch (Pfordresher & Brown, 2009, p. 1396). We demonstrate herein that enhanced brainstem representation of musical pitch notwithstanding (Fig. 3), it fails to translate into perceptual benefits for tone language speakers as it does for musicians (Figs. 4–5). It is plausible that dedicated brain circuitry exists to mediate musical percepts (e.g., Peretz, 2001) which may be differentially activated or highly developed dependent upon one’s musical training (Foster & Zatorre, 2010; Pantev et al., 1998; Zarate & Zatorre, 2008). During pitch memory tasks, for example, non-musicians primarily recruit sensory cortices (primary/secondary auditory areas), while musicians show additional activation in the right supramarginal gyrus and superior parietal lobule, regions implicated in auditory short-term memory and pitch recall (Gaab & Schlaug, 2003). These results highlight the fact that similar neuronal networks are differentially recruited and tuned dependent on musical experience even under the same task demands.
Whether or not robust representations observed in subcortical responses engage such cortical circuitry may depend on the cognitive relevance of the stimulus to the listener (Abrams et al., 2010; Bidelman, Krishnan, et al., 2011; Chandrasekaran et al., 2009; Halpern, Martin, & Reed, 2008). Thus, in tone language speakers, it might be the case that information relayed from the brainstem fails to differentially engage these higher-level cortical mechanisms which subserve the perception and cognition of musical pitch, as it does for musicians. Indeed, we find no association between neurophysiological and behavioral measures in either group lacking musical training (C and NM) in contrast to the strong correlations between brain and behavior for musically trained listeners (Fig. 5). Our results, however, do not preclude the possibility that the brain-behavior dissociation observed in the Chinese group can be minimized or even eliminated with learning or training. That is, the neural enhancements we find in Chinese non-musicians, relative to English-speaking non-musicians (Fig. 3), could act to facilitate/accelerate improvements in their ability to discriminate musical pitch with active training.
As in language, brain networks engaged during music probably involve a series of computations applied to the neural representation at different stages of processing (Hickok & Poeppel, 2004; Poeppel, Idsardi, & van Wassenhove, 2008). Sensory information is continuously pruned and transformed along the auditory pathway dependent upon its functional role, e.g., whether or not it is linguistically- (Krishnan et al., 2009; Xu et al., 2006) or musically-relevant (Bidelman, Gandour, et al., 2011; Bidelman, Krishnan, et al., 2011) to the listener. Eventually, this information reaches complex cortical circuitry responsible for encoding/decoding musical percepts including melody and harmony (Koelsch & Jentschke, 2010) and the discrimination of pitch (Brattico et al., 2009; Koelsch, Schroger, & Tervaniemi, 1999; Tervaniemi, Just, Koelsch, Widmann, & Schroger, 2005). Yet, whether these encoded features are utilized effectively and retrieved during perceptual music tasks depends on the context and functional relevance of the sound, the degree of the listener’s expertise, and the attentional demands of the task (Tervaniemi et al., 2005; Tervaniemi et al., 2009).
The presence of cross-domain transfer effects at pre-attentive stages of auditory processing does not necessarily imply that such effects will transfer to cognitive stages of processing invoked during music perception. Enhanced neural representation for pitch alone, while presumably necessary, is insufficient to produce perceptual benefits for music listening. We infer that neurophysiological enhancements are utilized by cognitive mechanisms only when the particular demands of an auditory environment coincide with one’s long-term experience (Bidelman, Gandour, et al., 2011; Tervaniemi et al., 2009). Indeed, our data show that while both extensive music and language experience enhance neural representations for musical stimuli, only musicians truly make use of this information at a perceptual level.
This article is based on part of a doctoral dissertation by the first author submitted to Purdue University in May 2011. Research supported in part by NIH R01 DC008549 (A.K.), NIDCD T32 DC00030 (G.B.), and Purdue University Bilsland dissertation fellowship (G.B.).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.