|Home | About | Journals | Submit | Contact Us | Français|
Music consists of sound sequences that require integration over time. As we become familiar with music, associations between notes, melodies, and entire symphonic movements become stronger and more complex. These associations can become so tight that, for example, hearing the end of one album track can elicit a robust image of the upcoming track while anticipating it in total silence. Here we study this predictive “anticipatory imagery” at various stages throughout learning and investigate activity changes in corresponding neural structures using functional magnetic resonance imaging (fMRI). Anticipatory imagery (in silence) for highly familiar naturalistic music was accompanied by pronounced activity in rostral prefrontal cortex (PFC) and premotor areas. Examining changes in the neural bases of anticipatory imagery during two stages of learning conditional associations between simple melodies, however, demonstrates the importance of fronto-striatal connections, consistent with a role of the basal ganglia in “training” frontal cortex (Pasupathy and Miller, 2005). Another striking change in neural resources during learning was a shift between caudal PFC earlier to rostral PFC later in learning. Our findings regarding musical anticipation and sound sequence learning are highly compatible with studies of motor sequence learning, suggesting common predictive mechanisms in both domains.
When two melodies are frequently heard in the same order, as with consecutive movements of a symphony or tracks on a rock album, the beginning of the second melody is often anticipated vividly during the silence following the first. This reflexive, often irrepressible, retrieval of the second melody, or “anticipatory imagery”, reveals that music consists of cued associations, in this case between entire melodies. Storage of these complex associations can span seconds or minutes and therefore cannot be accomplished by single neurons on the basis of, e.g., temporal combination sensitivity (Suga et al., 1978; Margoliash and Fortune, 1992; Rauschecker et al., 1995). Alternative neural mechanisms must enable the encoding and integration of sound sequences over longer intervals.
Like music processing, other forms of behavior require the integration and storage of temporal sequential hierarchies as well. This includes, for instance, speech perception, dancing, and athletic performance. Thus, some of the neural mechanisms underlying these behaviors may also be shared, and one may postulate the existence of generalized “sequence” areas that interact with auditory areas to facilitate learning of complex auditory sequences (Janata and Grafton, 2003; Schubotz and von Cramon, 2003; Zatorre et al., 2007).
How specific musical sequences are encoded and stored in the brain is a neglected area of research. The few studies that compare neural responses to familiar and unfamiliar music (Platel et al., 2003; Plailly et al., 2007; Watanabe et al., 2008) only address recognition memory and do not tap other important memory processes like recall and cued retrieval. Other studies that do address recall or retrieval confound the issue by requiring simultaneous execution of motor programs like singing or playing an instrument (Sergent et al., 1992; Perry et al., 1999; Callan et al., 2006; Lahav et al., 2007). Imagery research, by contrast, can assess the neural mechanisms of music recall independent of concurrent sensorimotor events. Imagery for isolated sounds recruits nonprimary auditory cortex and the supplementary motor area, suggesting these areas are critical for the internal sensation of sound (Halpern et al., 2004). Imagery for melodies, including cued (mental) completion of familiar melodies, is associated with a wider network of frontal and parietal regions (Zatorre et al., 1996; Halpern and Zatorre, 1999). However, it is unknown how the complex, predictive cued associations that drive anticipatory imagery, as described above, are stored, nor how these representations change with learning.
In our study, we exploit the anticipatory imagery phenomenon to assess how complex associations between musical sequences are learned and stored in the brain. Using functional magnetic resonance imaging (fMRI), we measured neural activity associated with anticipatory retrieval of a melody cued by its learned association with a previously unrelated melody, in two complementary learning situations. Experiment 1 assessed long-term incidental memorization of associated pieces of music. In Experiment 2, we targeted anticipatory imagery during short-term, conscious memorization of simple melody pairs at two stages of this short-term training. Thus, we tracked how cued associations between melodies are encoded throughout different stages of learning.
Twenty neurologically normal volunteers (10 males, mean age = 24.7) were recruited from the Georgetown University community to participate in these experiments. All were native English speakers, reported normal hearing, and were fully informed verbally and in writing of the procedures involved. Nine subjects (3 female) participated in Experiment 1; 11 subjects (7 female) participated in Experiment 2, though one participant was omitted due to excessive head movement and poor behavioral performance. Participants in Experiment 2 had at least 2 years of musical experience (mean = 6.5, SD = 4.17) in musical ensembles, lessons, or serious independent study.
Experiment 1 stimuli were digital recordings of the final 32 s of instrumental CD tracks that were either highly familiar or unfamiliar to a given subject. Experiment 2 stimuli consisted of 36 melodies constructed by one of the authors. Each was 2.5 s long, written in either C or F major, ended on either the tonic or fifth scale degree, presented in a synthetic piano timbre, and judged as highly musical by two independent, moderately trained amateur musicians (rating of at least 7 on a 10-point scale, with 10 meaning very musical and 1 meaning not at all musical). In both experiments, all stimuli were gain-normalized and presented binaurally at a comfortable volume (~75–80 dB SPL). During scans, sounds were delivered via a custom air-conduction sound system fitted with silicone-cushioned headphones specifically designed to isolate the subject from scanner noise (Magnetic Resonance Technologies).
During Experiment 1 scans, trials consisted of 32 s of music (the final seconds of familiar or unfamiliar CD tracks), followed by an 8 s period of silence. The order of music track presentation was randomized throughout each functional run. Each run consisted of nine trials, preceded by 16 s of silence during which 2 pre-experimental baseline EPI volumes were acquired (discarded in further analysis). Stimuli presented during familiar trials were the final 32 s of tracks from their favorite CD (familiar music, FM), in which participants reported experiencing anticipatory imagery for each subsequent track in the silent gaps between music tracks (anticipatory silence, AS). Performance on next-track anticipation (singing or humming of the opening bars of the correct next track; 75% threshold for inclusion) for familiar music was confirmed by pre-scan behavioral testing. Stimuli presented during unfamiliar trials consisted of music that the subjects had never heard before (unfamiliar music, UM). Thus, during this condition, subjects would not anticipate the onset of the following track (non-anticipatory silence, NS). Three “familiar” functional runs were alternated with three “unfamiliar” functional runs. In total, 270 functional volumes were collected for each subject, yielding a total session time of approximately one hour per subject. While in the MR scanner, subjects were instructed to attend to the stimulus being presented and to imagine, but not vocalize, the subsequent melody where appropriate.
Prior to scanning in Experiment 2, participants completed 30 min of self-directed training, in which they were asked to memorize 7 pairs of melodies. Participants were told that the goal of training was to be able to retrieve and imagine the second melody accurately and vividly when they heard the corresponding first melody. Learning was assessed during a two-part testing phase. In the first phase, participants completed a discrimination task, in which melody pairs were presented and they were asked to detect a musically valid, one-note change in melody #2. The second phase was a recall task. Here, participants heard the first melody of a pair followed by silence, during which they imagined the second melody. The second melody was then presented, and participants rated their image on accuracy and vividness. In total, this training session lasted approximately 45 minutes. Participants then completed an additional ~10 min of self-directed “refresher” training prior to the scanning session, and approximately 7–8 min more training between scan runs. Assigned trained and untrained melodies were counterbalanced across subjects.
During scan trials in Experiment 2, participants heard either a familiar melody (FM, i.e., the first melody of a trained pair), an unfamiliar melody (UM), or silence (S). Participants were not cued as to the content of upcoming trials and therefore assessed the condition tested in each trial based on the music presented, or lack thereof. In the silence following FM, participants were asked to imagine the corresponding second melody of the trained pair (anticipatory silence, AS). In silence following UM, participants simply waited until the trial’s end (non-anticipatory silence, NS). At the end of each trial, participants rated the vividness of the image, if any, on a 5-point scale (1 = no image, 5 = very vivid image), regardless of trial type. In addition to assessing the vividness of the musical imagery experienced by the participants, these ratings also served to assess learning in runs 1 and 2. There were 70 trials with familiar melodies, 70 unfamiliar trials, and 34 silent trials per run, for 2 runs. Due to time restrictions, one participant heard 44 familiar, 44 unfamiliar, and 22 silent baseline trials per run, for 2 runs.
In Experiment 1, images were obtained using a 1.5-Tesla Siemens Magnetom Vision whole body scanner. Functional echo planar images (EPIs) were separated by several seconds of silence in a sparse sampling paradigm, allowing for a lesser degree of contamination of the EPIs and stimulus presentation by scanner noise (Hall et al., 1999). The timing of stimulus presentation relative to EPI acquisition was jittered in order to capture activation associated with either music (FM and UM) or silence following music (AS and NS). Functional runs consisted of 47 volumes each (TR = 8 s, TA = ~2 s, TE = 40 ms, 25 axial slices, voxel size = 3.5 × 3.5 × 4.0 mm3). Coplanar high-resolution anatomical images were acquired using a T2 weighted sequence (TR = 5 s, TE = 99 ms, FOV = 240 mm, matrix = 192 × 256, voxel size = 0.94 × 0.94 × 4.4 mm3).
In Experiment 2, images were acquired using a 3-Tesla Siemens Trio scanner. Two sets of functional echo planar images were acquired using a sparse sampling paradigm similar to Experiment 1 (TR = 10 s, TA = 2.28 ms, TE = 30 ms, 50 axial slices, 3 × 3 × 3 mm3 resolution). Between runs, a T1 weighted anatomical scan (1 × 1 × 1 mm3 resolution) was performed.
All analyses were completed using BrainVoyager QX (Brain Innovation, Inc). Functional images from each run were corrected for motion in 6 directions, smoothed using a 3D 8 mm3 Gaussian filter, corrected for linear trends, and then high-pass filtered to remove low frequency noise. Processed functional data were coregistered with corresponding high resolution anatomical images and interpolated into Talairach space (Talairach and Tournoux, 1988).
Random effects group analyses were performed across the whole brain and in regions of interest (ROIs) using the General Linear Model (GLM), which assessed the extent to which variation in blood oxygenation dependent fMRI (BOLD-fMRI) signal can be explained by our experimental manipulations (i.e., predictors or regressors; Friston et al., 1995). Random effects models attempt to remove inter-subject variability, thereby making the results obtained from a limited sample more generalizable to the entire population (Petersson et al., 1999). Thus, we employed random effects models in our analyses. In a single instance (AS > NS, Experiment 1), we additionally considered results of a fixed effects model, in which intersubject variability is not removed (see Results section).
In all analyses, we selected GLM predictors corresponding to the 4 conditions: FM, UM, AS, NS. In Experiment 2, we excluded trials in which participants reported imagery “errors” from all contrasts of interest. These “error” trials were entered as predictors in GLMs, but were discarded in subsequent contrasts (i.e., “predictors of no interest”). Thus, FM and AS trials in which participants gave vividness ratings less than 4 on the 5-point scale were discarded from all analyses except the parametrically weighted GLM (see below); likewise UM, NS and silent baseline trials with ratings greater than 1 were also discarded in Experiment 2.
In whole-brain analyses, appropriate single-voxel map thresholds were chosen for random (E1: t(8) > 3.83, p < 0.005; E2: t(9) < 3.69 p < 0.005) and fixed effects analyses (E1 AS >NS: t(2340) > 2.81, p < 0.005). Monte Carlo simulations were used to estimate the rate of false positive responses, which we then used to obtain corrected cluster thresholds for these maps at p(corr) < 0.05 (Forman et al., 1995).
Parametrically weighted GLM analyses were conducted in Experiment 2, in order to assess the relationships between the strength of the retrieved musical image indicated by behavioral ratings and BOLD-fMRI signal. Weights of the GLM predictors were adjusted to reflect vividness ratings 2–5 in a linear fashion (−1.0, −0.33, 0.33, 1.0), while all other predictors were given a weight of 1. Thus, this analysis identified voxels having BOLD-fMRI signal that was linearly correlated with vividness ratings on familiar trials.
Region of interest (ROI) analyses were conducted in both experiments, using foci identified by relevant whole-brain GLM analyses. These random effects GLM analyses were performed in order to explore subtler relationships between our conditions and mean BOLD-fMRI signal within these predefined regions, including: 1) the relative magnitude of response to familiar and unfamiliar music conditions in ROIs sensitive to music in general and 2) learning-related differences in AS ROIs across runs in Experiment 2. Linear detrending and within-run normalization allow for signal comparisons between runs. Relevant bar graphs depict average percent signal change from baseline within the relevant ROIs with standard error across subjects.
In the first experiment, we measured neural responses associated with anticipatory imagery for highly familiar music. During the scan, participants heard the final 32 s of tracks from their favorite CD presented randomly (familiar music, FM), followed by silence in which they would experience anticipatory imagery for the following track (anticipatory silence, AS). We also presented the final 32 s of unfamiliar tracks (unfamiliar music, UM), which was again followed by a period of silence (nonanticipatory silence, NS) during which participants would not experience imagery. This design is outlined in Fig. 1, and coordinates for significant clusters in all analyses can be found in Table 1. Preliminary data from Experiment 1 have been presented previously in brief form (Rauschecker, 2005).
Listening to music, whether familiar or unfamiliar, was associated with robust activity in auditory areas across superior temporal cortex bilaterally, as well as the right amygdala (Fig. 2). This music-related activity was identified using a conjunction between those voxels significantly more active for the FM and UM conditions as compared to baseline (NS). Region-of-interest (ROI) analysis demonstrated that signal change was not different for familiar and unfamiliar music in these areas, suggesting that both the auditory cortex and, notably, the amygdala were equally involved when listening to familiar and unfamiliar music. The amygdala likely reflects emotion processing common to both familiar and unfamiliar music used in this study (Anderson and Phelps, 2001; Koelsch et al., 2006; Kleber et al., 2007).
Contrasting familiar and unfamiliar music conditions directly in the whole-head analysis, we saw that familiar music was associated with increased posterior cingulate cortex (PCC, BA 23) and parahippocampal gyrus responses (PHG, BA 35/40; Fig. 3a, red clusters), as compared to unfamiliar music. Unfamiliar music did not drive activity in any brain area significantly more than familiar music, as signified by the absence of blue clusters in Fig. 3a. ROI analyses of signal change within the PCC and PHG showed that although familiar music drove activity in these areas significantly above baseline (p < 0.0002 for both), unfamiliar music did not (p = 0.27 and 0.10, respectively; Fig. 3b). This indicates that these areas were exclusively involved in perception of highly familiar music.
Anticipatory silence (AS) was associated with significant activation in a series of right frontal regions (Fig. 4). The superior frontal gyrus (SFG, BA10) was significantly active when employing a random effects analysis at our chosen thresholds (p(corr) < 0.05). This is a finding novel to studies of musical imagery, and likely reflects processes relevant to retrieval of highly familiar, richly complex naturalistic music. A fixed effects analysis (which does not consider intersubject variability) at p(corr) < 0.05 revealed a more extensive pattern of activation, including the SFG cluster, pre-supplementary motor cortex (pre-SMA, BA6), dorsal premotor cortex (dPMC, BA6), inferior frontal gyrus (IFG, BA44/45) adjacent to ventral premotor cortex (vPMC), all within the right hemisphere. Several of these latter areas are consistent with previous work on musical imagery (Zatorre et al., 1996; Halpern and Zatorre, 1999), and serve as a complement to the novel SFG result.
Experiment 2 recreated the anticipatory imagery captured in Experiment 1 within a more controlled setting. In a relatively short 45-minute training session immediately prior to scanning, participants memorized seven pairs of short, simple melodies. BOLD-fMRI signal was then monitored while participants listened to and imagined melodies, similar to Experiment 1 (Fig. 1). Experiment 2 included two identical and sequential runs, with a second short (< 10 min) training phase occurring between these runs. This design allowed us to assess differences in patterns of brain activation between moderately learned (Run 1) and well learned (Run 2) sequences. Results of both experiments are compared in Table 1.
After the initial training session and just prior to scanning, participants were tested on discrimination and recall accuracy (self-ratings) on trained melody pairs. In discriminating the second melody of trained pairs from distracter melodies with one-note deviations, participants achieved a high level of accuracy, making on average less than one error (mean accuracy = 93.65%, SD = 0.07). When asked to silently recall the second melody of trained pairs after being presented with the corresponding first melody, participants rated their images as being moderately accurate (mean rating = 6.03, SD = 1.73; 10-point rating scale) and moderately vivid (mean = 5.60, SD = 1.09; 10-point rating scale). Performance on this recall task indicated a moderate level of learning, which allowed room for improvement during the in-scan task and after the additional training between Runs 1 and 2.
Vividness ratings on familiar trials (FM and AS) were significantly higher than those on unfamiliar trials (UM and NS) (significant main effect of condition, F(3,27) = 150.50, p < .001). A significant run × condition interaction indicates that this difference in vividness ratings increased between runs, as expected (F(3,27) = 7.76, p = 0.001; Run 1: familiar trials mean = 3.62, unfamiliar trials mean = 1.46; Run 2: familiar trials mean = 4.01, unfamiliar trials mean = 1.36).
Listening to both familiar and unfamiliar melodies elicited robust bilateral activation of auditory areas along superior temporal cortex (Fig. 2), as in Experiment 1. Music perception was also associated with increased activity in the pre-SMA (BA6), several foci along both hemispheres of the cerebellum, the midbrain, and visual cortex. Comparing averaged signal change with these areas using an ROI analysis, no ROI was significantly more responsive to familiar or unfamiliar melodies, and signal was significantly greater than baseline (S) for both conditions, suggesting that all areas were recruited in both conditions. Coordinates for significant clusters in all analyses can be found in Table 1.
Contrasting FM and UM conditions directly across the whole brain, the pre-SMA and cingulate motor area (pre-SMA/CMA BA6/32), left inferior parietal lobule (L IPL, BA40), and left IFG or inferior ventral premotor cortex (vPMC, BA6/44) were significantly more active for familiar than unfamiliar musical sequences (Fig. 5a). Both familiar and unfamiliar music drove activity in most of these ROIs above baseline (Fig. 5b). However, signal in the IPL was not significantly greater than baseline for the UM condition, suggesting that the area was not involved in listening to unfamiliar music.
These results suggest that the inferior vPMC, pre-SMA, CMA, and cerebellum are involved in the perception of both familiar and unfamiliar sequences. However, the frontal (vPMC, pre-SMA/CMA) and, in particular, the parietal (IPL) regions were more involved in processing of familiar musical sequences, whereas the cerebellar activity was common to both familiar and unfamiliar melodies. Furthermore, the frontal areas (vPMC and pre-SMA/CMA) were similar to those activated in Experiment 1 during imagery, which may indicate that these areas are involved in both perception and production of music.
Comparing silence following learned sequences (anticipatory silence, AS) to silence following novel melodies (NS), imagery was associated with significant activity in the SMA proper (BA6), pre-SMA (BA6), PCC (BA23), L GP/Pu, R lateral cerebellum (CB), L IPL (BA40) and thalamus (Fig. 6a,b). Unlike Experiment 1, in which only frontal areas were recruited during the AS task, Experiment 2 recruited a variety of both cortical and subcortical areas during imagery of musical sequences.
We performed an ROI analysis on these clusters in order to determine what areas are unaffected by learning (i.e., equally active in both runs) and what areas change during learning (i.e., differ in Runs 1 and 2). In the first run, in which participants were moderately familiar with melody pairs, signal was significantly greater in the SMA proper (BA6) and GP/Pu than in Run 2 (Fig. 6c). No ROI was more active in Run 2 than in Run 1. This indicates that the SMA proper and basal ganglia play a greater role in early or middle stages of learning. However, activity in all ROIs, including the SMA proper and GP/Pu, was significantly greater for the AS than the NS condition in both runs, indicating that these areas are important throughout the learning process. Interestingly, Run 2 signal was only slightly greater for the AS than NS condition in GP/Pu (p = 0.02), suggesting that the basal ganglia may be most crucial to early stages of learning (Pasupathy and Miller, 2005).
We also identified areas in which the fMRI signal correlated with in-scan behavioral vividness ratings using a parametrically weighted General Linear Model (GLM). In this way, we were able to identify those areas whose activity increased with increasingly successful covert retrieval. Two areas demonstrated a positive linear correlation with participant vividness ratings during the AS condition: right GP/Pu and L inferior vPMC (BA 6/44, Fig. 7a,b). Though the linear correlation between percent signal change and behavioral rating was positive for both, the correlation was significant in both ROIs only in Run 2, but not Run 1 (GP/Pu: r = 0.58 and 0.96, p = 0.08 and 0.0006; L vPMC: r = 0.90 and 0.97, p = 0.11 and 0.003; for Runs 1 and 2, respectively). This indicates that signal in GP/Pu and vPMC more accurately reflects imagery processes in Run 2 than in Run 1, while the areas were recruited regardless of retrieval success in Run 1. Additionally, because behavioral ratings were predictive of signal in these regions, we can be more confident in the validity of self-report measures gathered in these experiments. No area had signal negatively correlated with vividness ratings.
The results presented here not only pinpoint brain areas involved in anticipatory retrieval of musical sequences, but also explain how activity within this network changes throughout learning. Familiar music in Experiment 1 had been heard by participants many more times, and over a longer period of time, than newly composed melodies in Experiment 2. Thus, long-familiar music from Experiment 1 was likely to recruit areas involved in long-term encoding of music (including semantic and affective associations), whereas newly familiar music in Experiment 2 was more likely to be processed in more “raw” form. Specifically, our data indicate that changes in frontal cortex and basal ganglia during learning, and perhaps ultimately in parietal and parahippocampal cortex as well, are key to developing increasingly complex brain representations of sequences.
While subjects imagined melodies after short-term explicit training in Experiment 2, a variety of cortical (frontal and parietal) and subcortical (basal ganglia and cerebellum) structures were involved. However, while experiencing imagery for highly familiar music after long-term incidental exposure in Experiment 1, only frontal areas were recruited. This seems to indicate that at the end stages of musical learning, frontal cortex involvement dominates, whereas early and moderate stages require greater input from structures like the basal ganglia, which may “train” frontal networks (Pasupathy and Miller, 2005). Additionally, the cerebellum was involved in anticipatory imagery in Experiment 2, suggesting the cerebellum’s role, like that of many so-called “motor” structures, reaches beyond overt production (Callan et al., 2006; Callan et al., 2007; Lahav et al., 2007).
Imagery for highly familiar music in Experiment 1 was associated with activity in a variety of frontal regions, including SFG, IFG/vPMC, dPMC, and pre-SMA. However, only the most rostral of these, SFG, was significant in the random effects analysis, suggesting that SFG was more consistently involved during anticipatory imagery than the other frontal areas. Moreover, this rostral prefrontal area was not active when participants imagined moderately familiar music in Experiment 2, further suggesting the area’s exclusive involvement in anticipatory imagery of highly familiar music. Lateral rostral prefrontal cortex is associated with episodic and working memory (Gilbert et al., 2006), including episodic memory for music (Platel et al., 2003) and prospective memory (Burgess et al., 2003; Sakai and Passingham, 2003; Gilbert et al., 2006). A caudal-to-rostral hierarchy within prefrontal cortex has been proposed (Christoff and Gabrieli, 2000; Sakai and Passingham, 2003), with rostral areas coordinating activity in more caudal modality-specific areas (Sakai and Passingham, 2003; Rowe et al., 2007). Our results indicate rostral prefrontal cortex could very likely be involved in cued recall of well-learned, highly familiar music (i.e., anticipatory imagery) as well, perhaps through interactions with more caudal prefrontal and premotor regions.
Our data indicate that inferior vPMC is involved in anticipatory cued recall of entire melodies in later stages of learning novel melody pairs (Experiment 2), as well as during retrieval of highly familiar melody associations (Experiment 1). Similar vPMC regions, as well as adjacent areas of the IFG located more rostrally, are involved in temporal integration of sound (Griffiths et al., 2000). This includes tasks requiring application of “syntactic” rules in music (Janata et al., 2002; Tillmann et al., 2006; Leino et al., 2007) and language (Friederici et al., 2003; Opitz and Friederici, 2007), and a role in congenital amusia (Hyde et al., 2007). VPMC also responds similarly to motor, visual, and imagery tasks, indicating that it performs these integrative computations in a domain-general manner (Schubotz and von Cramon, 2004; Schubotz, 2007). Inferior vPMC involvement during anticipatory imagery in our studies supports theories that this region is amenable to predictive, anticipatory processing of sequences.
The basal ganglia (BG) were most active during the earliest stage of learning in these experiments, the first run of Experiment 2, and significantly decreased in activation in the second run of Experiment 2 to a level barely above baseline. Interestingly, there is evidence that the BG are recruited early when learning stimulus associations but are less involved as learning progresses (Pasupathy and Miller, 2005; Poldrack et al., 2005). This has led to the hypothesis that the BG may function to prepare or “train” prefrontal areas (Pasupathy and Miller, 2005). In songbirds, anterior forebrain and BG nuclei are critical in sensorimotor song learning during development (Brainard and Doupe, 2002) and modulation of song production in adults (Kao et al., 2005). Indeed, the BG and inferior frontal cortex were the only two areas in Experiment 2 that demonstrated significant correlations between behavioral ratings and BOLD-fMRI signal, further suggesting the importance of the relationship between these two regions, especially when retrieving newly learned associations.
Signal in SMA proper during anticipatory imagery was inversely proportional to the degree of learning in our experiments. Pre-SMA, on the other hand, was activated during both experiments (though subthreshold activation in the random effects analysis in Experiment 1 suggests less consistent recruitment). SMA proper is more closely related to primary motor areas and movement execution (Hikosaka et al., 1996; Weilke et al., 2001; Picard and Strick, 2003) and is typically active throughout sequence execution (Matsuzaka et al., 1992). Pre-SMA, by contrast, is transiently active at sequence initiation (Matsuzaka et al., 1992) and is associated with more complex motor action plans (Alario et al., 2006), including chunks of sequence items (Sakai et al., 1999; Kennerley et al., 2004). Here, pre-SMA was involved in retrieval of more complex learned representations (i.e., musical phrases or “chunks” of sequence material), while SMA was recruited during earlier stages that presumably rely more on individual note-to-note associations. This assessment is compatible with previous music imagery studies, which report pre-SMA involvement in imagery for familiar melodies (Halpern and Zatorre, 1999), and SMA involvement in imagery for single sounds and sequence repetition (Halpern and Zatorre, 1999; Halpern et al., 2004), neither of which are likely to rely on “chunking” of multiple sequence items.
Brain structures involved in perception of familiar, but not unfamiliar, music included the parahippocampal gyrus (PHG) and posterior cingulate cortex (PCC) in Experiment 1 and an inferior parietal lobule (IPL) subregion in Experiment 2. The hippocampus and surrounding medial temporal lobe have long been linked to memory processes (Schacter et al., 1998; Squire et al., 2004), including memory for music (Watanabe et al., 2008) and consolidation of learned motor sequences (Albouy et al., 2008). PCC and adjacent medial parietal cortex share connections with medial temporal regions (Burwell and Amaral, 1998; Kahn et al., 2008), and damage to PCC can result in memory deficits (Valenstein et al., 1987; Osawa et al., 2008), suggesting that the area is more directly involved in memory retrieval. Thus, the PHG and PCC are likely to be involved in recognizing familiar music after consolidation of that music into long-term memory (i.e., Experiment 1).
Lateral parietal cortex is also connected with the medial temporal lobe (Burwell and Amaral, 1998; Kahn et al., 2008) and is routinely recruited during memory tasks (Wagner et al., 2005; Cabeza et al., 2008). Lateral parietal damage, however, does not seem to confer measurable stimulus recognition deficits (Cabeza et al., 2008; Haramati et al., 2008), which has led to the idea that the IPL may be involved in the mediation of memory effects on behavior (Wagner et al., 2005) or attention to cued memories (Cabeza et al., 2008). In our study, the IPL was associated with listening to newly learned music in Experiment 2, but was also recruited during anticipatory imagery in Experiment 2, suggesting the IPL may mediate both recall and recognition of newly learned musical sequences.
Our results indicate that learning sound sequences recruits similar brain structures as motor sequence learning (Hikosaka et al., 2002) and, hence, similar principles might be at work in both domains. Moreover, the very existence of anticipatory imagery suggests that retrieving stored sequences of any kind involves predictive readout of upcoming information prior to the actual sensorimotor event. This prospective signal may be related to the “efference copy” proposed in previous models (von Holst and Mittelstaedt, 1950; Troyer and Doupe, 2000), including so-called emulator (or ”forward”) models (Grush, 2004; Callan et al., 2007; Ghasia et al., 2008). As discussed above, prefrontal cortex is a prime location for developing complex stimulus associations required to process sequences of events. This is confirmed both by strong frontal involvement during our task, as well as the caudal-to-rostral progression of frontal involvement during different stages of learning, possibly reflecting ascending levels in the processing hierarchy. Other structures are also likely to be involved in these putative internal models at various stages, including ventral premotor and posterior parietal cortex, basal ganglia, and cerebellum. Future research confirming the temporal order of involvement of these structures is needed (Sack et al., 2008), and studies of auditory-perceptual or audiomotor learning in nonhuman primates could serve as a bridge between electrophysiology in songbirds (Dave and Margoliash, 2000; Troyer and Doupe, 2000) and human imaging.
This work was funded by the National Institutes of Health (Grants R01-DC03489, F31-MH12598, and F31-DC008921 to J.P.R., B.Z., and A.M.L. respectively) and by the Cognitive Neuroscience Initiative of the National Science Foundation (Grants BCS-0350041 and BCS-0519127 to J.P.R. and a Research Opportunity Award to A.R.H.).