|Home | About | Journals | Submit | Contact Us | Français|
A tonotopic organization of the human auditory cortex (AC) has been reliably found by neuroimaging studies. However, a full characterization and parcellation of the AC is still lacking. In this study, we employed pseudo‐continuous arterial spin labeling (pCASL) to map tonotopy and voice selective regions using, for the first time, cerebral blood flow (CBF). We demonstrated the feasibility of CBF‐based tonotopy and found a good agreement with BOLD signal‐based tonotopy, despite the lower contrast‐to‐noise ratio of CBF. Quantitative perfusion mapping of baseline CBF showed a region of high perfusion centered on Heschl's gyrus and corresponding to the main high‐low‐high frequency gradients, co‐located to the presumed primary auditory core and suggesting baseline CBF as a novel marker for AC parcellation. Furthermore, susceptibility weighted imaging was employed to investigate the tissue specificity of CBF and BOLD signal and the possible venous bias of BOLD‐based tonotopy. For BOLD only active voxels, we found a higher percentage of vein contamination than for CBF only active voxels. Taken together, we demonstrated that both baseline and stimulus‐induced CBF is an alternative fMRI approach to the standard BOLD signal to study auditory processing and delineate the functional organization of the auditory cortex. Hum Brain Mapp 38:1140–1154, 2017. © 2016 Wiley Periodicals, Inc.
Neuroimaging techniques allow for the non‐invasive investigation of the functional organization of the auditory cortex (AC) in the human brain: In agreement with previous animal studies (e.g., cats and primates; Merzenich and Brugge, 1973; Merzenich et al., 1973), functional magnetic resonance imaging (fMRI) studies have found a tonotopic organization of the human early AC (Da Costa et al., 2011; Formisano et al., 2003; Humphries et al., 2010; Langers and van Dijk, 2012; Talavage et al., 2004; Woods et al., 2010). Human early AC tonotopy has been consistently described as two frequency gradients composing a high‐low‐high preferred frequency pattern located in and around the Heschl's gyrus (HG). However, its extent, orientation, and finer details are still a matter of debate. In the visual cortex, the early visual areas are parcellated by means of stimuli varying within the two dimensional extent of the visual field, called retinotopy. In contrast, in the auditory domain, a second stimulus dimension is so far missing that would enable delineation of the borders of early auditory areas.
Within the high‐low‐high frequency map, the more posterior/medial frequency gradient is considered the human homolog of monkey area A1, while the more anterior/lateral frequency gradient is considered the human homolog of monkey area R. These two areas (gradients) together are considered to define the human primary auditory cortex (PAC, also called auditory core) (Baumann et al., 2013, Moerel et al., 2014; Saenz and Langers, 2014). However, the exact extent of the PAC remains ambiguous: Some studies consider the main high‐low‐high frequency gradients to be entirely included in the PAC (Da costa et al., 2011), while other studies attribute part of them to auditory belt regions (Humphries et al., 2010; Talavage et al., 2004). Moreover, three main theories about the orientation of the PAC coexist: The classical interpretation locates the PAC along HG and finds its foundations in cytoarchitectonic studies. The orthogonal interpretation locates the PAC across HG assuming in between‐species homology with monkeys (for which PAC was shown to run parallel to the superior temporal gyrus, STG). The oblique interpretation is a more recent theory proposing an oblique orientation of PAC with respect to HG (Baumann et al., 2013). Finally, additional tonotopic gradients outside of the HG are often observed, especially at high spatial resolution and single‐subject level (Da Costa et al., 2011; Herdener et al., 2013; Moerel et al., 2013).
Besides tonotopy, other functional organizations of AC related to sound properties have been investigated, such as tuning width and periodicity preference (Barton et al., 2012; Herdener et al., 2013; Moerel et al., 2012, 2013). Moreover, also higher order functional areas have been studied: Specifically, regions showing differential response for vocalizations with respect to other sound categories (tones, objects, environmental noises, etc.) have been reliably observed and are typically referred to as voice sensitive areas (Belin et al., 2000).
In addition to tonotopic gradients, anatomical landmarks are utilized to identify PAC. Recently, “myelin” imaging using MRI has been proposed for that purpose (De Martino et al., 2015). Primary sensory areas, such as early visual, somatosensory and auditory cortex, have been hypothesized to have higher myelin content than the surrounding brain areas, and they are detectable with T1 and T2* MRI contrasts (Bock et al., 2009; Cohen‐Adad et al., 2012; De Martino et al., 2015; Dick et al., 2012; Geyer et al., 2011; Glasser and Van Essen, 2011; Sereno et al., 2013; Sigalovsky et al., 2006). Alternatively, susceptibility weighted imaging (SWI) can be used to probe iron and myelin differences in the cortex and, additionally, to locate veins. SWI is a recent MR technique detecting susceptibility differences in the brain (Haacke et al., 2004; Reichenbach et al., 1997). SWI combines magnitude and phase information of the complex T2* weighted images to enhance the contrast of paramagnetic substances, such as deoxygenated hemoglobin and iron, with respect to the surrounding diamagnetic tissue.
FMRI studies investigating the functional organization of AC have so far employed the blood oxygenation level‐dependent (BOLD) effect as an indirect measure of neural activity. Although the BOLD signal is the standard contrast for fMRI, it has some limitations in terms of spatial specificity and of being a quantitative marker of neuronal activity. The BOLD signal arises from the combined changes of oxygen metabolism (CMRO2), cerebral blood flow (CBF), and cerebral blood volume (CBV) in response to neural activity modulations (Logothetis, 2008). It has been shown that the overall changes in the oxygenation spread from the location of the neural activity into the draining veins. At low magnetic field strength (e.g., 1.5 and 3 T) and for both gradient‐echo and spin‐echo sequences, the measured BOLD signal mostly originates from draining veins (Uludag et al., 2009). Consequently, the spatial specificity of the BOLD signal is biased by the presence of draining veins causing signal blurring and possible displacement from the actual site of neural activity (Ugurbil et al., 2003).
Besides the BOLD signal, alternative fMRI acquisition techniques exist to study brain function. For example, arterial spin labeling (ASL) techniques measure absolute CBF in addition to the BOLD signal. ASL allows to quantify both stimulation‐induced and baseline CBF as an absolute marker of the physiological state of the tissue and its changes. Studies have shown that CBF, compared to the BOLD signal, is more spatially localized to neural activity, has lower intersubject variability and is more reproducible over time (Aguirre et al., 2002; Wang et al., 2003; Tjandra et al., 2005). However, CBF signal has a lower SNR and its quantitative estimation is challenged by MRI acquisition confounds such as transit delay, magnetization transfer and relaxation time effects. Nevertheless, the post‐labeling delay (PLD) interval (i.e., the time interval occurring between blood labeling and image acquisition, which is necessary for the labeled blood to flow from the tagging location to the imaging slab) represents an inherently silent gap in the ASL MR sequence, allowing therefore auditory stimulation in the absence of scanner noise and making ASL particularly attractive for auditory fMRI studies.
In this paper, we aim to assess – for the first time – the feasibility of tonotopic mapping of the human auditory cortex with ASL fMRI, and to compare BOLD and CBF based tonotopies. Additionally, we evaluate whether baseline quantitative CBF and SWI provide information on the location of the PAC in addition to anatomical landmarks and myelination.
Twelve healthy volunteers (six females, age range 25–33) with normal hearing took part in this experiment. Written informed consent was obtained from all participants according to the approval of the study protocol by the Ethical Committee of the Faculty of Psychology and Neuroscience, Maastricht University.
The stimulus design employed in this work (if not stated otherwise) was adapted from the study by De Martino et al. (2013; cfr Experiment 3), which investigated tonotopy in the inferior colliculus and auditory cortex using BOLD fMRI. The stimuli consisted of amplitude modulated (8Hz, modulation depth 0.95, length 0.8s, sample rate 44.1 kHz) tones created in MATLAB with eight carrier frequencies logarithmically spaced between 0.180 and 7.091 kHz (see Fig. Fig.1B).1B). Each of these eight frequencies represents one stimulus condition and, for the rest of the paper, we will refer to them as center frequencies. In order to introduce variability within each stimulus condition, for each center frequency two additional sounds were created with a frequency difference of ±10% in logarithmic scale. A total of 24 tones were therefore created (center frequencies are highlighted in bold): 168,180,193, 284,304,326, 480,514,551, 811,869,931, 1370,1469,1574, 2316,2482,2661, 3915,4196,4497, 6616,7091,7600 Hz. Sound onset and offset were ramped with a 10ms linear slope and the sound energy (calculated as root mean square) was equalized.
In one set of experiments (subjects nr 1 to 6), stimulus conditions were presented in blocks of six TRs with one sound per TR (= 3s) (same as in De Martino et al., 2013). Each stimulus block was followed by a resting block of six TRs with no auditory stimulation. In another set of experiments (subjects nr 7 to 12), the stimulus duration was reduced to four TRs and the rest condition increased to eight TRs. Stimulus conditions were presented in randomized order and repeated twice per run. Six runs were acquired for each subject, for a total of 96 stimulus blocks.
An additional voice localizer run was acquired (except for subject 1) using the same block design as above. For this run, however, the auditory stimuli used (adapted from Bonte et al., 2013) were 1.0 s long and consisted of vocal sounds (both speech and non‐speech), other natural sounds (musical instruments, environmental and tool noises, and animal cries), or amplitude‐modulated tones (8Hz, frequency between 0.3 and 3.0 kHz). The run included eight blocks of each of these three categories for a total of 304 TRs and ~15 min duration.
During all functional runs, subjects were asked to fixate a cross in the center of the screen and passively listen to the sounds.
Measurements were performed on a 3T Prisma Siemens scanner using a 64‐channel head coil. Functional runs were acquired using pseudo‐continuous ASL (pCASL) with 2D single‐shot echo‐planar imaging (EPI) readout (TR 3s, TE 13ms, voxel size 2.5mm isotropic, 19 slices, labeling duration 1.2s, post labeling delay 1.2s, partial Fourier 7/8, GRAPPA 2; Dai et al., 2008). Labeling duration and PLD were deliberately chosen shorter than recommended by the ASL white paper (Alsop et al., 2015), whose recommendations are tailored in view of whole brain baseline perfusion measures. These choices allow us shorter TR and thus improved sampling of the hemodynamic response. At the same time, PLD was chosen long enough to allow CBF quantification in the auditory cortex, which is adequately perfused by the labeled blood after a PLD of 1.0 s (Donahue et al., 2014; Mezue et al., 2014).
This sequence is acoustically characterized by a tagging module and EPI‐train presenting a power spectrum with contribution mostly from frequencies below 3–4 kHz (peak frequencies were 990.5, 1981.0, and 2993.0 Hz for the tagging module and 925.9, 1163.0, 1809.0, and 2713.0 Hz for the EPI‐train; see Fig. Fig.1)1) and relative loudness (estimated as the ratio between the RMS values of the tagging module and the EPI readout waveforms) of 0.57. Functional images were acquired while participants passively listened to the tones presented in a block design. During the stimulus blocks, the sounds were presented in the 1.2s PLD interval (see Fig. Fig.1A).1A). At the beginning of the fMRI session, the stimuli were subjectively equalized for loudness, and the overall volume of the stimuli was adjusted to a comfortable intensity level. The stimuli were presented via MR‐compatible earphones (Sensimetrics S14, Malden, MA, USA). After the experiment, all subjects reported a clear hearing of the stimuli.
In order to allow absolute quantification of CBF (see below), an M0 image was acquired using the same pCASL sequence as described above but with the TR value increased to 20s.
A susceptibility weighted image was acquired with an in‐plane resolution of 0.5 × 0.5mm2 and a slice thickness of 1.0mm (TR 28ms, TE 20ms, Flip angle 15deg, GRAPPA 2, matrix size 384 × 312 × 52; Haacke et al., 2004; Reichenbach et al., 1997). Finally, a high‐resolution (1.0mm isotropic) anatomical image was acquired using an MPRAGE sequence (TR 2.4s, TE 2.18ms, TI 1040ms, Flip angle 8deg, GRAPPA 2, matrix size 224 × 224 × 192; Mugler and Brookeman, 1990).
Functional data were pre‐processed in BrainVoyager QX (Version 220.127.116.1145, 64‐bit, Brain Innovation, Maastricht, The Netherlands): The functional runs were motion‐corrected and realigned to the first volume of the fourth functional run (i.e., the image approximately at the midpoint of the session and closest in terms of acquisition order to the SWI image, which was acquired between the third and fourth functional run). 3D motion correction was performed using the “Trilinear/sinc interpolation” option, i.e., trilinear interpolation is used during the motion detection step and sinc interpolation for the actual motion correction (spatial transformation) step. Functional ASL, M0 and SWI image were coregistered to the anatomical image using a gradient‐based alignment with six‐parameter (i.e., three translation and three rotation parameters) rigid body transformation.
The anatomical image was transformed into Talairach space in order to employ the automatic segmentation implemented in BrainVoyager. The results of the automatic segmentation were visually inspected and manually corrected, whenever necessary. The obtained white‐gray matter boundary was reconstructed to produce a 3D folded cortical surface for each subject. This 3D cortical representation was used for group alignment purposes to the aim of both computation and visualization (after inflation) of group maps. The group cortical surfaces were produced taking into account the individual subject's cortical curvature via the “moving target group averaging approach” of the cortex‐based alignment (CBA) procedure (Fischl et al., 1999; Goebel et al., 2006). Anatomical masks of the temporal lobe and the primary auditory cortex were manually drawn in the common CBA space based on (Baumann et al., 2013; Bonte et al., 2013; Kim et al., 2000; see Fig. Fig.22A).
The BOLD and CBF time courses were calculated from the motion corrected ASL time course using the BrainVoyager ASL Perfusion Volume Data Processing plugin performing surround averaging and subtraction: The ASL time course is separated in a control and label time‐series. Each time‐series is temporally interpolated to obtain BOLD and CBF data points at each original TR. The subtraction of the interpolated label from the interpolated control time‐series yields the CBF time course, while the addition of the two interpolated time‐series yields the BOLD time course (Liu and Wong, 2005).
Both BOLD and CBF time‐series were transformed from the functional space to the anatomical space of each individual subject by applying the coregistration transformation previously calculated and the data was slightly upsampled to an isotropic resolution of 2mm (i.e., half the anatomical resolution). Finally, both time‐series were spatially smoothed in the volume space with a 2mm FWHM 3D Gaussian kernel (Gardumi et al., 2016) and temporally high‐pass filtered removing linear and low frequency non‐linear drifts up to 3 cycles per time course.
In order to evaluate the signal quality of the CBF and BOLD time‐series, two measures of signal‐to‐noise ratio (SNR) were employed: the temporal signal‐to‐noise ratio (tSNR) and the contrast‐to‐noise ratio (CNR). The tSNR was calculated as the mean of the time course divided by its standard deviation and averaged across all the voxels in the temporal mask of AC (see Fig. Fig.2A).2A). The CNR, which represents a quantity more closely linked to the functional sensitivity of the data, was calculated as the ratio between the standard deviation of the activation response and the standard deviation of the noise (Welvaert and Rosseel, 2013). In the CNR calculation, only active (as defined via a GLM analysis, see following section) voxels were included. The CNR was computed for all center frequencies together and for each of them separately. A two‐way repeated measure ANOVA, performed with contrast (CBF or BOLD signal) and center frequency (i.e., the eight stimulus conditions) as factors and a significance level α of 0.05, was used to assess differences in CNR.
In order to investigate the consistency of CBF and BOLD signal changes across subjects, we computed the coefficient‐of‐variation (COV), defined as the standard deviation divided by the mean of the percent signal change across the subjects and expressed in percentage. For each subject, the percent signal change was calculated as the ratio between the amplitude of the mean evoked response and the temporal mean of the time course using the same number of active voxels for CBF and BOLD signal.
BOLD‐ and CBF‐based tonotopic maps were independently calculated following the same two‐step procedure:
First, a general linear model (GLM) analysis was performed using one predictor for each of the eight stimulus conditions (i.e., each of the eight center frequencies) to model the respective BOLD/CBF response. The predictors were built convolving a box car function representing the stimulus block with a canonical (double gamma) hemodynamic response function (HRF) in order to account for the hemodynamic response delay. For each run, one constant predictor and the six parameters estimated by the motion correction algorithm were included in the GLM as confound predictors. The obtained BOLD/CBF statistical maps were thresholded (uncorrected t‐value>2 for the contrast: all center frequencies>baseline) to select only active voxels entering the 2nd step.
Second, for each active voxel, its BOLD signal/CBF preferred frequency was defined as the one having the highest β‐value among the BOLD signal/CBF predictors. The final single‐subject BOLD signal/CBF tonotopy was obtained by color‐coding each voxel according to its preferred frequency: with red‐to‐green‐to‐blue coding for low‐to‐medium‐to‐high frequencies taking into account the logarithmic spacing of frequency in the stimuli. For visualization, the maps were projected on the inflated cortical surfaces. The color‐coded single‐subject maps were transformed into the common CBA space and averaged to obtain a group tonotopic map (Da Costa et al., 2011; De Martino et al., 2013; Formisano et al., 2003; Herdener et al., 2013; Humphries et al., 2010; Langers et al., 2007; Moerel et al., 2012).
Results from the GLM analysis performed as first step of the tonotopy computation were used also to assess the overall response to the sound stimuli as measured by CBF and BOLD signal. To evaluate the extension of activation in the auditory cortex and its spatial consistency across subjects, probabilistic maps were computed by calculating at each vertex of the common CBA space the relative number of subjects having significant activity at that spatial location. For this analysis, a threshold of q(FDR)<0.05 (contrast: all center frequencies>baseline; FDR standing for false discovery rate) was used. The difference in number of voxels significantly active for CBF and BOLD signal was tested using a two‐tailed t‐test (with significance level α of 0.05).
BOLD and CBF tonotopies were compared by calculating their spatial correlation at the single‐subject level in the native anatomical space of each subject and at group level in the CBA surface space. The significance of the correlation at single‐subject level was evaluated by estimating the null‐distribution via permutation test (N_perm=1000). For each iteration of the permutation test, the spatial correlation was computed between the original BOLD tonotopy and the permuted CBF tonotopy obtained by randomly permuting the preferred frequencies across the voxels in the tonotopic map. Here and in the rest of the paper, unless specified, by the term correlation, we refer to Pearson's correlation calculated between the two maps represented in vectors.
A quantitative perfusion (CBF) map was estimated from the ASL data using the model proposed by the ASL white paper (Alsop et al., 2015). The assumption of this model are: complete bolus delivery to the target tissue (i.e., PLD>ATT for pCASL, where ATT stands for arterial transit time); no venous outflow of labeled blood water (which is generally valid in humans); and T1 relaxation time of labeled spins to be the same as T1 of blood (at 3T, these values are similar and therefore errors are negligible (Cavusoglu et al., 2009)). Given that the assumptions are satisfied in our study, quantitative CBF was calculated in each voxel using the following equation:
where is obtained by estimating the full GLM ASL model (for details see Hernandez‐Garcia et al., 2010; Mumford et al., 2006). In this model, is a scaling parameter of the baseline CBF predictor (constructed as an alternation of −0.5 and 0.5) and is proportional to the label and control signal difference. Since CBF and BOLD signal changes related to activation are also modeled in this GLM approach, the β estimate of baseline CBF is not biased by signal changes induced by the task auditory stimulation. Other parameters in Eq. (1) are: representing the signal intensity of the M0 image; the post labeling delay (adjusted for each slice according to the slice acquisition order); the label duration; the longitudinal relaxation time of blood in seconds ( at 3T; Lu et al., 2004; Zhang et al., 2013); the brain/blood partition coefficient in ml/g ( ); and the labeling efficiency ( for pCASL). Finally, the factor 6000 converts the units from 1/s to ml/100g/min, which is the commonly used physiological unit for quantitative CBF.
The resulting quantitative CBF map was used to calculate single‐subject gray matter (GM) perfusion values by averaging the CBF values across all voxels included in a GM mask defined on the base of the individual anatomical segmentation and intersected with the ASL imaging slab. The group perfusion map was obtained as the average of the single‐subject perfusion maps after registration in the CBA space.
The BOLD and CBF time‐series calculated from the ASL voice localizer run were projected on the individual surface space and, independently, analyzed with a GLM including one predictor for each category (vocal sounds, other natural sounds and tones). The predictor was built as the convolution of the box car representing the stimulus block with a canonical (double gamma) HRF. β estimates of all predictors were calculated at each vertex of the cortical surfaces. Then, the individual β estimates were projected in the CBA surface space and a second level (i.e., multi‐subject) GLM was performed taking into account the variability across the subjects (random effects group analysis, RFX). The obtained CBF and BOLD statistical maps were thresholded (uncorrected t‐value>2; contrast: all voice localizer sounds>baseline) to select only vertices showing stimulus‐induced activity. Such vertices were included in the computation of the contrast vocal sounds>(other natural sounds+tones)/2. In this manner CBF and BOLD unthresholded voice selective maps were obtained. Their (dis)agreement was assessed computing the correlation between the two maps. The approach presented here intentionally avoids to statistically threshold the maps at significance level in order to compare CBF and BOLD‐based voice selective maps circumventing the issue of lower SNR for CBF compared to BOLD signal. Their comparison through correlation is a valid approach, but unthresholded maps have to be interpreted with caution. In the Supporting Information, the comparison of CBF and BOLD voice selective regions was performed also using statistically thresholded maps, with qualitatively similar results (see Supporting Information).
In order to quantitatively compare our results with previous findings using standard BOLD fMRI, we computed the correlation between the BOLD unthresholded voice selective map obtained in this study and that made available at http://neurovault.org/collections/33/ by Pernet et al. (2015). The correlation was calculated after transforming the latter from MNI space to our group‐specific CBA space and considering only the common vertices between the two maps.
Vein masks were generated from SWI in order to assess the tissue specificity of the BOLD and CBF activation signal. SWI takes advantage of the complementary information of T2* weighted magnitude and phase images (Haacke et al., 2004; Rauscher et al., 2006; Reichenbach et al., 1997). Because of their paramagnetic properties, venous vessels look dark on magnitude images and take negative values in the phase image. SWI uses the phase information to further suppress the magnitude intensity of venous vessels and therefore enhance their detection in the final SW image. A sliding minimum intensity projection (mIP) over two subsequent slices of the SW image was performed in order to profit from vessel connectivity while preserving the local information across the slice direction to the level of the resolution of the functional maps (i.e., 2mm). Vein masks were created by binarising the mIP‐SW image with the value 1 assigned to voxels having a SWI value lower than 1/5 of the maximum SWI value and the value 0 otherwise. The vein mask was coregistered to the individual anatomy applying the coregistration parameters estimated between magnitude image of the SWI and MP2RAGE image of the anatomy. Finally, the coregistered vein mask was downsampled to an isotropic resolution of 2.0mm to match the resolution of the functional data (Harmer et al., 2012). The resulting vein mask (see Supporting Information Fig. S1) was visually inspected by overlaying it on the original SW image and the T1 weighted anatomy. Note that the obtained vein mask may include CSF voxels given that also CSF signal is suppressed in SWI images; however, this does not represent a concern for our analyses since we used SWI information only for voxels detected as active by GLM of CBF and/or BOLD signal and therefore most likely consisting of tissue, vessels and/or CSF containing vessels.
The fraction of active voxels, as previously defined by a GLM analysis of the BOLD and CBF signal, labeled as vein voxels in the vein mask was determined. Then, we investigated whether the presence of a vein biasing more strongly the BOLD signal than the CBF signal could explain the partial mismatch between the BOLD and CBF tonotopies. To that end, we calculated the correlation between BOLD and CBF tonotopies splitting the voxels according to the vein mask labeling.
Figure Figure2A2A shows the two anatomically defined masks used in this study: in light blue, the temporal cortex mask including the superior temporal plane (STP), STG, superior temporal sulcus (STS) and middle temporal gyrus (MTG); in pink the primary auditory cortex mask including HG and the areas immediately anterior and posterior to it. Masks were drawn in the common CBA space based on (Baumann et al., 2013; Bonte et al., 2013; Kim et al., 2000).
Tones elicited a robust activation in the auditory cortex as measured by both BOLD and CBF signals and shown in Figure Figure2B2B by a probabilistic map of the contrast between all stimulus conditions and baseline. The activated areas included HG, planum temporale (PT), planum polare (PP), STG, and STS. BOLD activation clusters were more widespread than CBF ones, as expected because of the lower SNR of CBF signal compared to the BOLD signal. Thresholding each single‐subject statistical map with q(FDR)<0.05, the resulting number of active voxels was significantly higher for the BOLD signal with respect to CBF signal (2036±274 [range 414‐3303] for BOLD signal and 729±108 [range 140‐1458] for CBF, reported as mean±standard error across the subjects and [range min‐max value]; P<0.001, two‐tailed t‐test; see Supporting Information Fig. S2 for a boxplot of the value distributions). The tSNR, calculated averaging across all the voxels in the temporal cortex mask of AC, was 57.6±1.4 for BOLD signal and 2.3±0.1 for CBF (where the tSNR values are reported as mean±standard error calculated across the subjects). More closely linked to the functional sensitivity of the data, the CNR of the two time‐series resulted in a value of 0.206±0.014 for the BOLD signal and 0.130±0.004 for CBF (reported as mean±standard error across the subjects). CNR was also calculated separately for each center frequency and a two‐way repeated measure ANOVA was performed with contrast (CBF or BOLD signal) and center frequency (i.e., the eight stimulus conditions) as factors. Results showed a significant main effect for the contrast (P<0.001), but no significant effect for the center frequency nor significant interaction between the two factors.
The average percent signal change was 1.53±0.12% for the BOLD signal and 16.50±1.19% for CBF signal (reported as mean±standard error calculated across the subjects), resulting in a COV of 27.73% and 24.99%, respectively.
Tonotopic maps for CBF and BOLD signal are shown in Figure Figure3A3A (top and bottom row, respectively). The CBF tonotopy presented two reversed spatial gradients of preferred frequencies located on HG: preferred low frequencies (in red) were located on the central part of HG and preferred high frequency (in blue) medially and posteriorly to it, forming a gradient pattern of high‐low‐high frequency. Additional gradients were located in the surrounding areas. More specifically, clusters of low frequencies were identifiable on the middle part of the STG lateral and anterior to HG, and on the posterior STG. The tonotopic patterns are similar across the left and right hemispheres. The overall layout and the spatial arrangement of the tonotopic gradients described for the CBF tonotopy is in good (qualitative) agreement with those in the BOLD tonotopy (in the corresponding panel of Fig. Fig.3A,3A, the two reversed gradients forming the high‐low‐high frequency pattern are indicated by white double arrows and the additional low frequency clusters by black single arrows), and similar to maps shown in previously published BOLD signal studies (Da Costa et al., 2011; Formisano et al., 2003; Humphries et al., 2010; Moerel et al., 2012). However, in the CBF tonotopy, extreme low or high preferred frequency values are less represented than in the BOLD tonotopy. A few limited mismatches between CBF and BOLD signal tonotopy are highlighted in Figure Figure3A3A by white single arrows.
The correlation between the CBF and BOLD tonotopic maps in the individual volume space was significantly above chance for all subjects but one (i.e., 11 out of 12 subjects; P<0.01 as assessed by permutation test) resulting in a mean correlation of 0.15±0.06.
The spatial correlation between the BOLD and CBF group tonotopic maps was calculated in the group‐aligned surface space and resulted in a value of 0.45. Such correlation increased to a value of 0.67 when restricting the computation to the vertices within an anatomically defined mask of primary auditory cortex (PAC; see Fig. Fig.2A,2A, pink mask). In this ROI, the right hemisphere showed a slightly higher correlation value than the left one (0.72 and 0.65, respectively).
A GM baseline perfusion value of 54±2 ml/100g/min was obtained as mean and standard error across the subjects. Supporting Information Figure S3 shows a multi‐slice view of the quantitative perfusion map for each subject. Figure Figure3B3B shows the group perfusion map, the histograms of the GM perfusion values at vertex level in the left and right hemisphere, and a zoomed view of the temporal lobes after thresholding (value>68 ml/100g/min) in order to highlight the region(s) with higher baseline perfusion. (Note that due to the limited brain coverage of the ASL acquisition, there are no CBF values detected for the top part of the cortex.) In left and right AC, a relatively homogeneous region with high perfusion was centered on HG and extended posteriorly and medially to it. These areas corresponded in both hemispheres to the main high‐low‐high frequency pattern observed in the previously computed tonotopic maps and to the presumed location of the PAC. The correspondence is appreciable by contouring the high perfusion region and superimposing such contour to the tonotopy (Fig. (Fig.3A):3A): the perfusion‐based contour “segments” the V‐shaped gradient of high‐low‐high frequency cutting through the low frequency region.
The overlap between this high perfusion region and the anatomically defined PAC was 72.15% (Supporting Information Fig. S4).
Figure Figure44 shows the unthresholded voice selective maps in the CBA surface space obtained from CBF and BOLD time courses using RFX GLM and computing the contrast vocal sounds>(other natural sounds+tones)/2. The overlaid black contours and the Supporting Information Figure S5A show the voice selective regions defined as those regions showing significantly higher activation to vocal sounds compared to other natural sounds and tones. BOLD signal voice selective regions presented several peaks of voice sensitivity, and in particular, on mid STS (lateral to HG), posterior STS and STG, and anterior STS for the left hemisphere only. Outside the temporal lobe, a significant cluster was detected bilaterally on the inferior frontal gyrus by both CBF and BOLD voice sensitive mapping (see Supporting Information Figs. S6 and S7). This configuration is in agreement with previous studies, although an additional cluster on anterior STS in the right hemisphere is sometimes found (Belin et al., 2000; Bonte et al., 2013, 2014; Moerel et al., 2012; Pernet et al., 2015). In contrast, the extension of CBF voice selective regions was very limited, probably due to the lower CNR as suggested by below‐threshold effects (see Supporting Information Fig. S5B and section “Voice selective regions for a large range of initial vertex‐level threshold”). To overcome the limitation of low SNR for CBF, the agreement between CBF and BOLD signal voice selectivity was assessed calculating the Pearson's correlation between the two corresponding unthresholded maps. A correlation of 0.3815 (P < 0.001, two‐tailed t‐test) was found. Finally, good agreement was found between the BOLD signal voice selective map obtained in this study with that obtained by Pernet et al. (2015) using standard BOLD fMRI (Pearson's correlation of 0.4810; P < 0.001, two‐tailed t‐test).
We found no significant difference in the fraction of vein voxels when considering the whole ROI of BOLD versus CBF active voxels. However, considering the active voxels non‐overlapping between the BOLD and the CBF active ROIs, we found a significantly higher fraction of vein voxels for BOLD exclusively active voxels versus CBF exclusively active voxels (35%±2% and 27%±2%, respectively; t‐test(11)=3.6114; P=0.0041).
Hypothesizing a relationship between vein biasing and BOLD‐CBF tonotopy mismatch, we calculated the correlation between BOLD and CBF tonotopy splitting the voxels according to the vein mask labeling. Vein voxels had a BOLD‐CBF tonotopy correlation of 0.142±0.039 and non‐vein voxels of 0.163±0.046 resulting in a non‐significant difference (t‐test(11)=0.5169; P = 0.3077).
The present study investigated the tonotopic organization of the human auditory cortex using ASL fMRI. In contrast to standard BOLD fMRI, ASL allows to simultaneously measuring CBF and BOLD signal. CBF has the advantage of being quantitative and physiologically meaningful, having higher spatial specificity and reproducibility, albeit with lower SNR compared to the BOLD signal. ASL has been previously employed to map the topography of a sensory system, namely retinotopy in visual cortex (Cavusoglu et al., 2012), while the current study, to the best of our knowledge, is the first to employ the ASL technique to perform tonotopic mapping in the auditory cortex.
As expected, the passive listening of the stimulus tones activated the auditory cortex bilaterally in a wide range of areas, such as HG, PT, PP, STG and STS. The extent of the activated areas was significantly larger when estimated from the BOLD signal than from CBF. Such difference was primarily most likely due to the different inherent SNR of the two contrasts, which is (for typical 3T human imaging parameters) three to five times lower for CBF than for BOLD signal (Cavusoglu et al., 2012). In the current study, CNR of the BOLD signal was approximately twice as large as the CNR of CBF. These numbers are in agreement with the number of voxels detected as significantly active in this study. Despite the lower SNR, we observed a lower coefficient of variation for CBF percent signal change compared to BOLD, implying a higher reproducibility of CBF values, in agreement with previous studies (Leontiev and Buxton, 2007; Tjandra et al., 2005).
We demonstrated the feasibility of tonotopic mapping using CBF signal measured with ASL technique, specifically pCASL at 3T. The CBF tonotopy clearly showed a main V‐shaped gradient of high frequencies around a low frequency region centered on HG and additional gradients in surrounding areas. The overall pattern of the tonotopic map was similar across the two hemispheres. The CBF tonotopy was in good agreement with the BOLD tonotopy obtained by the BOLD time course extracted from the same ASL signal. The BOLD tonotopy obtained from the ASL sequence, in turn, agreed very well with those of previous studies employing GE‐EPI BOLD sequences (Da Costa et al., 2011; Formisano et al., 2003; Humphries et al., 2010; Moerel et al., 2012; Saenz and Langers, 2014). Although acquired simultaneously and therefore susceptible to the same correlated artifacts (e.g., physical noise, motion, …), ideally, the BOLD signal and CBF represent physically independent modulation of the ASL signal. That is, the presence of tonotopy in both the BOLD signal and CBF provides reciprocal validation of the utility of both parameters for probing the human AC.
Despite the good correspondence between the spatial locations of the gradients between BOLD and CBF tonotopy, we observed a less steep gradient between the two extremes of the frequency scale using CBF, resulting in smaller areas activated preferentially by the lowest or highest frequencies. We attribute this discrepancy to the inherently lower SNR of CBF resulting in noisier single‐subject maps and therefore favoring intermediate frequency range due to the averaging of best frequency values involved in the calculation of group maps.
To focus our analysis on the auditory core, we defined a mask including HG and the areas immediately surrounding HG anteriorly and posteriorly (see Fig. Fig.2A).2A). The mask was anatomically defined in the CBA group space on the basis of previous literature and current best practice (Humphries et al., 2010; Langers and van Dijk, 2012; Moerel et al., 2014). The rationale of focusing on the auditory core is that this is the area of AC that, despite the debate about the orientation of PAC, is most consistently described and reliably interpreted across different studies at different field strength, using different stimuli and at single‐subject and group level (Moerel et al., 2014). Interestingly, restricting the analysis to the anatomically defined PAC increased the spatial correlation between BOLD and CBF group tonotopic maps. This result confirms that human tonotopic maps–as obtained with tones–are more reliable and more consistent across subjects in PAC than in the whole AC. In future studies, it would be interesting to see whether combining CBF‐based tonotopic mapping and natural sounds, which engage the whole auditory cortex in an ecologically valid manner (Moerel et al., 2012), increases consistency also outside PAC.
Additionally to the functional ASL runs, an M0 image was acquired to allow the quantification of brain perfusion. On the basis of the quantification formula proposed by the ASL white paper (Equation (1); Alsop et al., 2015), we estimated the quantitative baseline CBF voxel‐by‐voxel using the full ASL model to obtain a perfusion estimate unbiased by the auditory activation due to the tone presentation. In the auditory cortex, we observed a region characterized by higher baseline perfusion values in the location of HG and immediate surroundings for both hemispheres. This finding is in agreement with the values of regional CBF reported by (Chen et al., 2011; cfr. Table 2 “Transverse temporal – young adults” and Fig. Fig.2),2), although in the cited paper no specific comment was done on this regard. At least two alternative explanations of the observed higher CBF in the presumed auditory core are possible: One possible cause is the noise of the MR gradients during image acquisition. Alternatively, relatively higher CBF is due to higher vascularization in the auditory core and thus independent of the MRI acquisition. In other words, even though the MR gradient noise is a stimulus for the auditory cortex, the spatial distribution of relative CBF can be preserved under the MRI conditions. Regardless of the underlying cause, we suggest that such high localized perfusion area detected bilaterally in the auditory cortex identifies the primary auditory core (the homologues of monkey areas A1 and R). This interpretation is supported by previous findings that primary (visual, auditory, and somatosensory) areas have higher vascular density and steady‐state metabolic demands than secondary areas (Weber et al., 2008), and by the correspondence between the location of the high perfusion area and that of the main high‐low‐high frequency gradients of both BOLD and CBF tonotopies. Interestingly, the anterior border of the high perfusion area cuts through the low preferred frequency area of the gradient. This offers a possible distinction between primary and non‐primary auditory regions otherwise not possible on the basis of the tonotopic information alone. In conclusion, independently from its cause, we suggest that the observed higher perfusion is spatially restricted to early auditory areas, thus, allowing the parcellation of the auditory cortex.
We investigated the tissue specificity of BOLD and CBF signals and found a significantly higher number of vein voxels among BOLD signal active voxels, compared to when using CBF. Vein voxels were defined using vein masks created from SWI images optimized to enhance venous vessels from the surrounding tissue. Our results are in agreement with previous studies reporting a venous bias of the BOLD signal whilst a higher specificity to the capillary beds for CBF signal (Aguirre et al., 2002; Tjandra et al., 2005; Wang et al., 2003). We hypothesized that such venous bias of the BOLD signal could explain the observed BOLD signal‐CBF tonotopy mismatches, but no significance difference was found between the BOLD signal‐CBF tonotopy correlations of vein versus non‐vein voxels. Further investigations are needed to shed light on the origin(s) of the observed mismatches between BOLD signal‐ and CBF‐based tonotopy possibly using higher spatial resolution reducing partial volume effects between tissue, veins and CSF, in particular outside of the PAC.
To further characterize the human auditory cortex, we investigated a higher order functional property such as voice sensitivity. Voice selective regions were investigated by contrasting responses to vocal sounds versus those to other natural sounds and tones as measured by CBF and BOLD signal computed from the ASL signal of the voice localizer run. BOLD signal defined voice selective regions were mainly located on STG and STS and presented five peaks of voice sensitivity, namely posterior and mid STS for both hemispheres and anterior STS for the left hemisphere in good agreement with previous studies (Belin et al., 2000; Bonte et al., 2013, 2014; Moerel et al., 2012; Pernet et al., 2015). CBF defined voice selective regions, although showing a more limited extent, successfully detected three peaks corresponding to the bilateral posterior and the left mid STS clusters. The correlation analysis between the unthresholded BOLD‐ and CBF‐based voice selective maps showed their relatively good agreement and further support the hypothesis that differences in extent and number of detected peaks was most likely due to the different CNR of the BOLD and CBF signal.
The most stringent limitation of using CBF signal is its low SNR (compared to the standard BOLD signal). In this study, we assessed the CNR as a measure of the functional sensitivity of the data and we found a CNR 1.6 times lower for CBF than BOLD signal. Thus, differences between auditory processing as detected using CBF and BOLD signal can either be attributed to the differences in the biophysical origins of both signals or to differences in their CNRs. In future studies, an adequately larger number of trials could be used to overcome the ASL CNR penalty. Moreover, the labeling duration and PLD used in this study were shorter than those recommended by Alsop et al. (2015). Using a pCASL sequence with longer labeling duration and PLD might have resulted in higher SNR of the baseline CBF. Note, however, even though some of the quantitative results on image SNR and tSNR of CBF are affected by the choice of ASL parameters, the results on CBF tonotopic maps and their comparison with BOLD signal tonotopic maps are qualitatively insensitive for a wide range of these parameters.
Another limitation of using ASL techniques is the need of acquiring tag‐control pairs of images, which results in an effective temporal resolution lower than the nominal TR. Moreover, the TR itself cannot be as short as in BOLD imaging because of the post‐labeling delay to allow the blood to reach the imaging slab. Such transit time constitutes a time constraint of the order of ~700‐2800 ms depending of the region of interest (Mildner et al., 2014), which, however, enables presenting the auditory stimuli within the silent period of the delay.
On the other hand, CBF offers some important advantages such as quantification, physiological unit of measure, reproducibility and spatial specificity. Moreover, our results show that the baseline perfusion signal offers additional information to characterize AC. Most importantly, delineating the primary auditory core on the basis of the perfusion baseline map alone provides complementary and independent information to anatomical landmarks or myelin delineations. ASL perfusion baseline measurements can be performed without sound presentation and with a run duration of 3–10 minutes (depending on the spatial resolution), therefore in a much shorter acquisition time than usual tonotopy protocols. FMRI studies interested in PAC localization, but not in tonotopic information per se, could therefore greatly benefit from perfusion baseline PAC delineation as they could invest the spared time in the effect/task of interest. In addition, differences in baseline perfusion between populations (e.g., healthy subjects vs Tinnitus patients) may be detected and be meaningful in characterizing the state of the auditory processing in these populations. Furthermore, venous biases potentially confounding BOLD signal maps (such as detected in V4 in the visual cortex, see Winawer et al., 2010) may be absent in CBF maps, which therefore may yield a more faithful representation of the underlying neuronal functional architecture.
In this study, we demonstrated the feasibility of tonotopy and voice area mapping in human auditory cortex using CBF obtained with an ASL MRI sequence. We described the limitations and benefits of this approach compared to standard BOLD fMRI: CBF is characterized by a lower CNR and temporal resolution, but is a quantifiable physiological measure, has higher reproducibility, higher spatial specificity, and ASL sequences allow the simultaneous acquisition of CBF and BOLD signal and sound presentation during the silent PLD. Interpreting the perfusion baseline map and tonotopy together, we propose quantitative baseline perfusion as a novel marker to identify the primary auditory cortex.
The authors thank Federico De Martino for the implementation of the auditory stimuli and helpful discussions.