|Home | About | Journals | Submit | Contact Us | Français|
The study of nonhuman primate vocal–auditory behavior continues to provide novel insights into the origins of human language. However, data on the neural systems involved in the perception and processing of conspecific vocalizations in great apes are virtually absent in the scientific literature, yet are critical for understanding the evolution of language. Here we used positron emission tomography to examine the neurological mechanisms associated with the perception of species-specific vocalizations in chimpanzees. The data indicate right-lateralized activity in the chimpanzee posterior temporal lobe, including the planum temporale, in response to certain calls, but not others. In addition, important differences are apparent when these data are compared with those published previously from monkey species suggesting that there may be marked differences in the way chimpanzees and macaque monkeys perceive and process conspecific vocalizations. These results provide the first evidence of the neural correlates of auditory perception in chimpanzees and offer unprecedented information concerning the origins of hemispheric specialization in humans.
The evolutionary origin of human language and its neurobiological foundations has long been the object of intense scientific debate. This controversy, at least in part, can be attributed to the fact that language and its anatomical and physiological substrates do not leave indelible marks in the archaeological record. Thus, the study of extant nonhuman primate communicative signals and their neural correlates are essential for understanding the origins and evolution of human language. Although a variety of behavioral and neurophysiological techniques have been employed with monkeys and other species, such as single-cell recording and lesion studies, recent advances in functional imaging now make it possible to examine the neurological foundations of communicative systems in vivo.
It is commonly accepted that language functions are processed asymmetrically in the human brain, with the left hemisphere dominant. It is believed that this lateralization enables the rapid perception, organization, and production of the acoustically complex and semantic signals that are essential for human spoken language. However, contemporary theories also implicate the right cerebral hemisphere in the processing of a variety of language functions. For example, emotive and expressive vocalizations in humans involve the recruitment of different brain regions (Binder et al. 1997), and recent evidence has implicated the right posterior temporal lobe in the processing of affective prosody of speech (Wildgruber et al. 2005; Ethofer et al. 2006). In addition, a right hemisphere lateralization of function has been reported for the detection of emotional prosody when compared with the detection of verbal components in human speech (Buchanan et al. 2000). Thus, there seems to be a division of labor between the hemispheres with the left dealing predominantly with the semantic aspects, and the right with the prosodic elements of human language.
Humans, however, are not the only species to show asymmetries during the perception of conspecific vocalizations. Data from nonhuman primates (reviewed in Taglialatela 2007) as well as nonprimate species indicate that hemispheric specialization for the perception and processing of communicative signals may not be uniquely human (Ehret 1987; Okanoya et al. 2001; Palleroni and Hauser 2003; Boye et al. 2005; George et al. 2005). Given their evolutionary proximity to humans, the functional lateralization of communicative signals in nonhuman primates is particularly relevant to discussions of language origins. Researchers working with nonhuman primates have employed indirect behavioral measures to assess hemispheric specialization for the perception and processing of conspecific vocalizations (Petersen et al. 1978; Beecher et al. 1979; Hauser and Anderson 1994; Hauser 1998; Ghazanfar et al. 2001). The results from these studies, by and large, have indicated right ear orienting biases following the presentation of conspecific vocalizations, suggesting a left hemisphere specialization (but see Gil-da-Costa and Hauser 2006 for a possible exception). The results from these indirect behavioral measures are, to some extent, corroborated by findings in Japanese macaques indicating transient disruption in discrimination of species-specific vocalizations when monkeys received lesions to the left but not the right posterior superior temporal gyrus (STG; Heffner and Heffner 1984).
More recently, noninvasive functional neuroimaging techniques such as positron emission tomography (PET) and functional magnetic resonance imaging (MRI) have been used to visualize neuronal activity during passive listening to conspecific vocalizations in macaque monkeys (Gil-da-Costa et al. 2004; Poremba et al. 2004; Gil-da-Costa et al. 2006; Petkov et al. 2008). However, data on the neural systems involved in the perception and processing of conspecific vocal signals in great apes are virtually absent in the scientific literature (but see Bernston et al. 1993), yet are critical for understanding the evolution of language because of both their biological similarity to humans, and more complex cognitive and communicative capacities when compared with non-ape primates. In addition, chimpanzees and other great apes have been shown to possess the ability to comprehend human spoken language, as well as use and understand both American Sign Language and an artificial language (Gardner and Gardner 1969; Rumbaugh et al. 1973; Savage-Rumbaugh et al. 1993). The absence of functional imaging data in great apes is particularly unfortunate given the fact that chimpanzees show human-like left hemisphere neuroanatomical asymmetries in both the posterior temporal lobe and the inferior frontal gyrus (Gannon et al. 1998; Cantalupo and Hopkins 2001), regions considered homologous to the human Wernicke's and Broca's areas, respectively. In humans, these regions are known to be involved in language processing, and recent data have implicated the chimpanzee inferior frontal gyrus in the production of communicative signals (Taglialatela et al. 2008).
Here we used PET to examine the neurological mechanisms associated with the perception of species-specific vocalizations in chimpanzees. Three subjects participated in 3 separate PET scanning sessions each in which they were presented with 3 distinct types of auditory stimuli, broadcast conspecific vocalizations (BCVs), proximal conspecific vocalizations (PRVs), and time-reversed vocalizations (TRVs; see Method). “Proximal” vocalizations are relatively low intensity vocalizations typically produced by individuals in close spatial proximity of conspecifics, and are seemingly directed toward these individuals. “Broadcast” vocalizations are much higher amplitude calls as compared with the PRV and are also produced by individuals in the presence of conspecifics, but appear to be directed to individuals or entities distal to the caller. We chose to divide the calls into these 2 broad categories rather than more descriptive classes based on detailed acoustic properties of the sounds (e.g., see Goodall 1986) given the fact that very little data are available on the vocal communicative repertoire of chimpanzees, particularly in reference to the neurobiological mechanisms that mediate these behaviors. It has been proposed that chimpanzee vocalizations are blended and graded among acoustically defined call categories, reflecting conflicting internal motivations (Parr et al. 2005). We hypothesized that PRVs and BCVs likely serve very different functions for the chimpanzees themselves. Therefore, this broad distinction provided a consistent, if not oversimplified, method for objectively and unambiguously classifying vocalizations into at least 2 hypothesized functional classes. TRVs were included to serve as a comparison for both the PRV and BCV conditions given that time-reversing the vocalizations preserves the spectral properties of the calls as well as their duration.
Subjects were 3 captive-born chimpanzees including one male and 2 females between the ages of 20 and 23 years. All 3 subjects were born in captivity and reared in a nursery environment at the Yerkes National Primate Research Center (YNPRC) (Bard 1996). Currently, all 3 subjects live in small social groups (N = 2 for the male subject, N = 3 for the female subjects). All aspects of this study were conducted in accordance with ethical guidelines associated with the care and use of nonhuman primates and with the approval of the Emory University Institutional Animal Care and Use Committee.
Auditory stimuli used for all scanning sessions were comprised of prerecorded vocalizations produced by chimpanzees at the YNPRC. Recordings were made with a digital video camcorder (Optura 600, Canon, Inc.) with a short gun directional microphone (ME-66, K6 power module, Sennheiser Electronic Corporation) connected to the camera's auxiliary audio input via an XLR to 1/8” single channel cable. Digital video and audio were recorded simultaneously onto digital video cassettes (video: 30 frames per second; audio: 48 kHz, 16 bit precision). Comments were spoken directly onto the audio track of the digital videotape, and, whenever possible, written notes were made during recording. The date, time, frame, and audio sampling rate were also recorded to a separate digital track of the recording. All recordings were collected between 08:00 and 19:00. Caller identity was noted by the observer following cessation of vocalization or at the end of a vocalization bout. Vocalizations contaminated with any other sound or environmental noise were discarded.
Digital recordings were reviewed using a digital video camcorder (Optura 600, Canon, Inc.) connected to a MacBook Pro (Apple Inc., Cupertino, CA) to localize relevant vocal bouts. Once identified, digital audio and video were transferred to a MacBook Pro computer using the FireWire (IEEE-1394) output on the digital video camcorder (Optura 600, Canon, Inc.) and Final Cut Express HD software (Apple Computer, Inc.). The audio track of each unprocessed video segment was then extracted and saved as a self-contained file. Audio files were then opened and the vocal bouts isolated and saved to a separate file using signal analysis software (Raven v. 1.2.1; The Bioacoustics Research Program, Ithaca, NY).
For each scanning session, the subjects were presented with 40 vocal bouts that ranged in duration from approximately 2–38 s and totaled approximately 290 s. Each bout was separated from the next by 5 s of silence. The presentation of the 40 stimuli was repeated for the duration of the 40-min uptake period (~5 times). Each session included only prerecorded vocalization bouts from one of the 3 stimulus categories (BCV, PRV, and TRV). For the BCV and PRV conditions, the vocal stimuli presented to the subject's were novel. The stimuli consisted of prerecorded vocalizations from individuals at the YNPRC, but subjects were never presented with their own vocalizations.
For each subject for each condition, approximately half of the vocal bout stimuli had never been used before, and half were used with all 3 of the subjects. Therefore, each subject was presented with a unique stimulus set. For the BC condition, vocal stimuli included chimpanzee pant hoot vocalizations, both with and without the pant hoot climax component (Goodall 1986). For the PR condition, vocal stimuli consisted of grunts and barks (including pant-grunts and pant-barks) as described by Goodall (1986). For the TRV condition, vocal stimuli consisted of half of the vocal bouts presented previously to each individual subject during the BC and PR conditions.
All 3 subjects participated in each of 3 passive-listening conditions. For each condition, only a single vocalization type was presented, and sessions were separated by a minimum of 5 months. Two of the subjects were presented with PRV followed by BCV and finally TRV. The remaining subject was presented with BCV, then PRV, and finally TRV. At the outset, it was not possible to determine if hearing the stimuli in reverse might have compromised the novelty of the sounds themselves. Therefore, the TRV condition was presented last for all 3 subjects given that the stimuli presented to the subjects needed to be novel (to prevent activation differences that might be related to familiarity).
For the duration of the passive-listening conditions, each subject was separated from their social group, but remained in the outside portion of their home enclosure. Prior to the administration of the ligand, an audio reference monitor (Alesis LLC, Cumberland, RI) was placed atop a wheeled utility cart and positioned approximately 2–3 m from the front of the subject's enclosure along with a closed cooler containing a cache of sugar-free flavored drink frozen cubes (1 quart plastic container containing 20–30 small frozen cubes of approximately 2 fluid ounces of sugar-free flavored drink mixture), a stool for the experimenter, and the training tool box that the chimpanzees were accustomed to seeing during prior intramuscular injection training sessions. The session began when the human experimenter approached the subject's home cage and offered them the 18F-fluorodeoxyglucose (18F-FDG) diluted in a small amount of sugar-free flavored drink mixture. The human experimenter then sat approximately 2–3 m from the front of the subject's home enclosure on a stool and remained visible to the subject for the duration of the 40-min uptake period. Beginning 2 min after the subject consumed the ligand and continuing at 4-min intervals thereafter, the experimenter would approach the subject's enclosure while calling their name, and offer the subject one frozen cube from the cooler.
Prior to scanning, chimpanzee subjects were acclimated to the testing situation and stimulus presentation apparatus and were trained using positive reinforcement techniques to present for an injection. Following passive-listening, subjects voluntarily presented for an intramuscular injection of an anesthetic agent and were transported to the PET imaging facility for image acquisition.
Subjects were administered 18F-FDG at a dose of 20 mCi. FDG was selected as the ligand because of its relatively long uptake period (~80 min) and long half-life (approximately 110 min). Thus, we capitalized on these features of 18F-FDG because they allowed for prolonged behavioral testing during the uptake period and a relatively long time frame to capture neural activity trapped in the cells between the termination of uptake and the interval of time needed to transport and scan the chimpanzees. Previous studies have used nearly identical procedures to scan other nonhuman primate species and have revealed significant and consistent patterns of PET activation (Martinez et al. 1997; Rilling et al. 2000; Kaufman et al. 2003; Rilling et al. 2007; Taglialatela et al. 2008).
Chimpanzees consumed 0.24 mL of 18F-FDG that was diluted in approximately 100 mL of a sugar-free flavored drink mixture. The subjects then participated in the passive-listening task for 40 min. Following the 40-min uptake period, chimpanzees were asked to voluntarily present for an intramuscular injection of Telazol (4 mg/kg). Once anesthetized, chimpanzees were transported to the PET imaging facility. For the duration of the PET scan, chimpanzees remained anesthetized with Propofol administered intravenously and diluted in lactated ringers at a dose of ~10 mg/kg/h. After completing PET procedures, the subjects were returned to the YNPRC and temporarily housed in a single enclosure for approximately 18 h to allow the effects of the anesthesia to wear off and radioactivity to decay. Subjects were then returned to their social group.
The PET images were acquired on a High Resolution Research Tomograph (CPS HRRT; CTI/Siemens Inc., New York, NY) approximately 1 h and 35 min following ingestion of the 18F-FDG. Recall that 40 min constituted the uptake period; thus, the remaining 55 min constituted the time between the injection of anesthesia, transport to the PET imaging facility, and the PET scan duration (approximately 30 min). Scan procedures were identical for all subjects. Chimpanzees fasted for approximately 5 h prior to 18F-FDG administration, and were rewarded with only minimal amounts of frozen sugar-free flavored drink cubes during the uptake period. Subjects were placed in the supine position inside the scanner. Six-minute transmission scans were followed by 20-min emission scans. Scan parameters were identical for all subjects: axial field of view (FOV) = 24 cm; transverse FOV = 31.2 cm; slice thickness = 1.22 mm. Transaxial Spatial Resolution full-width half-maximum (FWHM) is 2.4 mm at the center and 2.8 mm 10 cm from the center. Following scanning, a post reconstruction 2 mm smooth was applied to the images.
MR images were collected from each subject. One subject (the male subject) was imaged using a 3.0 Tesla scanner (Siemens Trio; Siemens Inc., Malvern, PA) at the YNPRC. T1-weighted images were collected using a 3D gradient echo sequence (pulse repetition = 2300 ms, echo time = 4.4 ms, number of signals averaged = 3, matrix size = 320 × 320). The 2 female subjects were imaged with a 1.5 Tesla scanner (Phillips, Model 51, Bethell, WA). T1-weighted images were collected in the transverse plane using a gradient echo protocol (pulse repetition = 19.0 ms, echo time = 8.5 ms, number of signals averaged = 8, and a 256 × 256 matrix). The archived MRI data were transferred to a PC running Analyze 8.0 (Mayo Clinic, Rochester, MN) software for postimage processing. MRI scans were then aligned in the axial plane and cut into 1-mm slices using Analyze 8.0.
The individual PET images were spatially aligned to their respective MR images using 3D voxel registration with a linear transformation (Analyze 8.0, Mayo Clinic). Once aligned, each subject's MRI was used to outline the brain on the PET image in each and every slice in the axial plane. An average PET activation was then calculated based on the registered activity within these slices. Once the mean activation for the whole brain had been computed, each voxel within that entire volume was divided by the mean activation in order to obtain a standardized PET image. Next, images were smoothed with a low pass filter with an isotropic 6-mm kernel. Two difference volumes were then calculated by subtracting each subject's normalized TRV volume from their normalized PRV and BCV volumes. For both scan types (PRV and BCV), the 3 difference volumes from all 3 subjects were spatially registered to one another, and 2 average PET volumes calculated. Whole-brain analysis was conducted for the 3 subjects collectively using Analyze 8.0 (Mayo Clinic). Significant areas of activation were identified by calculating t-map volumes and using a threshold value of t = 4.31 (P < 0.025; one-tailed test). Significant clusters were identified as 3 or more contiguous voxels on 3 or more consecutive 2-mm slices in the axial plane with intensity values ≥ 4.31 (i.e., 9 voxels total; 72 mm3).
Whole-brain analyses revealed significantly greater activation in the BCV and PRV conditions compared with the TRV in a number of brain regions (see Table 1 and Fig. 1), including the cingulate gyrus, right STG, and the cerebellum. In addition, the PRV > TRV comparison revealed significant activation in the right posterior temporal lobe—including the planum temporale (PT), and bilaterally in the superior parietal gyrus. Although the BCV > TRV comparison did not reveal significant activation in the posterior temporal lobe or superior parietal areas, significant areas of activation unique to this comparison included the caudate/putamen, angular gyrus, precentral gyrus, and the left thalamus (see Table 1).
To determine if the observed activation in the right PT and other areas of the posterior temporal lobe were significantly lateralized, the average comparison volume (PRV > TRV) from all 3 subjects was flipped on the left–right axis and subtracted from the correctly oriented volume. Significant lateralized areas of activation (t ≥ 4.31) are depicted in Figure 2 overlaid on the MR image of a representative chimpanzee brain. A comparison of these images with the (PRV > TRV) activations depicted in Figure 1 indicate that the results of the lateralization analysis are largely consistent with the whole-brain data. Specifically, significantly right-lateralized activation is evident in the posterior temporal lobe, including the PT.
Figure 3 depicts standardized individual comparison PET volumes (PRV > TRV) for all 3 subjects overlaid on the MR image of a representative chimpanzee brain. As evident from the images, individual data are consistent with those data depicted in Figure 1. For reference, Figure 4 depicts individual PET volumes for all 3 subjects in the TRV condition.
These results provide the first evidence of the neural correlates of auditory perception in chimpanzees. Two important findings emerge from these data. First, right-lateralized activity is observed in the posterior temporal lobe including the PT when chimpanzees are presented with PRV as compared with TRV. However, similar lateralized activity is not observed during passive listening to BCV. These results suggest that a functional distinction may exist between calls classified broadly as BCV and PRV. Secondly, although some consistencies are evident between the results reported here and those published previously from monkey species (Poremba et al. 2004; Gil-da-Costa et al. 2006), important differences are apparent. As has been reported for monkeys, right-lateralized activity is observed in the chimpanzee STG in response to conspecific vocalizations. However, Poremba et al. (2004) reported right-lateralized activity in posterior regions of the STG nonspecifically in response to auditory stimuli, and left-lateralized activity in the temporal pole in response to conspecific vocalizations (Poremba et al. 2004). Furthermore, Gil-da-Costa et al. (2006) reported significant activation in response to conspecific vocalizations in monkey temporoparietal, posterior parietal, and ventral premotor cortex, but did not observe any lateralized activation, even in the left temporal pole as reported previously by Poremba et al. (2004). Most recently, Petkov et al. (2008) identified a region of auditory cortex in the macaque brain that is selectively active during the perception of species-specific vocalizations (Petkov et al. 2008). However, this region was located in the anterior temporal lobe. When compared with the data presented in this report, these results suggest that there may be marked differences in the way in which chimpanzees and macaque monkeys perceive and process conspecific vocalizations.
In contrast to monkey species, chimpanzees possess a left-lateralized anatomical homologue to the human Wernicke's area (Gannon et al. 1998; Hopkins et al. 1998). Given this leftward structural asymmetry, the rightward functional asymmetry reported here seems somewhat paradoxical. However, in humans, laterality for spoken word comprehension and PT asymmetry are not significantly associated (Eckert et al. 2006). Function lateralization for vocal perception in chimpanzees may be similarly independent of the anatomical PT asymmetry—at least for the vocalization types used in this study.
It is important to note that all 3 subjects in this study were presented with the baseline condition (TRV) last. This was necessary given that the stimuli presented to the subjects needed to be novel to prevent activation differences that might be related to familiarity. It is possible that even hearing the stimuli in reverse might have compromised the novelty of the sounds themselves. The presentation order was counter-balanced for the other 2 stimuli types. Therefore, potential order effects are unlikely to explain our results for a number of reasons. Perhaps most notably, a minimum of 5 months lapsed between scanning sessions, thus, the subjects were familiar and comfortable with the paradigm, but not habituated to it. Secondly, given that a whole-brain approach was used to analyze these data, and that each individual scan in each condition was standardized to its respective mean metabolic activity, order effects, or contamination based on other extraneous factors are unlikely.
A compelling model concerning the lateralization of language functions in the human brain proposes that temporal information is processed predominantly in the left hemisphere, whereas spectral information is handled by the right (Zatorre and Belin 2001; Zatorre et al. 2002). Recently, both hemispheres have been shown to be highly sensitive to temporal information, but on different time scales (Boemio et al. 2005). Thus, the picture that emerges is one in which speech signals are represented bilaterally at the primary level, and are lateralized in higher-order regions based on the temporal information (relatively long or short) that is extracted by the relevant hemisphere (Poeppel 2003; Boemio et al. 2005). Speech utterances certainly possess information on more than a single time scale (e.g., phonemes vs. prosody), so this model plausibly accounts for both observations of differential lateralization patterns associated with verbal vs. emotional components of language, as well as for differences in lateralized activation patterns as a function of attention (Buchanan et al. 2000; Grimshaw et al. 2003; Hugdahl et al. 2003).
This model may similarly help to explain the differences in functional lateralization reported here for BCV and PRV in chimpanzees. Group level systematic structural variation has been reported in the calls of chimpanzees in the wild as well as in captivity (Arcadi 1996; Marshall et al. 1999; Crockford et al. 2004). For example, Crockford et al. (2004) report structural differences in the pant hoot vocalizations of male chimpanzees living in neighboring communities, but not between groups from a distant community. These results could not be accounted for by genetic or habitat differences suggesting that the male chimpanzees may be actively modifying the structure of their calls to facilitate group identification (Crockford et al. 2004). Systematic variation in vocal structure has also been observed in chimpanzees with regards to social situation and communicative context (Crockford and Boesch 2003; Taglialatela et al. 2003; Hopkins et al. 2007). These data indicate that there are a number of different types of potentially meaningful information that can be encoded within a vocalization (e.g., caller identity, group affiliation, social situation, communicative context, etc.), and that not all calls in a species’ repertoire are functionally equivalent. Therefore, the right-lateralized posterior temporal lobe activity reported here following the presentation of PRV, but not BCV may be related to the type of information contained within the calls themselves and consequently extracted by the listener. It is tempting to interpret this right-lateralized pattern of activation in the posterior temporal lobe and PT as a reflection of the neurological processing of emotion—as nonhuman primate signals are commonly accepted to be merely reflections of the affective state of the caller. However, if this is the case, it is puzzling as to why right-lateralized activity in the posterior lobe is only associated with passive listening to PRV and not BCV. If chimpanzee vocalizations are purely emotive signals, one would expect to observe similar neural activity following the presentation of all call types. As the data described above clearly indicate, this is not the case.
It is also possible that the observed differences in activity following the presentation of BCV and PRV are not due to functional differences in the calls themselves at all, but rather are mediated by differences in the acoustic properties of the sounds. Although additional data are needed to address this issue directly, we would argue that this is unlikely because neuronal metabolic activity associated with passive listening to BCV and PRV were both compared with activity in the TRV condition. Time-reversed calls are, by their nature, matched to the acoustic properties of the stimuli. Therefore, the observed differences in neuronal metabolic activity most likely reflect differences associated with the processing of the different call types in the chimpanzee brain. These results suggest that not all vocalizations are functionally equivalent, and point to a heretofore unrecognized level of complexity in the chimpanzee vocal repertoire. Future studies will focus on examining this hypothesized complexity directly, as well as the basis for the functional lateralization reported here. Notwithstanding, these results provide the first evidence of the neural correlates of vocal perception in chimpanzees as well as novel insights into the origins of hemispheric specialization in humans.
National Institutes of Health (F32DC007823 to JPT, NS-36605, and NS-42867) to W.D.H., (RR-00165) to the YNPRC.
We would like to thank E. Strobert, J. Orkin, F. Stroud, K. Paul, D. Bonenberger, S. Weissman, and the veterinary and animal care staff of the YNPRC for their support; and M. Jones, D. Votaw, and J. Votaw for their technical assistance with PET imaging.
Conflict of Interest: None declared.