|Home | About | Journals | Submit | Contact Us | Français|
Although ears capable of detecting airborne sound have arisen repeatedly and independently in different species, most animals that are capable of hearing have a pair of ears. We review the advantages that arise from having two ears and discuss recent research on the similarities and differences in the binaural processing strategies adopted by birds and mammals. We also ask how these different adaptations for binaural and spatial hearing might inform and inspire the development of techniques for future auditory prosthetic devices.
The Hindu mother goddess Durga has eight hands. At times, this must earn her the envy of ordinary, mortal human mothers, who must confront the hundredfold challenges of motherhood with only two hands each. The considerable advantages that could spring from extra sets of hands are easy to imagine. And who has not occasionally wished for extra eyes, in the back of the head? Or what about ears? Is two ears a good number? Would one be enough, or should we really have more? It seems unlikely that evolution has independently ‘finetuned’ the number not just of hands but also of hind limbs, lungs, kidneys, gonads, eyes and ears, and found in each case that animals with two of each of these organs were invariably better adapted to their environments than those with one or three. That we, like most animals, have two ears may be due more to embryological constraints related to a bilaterally symmetric body plan than to evolutionary selection pressures. Whether two constitutes an ‘optimal’ number of ears, either for modern humans or for the many other binaural animal species in their diverse ecological niches, is uncertain.
It may seem peculiar to ask how many ears a person ideally should have. However, the advent of cochlear implant technology is turning this seemingly odd question into a point of serious and controversial debate of considerable practical importance. Possessing more than one functioning ear can certainly bring significant advantages. For example, binaural hearing greatly improves our ability to determine the direction of a sound source1. Without binaural cues, we must rely solely on monaural ‘spectral cues’ provided by the directional filtering of sounds by our outer ears2 to judge the direction of a sound source. Relying only on spectral cues results in much-reduced localization ability, whereas combining spectral and binaural cues results in remarkably accurate sound localization3. Binaural information can also improve our ability to separate sound signals from ambient background noise, a phenomenon called ‘binaural unmasking’4. Consequently, people with just a single cochlear implant often have great difficulty understanding speech in noisy, acoustically cluttered environments, whereas bilaterally implanted individuals may do better at these challenging acoustic tasks5,6. But cochlear implantation is an expensive procedure and not without risk. It is not obvious that the potential binaural advantages that might accrue from fitting two cochlear implants, rather than just one, warrant doubling the costs and the risks for each person, particularly because current cochlear implants are not optimized for binaural hearing.
The problem is that the acoustic cues that need to be exploited to reap binaural advantages are often very subtle. For example, we, like many other vertebrates, use tiny differences in the time of arrival or the intensity of sounds at each ear to help us determine sound source direction. The sound will arrive slightly earlier, and be slightly louder, in the near ear—the emphasis here is on “slightly,” as natural interaural time differences (ITDs), for example, usually amount to only a small fraction of a millisecond. We are also very sensitive to changes in the correlation of inputs to the left and right ears, a prerequisite for binaural unmasking7. To process these minimal binaural cues, our ancestors evolved sensitive tympanic ears and highly specialized auditory brainstem circuits (Fig. 1). It seems that such tympanic ears capable of receiving airborne sound evolved separately and repeatedly among the ancestors of modern frogs, turtles, lizards, birds and mammals8–10. Their ancestors, the earliest land-dwelling vertebrates, were probably sensitive to bone conduction and sound waves traveling through the ground. Thus, each of these tetrapod groups constitutes an independent ‘evolutionary experiment in hearing’. Some notably similar principles have emerged, presumably due to similar selection pressures for localizing and identifying auditory targets.
In both birds and mammals, early stages of the auditory pathways contain synaptic relays designed to preserve the temporal fine structure of incoming acoustic signals with great accuracy, and neural processing stages that compare inputs from the left and right ears arise early, immediately after the first synaptic relay in the cochlear nucleus. Both groups also have nuclei specialized for either the computation of ITDs or interaural level differences (ILDs). ILD-sensitive neurons are excited by input from one ear and inhibited by the other. These ‘EI neurons’ respond most strongly when sounds come from the side of the excitatory ear. Although this excitatory ear receives the full sound intensity, the inhibitory ear sits in a ‘sound shadow’ on the far side of the head, so the resulting inhibition is small. If the sound source moves closer to the inhibitory ear, the neural firing rates decline because of greater inhibition, resulting in a rate code for sound source position. This ILD sensitivity arises in the lateral superior olive (LSO) in mammals and in IE neurons in the avian nucleus of the lateral lemniscus10. However, the head provides a significant sound shadow only if it is large compared to the wavelength of the sound, so ILD cues are most effective at relatively high frequencies; neurons tuned to high frequencies are overrepresented in the ILD-sensitive nuclei.
In contrast, most neurons that are sensitive to ITDs are excited by input from both ears (‘EE neurons’; see Fig. 2a). The strength of the excitation depends on the exact relative timing of the inputs. These neurons, found in the mammalian medial superior olive (MSO) or the avian nucleus laminaris, were classically thought to be organized in a ‘delay line and coincidence detector’ arrangement, known as the ‘Jeffress model’11 (see below). The model posits that individual neurons fire in response to precisely synchronized excitation from both ears, and systematically varied axonal conduction delays along the length of the nucleus serve to offset ITDs, so that each neuron is ‘tuned’ to a best ITD value that cancels the signal delays from the left and right ear (Fig. 3a,b). The Jeffress model has been particularly influential, partly because initial experimental evidence from birds provided strong support for the existence of such a delay line arrangement12,13, but also because many researchers find the manner in which this simple scheme turns systematic variations in ITD into a topographic map of sound source location very elegant and appealing. However, although it is widely thought that the Jeffress model is a good description of the avian ITD processing pathway, its relevance to the mammalian system has increasingly been questioned.
For starters, anatomical evidence for systematic delay lines in mammals is not definitive14,15. Of course, the internal delays would not necessarily have to be set up through axonal conduction delay lines, and one alternative hypothesis is that the delays might actually be of cochlear origin16. Hearing begins when the cochlea mechanically filters incoming sounds to separate out various frequency components. The mechanical filters that transduce sound into neural signals cannot respond infinitely fast, and they are said to be subject to small ‘group delays’. The group delays for low sound frequencies are somewhat larger than those for higher frequencies. Thus, if a signal from a higher-frequency neuron in the left ear arrives at an EE neuron at exactly the same time as a low frequency input from the right ear, then this would indicate that the sound came from the right, so that the extra time taken by the sound traveling to the farther (left) ear was offset by the larger group delay in the right cochlea.
However, the implementation of delay lines (axonal or cochlear) does not change the fundamental nature of Jeffress’s delay-line-and-coincidence-detector model. A more crucial question is how neurons achieve coincidence detection at the phenomenally fine temporal resolution that is required to account for behaviorally measured ITD detection thresholds. Both birds and mammals can detect ITDs as small as a few microseconds. MSO and nucleus laminaris neurons have similar anatomical and biophysical specializations, such as stereotypical bipolar dendrites, with inputs from each ear segregated onto each set of dendrites, allowing nonlinear integration between the inputs from left and right ears17. These neurons have a high density of low voltage–activated potassium channels, which speed up their synaptic dynamics, yielding excitatory postsynaptic potentials that are typically around 400 ms wide at half amplitude18,19.
Coincidence detectors seem to work by ‘cross-correlating’ sinusoidal synaptic conductances, which mirror the stimulus waveform, as seen through the ‘narrow-band filters’ that provide the input to the MSO. MSO neurons receive band-pass-filtered input that is relayed from the cochlea through the cochlear nuclei. The band-pass filtering makes them sensitive only to frequencies close to their own characteristic frequency. In other words, all sounds ‘look’ to them more or less like a sine wave at their own characteristic frequency20 (see Fig. 2b). Models suggest that when these sinusoidal inputs from each ear arrive in phase, they interfere constructively, and the binaural conductance sum becomes maximal, but when they arrive totally out of phase (worst ITD), they interfere destructively. This probably explains why ITD tuning curves measured in the auditory brainstem and midbrain have a cosine-like shape, with a period that depends on the neuron’s characteristic frequency, since this tuning curve arises as a sum of roughly sinusoidal inputs with frequencies close to the neuron’s characteristic frequency. The range of ITDs spanned between each neuron’s most and least preferred ITD value is consequently always approximately equal to half the period of the neuron’s best frequency, and it is more appropriate to think of MSO neurons as sensitive to interaural phase differences rather than to ITDs.
A recent modeling study21 illustrated that this interdependence between the shape of the ITD tuning curve and a neuron’s frequency tuning is problematic. Jeffress envisaged arrays of ITD detectors for each frequency band, each tuned to a different preferred ITD, so that the whole array could implement a sort of ‘labeled line’ population code22. For most low-frequency neurons, however, the ITD tuning curves are too broad and their peaks too blunt to make such an arrangement efficient (Fig. 3). Consequently, from an ‘optimal coding’ perspective, the peaks of the ITD tuning curves may be less relevant, and what matters is that the steepest slopes of the ITD tuning curves cover the animal’s behaviorally relevant ITD range21,23. Steep slopes mean that a small change in the stimulus causes a relatively large, easily detectable change in the neuron’s response. But tuning curves cannot be infinitely ‘tall’, and if ITD tuning curves are very steep over some part of the possible range of ITDs, then other parts of that range may have to fall on the less steep and hence less informative ‘plateaus’. Thus, a Jeffress-like arrangement, with systematically spread out tuning curve peaks, becomes computationally efficient when ITD tuning curves are so sharp and narrow that their slopes can no longer cover the range of ITDs that an animal experiences. This would be the case for the barn owl, which has ITD-tuned neurons with characteristic frequencies as high as 9 kHz and widely separated ears, and therefore large maximal ITDs, but not for the gerbil, for which ITD-sensitive neurons in the MSO rarely have characteristic frequencies greater than 2 kHz and the separation between the ears is much smaller.
Given these differences, it is an open question whether optimally efficient coding is a substantial constraint on the evolution of neural circuits. Natural selection often produces local maxima, and solutions need not be optimal as long as they are good enough24,25. In the auditory system, the narrowest information bottleneck presumably occurs in the auditory nerve. Thereafter, diverging connections in the ascending auditory pathway could mean that ever greater numbers of neurons are available to encode finite information. This would create some redundancy and reduce the necessity to make neural coding at subsequent stages optimally efficient. Nevertheless, the ‘optimality’ arguments put forward by Harper et al.21 provide a plausible, even elegant, explanation for recent experimental findings in rodents, wherein the peaks in ITD tuning curves were often found to lie outside the animal’s physiological range and tended to depend systematically on the neuron’s characteristic frequency26,27 (Fig. 3a). These observations do not fit the classic Jeffress model, in which coincidence detectors are organized to form a place map. To form such a topographic map requires ITD detectors in each frequency band to be tuned to the full range of physiological ITDs, and their tuning should therefore not depend on characteristic frequency. The new data thus argue for a population rate code rather than a map.
Are the two models for encoding ITDs irreconcilable? Not entirely: both depend on coincidence detection and convey ITDs to the midbrain through the distribution of firing rates across the population of neurons. Barn owls seem to use the information in both the peaks and the slopes of the tuning curves28, whereas theoretic analyses suggest that the two codes are not mutually exclusive29. Whether peaks or slopes of the tuning curves are better suited to representing ITDs from an information theoretic perspective may depend on the factors that constrain the shape of the tuning curves21,29. Animals may also not necessarily adopt the same optimal solution. For example, chickens and gerbils both have similar head sizes and ability to encode temporal information, and both use ITDs at relatively low frequencies (Fig. 3). The two species have similar head sizes and abilities to encode temporal information, and both use ITDs at relatively low frequencies. The constraints on the codes should be similar. But chickens have a place map of ITD in the nucleus laminaris25 (Fig. 3b), fitting Jeffress’s model closely, whereas gerbils may not.
The anatomical organization of the MSO is also more complicated than required by the basic Jeffress model. MSO neurons receive excitatory inputs from each ear, but they also receive inhibition that is precisely time-locked to the incoming auditory signals30. This inhibition is important for shaping ITD tuning curves in the MSO. Brand and colleagues27, for example, recorded responses in gerbil MSO before and after pharmacological suppression of inhibitory inputs and noted substantial changes in the shape of the ITD tuning curves, including shifts of the peak in the curve by several hundred microseconds. Several modeling studies30–32 have since explored how the interplay of excitation and inhibition might produce these shifts in the ITD tuning curve. Joris and Yin33 have recently argued against such a central role for inhibition in ITD processing. Consideration of the data published by Brand and colleagues27 led them to conclude that inhibition in the MSO might mostly affect the early ‘onset’ but not the sustained response. This conclusion may, however, be premature. Many important sound signals (for example, footsteps) are highly transient in nature and contain little or no sustained sound, meaning that the onset is arguably the most important part of the neural response34. Moreover, data recently published by Pecka et al.35 indicate that the effect of glycinergic inhibition on the shape of ITD tuning curves can persist during the sustained part of the response in the MSO. Thus, synaptic dynamics based on the interplay of precisely timed excitation and inhibition remain a credible additional or alternative mechanism to either axonal or cochlear delays in the inputs to the mammalian MSO (Fig. 3c,d).
Perhaps we are so attached to the systematic axonal delay line arrangement in Jeffress’s model because it alone automatically leads to a topographic, ‘space-mapped’ representation of best ITDs. In the barn owl, where the evidence for the implementation of a Jeffress model in the nucleus laminaris is strongest, the resulting topographic map of ITDs is passed on and maintained in subsequent processing stations, such as the central and external nuclei of the inferior colliculus and the optic tectum, where auditory and visual information is combined to direct the animal’s direction of gaze22. Mammals too have a topographic mapping of auditory space in the superior colliculus (the mammalian homolog of the optic tectum), but the evidence for place coding in the mammalian MSO looks increasingly weak, and there is no topographic space map in the central nucleus of the inferior colliculus. Rather, the mammalian auditory space map in the superior colliculus emerges gradually, under visual guidance36, as the information passes from the central nucleus of the inferior colliculus by way of the nucleus of the brachium to the superior colliculus37. Furthermore, the mammalian map in the superior colliculus is, as far as we know, based mostly on ILDs and monaural spatial cues, not ITDs38, and no topographic arrangement of spatial tuning or ITD sensitivity has ever been found in the areas of mammalian cortex thought to be involved in the perception of sound source location. The generation of a topographic map of ITD is considered by some to be a “defining feature” of Jeffress’s model33, and although it serves a clear function in the barn owl brain (where this topography is brought into register with a retinotopic representation of visual space), it is unclear whether such an ITD map exists in the mammalian brainstem or what purpose it would serve. Why would the mammalian auditory pathway establish a topographic representation of ITD in the MSO, only to abandon it again at the next stage of the ascending pathway?
Perhaps the strategies for encoding ITDs may be somewhat different in birds and mammals because different species combine different binaural cues in different ways. For example, barn owls have highly asymmetric external ears and can use ILDs to detect sound source elevation22. They then combine ITD and ILD information to create neurons sharply tuned for location in both azimuth (sound source direction in the horizontal plane) and elevation (direction in the vertical plane)39. This is very different from mammals, which typically use both ITDs and ILDs as cues to sound source azimuth but tend to rely on ITDs mostly for low frequency sound and on ILDs for high frequencies40. Sounds only generate large ILDs when the wavelength of the sound is small compared to the head diameter, which limits the usefulness of ILDs at low frequencies. ITDs are in principle present at all frequencies, but limitations in the ability of neurons to represent and process very rapid changes in the sound (so-called phase-locking limits and phase ambiguity) make it difficult for the brain to exploit ITDs in high-frequency sound. Therefore, our judgment of the azimuthal location of a sound source is dominated by ITDs at low frequencies and by ILDs at high frequencies. However, there is a considerable ‘overlap’, where information from both cues is combined. Presumably, this cue combination occurs as ITD and ILD information streams converge, and it would be very straight forward if ITD and ILD were both encoded using a similar population rate-coding scheme41.
The noteworthy similarities and the differences between the avian and the mammalian ITD processing pathways clearly suggest that there are many ways of localizing sounds and separating sources from background noise; these different strategies can provide inspiration for the development of new technologies such as cochlear implants. Current cochlear implant speech processors simply encode the spectral profile of incoming sounds in a train of amplitude-modulated electric pulses. In binaural implants, present-day processors work independently, so that the timing of pulses is not synchronized or coordinated between the ears, and much of the fine-grained temporal structure required for effective ITD processing is not preserved. Also, the range of different intensities that can be delivered through cochlear implants (the ‘dynamic range’) is very limited compared to natural hearing, which may affect the delivery of ILD cues. The first technical challenge will therefore be to try to overcome these limitations on the delivery of ‘natural’ ILD and ITD information, already an active research area42.
However, in the longer term, it may be beneficial to try to incorporate cues that are ‘supranatural’, at least for humans. Humans normally rely mostly on ITDs for low-frequency sounds, but low frequency–sensitive neurons are found near the apex in the mammalian cochlea, far from the round window, which makes them very difficult to access with current cochlear implant techniques. Unlike humans, barn owls can extract valuable ITD information for frequencies up to 9 kHz, and they overcome the phase ambiguity that arises when ITDs may be smaller than the period of the sound wave by integrating information across frequency channels22. In principle, sophisticated bionic devices could similarly extract both ITD and ILD information at high precision over a very wide frequency range and recode it in the manner appropriate for each individual. For example, in cases where cochlear implantation failed to restore low frequency hearing, this might involve merely translating ITD cues into enhanced ILDs, but more sophisticated approaches could also be developed.
In an age where many personal stereo systems already pack powerful microprocessors, future cochlear implant processors and hearing aids could become more sophisticated and incorporate various spatial filtering and preprocessing techniques, not necessarily modeled on designs normally found in mammals. Future designs could incorporate pressure gradient receivers, as used by some insects43 and other terrestrial vertebrates (lizards, frogs and some birds; Fig. 1). The ears of these animals are inherently directional because they are acoustically connected by a continuous airspace between the eardrums, either through the mouth cavity or through interaural canals8. This acoustical connection allows sound to reach both sides of the eardrum, which is then driven by the pressure difference between the external and internal sounds. Pressure gradient receiver ears have highly directional eardrum motion, provided the interaural coupling is strong enough. These ears perform so well that they beg the question of why mammals and some hearing specialists such as owls even have independent ears. There have been many theories advanced: increased breathing rates interfere with tympanum motion when ears are coupled through the mouth; and even the best pressure gradient receivers have nulls—that is, they may become insensitive to certain frequencies if sound waves arising from either side interfere destructively44. Nevertheless, given that this design of a directional receiver is also used in insect hearing, it is clearly amenable to miniaturization, and potentially, it could be incorporated in directional receivers for auditory prostheses.
Of course, artificial directional hearing designs would not necessarily have to be binaural. Even insects rarely have more than two ears, and sometimes only one45, which is perhaps unexpected, given that a separation (‘unmixing’) of sounds from different simultaneous sound sources can in theory easily be achieved using techniques such as independent component analysis46, provided that the number of sound receivers (ears or microphones) is as large as the number of sound sources. Perhaps bionic ears of the future will interface to elaborate cocktail-party hats that sport as many miniature microphones as there are guests at the party. The basic algorithm for independent component analysis requires that the relationship between sources and receivers be stable over time. To adapt this to mobile speakers and listeners, methods would have to be developed to track auditory streams when the sound sources and receivers move relative to each other, but that may well be a solvable problem. If so, many-eared, rather than merely binaural, devices might ultimately turn out to be optimal solutions for bionic hearing.
Supported by UK Biotechnology and Biological Sciences Research Council grant BB/D009758/1, UK Engineering and Physical Sciences Research Council grant EP/C010841/1, a European Union FP6 grant to J.W.H.S., and US National Institutes of Health grants DCD000436 to C.E.C. and P30 DC0466 to the University of Maryland Center for the Evolutionary Biology of Hearing.