We are constantly required to perceive, distinguish, and identify signals in our acoustic environment. A critical first stage of these processes is the encoding of the information into a robust neural code that allows efficient subsequent processing in the auditory system [

1]. We investigated the properties of such a robust neural code at the level of the cortex by varying the amount of information—or entropy—in the acoustic signal.

In the context of information theory [

2,

3], entropy (

*H*) denotes the uncertainty associated with an event and thus provides a metric to quantify information content: a rare—or uncertain—event carries more information than a common—or predictable—event. The properties of many information transmitting systems can be characterised in terms of entropy. Indeed, Shannon originally applied information entropy to describe transitional probabilities in language [

2]: in English, less common letters (e.g., “k”) have a lower probability (or higher uncertainty) than more common letters (e.g., “e”), and therefore carry higher information and entropy. Similarly, entropy can be used to characterise pitch transition probabilities in simple musical melodies [

4,

5]. We used entropy to quantify the information content of pitch sequences.

“Fractal” pitch sequences based on inverse Fourier transforms of

*f*^{–n} power spectra [

6,

7] provide a means to control directly the entropy of the sequence via the exponent

*n* (). For

*n* = 0, the excursion of the pitch sequence is equivalent to fixed-amplitude, random-phase noise and thus is completely random (high entropy). In the context of information theory, the high degree of randomness in this signal does not correspond to noise that must be removed by the system, but rather to a low predictability of the stimulus that results in each individual element of the sequence making a high degree of contribution to the information in the sequence. As

*n* increases, a single stream gradually dominates the local pitch fluctuations and successive pitches become increasingly predictable (low entropy). Such stimuli are more predictable so that each element of the sequence makes little contribution to the overall information in the stimulus. These families of pitch sequences with different values of

*n* are statistical “fractals” [

8] in the sense that their statistical properties are scale-independent [

7]. For present purposes, the critical property of these pitch sequences that we exploit here is not their fractal behaviour, but the variation of entropy that is produced as

*n* varies, whilst pitch range, tempo, and pitch probability remain largely constant (however, it is inherent to the system that for large exponents

*n* > 4, the pitch distribution approaches a sinusoid and consequently is tilted toward the extremes of the pitch range and also that the average interval size between successive pitches decreases for increasing exponents

*n*).

Entropy for pitch sequences generated with a given value of exponent

*n* can be determined by computing the sample entropy (

*H*_{SampEn}) [

9]. Intuitively,

*H*_{SampEn} is based on the conditional probability that two subsequences of length

*m* that match within a tolerance of

*r* standard deviations remain within a tolerance

*r* of each other at the next point

*m + 1*. Explicitly, for a signal or time series of length

*N*,

*H*_{SampEn} is defined as:

where

*A*_{r}(

*m*) (or

*A*_{r}(

*m + 1*)) denotes the probability that two subsequences of length

*m* (or

*m + 1*) match within a tolerance

*r*. Two sequences “match” if their maximum absolute point-by-point difference is within a tolerance of

*r* standard deviations. That is, sample entropy is essentially a measure of self-similarity, where highly self-similar time series signify high redundancy and therefore low entropy, and time series with low self-similarity represent a high degree of uncertainty and therefore high entropy. Furthermore, sample entropy is a nonparametric measure in the sense that it does not require a priori knowledge of the true probability density function of the underlying time series. In the present case, the parameters were chosen as

*m* = 2,

*r* = 0.5, and

*N* represents the number of tones of the pitch sequence.

By varying information theoretic properties of pitch sequences, we address encoding mechanisms applied to sounds at a level of generic processing that is not specific to any semantic category. Even before such encoding mechanisms are engaged, the auditory system must represent spectrotemporal features of the stimulus in sufficient detail such that a number of different aspects of the stimulus can be encoded, in order to allow different types of subsequent categorical and semantic processing. In the current context, encoding constitutes the stage of analysis between the detailed representation of the spectrotemporal structure of the stimulus and the subsequent categorical analysis of abstracted acoustic forms. A single sound may be associated with more than one abstracted form: for example, we might obtain vowel, speaker, and position from a single sound, where each feature can undergo subsequent categorical and semantic processing. Here we use information theory to demonstrate encoding mechanisms in the brain that result in the abstraction of a form of the stimulus.

We hypothesise that if such encoding mechanisms are efficient, they will use less computational resource for stimuli that have a low information content compared with stimuli that have high information content. This hypothesis is tested by measuring the functional MRI (fMRI) blood oxygenation level–dependent (BOLD) signal as an estimate of neural activity and computational resource during the encoding of auditory stimuli in which the information content is systematically varied. We further hypothesise that processing in primary auditory cortex in the Heschl's Gyrus (HG) corresponds to a stage at which the detailed spectrotemporal structure of sounds is represented [

10–

12] and where such a relationship will not be observed. Instead, such a relationship is expected to be observed in distinct auditory association cortex in the planum temporale (PT), which we have previously characterised as a “computational hub” [

13] that is required to convert spectrotemporal representations into “templates”—sparse symbolic neural representations that are the basis for categorical, semantic, and spatial processing. For example, the spectral envelope of a sound would represent such a template for vowel processing [

14]. The model was developed to account for the involvement of PT in the analysis of a variety of complex sounds that can be processed categorically (speech, music, and environmental sounds) as well as different spatial attributes (for a review, see [

13]).

Here we investigate the encoding of pitch sequences that can be like melodies in their structure, but in which the structure and information content is determined by statistical rules. We sought brain areas that display a positive relationship between the information content or entropy of pitch sequences and neural activity as assessed by the BOLD signal during encoding. Specifically, we hypothesised that such a relationship exists in PT but not in earlier auditory areas.