|Home | About | Journals | Submit | Contact Us | Français|
The purpose of this article is to describe the development of translational methods by which spectrum analysis of human infant crying and rat pup ultrasonic vocalizations (USVs) can be used to assess potentially adverse effects of various prenatal conditions on early neurobehavioral development. The study of human infant crying has resulted in a rich set of measures that has long been used to assess early neurobehavioral insult due to non-optimal prenatal environments, even among seemingly healthy newborn and young infants. In another domain of study, the analysis of rat put USVs has been conducted via paradigms that allow for better experimental control over correlated prenatal conditions that may confound findings and conclusions regarding the effects of specific prenatal experiences. The development of translational methods by which cry vocalizations of both species can be analyzed may provide the opportunity for findings from the two approaches of inquiry to inform one another through their respective strengths. To this end, we present an enhanced taxonomy of a novel set of common measures of cry vocalizations of both human infants and rat pups based on a conceptual framework that emphasizes infant crying as a graded and dynamic acoustic signal. This set includes latency to vocalization onset, duration and repetition rate of expiratory components, duration of inter-vocalization-intervals and spectral features of the sound, including the frequency and amplitude of the fundamental and dominant frequencies. We also present a new set of classifications of rat pup USV waveforms that include qualitative shifts in fundamental frequency, similar to the presence of qualitative shifts in fundamental frequency that have previously been related to insults to neurobehavioral integrity in human infants. Challenges to the development of translational analyses, including the use of different terminologies, methods of recording, and spectral analyses are discussed, as well as descriptions of automated processes, software solutions, and pitfalls.
As a relatively new domain of scientific inquiry, translational research is characterized by concepts, definitions, and methods that continue to evolve. Whereas early phases of translational research frequently address how basic discoveries provide a basis for determining candidates for health applications and practice, we can also conceptualize translational research as a bidirectional process in which the methods and findings of the clinical domain of inquiry may feed back to inform the development of basic discovery. For the purposes of this paper, the concept of translational research refers to a method of scientific inquiry in which the domains of experimental research with non-human animals and correlational research with humans inform the methods and findings of one another. As the field of behavioral epigenetics continues to emerge (Lester et al., 2011), we apply this approach to understanding how variation in the prenatal environment may affect the phenotypic expression of early neurobehavioral development. In studies of human infants, determining the effects of any one prenatal environmental condition or potential teratogen on neurobehavioral development is complicated by the many possible confounding and correlated factors that may be associated with differences in maternal lifestyle, nutrition, healthcare, socioeconomic status, emotional well-being, and use of licit and illicit drugs during pregnancy. Basic experimental investigations of other species may provide control over these and other factors, but finding potentially comparable measures of neurobehavioral development that may be applicable to humans and other species is a challenge. The purpose of this paper is to describe the development of methods by which the cry sounds of both human infants and rat pups can be spectrum analyzed as a means of examining the effects of variations in the prenatal environment on the expression of early neurobehavioral development.
The spectrum analysis of human infant cry sounds and ultrasonic vocalizations (USVs) of rat pups may provide a particularly sensitive and useful assessment of neurobehavioral integrity and development. Cry vocalizations of both species during the early postnatal period are exceptional behaviors in that they are, at once, biological and social signals, critical to early survival and development (Zeskind, in press). As such, the study of the form and function of these vocalizations provides a unique window into both the biological and social processes that guide early development. The analyses of human infant crying and rat pup USVs during the early postnatal period have extensive histories that have often proceeded along different paths of inquiry, each with its own respective strengths and limitations. With regard to the assessment of neurobehavioral integrity, the study of human infant crying over the past 50years has resulted in a rich set of measures that may be applied to the analysis of rat pup USVs. On the other hand, rat pup USVs have been analyzed in larger temporal contexts with a variety of more ecologically relevant conditions that may be instructive to the analysis of human infant crying. While we certainly must avoid the pitfalls of direct homologous comparisons of the form and function of the vocalizations of the two species (Dow-Edwards, 2011), the creation of a comparable set of measures by which vocalizations can be analyzed may contribute to the development of future translational research regarding the assessment of the effects of variations in the prenatal environment on infant neurobehavioral development.
The sound of infant crying during the early postnatal period can be likened to a biological siren, an acoustic signal that reflects the current organic condition of the infant and then broadcasts that condition to the social environment in a repetition of high-pitched sounds wavering in both frequency and temporal organization (Zeskind, in press). Across many mammalian species, these sounds may occur incidentally and without intent as they effectively alert the caregiving environment, facilitate location of the infant, and provide the motivational basis for responses that may contribute to early survival and development (Owren and Rendell, 2001). The communicative content, or what early cry vocalizations specifically communicate, however, has remained a continuing question. Similar to discussions in the primate literature (Owren and Rendell, 2001), investigators have disagreed about whether the USVs of young rat pups reflect specific emotional states or an acoustic by product of physiological changes (Blumberg and Sokoloff, 2001). After analogous questions regarding the communicative significance of human infant crying were raised several years ago, the cries of human infants are now viewed by most investigators as a graded signal that reflects the intensity of non-specific eliciting conditions (Zeskind, in press). While it is beyond the scope of this paper to fully address this issue, we will consider the cries of human infants and rat pup USVs within this latter conceptual framework. For the purposes of this discussion, we will refer to the “distress” vocalizations of both human infants and rat pups as “infant crying.” This terminology may be more intuitive with regard to the vocalizations of human infants, but the argument has been made that we can consider USVs of rat pups to be a form of crying to the extent that they are characteristic “distress” vocal sounds (Blumberg and Sokoloff, 2001).
The cry of the newborn and young infant is initiated by endogenous and exogenous sensory experiences, such as pain and hunger, which disrupt the homeostatic balance of infant arousal systems. With significant neurobehavioral reorganization between 2 and 3months of age, the primarily reflexive cry of the newborn infant additionally a social signal, shaped by caregiver responses (Emde and Gaensbauer, 1981), with changes in form and function (Murry and Murry, 1980; Zeskind, 1985). During the first couple of months, the human infant cry sound may actually be more similar to the distress vocalizations of other primates than it is to the form and function of subsequent human language (Lieberman et al., 1971). As in most well-documented primate vocal repertoires, vocalizations that induce attention and arousal often have sharp onsets, dramatic frequency and amplitude fluctuations and either shorter or longer, upward sweeps in frequency (Owren and Rendell, 2001).
The arousing and dramatic acoustic characteristics of the cry of the human infant can be seen in its temporal and spectral features. The temporal morphology of infant crying is comprised of a rhythmic repetition of (1) an expiratory sound, (2) a brief pause, (3) an inspiratory period, and (4) a second pause before the next expiratory sound – although there also may be coughs and smaller utterances interspersed within the repeating pattern (see Figure Figure1).1). The fundamental frequency (F0; basic pitch) of the cry is typically measured during the expiratory component of the sound as air is pushed outward past the vocal cords as part of the respiratory cycle. The harmonic structure of the cry within which the F0 occurs has been described as having one of three qualitatively different modes (Truby and Lind, 1965). Phonation is the typical cry mode and usually has an F0 ranging between 400 and 600Hz. Hyperphonation is a second cry mode characterized by a qualitative shift in vocal production that results in an F0 ranging between 1000 and 2000Hz and higher. Importantly, this high-pitched cry sound is frequently found in infants who have experienced a wide range of prenatal conditions that may insult the integrity of neurobehavioral organization and frequently elicits particularly strong affective responses from the caregiving environment (Zeskind and Lester, 2001). The third cry mode, dysphonation, typically occurs during periods of high infant arousal and is characterized by sonic turbulence due to aperiodic vibrations in the vocal apparatus. The aperiodic nature of the cry sound results in a lack of harmonic structure and measurable F0. Figure Figure22 shows a human infant cry sound containing both hyperphonated and phonated acoustic structures in the same expiratory period. Figure Figure33 shows human infant cry sounds with dysphonation.
The sound and rhythm of infant crying has its basis in anatomical and physiological mechanisms that produce non-specific changes in infant arousal. Several physioacoustic models have described the coordinated activity among brainstem, midbrain, and limbic systems, as well as autonomic and other neural systems, which result in variations in cry sounds (Lester, 1984; Golub and Corwin, 1985; Lester and Boukydis, 1992). Autonomic and central nervous system regulation of the respiratory cycle, for example, underlie the rhythmic temporal morphology of crying. Variations in the pitch of crying originate in the lower brainstem, which controls the tension of laryngeal muscles through the vagal complex (cranial nerves IX–XII) and phrenic and thoracic nerves. Hyperphonation reflects instability in these neural control mechanisms and is often found in infants who suffer from poor autonomic and neurobehavioral regulation. Similarly, the threshold for the initiation of crying is directly related to integrity of the autonomic nervous system and its effects on the rhythmic organization of arousal (Zeskind et al., 1996a). Due to their bases in CNS innervation and ANS modulation, individual differences in these and other features of crying have been related to several other measures of neurobehavioral function in the infant (Porter et al., 1988; Green et al., 2000; Zeskind and Lester, 2001; Lester et al., 2002).
Based on the human auditory perception range (20Hz–20kHz), rat vocalizations have been categorized into two broad types, sonic and ultrasonic. USVs typically occur with a fundamental frequency above 40, 20kHz above the range of human hearing. Infant rats tend to vocalize over a wide fundamental frequency range (up to 100kHz), while adult rats tend to vocalize with fundamental frequencies around 22 or 55kHz (Roberts, 1975). The differentiation is not entirely clear, however. Some would argue that the range of USVs in the neonatal rat is fairly limited because of the underdeveloped laryngeal system that plays a role in the emission of the vocalizations. Unlike most mammalian vocalizations that are produced by vibrations of the laryngeal folds, these USVs are not produced by vocal fold vibration during expiration, but by the passage of air under high pressure between constricted vocal folds (Roberts, 1972). These vocalizations may occur with durations as short as 0.03 or 0.05s in rapidly emitted bursts comprised of varying numbers of sounds. Figure Figure44 shows an example of the rhythmic organization of repeated rat pup USVs. Temporal organization of USVs changes with age in several rodent species (Elwood and Keeling, 1982). The number of USVs is low soon after birth, then increases to a peak between approximately postnatal days (PND) 11–12 (Branchi et al., 1998), but varies depending on the species and eliciting condition (Sales and Smith, 1978). The number of USVs then declines across the neonatal period as pups grow larger (Naito and Tonoue, 1987). Other work indicates that while the duration of calling, bandwidth, peak frequency and spectral complexity increases with age (Brudzynski et al., 1999; Brudzynski, 2005), the dominant frequency (frequency with highest amplitude) of ultrasound production decreases linearly from 45kHz at 2days of age to 25kHz at 20days of age (Blumberg et al., 2000). Similar to changes in the human newborn after 2months, changes in vocalizations over the first two postnatal weeks may reflect a transformation from a more reflexive behavior in response to variations in temperature to a more social behavior in response to social cues.
Similar to differences in the acoustic and melodic structure found in human infant cry sounds (Truby and Lind, 1965; Wasz-Hockert et al., 1968), rat pup USVs have been described with regard to variations in their waveforms (Brudzynski et al., 1999). Using “isolation calls” of 10- to 17-day-old rat pups, Brudzynski et al. (1999) classified USVs as having the characteristics of one of 10 possible melodic shapes, based on the quantity and quality of frequency modulations. The classification analysis can be described as reflecting an increase in the complexity of frequency modulations as categories progress from 0 to 9. As seen in Figure Figure5,5, categories proceed from compositions of simple dots or lines to single rising or falling patterns to U-shaped and W-shaped spectrographic structures. This figure also includes additional waveform classifications that we have created and will discuss later in this paper. In particular, these additional classifications include waveforms that show a sudden shift in pitch reminiscent of the sudden, qualitative shifts in pitch that are characteristic of hyperphonation in human infant cry sounds. Other than Sales and Smith’s (1978) description of “frequency steps,” defined as an instantaneous frequency change with no interruption in time, little is known about the occurrence or prevalence of this acoustic structure. Because hyperphonation is evidence of neurobehavioral dysregulation in human infants, the shifts in pitch may be particularly relevant to a translational analysis for purposes of neurobehavioral assessment. Others have also used a modified version of Brudzynski’s classification system as a proposed method to assess neurobehavioral development following prenatal malnutrition (Tonkiss et al., 2003).
There is considerable debate regarding the motivational/physiological basis of infant rat cry production (PND 1–15). A large number of studies suggest that changes in cry production reflect specific alterations in the stress state of the pup (Hofer, 1996; Branchi et al., 2001; Scattoni et al., 2009). Variations in the sounds of these vocalizations have been associated with handling, cold temperatures, isolation, and various social factors (Blumberg et al., 1992; Shair et al., 1997; Branchi et al., 2001; Hahn and Lavooy, 2005). In addition to retrieval, vocalizations may also elicit maternal consumption of pup excretions during anogenital licking (Brouette-Lahlou et al., 1992) and may directly stimulate prolactin secretions in dams, although some controversy exists (Terkel et al., 1979; Stern et al., 1984; Hashimoto et al., 2001). In this context, crying is often viewed as a motivated behavior. In contrast, a smaller number of studies suggest that cry production may strictly be an incidental byproduct of locomotion or a physiological thermoregulatory mechanism (Blumberg and Alberts, 1990; Blumberg, 1992; Blumberg et al., 1992). In this context, crying is viewed as a non-motivated behavior (for a detailed discussion, see Blumberg and Sokoloff, 2001; Blumberg and Sokoloff, 2003; Panksepp, 2003). A third perspective, a Polyvagal Theory (Porges, 2009), may help resolve the discrepancy between these seemingly contradictory approaches. This theory posits that the excitatory equilibrium that exists between the two branches of the vagus controls the physiological arousal state of the animal, and thus cry production, as well as cardiac function and thermoregulation. Importantly, the equilibrium of these branches can be modulated by the stress or arousal state of the animal. Thus, various levels of arousal or stress may alter thermoregulatory mechanisms that produce variations in the sound and organization of the vocalization. Resolution of this issue may require new conceptualizations of crying in the future.
The conceptualization of crying that guides the development of our translational methods emphasizes the contribution of infant crying to a synchrony of arousal between infants and caregivers (Zeskind et al., 1985). Use of the term “arousal” in the context of this paper refers to changes in the homeostatic balance between sympathetic and parasympathetic contributions to autonomic nervous system activity. This approach has four basic elements that describe the dynamic and graded signal qualities of the cry and their effects on the caregiving environment (Zeskind, in press). First, variations in the sound and temporal organization of crying result from non-specific changes in infant arousal rather than specific emotional states or eliciting conditions. Second, graded increases and decreases in infant arousal result in corresponding graded increases and decreases in the temporal and acoustic characteristics of the cry sound. Different patterns of crying that have been associated with specific emotional states may reflect the infant’s level of arousal associated with the specific environmental conditions in which they occur. Third, these graded increases and decreases in the characteristics of the cry sound result in synchronous graded changes in the intensity of the receiver’s arousal system. The Polyvagal Theory similarly emphasizes reciprocal relations between the arousal systems involved in both the production and reception of vocalizations and the perceptual advantage that mammals have by vocalizing within a frequency band to which the receiver’s anatomical characteristics, such as the middle ear, are particularly sensitive (Porges and Lewis, 2010). Fourth, these changes in the adult’s intensity of arousal result in responses to the cry sounds that are mediated by the caregiver’s own characteristics, developmental history, physiology, and context.
Although these basic elements may currently exist at different levels of conceptual and empirical development in human and comparative fields, they can provide a unified guide to our development of translational methods in the analysis of cry sounds of human infants and rat pups. The human literature is replete with studies demonstrating the four basic elements of this model (for a review, see Zeskind, in press). Among the many measures of the sounds of infant crying, variations in the fundamental frequency and temporal organization of crying have been shown to be particularly salient. Cries with a high fundamental frequency typically reflect high infant arousal (Porter et al., 1988) and then elicit the greatest intensity of response (Zeskind and Marshall, 1988; Schuetze et al., 2003). In particular, hyperphonated cries elicit very strong perceptual (Zeskind and Lester, 1978) and physiological responses (Zeskind, 1987). In the temporal domain, a curvilinear relationship may exist. Cries with either longer or shorter expiratory sounds, along with either longer or shorter pauses, both reflect the greatest infant arousal (Zeskind et al., 2006; Tutag-Lehr et al., 2007) and elicit the greatest adult arousal and perceived urgency (Zeskind et al., 1992). Adults’ responses to these changes in their arousal depend on the emotional characteristics and perceptual set of the caregiver. For example, whereas the intensity of the responses of “typical” mothers increases as the fundamental frequency of crying increases, the intensity of responses of women who are depressed or who used cocaine during pregnancy decreases to these sounds – responses suggesting increased action versus withdrawal, respectively (Schuetze et al., 2003, 2005). These and other behavioral and physiological responses to infant cries have been associated with the development of physical abuse and/or neglect, based on how the cry is perceived by the caregiver (Crowe and Zeskind, 1992; Zeskind, in press).
Rodent vocalizations have also long been viewed by some as being produced by changes in pup arousal which then result in changes in maternal arousal (Owren and Rendell, 2001). Bell (1974) suggested that graded changes in the acoustic properties of vocalizations represent quantitative changes in such signal parameters as rate, intensity, frequency, bandwidth, duration, and persistence that are related to the degree of pup arousal. The occurrence of these signals then triggers a similar degree of arousal in the dam. For example, small decreases in air temperature may result in decreases in physiological temperature and concomitant increases in the production rate of ultrasounds (Sokoloff and Blumberg, 1997). In turn, a sustained high-rate of vocalizing by pups may be the most effective stimulus for maternal attention and retrieval, resulting in increased warmth (Deviterne et al., 1990; Brunelli et al., 1994; Farrell and Alberts, 2002; Zimmerberg et al., 2003; Fu et al., 2007). As a dynamic signal, the peak frequency of rat pups at postnatal day 7 has been shown to increase in the second minute of crying, as sustained crying continues (Tonkiss et al., 2003). Perhaps it should be emphasized that, like response patterns in human, cries may not always elicit a “typical” response in rats or other species. Changes in the intensity of arousal, across mammalian species, including rat pups, have been shown to provide the basis for many different potential responses, also depending on the developmental history of the caregiver and the context in which crying occurs (Smotherman et al., 1978; Owren and Rendell, 2001). Future work that examines how maternal responses vary with respect to variations in both the acoustic attributes of vocalizations and maternal developmental history will afford an increased understanding of the bidirectional processes underlying behavioral development.
Spectral analysis of infant crying may be particularly valuable in the study of the effects of variations in the prenatal environment on infant neurobehavioral development. Eliciting an infant’s cry has long been used as part of the newborn neurological examination to support the differential diagnosis of brain damage (Prechtl and Beintema, 1964). Whereas infants with trisomy chromosomal disorders typically have cries with lower fundamental frequencies, the hallmark of cry sounds of infants with most insults to neurobehavioral function is a higher fundamental frequency and a frequently occurring shift to a hyperphonated acoustic structure (Zeskind and Lester, 2001; LaGasse et al., 2005). In addition to measures of the spectral characteristics of crying (e.g., higher fundamental or formant frequencies), measures of the temporal morphology (e.g., shorter durational components) and production efforts (e.g., higher threshold for cry initiation and longer latency to crying) components of crying have been used to differentiate infants along a wide continuum of neurobehavioral casualty – from cases of severe brain damage (Karelitz and Fisichelli, 1962; Wasz-Hockert et al., 1968) to preterm birth and low birth weight (Michelsson, 1971; Lester and Zeskind, 1978; Corwin et al., 1992). These pioneering studies showed that the analysis of infant crying could be used to support the existing neurological or physical assessment of the infant as being at risk for poor development. However, the analysis of crying in these cases added little additional information to the evident neurobehavioral evaluation of the infant.
Subsequent research showed that the measures of infant crying previously used to differentiate cases of known neurological damage could provide additional information regarding the neurobehavioral integrity of infants who have experienced adverse prenatal conditions that put them at risk for later-detectable poor developmental outcomes. For example, infants with prenatal exposure to several licit and illicit substances, including opiates (Blinick et al., 1971; Corwin et al., 1987; Lester et al., 2002), marijuana (Lester and Dreher, 1989; Lester et al., 2002), alcohol (Nugent et al., 1996; Lester et al., 2002), and tobacco (Nugent et al., 1996) have been differentiated by a variety of measures of crying, including latency, threshold, amount of dysphonation, duration of expiratory sounds, fundamental frequency, formant frequencies, and their amplitudes (LaGasse et al., 2005). Variations in these measures may also be sensitive to different behavioral syndromes in response to similar prenatal exposures. For example, in response to prenatal cocaine exposure, some newborn infants show increased hyperphonation, dysphonation, and cry duration (Lester et al., 2002), while others show less hyperphonation and fewer cry utterances or expirations (Corwin et al., 1992). These dual syndromes may be due, respectively, to the excitatory direct effects of prenatal cocaine on nervous system activity, as compared to a depressive pattern of behavior that results from the indirect effects of concurrent malnutrition (Lester et al., 1991).
The sensitivity of spectral analyses of infant crying to insults in neurobehavioral integrity can perhaps best be demonstrated in the assessment of infants who experienced potentially adverse prenatal conditions, yet appear to be healthy and show no abnormal signs on routine physical and neurological examinations. Common prenatal conditions that have been differentiated by the analysis of infant crying among seemingly healthy infants include high numbers of prenatal complications (Zeskind and Lester, 1978), a subtle form of third trimester malnutrition common to low SES families (Zeskind, 1981; Zeskind and Lester, 1981), subclinical fetal alcohol exposure (Zeskind et al., 1996b), and prenatal exposure to maternal antidepressant-use during pregnancy (Zeskind et al., 2005, 2009). Typically, these infants have cries with a higher threshold (higher numbers of stimuli needed to elicit sustained cry), a longer latency, shorter initial expiratory components, a shorter overall duration of the crying bout and a higher fundamental frequency in the initial expiratory segments. Recent work further suggests that the dominant frequency of crying (the harmonic frequency with the highest amplitude or power) may detect the effects of infants’ withdrawal from prenatal exposure to maternal antidepressant-use (Zeskind et al., 2005). The importance of these studies is that the measures of infant crying previously used to support the differential diagnosis of brain damage can detect insults to neurobehavioral organization and integrity among seemingly healthy, full term, full birthweight infants residing in the normal newborn nursery. That is, the spectral analysis of infant crying may be able to assess whether or not a prenatal exposure or environmental condition has deleterious effects on neurobehavioral organization – in the absence of other abnormal signs.
While the utility of the analysis of rat pup vocalizations for assessing the effects of various prenatal conditions on early neurobehavioral integrity has not yet been studied in the depth or detail as has human infant cry sounds, the neurobiological bases of vocalizations are better known in other mammalian species than they are in human infants. Studies on communication in squirrel monkeys have found that the periaqueductal gray (PAG) is a critical region for cry elicitation (Jürgens and Richter, 1986; Jürgens, 2002) and that glutamatergic and GABAergic input into the PAG (Jürgens and Lu, 1993b) from limbic structures including the hypothalamus and amygdala (Jürgens, 1982; Jürgens and Lu, 1993a) are speculated to play a large role in the control of vocalizations. Rodent studies also support these structures as playing a role in vocalizing behavior (Koo et al., 2004; Borszcz, 2006; Burgdorf et al., 2007; Oka et al., 2008). The known neurobiological basis of vocalizations in mammalian species other than humans is another potential area of translational research in which the study of rat pups may inform the study of human infant crying, especially with regard to the value of the analysis of vocalizations for purposes of neurobehavioral assessment. At this time, however, no known studies have explored the neurobiological mechanism(s) in an animal model that exhibits an altered vocalizing phenotype.
A growing literature suggests that rat pup USVs are sensitive to the effects of several potentially adverse prenatal conditions. Decreased vocalization rates have been found following prenatal malnutrition (Tonkiss and Galler, 2007) and exposure to anxiolytic drugs (Gardner, 1985), SSRIs (Joyce and Carden, 1999), pesticides (Venerosi et al., 2009), alcohol (Engel and Hard, 1987; Kehoe and Shoemaker, 1991; Tattoli et al., 2001; Barron and Gilbertson, 2005), cannabinoids (Antonelli et al., 2005), ecstasy (Winslow and Insel, 1990), morphine (Carden and Hofer, 1990), cocaine (Hahn et al., 2000), and methylmercury (Elsner et al., 1990). Whereas the rates of vocalization have mostly been used to assess these effects, other vocalization measures may also be of value (Branchi et al., 2001). Shorter call duration (Elsner et al., 1990), increased latency (Venerosi et al., 2009) and increased threshold (Vathy and Komisaruk, 2002) have also been found following prenatal exposure to several substances. Using the categorical system created by Brudzynski et al. (1999); Barron and Gilbertson (2005) found that pups with prenatal alcohol exposure emitted USVs with fewer waveform categories 3, 4, and 6 than control rat pups. Similarly, using a modified version of Brudzynski’s categorization system, Tonkiss et al. (2003) found different waveform patterns in rat pups following prenatal malnutrition, as well as a higher peak frequency and durations in the vocalizations, independent of the waveform type. As such, analysis of the crying in both human infants and rat pups may be ripe for application of translational methods by which the effects of a wide range of prenatal conditions can be assessed.
Determination of how we measure, describe and, thus, characterize vocalizations of any species is based on the technologies we use. For example, one of the first known attempts to objectively quantify the rhythmic and melodic variations of human infant crying was based on the technology available at the time–musical notes on a musical staff (Gardiner, 1838). Darwin (1855/1965) later used the then current technological advancement, photographic plates, to describe variations in infant cry sounds. Today we have digital sound spectrographic systems that conduct fast-Fourier transforms (FFT) on user-selected points in human infant crying, as well as calculating an average of multiple power spectra over the duration of a spectrally less complex ultrasonic rat pup vocalization. These systems have not only significantly increased the resolution, accuracy, and objectivity of measurement, but have also allowed for the discovery of new and different aspects of vocalization patterns. There are, however, important differences both within and between the analyses of human infant crying and rat pup USVs that challenge the development of a translational analysis. These include methods by which cry sounds are elicited, recorded, and analyzed and the availability of a functionally similar set of measures that may facilitate comparisons between species.
First, significant differences in the spectral complexity of human infant and rat pup cry sounds result in a differential ability to use automated systems to create a similar set of measures for the two species. As seen in Figure Figure1,1, the cry of the human infant is comprised of a complex array of sounds varying in frequency, amplitude, spectral noise, and harmonics, any number of which may have higher amplitudes than the fundamental frequency. This complexity creates a sound that is beyond the ability of commercially available sound analysis systems to automatically determine most measures of human infant crying. While one dedicated analysis system has been developed that is capable of automatically and systematically analyzing large amounts of infant cry sounds (Cry Research Inc., Brookline, MA, USA), this system is not commercially available for broad use. However, studies using this system, and the measures it creates, are widely reported in the literature and merit our attention (LaGasse et al., 2005). The measures derived and reported may be, correctly, idiosyncratic. For example, unlike other systems, the CRI system conducts analyses based on the shape and size of the newborn infant’s vocal tract and, thus, provides the only known valid reported measure of formant frequencies in the human infant cry. In comparison, rat pup USVs contain relatively less spectral complexity and can be easily subjected to automated processing for rapid analysis of large numbers of vocalizations, using such software programs as Avisoft-SASLab Pro.
Second, use of different analytic systems has also resulted in varying techniques and units of measurement that challenge the ability to directly compare results across studies. Most studies have used their own idiosyncratic methods and measures of the cry sound. For example, whereas the CRI system differentiates between hyperphonated and phonated acoustic structures for reported measures of the fundamental frequency of the human infant cry (Lester et al., 2003), others report the Peak F0, independent of the acoustic structure, as a means of describing the highest pitched sound in the cry (Zeskind et al., 2005; Tutag-Lehr et al., 2007). In this case, the average F0 determined from phonated cry sounds may be lower than the average F0 of cry sounds that include both phonated and high-pitched hyperphonated cry segments. Similarly, whereas some report fundamental frequencies based on an average of multiple power spectra in analyses of both human infant crying (Lester et al., 2003) and rat pup USVs (Brudzynski et al., 1999), others may use the Peak F0 described above (Zeskind et al., 1996a). In contrast to the latter measure of fundamental frequency at the single point where it reaches its highest value in a cry sound, the former measure of fundamental frequency is calculated on a sum and average of the fundamental frequency determined at each point (e.g., 25ms blocks) along the entire length of each vocalization, thus including the lowest to highest frequencies in the sound. While the benefit of this calculation is that it provides a description of the fundamental frequency of the entire vocalization, the mean may be differentially weighted by a greater presence of lower pitched sounds and may not identify the highest point at which F0 is emitted. While each measure has its pros and cons, it should be understood that calculations of a mean F0 based on an average of multiple power spectra across the human infant cry sound or rat pup USV will typically provide a lower measure of F0 than that based on the point at which the F0 reaches its highest point (in hertz).
Third, translational methods and comparisons between species are also complicated by use of different measurement techniques and terminologies traditionally employed by different research disciplines. For example, whereas in the human cry literature, Peak F0 refers to where the fundamental frequency reaches its highest point (in hertz), analyses of rat pup USVs typically refer to the Peak F0 as the frequency where the fundamental frequency has the highest power (amplitude). In this case, the Peak F0 of a human infant cry might be described as the Maximum F0 in a rat pup USV and the Peak F0 in a rat pup USV might be described as the Dominant Frequency in the cry of the human infant (although not from an averaged power spectrum; Tutag-Lehr et al., 2007). Accurately measuring the Peak F0 of the human infant cry at its highest amplitude is often precluded by the often-present case of dysphonation. Although dysphonation may appear anywhere in a cry expiration, it often appears where the infant is most aroused – which may be at the point of the highest amplitude in the expiration. Thus, determining the Peak F0 of a human infant cry sound at the point of its highest amplitude may downwardly bias the reported frequency to where the F0 is measurable (Tutag-Lehr et al., 2007).
Fourth, the dynamic and graded nature of infant crying over the duration of a crying bout has implications for the selection of cry segments chosen for analysis. For example, spectral and temporal characteristics of only the first (Zeskind and Lester, 1978), or first three, expiratory cry sounds (Lester et al., 2003) after application of a painful stimulus have mostly been studied because they assess the infant at its highest state of arousal where the cry typically contains more hyperphonation than later cry segments (Zeskind, 1983). Other work has examined the spectral and temporal features of the cry segment that contained the Peak F0 wherever it was in the entire recorded bout of crying (Zeskind et al., 2005). While some work shows how the frequency and other characteristics of infant crying change over time (Zeskind, 1985; Green et al., 1998), this is a markedly understudied area. Similarly, while some have described changes in the rate of USVs over time as a function of eliciting conditions, such as changes in temperature (Sokoloff and Blumberg, 1997), a paucity of known work has examined whether or how these vocalizations may change over the duration of a bout. Analyses are typically conducted on the USVs that occur during a specific amount of time (e.g., first 5min) or number of vocalizations (e.g., first 60; Barron and Gilbertson, 2005). The analysis of whether and how rat pup USVs change over time would offer new insights into the diagnostic and communicative values of this acoustic signal.
The ultrasonic nature of rat pup vocalizations presents methodological challenges not evident in the analysis of human infant cry sounds. The ultrasonic nature of rat pup vocalizations, by definition, makes analyses more difficult due to the sounds not being audible to the human ear. While “bat detectors” translate the ultrasonic sounds of rat pup vocalizations into the audible range of the human auditory system, different results may also be obtained across studies due to the use of different kinds of bat detectors. Whereas the Direct Record type preserves the integrity of the acoustic parameters, Heterodyne and Frequency Division types convert ultrasonic sounds to an audible sound so that the user can hear a representation of the tone. However, it should be noted that heterodyne detectors (Burgdorf et al., 2000) require frequency range restriction and rely on an unknown formula to convert the sound from ultrasonic to audible, which may not allow for accurate representation of temporal and frequency characteristics. While frequency division bat detectors do not require the selection of a frequency range, they distort the sonographic quality of the sound, and thus, while commonly used for such purposes (Burgdorf et al., 2005), perhaps should not be relied upon for analyzing sonographic information. Should study hypotheses simply require an approximate count of the number of vocalizations that will occur, the Med Associates ANL-937-1 and Noldus UltraVox systems provide good solutions. While these and similar systems are sufficient for understanding if and when calls occur, the sampling rate of the Med Associates system may be less optimal for visualizing some of the more complex frequency modulations produced by rodents. Restriction of the frequency range in the Noldus system may result in “clipping” of a sound if the frequency of a tone shifts out of range and it will be ignored.
Methods for recording USVs can also vary dramatically depending on the type of analysis needed for a given application. For analysis of the sonographic structure to be further discussed here, high fidelity recordings are required due to the high frequencies at which USVs can occur (fundamental frequencies often reach 90–100kHz), their varying amplitude range, and their short durations (0.03–0.05s). Thus, the sampling rate and bit value of a recording can dramatically influence the quality of data that result. When considering the representation of a simple sine wave, the horizontal axis is determined by the sampling frequency (hertz), and the vertical axis by the bit value. The Nyquist Sampling Theorem (Nyquist, 2002) dictates that sampling frequencies be a minimum of twice the maximal frequency of the signal of interest. Therefore, considering that vocalizations can occur with fundamental frequencies of approximately 100kHz, a minimum recording rate of 200kS/s would be required. Sampling at higher rates (400–800kHz) will give better resolution of sonographic characteristics at the cost of increased file size. If measures of the first harmonic are also of interest, these values would need to be doubled. A higher bit value will generally allow for a higher amplitude resolution (dynamic range). Typically, a resolution of at least 14-bit should be sufficient for adequate representation of amplitude statistics.
Aside from sampling characteristics, microphone selection can also dramatically influence the quality of recorded data. Two important characteristics of microphones pertaining to the current discussion are their field size and frequency–response curves. A narrower field of recording will generally result in lower background noise levels, and thus a higher signal to noise ratio (SNR), but will result in inaccurate representations of a call if the subject moves out of the field of recording. Wider fields reduce the likelihood of this happening, but carry the burden of reduced SNR. Microphones also typically have frequency–response curves that effectively “tune” the microphone to specific frequency ranges. When selecting a microphone, the frequency–response curve should be carefully examined to ensure that it has a relatively flat response across the entire frequency range of interest. A non-flat response will cause inaccurate representations of amplitude and varying SNR dependent upon the frequency of the sound, and will dramatically confound any amplitude data collected. In addition to microphone characteristics, amplitude data can also be confounded by the signal gain (amount of amplification). To avoid such potential errors, microphones should be calibrated to a sound of known amplitude, and recalibrated routinely to ensure continuity between subjects. Given the demands placed upon hardware to obtain such high fidelity recordings, specialized hardware is required, such as the Avisoft Ultrasound Gate at a minimum, or the assembly of a custom solution, which can result in higher fidelity recording than commercially available systems. Avisoft offers numerous solutions to record at frequencies up to 1ms/s at 16-bit resolution, with multi-channel inputs also available to allow data collection from numerous subjects simultaneously. Avisoft also offers an assortment of microphones with excellent frequency–response curves.
Following the conceptual framework of the approach of this paper, we propose a set of measures that reflect the graded and dynamic qualities of crying that are at once sensitive to the neurobehavioral integrity of the infant and perceptually salient to the social environment. This set of measures borrows heavily from the acoustic and temporal characteristics previously developed in the human infant cry literature, but also includes the categorization of vocalization patterns developed in the comparative literature. The following description of the creation of translational measures of crying in human infants and rat pup USVs is based on pilot work in our laboratories.
Table Table11 provides a list of the measures of infant crying, and their definitions, included for the development of our translational methods. The process by which these measures were obtained is described below.
For our pilot analyses, infant crying was elicited and recorded via a standard method in a quiet room, isolated from extraneous noises. Previous research has used painful stimulation such as rubber band snaps (Zeskind and Lester, 1978) and mechanical stimulators (Lester et al., 2003) to elicit the cry sound. These methods allow for control of the intensity of the eliciting condition, clear onset of the vocalization and measures of threshold (number or amount of stimulation required to elicit a standard, sustained cry) and latency (from stimulus to cry onset) and appear to “bring out” hyperphonation in infants with disrupted neurobehavioral function (Zeskind, 1983). In clinical settings, cry sounds have also been recorded during blood withdrawal from the infant (e.g., Zeskind et al., 2005), but measures of latency and threshold are more difficult to determine using this method. While the use of sudden, painful eliciting methods may still be valuable in translational research, we elicited cries by placing the infant on a cold scale used to weigh the infant. Because variations in temperature have been used to elicit vocalizations in rat pups, this method of may provide a common method by which human infant and rat pup vocalizations can be more directly compared. We maintained the scale at a constant cold temperature and used an audible tone to indicate the precise moment at which the infant was placed on the scale to obtain an accurate measure of latency.
Recording the infant cry sound does not require sophisticated equipment. We used an Olympus DM-20 digital recorder (44.1kHz sampling rate). The microphone associated with this unit was held at a standard distance of 20cm vertically and approximately 3–4′′ horizontally (mid-sternum) from the infant’s mouth. We continued recording for at least 30s from the point of placement on the cold scale to be able to analyze a complete 30-s segment of crying once it began. Noting the lack of infant crying also provides important data, often reflecting poor neurobehavioral regulation. Perhaps it should be emphasized that if the infant does not cry, the intensity of the eliciting stimulus should not be increased to achieve a cry sound. We used the Multi-Speech Lab (MSL, KayPentax) software program to manually analyze the infant cry sounds for their temporal and spectral characteristics. Although manual assessment was highly time-intensive, the program provides macros and other forms by which repeated procedures can be simplified. A standard configuration file was created to preset window types and sizes and down-sample the cry from 44,100 to 22,050Hz, thus allowing for higher resolution measurements (±21Hz) of frequencies up to 11kHz. The MSL software presented a digital spectrographic display of the entire 30-s recording from which cry expiratory sounds were differentiated from non-cry utterances (NCU: fusses, whimpers) and significant non-cry utterances (SNCU: higher amplitude, non-cry sounds) and marked for analysis. Unlike previous analyses of just the first or first three expiratory sounds of infant crying, all expiratory cry sounds in the 30-s sample were subjected to temporal and spectral analysis to examine the dynamic quality of the cry sound.
Latency and all measures of the temporal morphology of the 30-s cry bout were determined with a displayed 6-s window of cry sound to increase the temporal resolution of measurement (±0.005s). The software produced a digital display of the differences in time between adjacent cursor placements. Latency was defined as the duration (in seconds) from the tone indicating placement of the infant on the cold scale until the first cry expiratory sound or SNCU. Expiration duration was defined as the duration of the expiratory sound, excluding breath holding before inspiration. The inter-cry-interval can be defined in two ways. The first definition defines this interval as the duration of time from the end of one expiratory cry sound to the beginning of the next expiratory cry sound. The benefit of this definition is that it provides a measure of the duration of pauses between expiratory periods (along with the very short inspiratory period). A second definition defines the inter-cry-interval as the duration in time from the beginning of one expiratory sound to the beginning of the next. This measure more closely approximates the inter-cry-interval produced by such automated programs as Avisoft-SASLab Pro used in the analyses of rat pup USVs.
To obtain measures of the selected spectral characteristics, a FFT was conducted on the 25-ms sample at which the F0 reached its highest point (in hertz) in each expiratory cry sound (Peak F0) across the entire 30-s sample. Four measures were obtained from the resulting power spectrum: (1) the frequency (hertz) and (2) relative amplitude (decibel) of the fundamental frequency and (3) the frequency (hertz) and (4) relative amplitude (decibel) of the dominant frequency (the peak with the highest power in the power spectrum at the Peak F0). A subjective determination of the amount of dysphonation in each cry expiratory sound was also made. We used a scale of 0–4 that has proved to be a reliable and useful method to evaluate this aspect of the acoustic structure (Tutag-Lehr et al., 2007): 0=none, 1=slight, 2=moderate, may occur at Peak F0, 3=harmonic structure mostly obscured, 4=harmonic structure totally obscured, unable to obtain frequency measures. Maximum amplitude (decibel) was determined from an energy contour of each expiratory cry.
Unlike the more constrained environmental conditions involved in recording infant cry sounds, recording infant rat pup vocalizations can be approached from a wide range of ecologically relevant circumstances, depending on the specific questions asked. As such, we do not present specific recommendations for such details as microphone placement and other specific methodological concerns, other than addressing the challenges posed in the previous section. Based on our pilot analyses, however, we can directly address challenges regarding management, storage, and analysis of very large data files that are created by the high sampling rates necessary for recording and analysis. In cases where the first 60 USVs of older rat pups who frequently vocalize are of interest (e.g., Barron and Gilbertson, 2005), the duration of recording time will be shorter and file sizes will be significantly smaller. Longer durations of recording, perhaps as part of examining experimental manipulations of environmental conditions, will perhaps better capture the dynamic nature of the USVs. In the latter case, dividing the overall recording period into subsets of shorter durations can create smaller file sizes. Software-based triggering systems that continuously monitor data coming from the microphone, and only save data that exceeds a pre-specified amplitude threshold, may also be used. Recording is automatically terminated after a set duration in which no sounds that exceed threshold occur and begins once threshold is again exceeded. Because files do not contain long periods of silence when no vocalizations occur, this method results in smaller file sizes. Smaller file sizes are easier to manage, store, and download. On the other hand, this method produces many more files to manage and individually analyze, thus making it less optimal to analyze automatically long recording periods in one pass.
Table Table22 provides a list of the measures of rat pup USVs and their definitions included for the development of our translational methods. An asterisk by the name of the variable denotes the name of the variable in the software program, Avisoft-SASLab Pro. The process by which these measures were obtained is described below.
For our pilot work, we used a software-based triggering system to capture USV production over an extended period of time. Analysis of the classification and spectral characteristics of the USVs required each file to be subjected to three procedures. First, a sound spectrogram of each file was created to identify and differentiate USVs from incidental sounds. Such noises commonly occurred alongside vocalizations and had to be manually deleted from the file to eliminate their interference in the automated analysis of the spectral characteristics. Second, classification of USV waveforms required creation of another spectrogram of a standard duration and frequency range to maintain the temporal and frequency resolution of the spectrogram. Higher resolutions were necessary to identify the shape of frequency modulations. We down-sampled each file to 250kHz and then viewed and printed it in standard 3-s sections with a frequency range of 200kHz. Among various available software programs to be used for these first two steps, we found Adobe Audition 3 to be preferred for its resolution, clarity, and ease of use. USV waveforms were frequently difficult to classify and could arguably have been placed in more than one category. As such, a consistent set of rules by which waveforms were classified needed to be established. Reliability required extensive training and retesting in our pilot testing.
The third step in these analyses was to conduct automated assessments of the spectral and temporal features of the rat pup USVs. We found Avisoft-SASLab Pro to provide ease of use, a wide range of selectable acoustic characteristics to measure, batch file processing, powerful sound analysis options and good file management. For measurement of fundamental frequencies up to 125kHz, files were down-sampled to 250kHz for improved resolution. As seen in Table Table2,2, in addition to the measures that were comparable to those used in the analysis of infant crying, two measures of the variability of the fundamental frequency and amplitude of each vocalization were also obtained. The variability of these measures of vocalizations may provide important information regarding the stability of neural processes and/or the degree of frequency modulation. To ensure that all measures were obtained from the fundamental frequency, an “eraser-type” cursor was used to remove harmonics from the spectrogram. To the extent that all measures of human infant crying were obtained from the fundamental frequency, removal of the harmonics in the rat pup USVs increased the translational value of the analyses. However, removal of harmonics may also systematically remove acoustic information that may differentiate groups that are being compared. For example, the frequency with the highest amplitude may sometimes occur in the first harmonic. Forcing analyses to obtain information only from the fundamental frequency may bias results in still unknown ways. Figure Figure66 shows a spectrogram created by Avisoft that indicates the points at which the frequency and temporal measures were obtained, along with the listing of those values to be logged.
Based on our pilot work examining the acoustic characteristics of USVs of rat pups at 3 and 5 PND of age, we created four new classifications that were not evident in the waveforms described by Brudzynski et al. (1999). Finding the presence of additional waveforms may be the result of examining the USVs of pups younger than the 10- to 17-day-old rat pups used to develop the previous classification system. As also seen in Figure Figure5,5, three of the categories describe sudden shifts in the fundamental frequency seen across sequential components in a single USV. In our classification system, this shift in the fundamental frequency may or may not have been evident in the harmonics and each component may or may not have resembled one of the original categorizations described by Brudzynski et al. (1999). Our first additional category, Category 10, is defined as a “sudden qualitative upward shift to a higher frequency where the end of the first component is parallel to the start of the second component.” In some cases, the end of the first component is connected visually to the beginning of the second component by a typically lower powered vertical line in the spectrogram. Category 11 is similar, but the shift is descending. Category 12 is defined as a single USV having multiple upward and downward shifts, typically starting with an upward shift in pitch. Category 13 is used as a “catch-all” category of otherwise undefined waveform patterns that show a more organized pattern than evident in Category 9. Figure Figure77 shows a spectrogram that contains examples of rat pup USVs in Category 12. These shifts in the fundamental frequency may be similar to “frequency steps” described by Sales and Smith (1978) in which mouse pup vocalizations show an instantaneous frequency change in a vertically discontinuous step with no interruption in time. However, there are no known reports of these new specific categories, in general, or in the USVs of rat pups, in particular.
The measures of crying described in the preceding sections are designed to (1) reveal the dynamic and graded qualities of infant crying, (2) reflect the neurobehavioral status of human infants and rat pups, and (3) provide systematic quantification of aspects of vocalizations that may be salient to the social environment. Table Table33 provides a list of how the measures of human infant crying and may correspond to the measures of rat pup USVs as a basis for translational analyses. This list is intended only to offer an initial framework through which vocalizations can be compared across species. While there are many similarities across some of the corresponding measures presented in Table Table3,3, similar measures may actually reflect different neurobehavioral processes in the two species. For example, unlike most mammalian vocalizations that are produced by vibration of the laryngeal folds, including those of human infants, the USVs of rat pups are produced by the passage of air under high pressure between constricted vocal folds. The fundamental frequencies of these vocalizations also have different neural and physiological bases and, thus, may reflect different aspects of neurobehavioral function. The translational utility of the fundamental frequency, as well as the other measures, will need to be determined in future study.
The purpose of this paper was to describe the development of translational methods by which human infant cry sounds and the USVs of rat pups can be used to assess early neurobehavioral development. To this end, we have proposed a novel set of similar measures of vocalizations based on current conceptualizations of this critical early behavior in both the human and comparative literatures. This set of measures includes several indices of the frequency and temporal organization of crying, as well as novel contributions to the classification of acoustic waveforms. The discovered shifts in frequency described in our classification system further the narrative of our understanding of the morphology of rat pup vocalizations, the value of which can only be determined in future investigations. Our provision of some of the challenges underlying a translational approach, which compares the sounds and functional significance of the vocalizations of such disparate species, only begins the discussion of the limitations to our work and this approach. Any comparison between species needs to be conducted with extreme caution. Cries of human infants and rat pups are produced by very different physiological mechanism and differ in both form and function.
Given the limitations to our proposed approach, a translational analysis of crying in human infants and rat pups has significant potential benefits to our understanding of how variations in prenatal conditions, including malnutrition, prenatal substance exposure and/or potential teratogens, may impact early neurobehavioral development. The conceptualization of infant crying as a graded and dynamic signal expands our view of the cry from one as a static, unitary sound to one that may contain different information at different points in time. The characteristics of the cry reflect not only the degree of infant arousal at the point of elicitation, but also the changing level of infant arousal in response to variations in both the internal and external environments of infants or pups. The measures of crying selected for this analytic approach were included because of their neurobehavioral and social significance, both of which are critical to understanding the course of subsequent development. An important value of examining infant crying, in this case, is that this behavior contributes to the development of the social environment that guides future development of the infant. As such, different maternal response patterns to the sounds of crying presented here may provide a window into the bidirectional influences between the infant and its environment. In all, the proposals in this paper should be considered a point of departure for future work that may advance our understanding of development, in general, and the contributions of variations in the prenatal environment, in particular.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The project described was supported by Award Number P01DA022446 from the National Institute on Drug Abuse. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Drug Abuse or the National Institutes of Health.” Elizabeth T. Cox was supported in part by NIDA predoctoral training grant (F31 DA030060). Matthew S. McMurray was supported in part by NIDA predoctoral training grant (F31 DA026251).