|Home | About | Journals | Submit | Contact Us | Français|
The understanding of sociolinguistic variation is growing rapidly, but basic gaps still remain. Whether some languages or dialects are spoken faster or slower than others constitutes such a gap. Speech tempo is interconnected with social, physical and psychological markings of speech. This study examines regional variation in articulation rate and its manifestations across speaker age, gender and speaking situations (reading vs. free conversation). The results of an experimental investigation show that articulation rate differs significantly between two regional varieties of American English examined here. A group of Northern speakers (from Wisconsin) spoke significantly faster than a group of Southern speakers (from North Carolina). With regard to age and gender, young adults read faster than older adults in both regions; in free speech, only Northern young adults spoke faster than older adults. Effects of gender were smaller and less consistent; men generally spoke slightly faster than women. As the body of work on the sociophonetics of American English continues to grow in scope and depth, we argue that it is important to include fundamental phonetic information as part of our catalog of regional differences and patterns of change in American English.
Humans vary in how they produce speech, and those differences can depend on a number of factors. For instance, speech tempo can be inherently speaker-specific, pertaining to the inherent speed of articulatory movements which define unique speaker characteristics along with other variables such as voice, use of prosody, or pausing. The within-speaker variation in speech tempo is systematically affected by the length of the utterance, discourse complexity, formality, affect, mood and communication style in noisy environments or over a longer distance, to name a few. Yet, in addition to the complex interaction of the within-speaker factors, there are other powerful sources of variation in speech tempo related to social variables, most notably speaker age, gender, geographic region of origin, place of residence, education, occupation and socio-economic status. The latter group of factors determines the between-speaker variation in speech tempo, which has been a topic of fruitful investigation (e.g., Byrd, 1994; Hewlett & Rendall, 1998; Smith, Wasowicz & Preston, 1987).
Researchers have long recognized that variation in speech tempo is a way of marking individual speaker characteristics. Abercrombie (1967:7–9) and Laver & Trudgill (1991: 237ff.) have proposed “a typology of markers of identity in speech,” including three types:
This typology emphasizes that there are several quite divergent aspects of speaker identity about which speech tempo may convey information. In contrast, numerous of the most familiar sociolinguistic variables — negative concord, “g-dropping,” /ai/ monophthongization, /æ/ raising, quotative like or “uptalk”— are perceived by speakers in terms of relatively focused social meanings, correlating primarily with social status or educational background (negative concord or g-dropping), regional speech (monophthongization or raising), or age (like or uptalk). This complexity alone makes speech tempo an important subject for language variation and change: It is necessarily interconnected with social, physical and psychological markings of speech. However, there is a basic lacuna in our understanding of speech tempo. Even the fundamental patterns of regional variation in how fast people speak have not been securely demonstrated.
This paper presents results of an experimental investigation of speech tempo which focuses primarily on regional variation in American English. In particular, the present experiments examine one pervasive stereotype about American dialects: the notion of “slow-talking Southerners” and “fast-talking Northerners” (Niedzielski & Preston, 1999 www.pbs.org/speak/speech/prejudice/attitudes).1 The examination of Southern and Northern speech is carried out in a controlled setting (reading task) as well as in an uncontrolled condition (free or spontaneous speech). To gain a fuller account of potential differences in speech tempo between the speakers from the South and the North, the study includes two additional between-speaker variables: Age and gender. It is of interest whether the speech of “Southerners” (men and women, younger and older) is indeed slower than the speech of “Northerners” under tow different levels of formality (reading versus speaking). On the other hand, some of the between-speaker effects are predicted to remain unaffected by regional variation, based on previous reports which are discussed below. This includes the effects of age (speech tempo to be faster for younger than older speakers) and gender (speech tempo to be faster for men than for women). We may also expect slower tempo in reading as compared to speaking. How well the present results conform to these predictions is assessed by measuring the articulation rate for each individual speaker in the study.
In this paper we operationally define speech tempo as the articulation rate. In the phonetics literature, the terms “speaking rate,” “speech rate,” or “articulation rate” tended to be used interchangeably to indicate speech tempo, i.e. the pace at which a stretch of connected discourse is delivered by the speaker. The consensus today is that while speaking/speech rate and articulation rate are both defined as “the number of output units per unit of time” (Tsao, Weismer & Iqbal, 2006:1156), speaking/speech rate includes pause intervals while articulation rate does not. Articulation rate determines the pace at which speech segments are actually produced and does not take into account speaker-specific ways of conveying information, such as hesitations, pausing, emotional expressions, and so on. Speech (or speaking) rate, on the other hand, captures more “global” speaker characteristics including frequency of pausing, use of laugh or fillers such as “you know” or “I mean” which, inserted in a stretch of discourse, cause an interruption of fluency and define a speaker-specific communication style. In this paper, measuring the articulation rate rather than speaking/speech rate will give us a better estimate of cross-dialectal differences in speech tempo as it will eliminate the use of pauses as an additional within-speaker variable. In this regard, the paper follows a more recent approach to study the regional variation in speech tempo in terms of measuring the articulation rate rather than speaking rate (e.g., Quené, 2008; Verhoeven, De Pauw & Kloots, 2004).
Speech tempo has been shown to vary not only across individuals but to have roots in regional variants of the same language. Byrd (1994) provides the most compelling evidence on regional differences in speech rate (including the pauses) in American English, drawing on the TIMIT database. She finds a slower speech rate for Southerners and a faster rate for those from the North, broadly speaking, while “Army Brats” (who had lived in three or more areas) spoke the fastest. Southern and South Midland speakers produced the most pauses and North Midland, Western, and “Army Brat” speakers produced less than was expected given a random distribution determined by a chi-square test. As discussed below, however, the TIMIT database has serious limitations which do not always allow for a conclusive assessment of regionally defined differences in speech rate in American English. Ultimately, few studies to date have provided direct evidence that overall speech rate differs regionally.
As an example, Ray & Zahn (1990) investigated speech rate with 93 speakers from Utah, Oregon, and Washington in the Pacific Northwest, Texas and Louisiana in the Southwest, and Wisconsin and Ohio in the Upper Midwest. The speech samples were taken from public speaking and group discussion, both in university settings. Context category (public speaking versus discussion) was the only significant factor in the variation in speaking rate to the exclusion of gender and region. The division of speakers was done entirely by the university they attended (i.e. without evidence on where they actually grew up). Even aside from that, the regions do not align with current views of American dialect areas. For instance, in Labov’s maps (e.g. Labov, Ash & Boberg, 2006:142, 148), Wisconsin typically includes large areas corresponding to North Central/“North” and Inland North, some other dialect maps also show a small patch of Midlands. In contrast, Ohio is mostly Midlands, with a band of the Inland North, along with various transition areas and small areas belonging to other dialect areas, including Southern (especially along the Ohio River).
As pointed out in a number of studies, young adults tend to speak faster than older adults (e.g. Quené, 2008; Verhoeven et al., 2004). This has been also discussed by Smith, Wasowicz & Preston (1987) and treated more recently by Yuan, Cieri & Liberman (2006, with references to earlier work). Seeking an explanation of such patterns, Ramig (1983) cited physiological factors such as “visual acuity, processing time, general neuromuscular slowing, peripheral degeneration of the speech mechanism, and psychosocial variables” (p. 224) as possible reasons why the physical condition of an elderly person affected their speaking rate. The findings by Quené (2008) point to the fact that older speakers produce relatively shorter phrases than younger speakers and the difference in phrase length may explain some of the age-related effects.
With regard to gender, most available evidence suggests that men actually speak somewhat faster than women, as found for example by Byrd (1994). Yuan et al. (2006) report a small but significant difference in the same direction. Outside of North America, Whiteside (1996) examined the characteristics of read speech in three women and three men speakers with a British General Northern accent. Whiteside noted that the small sample limits the conclusions that can be drawn, but results showed that women had longer mean sentence durations, and also had higher standard deviations. They also paused more frequently than the men. Other work on gender and speech rate has been reviewed recently by Heffernan (2007).
"Rate of speaking" in the informal sense is involved in a cluster of other stereotypes, including the view that some languages are spoken faster than others (Roach, 1998) or that urban speech is faster than rural speech. Hewlett & Rendall (1998) examined the claim that urban dwellers speak more quickly than rural dwellers by comparing the speech rates of 24 speakers from the rural Orkney Islands, north of Scotland, and from urban Edinburgh. The speakers read a passage and were interviewed for several minutes about their lives. From these data, both speaking and articulation rates were calculated in syllables per second (counting only turns with ten or more syllables). They found little difference in speaking rate while the speakers were reading, but in conversation mode the Edinburgh group was slower than in their reading rate, although this difference was not significant. Articulation rate also showed no significant differences in reading rate of the two groups but the Orkney group had a significantly faster rate in conversation than the Edinburgh group. Thus, results failed to support the claim that urban speech is faster than rural speech.
Speaking rate and reading rate were directly compared by Crystal & House (1982) who noted that speakers using “less formal production” (p. 706) – that is, informal conversations rather than reading – showed more temporal syllable reductions in their speech, which increased their speaking rates. In another study, Hirose & Kawanami (2002) examined dialogue versus read speech, and focused on the prosodic differences that exist between the two, noting that “dialogue speech generally shows wider dynamic ranges in its prosodic features” (p. 97), such as in tone and rhythm, as well as a higher speech rate than read speech. Howell & Kadi-Hanifi (1991) also examined prosodic differences in reading and speaking. Six speakers spoke spontaneously (describing a room of their choice), and then three months later, after transcribing their response, investigators had those speakers read what they had responded, as well as the response of two other speakers, in order to compare speaking styles. They found that speakers produced a “larger number of short tone units” (p. 166) while reading, indicating more fragmentation and a more formal style of speech. They concluded that this variation in stress placement did not produce significant differences in speaking rate, but did observe that speaking rate “changes with the mode of delivery” (p. 168).
As this brief review indicates, factor such as regional affiliation of the speaker, age, gender and speaking style have been found to produce differences in speech tempo in a systematic way. In this study we are primarily interested in regional differences in articulation rate and their manifestations across speaker age, gender and speaking situations (reading vs. speaking). At present, we only focus on the selected social and physical markers as potential predictors of differences in the articulation rate and do not address the contribution of psychological markers. The questions underlying our investigation center on the strength of a given predictor in making broader generalizations. Specifically, can we legitimately claim that certain regional varieties of American English are spoken faster (or slower) than others? If the differences do exist, are they manifested in both speaking and reading? Do older adults always speak slower than the younger? Do men always speak faster than women?
The present paper has one additional aim. Studies by Crystal & House (1988a, b) and Jacewicz et al. (2006, 2007) indicate that Northern vowels are in fact shorter than Southern vowels. This may suggest that there is a relationship between vowel duration and variation in speech tempo so that Southern vowels reflect the overall slower tempo of the Southern speech whereas the shorter Northern vowels correspond to the faster tempo of Northern speech. However, this correspondence may not hold for all vowels as pointed out by Clopper, Pisoni & de Jong (2005:1665) who found that vowel duration differences are “selective in nature,” with Southern lax vowels significantly longer than those of other Americans. The present study seeks to find a more conclusive account of regional variation in speech tempo which may also be carried on to smaller temporal units such as phrasal or segmental durations.
In the present paper, we report on a study that compares articulation rates of Northern speakers born and raised in south central/southeastern Wisconsin with those of Southern speakers born and raised in westernmost North Carolina. Subjects read sentences aloud and engaged in free conversation, providing two very different samples, and yielding clear differences between dialects. While our primary focus is on regional difference in speech tempo, our sample yields data on age — contrasting young adults (20–34 years old) with older adults (51–65 years old) — and gender — contrasting men’s and women’s speech.
A total of 94 speakers participated in the study. All were born, raised, and spent most of their lives in either South-central Wisconsin or Western North Carolina. The speakers fell into two age groups. There were 40 older adults aged 51–65 years, 20 from Wisconsin and 20 from North Carolina who were evenly divided by gender (10 men and 10 women), in each region. There were also 54 young adults aged 20–34 years. In this group, 36 speakers were the original participants of the study (18 speakers from Wisconsin and 18 from North Carolina, 9 men and 9 women in each region) and 18 additional speakers (6 from Wisconsin and 12 from North Carolina) were recorded at a later time. All Wisconsin speakers came from the Madison area and areas east, well within the parts of the state defined as Labov et al.’s (2006) “Inland North” dialect, the area characterized by the Northern Cities Shift. The North Carolina speakers came from the Sylva, Cullowhee, and Waynesville areas (Jackson, Swain and Haywood counties). Geographically, these participants yielded a highly homogenous sample of varieties of Northern and Southern speech, respectively.
The speakers were also comparable in terms of educational background and socioeconomic status. Most of the young adults were students at either University of Wisconsin-Madison or Western Carolina University. The older adults had mostly college education except for two Wisconsin speakers and six North Carolina speakers, three of whom worked as teacher’s assistants and a librarian. As Table 1 and Table 2 summarize, the occupation of the older adults in this study do not reflect a sharp regional difference in terms of socioeconomic status. To a limited extent, Wisconsin speakers are somewhat more metropolitan than North Carolina speakers. Still, these groups do not reflect a clear urban/rural split, something which has been argued to correlate with speech differences in the South (see Tillery & Bailey, 2008, and Thomas, 2008). Although some of the Wisconsin speakers grew up in Madison (which, according to U.S. Census data reached a population of 203,704 in 2005), many of the younger and older subjects came from small towns such as Beaver Dam, Fond du Lac, Stoughton, West Bend and New Glarus. The small towns in North Carolina where our subjects grew up (Dillsboro, Waynesville, Sylva, Webster, Cullowhee and Whittier) are located close to the Great Smoky Mountains National Park, a popular tourist destination, and these small towns experience a steady flow of visitors during the summer and autumn months. Moreover, the entire region encountered increased in-migration in recent years and the picturesque area continues to attract new residents from other states. Overall then, the subject populations in this study are generally similar in basic demographic terms.
Two types of recorded speech were obtained, read sentences and spontaneous talks. Each speaker read a set of 240 contextually and prosodically constrained sentences which were constructed to elicit variable emphasis in vowel production, which was a focus of a larger project in our lab. In these sentences, main sentence stress was systematically manipulated as in the examples below (see Appendix 1 for a complete set of the recorded sentence material):
The complete sentence material was recorded by 76 speakers (40 older adults and 36 young adults). One advantage of collecting read sentences instead of a longer passage of read discourse was that it led to better control of the stress placement and fluency in reading. That is, the sentence material created a testing condition in which all speakers emphasized a particular word and read the phrase without a pause or hesitation. More within-speaker variability can be expected in reading a longer text which introduces considerable differences in reading style, tempo, repetitions, corrections, hesitations, etc. Since we were interested in articulation rate which is linked to the speed of movement of articulators in a unit of time, fluency in reading was essential to obtaining a representative sample of speech tempo.
The second type of recorded speech consisted of a short informal and unconstrained talk, whose duration ranged from 10 to 15 minutes. Most speakers recounted stories from their lives or spoke about their families, friends, hobbies, and their daily activities. They were instructed to speak at their typical tempo and mode, and that the topic of their talk was not of interest to the research. Rather, they were told that the focus of the study was to examine variation in pronunciation across different regions in the United States. Recordings of a talk were obtained from each of the older speakers. However, not all young speakers who read the sentence material were recorded producing an informal talk. For that reason, 6 new Wisconsin speakers and 12 new North Carolina speakers were brought to the study. These speakers recorded the informal talks only. Altogether, the recordings of the talks were obtained from 40 older speakers and 40 young speakers (20 from Wisconsin and 20 from North Carolina in each age group, evenly divided by gender: 10 men and 10 women) for a total of 80 speakers.
Recording of sentences was controlled by a custom program written in Matlab. The sentence pairs appeared on a computer monitor in random order. The participant read the sentence pair speaking to a head-mounted microphone (Shure SM10A), placed at a 1-inch distance from the lips. The sentences were recorded directly onto a hard drive disc at a sampling rate of 44.1 kHz. Only fluently read sentences with a proper stress placement were accepted by the experimenter. The recordings were repeated as many times as needed to obtain satisfactory productions. The spontaneous talks were recorded in the same session using Adobe Audition speech analysis program. Two female research assistants helped with data collection, one in Wisconsin, and one in North Carolina. All participants in either state were thus recorded by the same experimenter. In general, the spontaneous talks produced by North Carolina speakers were mostly uninterrupted by the experimenter. The speakers clearly enjoyed sharing stories from their lives and mostly did not require prompting. Wisconsin speakers ran out of topics more often and needed a leading question when they stopped talking.
Only 120 sentences from each participant were analyzed for the present study for a total of 9120 sentences. The second sentence in the pair was chosen because initial analyses indicated significant differences in the number of hesitations and pauses between the two sentences. The second sentence in the pair was also produced more fluently than the first by most of the participants. All acoustic waveform analyses were done using the Adobe Audition waveform editing program. The locations of sentence onsets and offsets were determined by hand and these values served as input to a Matlab program which calculated the overall sentence duration automatically, displaying the onset and offset markings in the waveform for the researcher to examine. A reliability check was performed by a second researcher on all measurements using the same Matlab program with graphical display of sentence onsets and offsets. Agreement between these two researchers was essentially 100% as all disagreements in measurements were noted and resolved. The average articulation rate for each sentence was measured in syllables per second, which was calculated by dividing the sentence duration by the total number of syllables. There were seven syllables in each sentence and the syllable count for each sentence was verified by a researcher who performed the reliability check on the whole data set. The word “No” was excluded from the analyses. Because the focus of the study was to obtain fluent productions from each speaker, the pauses between words were very sparse throughout the entire sample. Nevertheless, if found, they were edited out and subtracted from the duration of the sentence.
To calculate articulation rate for spontaneous speech, the talks were first transcribed. Two types of orthographic transcripts were created. In the first transcript, all words and sounds (such as hesitations or laughing) coming from the speaker were written down. The second transcript focused on fluent phrases only and eliminated all non-fluent productions identified in the first transcript. For the present purposes, the phrase was defined as a string of words containing five or more syllables uttered without a pause. These phrases were numbered consecutively for each subject and the articulation rate was calculated for these fluent phrases only. An example of the second transcript is given in Appendix 2. Next, the onset and offset of each phrase was measured using waveform editor (Adobe Audition) and the articulation rate was calculated in the same manner as for read sentences, i.e. by dividing the duration of each phrase by the number of spoken syllables as determined by the experimenter. After listening to each phrase and marking its temporal onsets and offsets, the experimenter counted the number of syllables it contained based on the spoken utterance (and not on its orthographic notation). A total of 4930 phrases were analyzed from all speakers and the number of syllables per phrase varied (mean = 12.7 syllables per phrase, s.d. = 6.9). As for the read sentences, a reliability check was performed by a second researcher using a dedicated custom Matlab program.
The overall mean articulation rate in read sentences was 3.40 syll/s (s.d.=0.42) whereas the rate in spontaneous talks was 5.12 syll/s (s.d.=0.59). The large difference of 66.4% between the read and spoken productions indicated that, in general, articulation rate in reading is much slower than in a free speech. Of interest to the study, however, was whether there were significant differences within each production type as a function of speaker dialect, age, and gender. The results were initially assessed by two separate univariate analyses of variance (ANOVA). In the first ANOVA, the dependent variable was articulation rate in reading and the between-subject factors were dialect, age, and gender. In the second ANOVA, the dependent variable was articulation rate in talks and the between-subject factors remained the same. Additional analyses, if necessary, were conducted as detailed below.2
In reading, the overall means were 3.54 syll/s (s.d.=0.34) for Wisconsin speakers and 3.27 syll/s (s.d.=0.44) for North Carolina. The main effect of dialect was significant [F(1, 68) = 12.7, p = 0.001, η2 = 0.157], indicating that Wisconsin speakers demonstrated a faster articulation rate in reading as compared to North Carolina speakers. Using the same reading material and the same measurement criteria, the Wisconsin speakers read at a rate 8% faster than North Carolina speakers. For spontaneous talks, the second ANOVA also revealed a significant main effect of dialect [F(1, 72) = 28.8, p < 0.001, η2 = 0.286]. The overall mean articulation rates were 5.41 syll/s (s.d.=0.48) for Wisconsin speakers and 4.81 syll/s (s.d.=0.54) for North Carolina. Thus the articulation rate for Wisconsin speakers was 12.5% faster than for North Carolina speakers. These findings clearly show that the articulation rate, whether in reading or speaking, is faster for the regional variety of American English spoken in Wisconsin as compared to North Carolina.
Figure 1 shows the mean articulation rate for young and older adults in Wisconsin and North Carolina for the two types of productions. As can be seen, young adults tend to speak faster than older adults in both reading and spontaneous talks in Wisconsin. However, North Carolina young adults tend to speak faster only in reading but not in spontaneous talks.
The ANOVA results for read sentences showed a significant main effect of age [F(1, 68) = 20.9, p < 0.001, η2 = 0.235]. Young adults’ articulation rate in reading was 11% faster than that of older adults (3.58 syll/s (s.d.=0.44) vs. 3.23 syll/s (s.d.=0.32)). For spontaneous talks, however, the main effect of age was not significant, indicating no differences in articulation rate between the young and older adults (the overall means were 5.18 syll/s (s.d.=0.58) for young adults and 5.04 syll/s (s.d.=0.60) for the older). Since the results for Wisconsin suggested some differences between the two groups (see Figure 1), we conducted additional two-way ANOVAs with the between-factors age and gender separately for Wisconsin and North Carolina talks. The results for Wisconsin showed a significant main effect of age [F(1, 36) = 5.19, p = 0.029, η2 = 0.126] although the effect size was small. Young Wisconsin adults were shown to speak 6% faster than older adults (5.58 syll/s (s.d.=0.41) vs. 5.25 syll/s (s.d.=0.50)). For North Carolina, the main effect of age was not significant.
Altogether, the results show that the articulation rate in reading is faster for young adults as compared to older adults in both Wisconsin and North Carolina. However, in free speech, the results are less consistent. Young adults tend to speak faster than older adults in Wisconsin but not in North Carolina, where young and older adults do not show differences in the articulation rate (4.79 syll/s (s.d.=0.44) and 4.83 syll/s (s.d.=0.64), respectively).
As Figure 2 shows, the differences in articulation rate as a function of speaker gender were very small. As a general tendency, men tended to speak slightly faster than women both in reading (3.48 syll/s (s.d.=0.43) vs. 3.33 syll/s (s.d.=0.40)) and in spontaneous talks (5.2 syll/s (s.d.=0.57) vs. 5.03 syll/s (s.d.=0.61)). For read sentences, the articulation rate for men was 4.5% faster than for women. Although the ANOVA results showed a significant effect of gender [F(1, 68) = 4.06, p = 0.048, η2 = 0.056], its small effect size indicates that gender contributed very little to the variance accounted for. A near-significant interaction between gender and age [F(1, 68) = 3.85, p = 0.054, η2 = 0.054] pointed to the difference between the rates of young men and young women (3.74 syll/s (s.d.=0.42) vs. 3.43 syll/s (s.d.=0.41)). However, although young men tended to read faster than young women, the size of this effect was again very small. For spontaneous talks, the effect of gender was not significant nor were there any significant interactions between gender and any other factor.
We also conducted separate two-way ANOVAs for each dialect with the between-subject factors gender and age. The main effect of gender on either articulation rate (reading or talks) was again not significant for either Wisconsin or North Carolina. However, there was one significant age by gender interaction for read sentences for North Carolina although its effect size was small [F(1, 34) = 4.8, p = 0.035, η2 = 0.124]. This interaction indicated greater differences in the articulation rate between young men and young women as compared to older men and older women. A subsequent two-tailed independent samples t-test showed that the difference for young adults was significant [t=2.7, p = 0.015], showing that young men in North Carolina read 17% faster than young women (3.67 syll/s (s.d.=0.48) vs. 3.14 syll/s (s.d.=0.34)). The difference in speech rate between older men and older women was not significant (3.11 syll/s (s.d.=0.27) and 3.15 syll/s (s.d.=0.47), respectively).
Overall, the effects of speaker gender on articulation rate were small and were found mainly in reading where the articulation rate for men was slightly faster than for women, with the exception of North Carolina young men as compared to North Carolina young women. However, no differences as a function of speaker gender were found in the spontaneous speech where all speakers, whether young or older, spoke equally fast.
The choice of the univariate ANOVAs in the present study was motivated by the fact that not all speakers for the young adults group participated in both tasks, i.e., reading and spontaneous talks. Thus, the results for the articulation rate in read sentences and talks could not be compared directly. However, the older speakers did participate in both tasks. To address the question of whether there are in fact significant differences between the articulation rate in reading and in free speech a within-subject ANOVA was conducted for the older adults only. In this analysis, the speaking condition (reading and talk) was the dependent variable and dialect and gender were the between-subject factors. The results showed a strong significant effect of speaking condition [F(1, 36) = 495.26, p < 0.001, η2 = 0.932]. For the older adults, the articulation rate in reading was significantly slower than that in spontaneous talks (3.23 syll/s (s.d.=0.32) vs. 5.04 syll/s (s.d.=0.60)). The effect of dialect was also significant [F(1, 36) = 6.69, p = 0.014, η2 = 0.157] indicating that Wisconsin speakers demonstrate a faster articulation rate than North Carolina speakers (4.29 syll/s (s.d.=0.44) vs. 3.98 syll/s (s.d.=0.41)). Across the present speaking conditions, Wisconsin speakers spoke about 8% faster. The effect of gender was not significant. There were also no significant interactions.
The present results provide a broad set of data to allow detailed comparison of articulation rate in two regional varieties of American English, one Northern and the other Southern. The study provides strong evidence that, in both reading and spontaneous speech, the articulation rate of the Northern speakers was higher than that of the Southern speakers. The difference was as much as 8% in reading and 12.5% in spontaneous talks. Considering other factors such as age and gender, it was found that young adults read faster than older adults in both the North and the South. However, in free speech, young adults tended to speak faster than older adults in the Northern dialect but not in the Southern dialect where the articulation rates of both young and older adults did not differ. The effects of gender were smaller and less consistent. In general, men tended to speak a little faster than women but the differences were negligible. There was one exception to this trend among young adults in North Carolina. In reading, the articulation rate of young Southern men was significantly higher than of young Southern women.
Comparing the present results with those in previously published studies, we detect numerous important similarities as well as a few differences. In particular, Byrd (1994) reported notable differences in speech rate in the TIMIT database across eight broadly defined dialect regions in the United States. The corpus included read sentences only and pauses were included in the calculation of speech rate. The speakers were mostly young and mostly male. It was found that the speech rate among Southern speakers tended to be slower than among the Northern speakers. In terms of speaker gender, men spoke on average 6% faster than women (4.69 syll/s vs. 4.42 syll/s). It must be underscored that these general trends in the TIMIT database need to be interpreted with caution, given the unbalanced design in terms of speaker gender (69.5% men and 30.5% women) and broad regional distribution of speakers classified as Northern and Southern (some shortcomings of the TIMIT database are discussed in Keating, Byrd, Flemming & Todaka (1994)). Nonetheless, trends in that study and the present study are similar in terms of finding that speakers from the North spoke faster, on average, than speakers from the South. Although the effects of speaker gender were more variable in our study, the articulation rate for men was also slightly faster than for women.
The tendency for men to speak faster than women was also found in Whiteside (1996) for British Northern English. Results showed a higher articulation rate for men than for women (4.10 vs. 3.38 syll/s). The much higher rate of 21% for men may be attributed to a small sample size and perhaps to differences in the ages of the speakers, which were not reported in that study. Our results from a much larger subject pool and an extensive corpus of data do not support this finding. However, there was one exception in our data which corresponds to the finding in Whiteside (1996). In particular, the articulation rate in reading for young North Carolina men was 17% higher as compared to young North Carolina women, approximating the rate reported in Whiteside (1996). This suggests that the tendency for men to speak faster may be more variable across regional varieties and in some regions the articulation rate for men may be much higher than for women.
Turning to the articulation rate in spontaneous speech, Verhoeven et al. (2004) address the effects of dialect, gender, and age on the articulation rate (excluding silent pauses) and speaking rate (including the pauses) in free conversations by 160 Dutch-speaking teachers in the Netherlands and Belgium. The variables in that study were comparable with those in the present study although the number of speakers per region in the former was smaller (10 men and 10 women). Assessing the articulation rate, the study found significant effects of country (the Netherlands and Belgium), region, age, and gender. Men spoke 6% faster than women (4.79 vs. 4.50 syll/s) and young adults (aged below 40 years) spoke 5.4% faster than older adults (aged over 45 years). On average, articulation rate in the Netherlands was 16.2% faster than in Belgium (5.05 vs. 4.23 syll/s). Interestingly, there were no statistical differences in articulation rate between the regional varieties in Belgium. In the Netherlands, there was only one region which differed significantly from the remaining three. The significant effect of region disappeared in the assessment of the speaking rate, however, whereas men remained speaking faster than women (5.2%) and younger adults spoke also 5.2% faster than older adults.
Comparing our present results with those in Verhoeven et al. (2004) for the national standard varieties of Dutch in the Netherlands and Belgium, we found a much stronger effect of region which was manifested in both read sentences and spontaneous talks of Wisconsin and North Carolina speakers. Our results for gender are generally consistent, although men in our study spoke only 3.3% faster than women and the main effect of gender was not significant. In terms of age, we found a consistency only in Wisconsin where the present young adults spoke about 6% faster than older adults in spontaneous talks. (Recall that there were no differences between the two age groups in North Carolina.)
Verhoeven et al. (2004) contrasts with work by Byrd and others just discussed in regard to areal/dialect distribution: previous studies have typically taken speakers from across broadly defined regions (Clopper et al. 2005; Byrd 1994), while Verhoeven et al. provide a picture of regional variation in two national standards and a comparison over those two broader groups. Our present study differs from both approaches in having large samples from two very different but dialectally very coherent areas, establishing systematic differences in articulation rate between one Southern and one Upper Midwestern variety of American English.
The question arises whether our reported variation in the articulation rate is of any perceptual relevance or whether the differences were too small to be detected by an ordinary listener? The results from Quené (2007) may shed some light on this issue. In his study, listeners detected a change in the articulation rate when the difference was about 5%. This Just Noticeable Difference (JND) indicates that a 5% or more difference of rate in speech fragments could be perceived by the listener as faster (or slower). Most of the articulation rate differences reported here are clearly well above Quené’s reported JND of 5%. For most of our measures, the differences in articulation rate ranged from 6% to 17%. Except for the effects of speaker gender, the perceptual cues in our data should be available to listeners in creating impressions of how fast others speak.
For spontaneous talks, one issue has not been controlled for in the present study. As indicated in Quené’s recent statistical model (2008), speech tempo is strongly influenced by the length of the phrase. Because longer phrases contain more syllables than shorter phrases, they tend to be spoken faster which shortens syllable durations and increases the articulation rate. In that study, phrase length was found to vary with the speaker’s age: Older speakers produced shorter phrases compared to the younger speakers and they also tended to vary the length of their phrases more often. If so, speech tempo may be only weakly affected by speaker’s age if differences in phrase length are accounted for. This implies that the differences in the articulation rate between the Northern and Southern speakers (or differences due to speaker age and gender) may be significant but they will diminish if phrase length is included in assessing the results.
Although we did not consider phrase length in our analyses of spontaneous talks, the results for read sentences can be interpreted in the light of Quené’s findings. Those sentences contained a fixed number of syllables per phrase. Since each speaker uttered seven syllables in a phrase, it was clearly the differences in the phrase duration that affected the articulation rate. The present Wisconsin speakers read faster than the North Carolina speakers (the mean phrase durations were 2.01s and 2.20s, respectively), young adults read faster than older adults (2.0s vs. 2.20s) and men read a little faster than women (2.06s vs. 2.15s). Given the fixed number of syllables, longer phrase indicates slower articulation rate and shorter phrase indicates faster rate. The present results for reading thus clearly show that articulation rate varies as a function of speaker dialect, age and, to some extent, gender.
Can we assume that Wisconsin speakers as well as young adults and men are simply better readers than North Carolina speakers, older adults and women? Given the large sample and significant statistical effects, we have to exclude that these results came about by chance. The reading data show a systematic effect of between-subject factors. Because all reading productions were fluent, it is unclear what would account for the notion of a “better reader” (or perhaps a “faster reader”) if no pauses or hesitations were present in a prosodically constrained read phrase. Although we can only speculate at present in the absence of articulatory data, these effects could be due to differences in the articulatory paths for older and younger adults as well as for women and men, the former exhibiting more careful productions than the latter. However, the dialectal differences cannot be easily explained by the same reasoning. It might be the case that some other dialect-specific segmental properties play a role here. So far, we have found significant effects of vowel duration between the Northern and Southern speech (Jacewicz et al., 2007) and significant differences in stop closure durations, Wisconsin closures being longer than North Carolina closures (Jacewicz et al., 2008). However, these findings are far from making conclusive statements about the dialect-specific features with regard to articulation rate. More work is needed to explain which features contribute mostly to the perception of the Southern speech being slower than the Northern.
A reanalysis of the present data for spontaneous talks would be necessary to determine whether phrase length is a predictor of differences in articulation rate as a function of speaker dialect, age and gender. As a general rule, do Wisconsin speakers produce longer phrases which shorten syllable durations as compared to North Carolina speakers or does the dialectal difference originates in dialect-specific temporal differences between speech segments? Since the present results for reading and spontaneous talks do not overlap for the effects of age (see the differences in free speech for young and older adults in Wisconsin and North Carolina), we may infer that these differences are due to the lack of inclusion of phrase length as a predictor in the analysis of spontaneous talks. However, more work is necessary to prove the validity of this interpretation.
In conclusion, this study yielded several robust findings with regard to the articulation rate in read and spontaneous speech in the productions of mostly the same participants. First, the articulation rate of Wisconsin speakers was distinctly faster than that of North Carolina speakers. Second, young adults in Wisconsin spoke and read faster than older Wisconsinites. For North Carolina, articulation rates in reading followed the same pattern. However, in free speech, no significant differences due to speaker age were found.3 Finally, while the effects of gender were present, they were weak. It can only be suggested that men may speak faster than women under some circumstances, namely reading. This limited finding is consistent with most previous research.
As the body of work on the sociophonetics of American English continues to grow in scope and depth, we argue that it is important to include fundamental phonetic information, like these results on articulation rate, as part of our catalog of regional differences and patterns of change in American English.
This work was supported by research grant NIH/NIDCD R01 DC006871. We thank the following for help with data collection, transcription, and acoustic measurements: Mahnaz Ahmadi, Jason Fox, Janaye Houghton, Samantha Lyle, Leigh Smitley, Dilara Tepeli, and Lisa Wackler. We also thank Eric Raimy for discussion on the topic, as well as the audience at the American Dialect Society, Chicago, January 2008, where an earlier version of this paper was presented. Comments of three anonymous reviewers are greatly appreciated.
The following sets of sentences were recorded by each speaker. All 2-set sentences were randomly presented to the subject in two stimulus lists. The sentences in which the main stressed falls on the first and second word position, respectively, served as distracters and were not included in the final analyses.
(The nonsense word bade was explained to the speaker as indicating “a brand of knife, a brand name.”)
(The speaker was told that bad refers to “an error or mistake.” For example, if someone makes an error, he or she might say “my bad” instead of “my mistake.”).
(The nonsense word bide was explained to the speaker as indicating “a small animal, a type of dog.”)
Example of a transcript used in calculating the articulation rate in spontaneous talks. The fluent phrases were numbered consecutively and the articulation rate was calculated for these phrases only. All hesitations, pauses and fillers (marked here in italics) were excluded from analyses.
Older North Carolina female speaker:
52) and to tell a funny story about, uh, my speech, we have a, 53) a mountain pasture where we keep our cows, our cattle in the uh, spring and summer, and this, uh, 54) man from Florida came up and bought the place next to it, 55) and so, we were up there checking on the cattle one day and he, 56) we stopped and started talking to him and, 57) he got acquainted with us and uh, 58) he was just delighted with my speech and I didn’t think there was anything wrong with it but, 59) I did not get the feeling that he was making fun of me and, 60) he kept saying that the next time they came up from Florida that he was going to, um, have, 61) bring his wife and let her hear me talk,
1This popular stereotype is confirmed by media attention. Coverage of the 2008 U.S. presidential campaign reinforced this stereotype. Simple Google searches (December 22, 2007) for ‘fast-talking’ plus the names of two Northern candidates (Rudy Giuliani of New York, Mitt Romney of Massachusetts) yielded a total of 9,260 hits and similar searches with ‘slow-talking’ plus two Southern candidates (Mike Huckabee of Arkansas, Fred Thompson of Tennessee) yielded 3,370.
2These analyses were carried out on articulation rate means for individual speakers in either the read sentence or the spontaneous talk conditions. We did not address potential within-speaker variation in articulation rate.
3Note that possible ‘age-grading’ of speech or articulation rate could raise questions about the apparent time construct.
Ewa Jacewicz, Department of Speech and Hearing Science, The Ohio State University.
Robert A. Fox, Department of Speech and Hearing Science, The Ohio State University.
Caitlin O’Neill, Department of Speech and Hearing Science, The Ohio State University.
Joseph Salmons, Department of German, University of Wisconsin-Madison.