|Home | About | Journals | Submit | Contact Us | Français|
Russian maintains a contrast between non-palatalized and palatalized trills that has been lost in most Slavic languages. This research investigates the phonetic expression of this contrast in an attempt to understand how the contrast is maintained. One hypothesis is that the contrast is stabilized through resistance to coarticulation between the trill and surrounding vowels and prosodic positional weakening effects—factors expected to weaken the contrast. In order to test this hypothesis, we investigate intrasegmental and intersegmental coarticulation and the effect of domain boundaries on Russian trills. Since trills are highly demanding articulatorily and aerodynamically, and since Russian trills are in contrast, there is an expectation that they will be highly resistant to coarticulation and to prosodic influence. This study shows, however, that phonetic variability due to domain boundaries and coarticulation is systematically present in Russian trills. Implications of the relation between prosodic position and lingual coarticulation for the Degree of Articulatory Constraint (DAC) model, Articulatory Phonology, and the literature on prosodic strength are discussed. Based on the quantitative analysis of phonetic variability in Russian trills, we conjecture a hypothesis on why the contrast in trills is maintained in Russian, but lost in other Slavic languages. Specifically, phonological strategies used by several Slavic languages to deal with the instability of Proto-Slavic palatalized trills are present phonetically in Russian. These phonetic tendencies structure the variability of Russian trills, and could be the source of contrast stabilization.
Two of the main constraints on the phonetic variability of segment realization are segmental contrast and degree of coarticulation resistance. The first constraint is phonological, and depends on the structure and density of the segmental inventory in a language, while the second constraint is based on physical and motor constraints on articulation. Specifically, dense segmental systems limit the amount of phonetic variability in segment realization (Manuel, 1990, 1999), and segments with a high degree of physical-motor constraint on their production (high resistance) exhibit less segmental variability (Recasens and Espinosa, 2009). We focus on Russian trills, which are part of a dense coronal system, with a primary and a secondary contrast, and are expected to have a high degree of coarticulation resistance, based on research on trills in other languages (Recasens and Pallarès, 1999; Recasens, 1991, 2006; Solé, 2002a). Due to the density of the contrastive system and the high degree of coarticulation resistance, these trills are expected to exhibit a small amount of phonetic variability, if the contrasts involved are not neutralized. The goal of this work is to investigate how two well-known sources of phonetic variability, coarticulation and prosodic boundary strengthening/weakening effects, interact with the two constraints on phonetic variability mentioned above. There is an extensive literature on both of these sources of phonetic variability and how various segments exhibit variability (Edwards et al., 1991; Shattuck-Hufnagel and Turk, 1996; Hardcastle and Hewlett, 1999; Keating, 2006), however, there is little work on the interaction between coarticulation and prosody, as sources of variability, and their relation to both contrast and coarticulation resistance as limits on variability. Since the Russian segments investigated here are constrained simultaneously by linguistic contrast and high coarticulation resistance, they allow us to investigate the lower limit of variability due to coarticulation and prosodic boundaries. If the contrasts involved are not neutralized, can this variability be eliminated or minimized to an insignificant level due to the high physical-motor and linguistic constraints? This question assumes that the sources of variability and the limits on variability are in a zero-sum game, which can be won either by variability or by contrast. Is it possible, on the other hand, that the interaction between the sources and limits on variability is not zero-sum, but that marked variability and contrast can coexist? Investigation of the interaction between different sources and limits of phonetic variability allows us to begin to address these questions, which are important for a deeper understanding of the limits and structure of phonetic variability, the interaction between phonetics and phonology, and the interaction between physical-motor and linguistic constraints in phonetics.
The Russian trills /r/ and /rj/ are specified for apical and tongue dorsum gestures. As trills they are in contrast with other coronal segments that have a different manner of articulation, and additionally, there is a secondary place of articulation contrast between palatalized and non-palatalized trills. As trills, they require complex conditioning of the tongue tissue to adjust tension and inertia of the part of the tongue that vibrates and the pressure and flow conditions required for the tissue-flow coupling that is necessary for vibration (McGowan, 1989; Solé, 2002a,b; Barry, 1997). Due to the involvement of Russian trills in a primary and secondary contrast, and to the articulatory and aerodynamic complexity of trills, two independent factors, it may be expected that these trills should exhibit little phonetic variability, since such variability could lead to phonetic similarity between the trills and coronal segments they contrast with. If the contrast is not neutralized contiguous to particular vowels or in specific positions, Russian trills should therefore be highly resistant to factors that would weaken the contrast, such as coarticulation between the trill and contiguous vowels and prosodic weakening due to domain boundaries. The expected resistance of trills to coarticulatory influence has indeed been documented in other languages. In comparing Catalan trills and taps, Recasens and Pallarès (1999) found that trills are much more resistant to coarticulation than taps, and Recasens (2006) and Solé (2002a) found that even fricatives, which are usually highly resistant to coarticulation, will assimilate to a contiguous trill. The force of system-based contrasts in limiting phonetic variability has been documented previously for vowels and consonants (Manuel, 1990, 1999). Indeed Öhman’s (1966) landmark study on coarticulation compared VCV coarticulation in American English, Swedish, and Russian and found that that coarticulatory patterns were different in Russian than in the other two languages. Öhman attributed this difference to the palatalization contrast. Even though this work is on palatalization blocking V-to-V coarticulation, not on the variability of the palatal gesture itself, this very blocking makes it likely that the palatal gesture is not affected by the surrounding vowels, since they are not affecting each other. Choi and Keating (1991) showed statistically significant but very small and “most likely imperceptible” effects of vowel-to-vowel coarticulation in Russian and Bulgarian (less so in Polish) in comparison to English, supporting Öhman’s results. In addition, Barry (1991) and Zsiga (2002) compared heterosyllabic overlap in clusters in English and Russian and both found a great deal more overlap in English than in Russian, while Kochetov et al. (2007) compared Russian to Korean and found more overlap in Korean consonant clusters than in Russian. Even earlier, Trubetzkoy (1939/1969) related the density and complexity of a contrastive system to the extent of phonetic variability for trills. He compared German trills, which are in contrast only with the laterals, to trills of Czech (Slavic) and Gilyak (PaleoSiberian), languages that contrast trills with several more segments, and noted that the phonetic reflexes of the trill in German are more varied than in the other languages (Dresher, 2009). Indeed, empirical work on trills in some dialects of Spanish, Catalan, Arop-Lokep (Austronesian), and other languages without the secondary place contrast (Lindau, 1985; Raymond and Parker, 2005; Diaz-Campos, 2008; Recasens and Espinosa, 2007) have shown a relatively high level of variability. It is possible that different parts of the tongue have variable resistance to coarticulation aggression from surrounding segments, or that the presence of the secondary articulation is the main controlling factor. The goal of the present study is to investigate the contrast between /r/ and /rj/ in various vocalic and boundary contexts to examine how the two trills are systematically differentiated phonetically under these sources of variability.
One aim of the quantification of the interaction between the sources and limits on variability in Russian trills is to develop an understanding of why palatalization is often lost. Contrastive palatalization is pervasive in Slavic languages and in some other languages throughout the world. However, languages that use this contrast often lose it for trills (Hock, 1986; Hall, 2000; Kochetov, 2002; Zygis, 2004). This has been attributed to a physiological difficulty in articulating palatalized trills: coronal trills require the tongue dorsum to be low and somewhat back, while palatalization requires it to be high and front (Recasens and Pallarès, 1999; Hall, 2000; Kochetov, 2002; Zygis, 2004; Kavitskaya et al., 2009). The Proto-Slavic palatalized trill is unstable enough to have disappeared in most modern Slavic languages, through de-palatalization (Serbian), fricativization (Polish), or sequencing of the coronal and palatal portions (Slovenian). But Russian and a few other Slavic languages maintain this contrast. The phonetic study to be presented in this work shows that the pattern of contrast-structured phonetic variability in Russian trill is similar to the phonological variability in the reflexes of palatalized trills in Slavic languages. This allows us to propose the following conjecture: the contrast is maintained through several simultaneous phonetic tendencies that mimic the phonological strategies. That is, it is possible that the stability of the contrast in Russian is due to a very particular structuring of phonetic variability influenced by both linguistic and physical-motor limits on such variability.
The contrastive system of coronals is presented in Table 1a. Most consonants in Russian come in pairs with respect to the contrastive palatalization of the consonant, e.g., non-palatalized [t], as in [tomnɨj] ‘languid’, vs. palatalized [tj], as in [tjomnɨj] ‘dark’ (Timberlake, 2004). Exceptions include the glide [j], the voiceless affricates [ts] and [tʃ], and the fricative [ʒ]. Palatalization in Russian is contrastive in most environments.1 It is generally accepted that Russian non-palatalized consonants show some amount of velarization (Halle, 1959; Trubetzkoy, 1969; Öhman, 1966; Reformaskiy, 1967, Padgett, 2001; Kochetov, 2002; among many others). A magnetic resonance imaging investigation of palatalized vs. non-palatalized stops and spirants shows considerable backing of the tongue in the non-palatalized segments (Kedrova, 2008).
Slavic languages provide a rich test case for the study of the depalatalization of the trill. The palatalization of the Proto-Slavic trilled /rj/ is affected to different degrees in almost all Slavic languages. Table 1b shows the reflexes of the plain trill and the palatalized trill in modern Slavic (see also Kavitskaya 1997).
While /r/ is preserved in all Slavic languages, /rj/ is retained in only a few of them. Table 1b demonstrates that the palatalized trill is completely lost in Belarusian (East Slavic), Polish, Czech, and Slovak (West Slavic), and Serbian, Croatian and Macedonian (South Slavic), partially lost in Ukrainian (East Slavic), Upper Sorbian (West Slavic), and Bulgarian (South Slavic), and fully preserved in Russian (East Slavic) and Lower Sorbian (West Slavic). Given that the depalatalization of the trill occurred in all subgroups of Slavic to different extents and did not affect all Slavic languages, we can hypothesize that it is not a proto-Slavic sound change, but that it happened independently in different Slavic languages. We identify three strategies for the resolution of the conflict between trilling and palatalization in Slavic: De-palatalization of the trill (e.g. Serbian), fricativization (e.g. Polish), and sequential realization of coronal and palatal gestures (e.g. Slovenian).
The goal of this study is to investigate how coarticulation and prosodic boundaries affect the phonetic realization of the /r/-/rj/ contrast in Russian. In this subsection we discuss these sources of variability, and in the next subsection, we pose specific hypotheses that will be tested through the study. Regarding prosodic effects, we investigate whether utterance boundaries induce lengthening and strengthening effects seen in other languages (e.g. Gaitenby, 1967; Oller, 1973; Lehiste, 1964; Wightman, et al. 2001; Keating et al., 2003; Cho and Keating, 1991; Fujimura, 2000; Byrd and Saltzman, 2003; Keating, 2004; Byrd, et al., 2005). These studies have shown that consonants near a domain boundary are lengthened, and there is evidence also of consonantal strengthening near boundaries. The association between lengthening and strengthening is predicted by several theories (Lindblom, 1963; Fujimura, 2000; Byrd and Saltzman, 2003): lengthening allows greater time to achieve the target, and hence leads to segments that meet their targets (stronger). For trills, strengthening would probably consist of having more vibrations per trill (Lavoie, 2001), and weakening would lead to a single vibration (tap) or approximation.
We also investigate consequences of gestural timing effects that are predicted to happen in onset vs. coda positions (Browman and Goldstein, 1988; Krakow, 1999; Sproat and Fujimura, 1993; Gick et al. 2006; Goldstein et al. 2007). Specifically, Sproat and Fujimura predict that the more open gesture of a complex segment would be attracted towards the vowel. In this case, the palatal gesture, i.e., the secondary gesture, would be predicted to be closer to the vowel than the primary, the more constricted, apical gesture. Browman and Goldstein’s theory would predict that if Russian behaves like English, onset gestures would be in 0-phase with the vowel, i.e., are synchronous with the vowel (termed the C-Center pattern), whereas coda gestures would be sequential with respect to the vowel. However, the theory also posits that onset gestures are in 180-phase relation, so there should be sequentiality within the onset. Kochetov (2006) examined the inter-gestural timing of the two coordinated oral gestures, Lips (responsible for labialization) and Tongue Body (responsible for palatalization) in the palatalized voiceless stop /pj/ in different syllable positions, and compared it to the timing of the same gestures in the combination of consonants /p/ and /j/. Kochetov (2006) found differences between different consonants in the onset and coda. With respect to timing, he showed that the gestures in the palatalized stop are more sequential in the onset position and more synchronous in the coda position. Specifically, he showed that the lag for the onset /pj/ is always positive, meaning that the Tongue Body gesture is always delayed with respect to the Lips, but in the coda /pj/ the lag is either positive or negative. This finding is in contrast to many studies of the C-Center phenomenon that show consonantal gestures in the onset to be less sequential than those in the coda (Goldstein et al., 2007), which suggests that perhaps Russian does not show the C-Center effect. In this work, we investigate the timing of the apical and palatal gestures and its possible variability due to context.
Moreover, we investigate whether predictions of Recasens’s Degree of Articulatory Constraint (DAC) model regarding the possible blending of primary and secondary gestures within the trill and the coarticulatory influence from contiguous vowels hold for Russian trills (Recasens et al. 1997; Recasens and Espinosa, 2009). DAC holds that segments with a highly constrained dorsum are subject to less effect from their context but induce greater effects on their context (high coarticulation resistance and high DAC value), while segments with less constraint on the dorsum experience more effects from their context and induce lesser effect (low coarticulation resistance and low DAC value). This model is based on a comparison of the amount of coarticulation that different Catalan consonants induce and undergo, but has been used to account for coarticulation in other languages (Geng and Mooshammer, 2004). In later research on Catalan taps and trills in intervocalic context using electropalatographic and acoustic techniques, Recasens and Pallarès (1999) showed that taps have low coarticulation resistance, while trills have high resistance. Several works by Recasens show that among vowels, /i/, with a palatal gesture, is the strongest vocalic gesture (e.g., Recasens and Espinosa, 2009), and that trills have a highly constrained tongue body gesture (Recasens and Pallarès, 1999) and a high DAC value. However, since the palatal gesture is more open (more vocalic), it still has a lower DAC than a consonantal gesture (Recasens and Espinosa, 2009). Recasens and Pallarès (2001) have shown in the particular case of Catalan that front alveolars and alveopalatals can assimilate to coronals with a retracted tongue body, including the trill, but not vice versa. There is therefore a prediction that the palatal gesture in /rj/, which is weaker and less constrained, could be weakened due to the high resistance of the apical trill aspect of the segment, neutralizing the contrast. However, an investigation of X-ray images of Russian palatalized coronals, excluding trills, by Keating (1993), has shown that for /t/, but not /s/, there is a retraction effect of the palatalization gesture on the apical gesture. But, trills have a higher resistance than /t/ and would probably not be as affected by palatalization.
DAC also makes predictions about intersegmental coarticulation between the trill and contiguous vowels. Due to the high level of constraint on the tongue dorsum for contrastive, aerodynamic, and physiological reasons, we expect Russian trills to be highly resistant to coarticulation.
/r/ and /rj/ in Russian have an apical and a palatal gesture. They share the apical gesture, which contrasts them with the other coronals, and they contrast in the presence or absence of a palatal gesture. The first hypothesis we investigate is that due to the dense contrastive system and the high resistance of the trills (two independent factors), there would be very little variability in their implementation. There are two predictions from this hypothesis: 1) the apical gesture, which is shared by /r/ and /rj/ would be implemented in the same way, as a full trill, to phonetically differentiate these two trills from non-trill coronals, and would therefore not be affected by coarticulatory or prosodic effects which could weaken the similarity between the trills; 2) the palatal gesture would be implemented in the same way in /rj/ regardless of the contiguous vowel, domain boundaries, and syllabic affiliation, so that the contrast between the two trills could be maximal in all positions. The second hypothesis we investigate is that domain boundaries would have strengthening effects on trills (initial and final), enhancing the palatal contrast between them, while domain-internal trills would be weakened, potentially neutralizing the contrast, and that coarticulation would have an effect on the production of both types of trills. The positional effect is predicted by the positional neutralization hypothesis (Steriade, 1994) and the phonological augmentation hypothesis (Smith, 2005), which predict that contrasts are enhanced in environments that strengthen segments and are neutralized in environments that weaken the segments. The predictions of this hypothesis are: 1) the apical gesture would be strengthened near the domain boundaries, i.e. realized as a full trill, while domain-internal trills would be weakened to taps or approximants word internally; 2) the palatal gesture would be strong and easily detectable near the domain boundaries, and weaker domain-internally, potentially neutralizing the phonetic differentiation due to the palatalization contrast. We therefore examine the predictions of two opposing hypotheses, the first expecting restriction of variability, and the other predicting significant variability.
Before presenting the technical details of data collection and analysis, we provide the rationale for using acoustic data, rather than direct articulatory measurements, to study the effect of prosodic position and contiguous vowel on the interaction between palatalization and trilling. In a previous study, ultrasound imaging was used to investigate tongue motion during Russian alveolar stops, non-palatalized trills, and palatalized trills (Kavitskaya et al., 2009). The goal was to determine the configuration of the back of the tongue in the palatalized trills vs. alveolar stops and non-palatalized trills. It was determined that the back of the tongue is far more advanced in the palatalized trills in contrast to the other two types of segments, since palatalization is accompanied by advancement of the tongue root. However, it also became obvious in that study that the configuration of the tongue is quite complex during the palatalized trill, due to the interaction of the palatal gesture, dorsal gesture for surrounding vowels, and the apical trill gesture, which have conflicting demands on the body of the tongue. It is therefore quite difficult to study the effects of prosodic position and contiguous vowel by direct measurements of the tongue, since it is not clear what aspect of tongue motion to measure, since the tongue tip is moving at a very high temporal frequency and by a small amount. We therefore decided to perform an acoustic study to determine the effect of the factors on the acoustic qualities of the palatalized and non-palatalized trills as a first step. Furthermore, one of the acoustic measures we use is the number of contacts in a trill. Due to the high rate of vibration, usually around 30 HZ, cycles of vibration are difficult to measure with our current sampling rates for articulatory measures. Another acoustic measure we use is F2-F1 at different portions of the trill, as a measure of the amount of tongue backing. This same measure has been used by Sproat and Fujimura (1993) to study tongue backing during American English /l/. Their statistical analysis showed that this acoustic measure was highly correlated with articulatory measures. The use of F2-F1 as an estimate of tongue backing is also supported by extensive research on articulatory-acoustic relations, where it has been shown that fronting of a constriction in various models of the vocal tract from around the uvular region to the palatal region increases F2-F1 (Fant, 1960). The current study is therefore part of an extended project, in which the acoustic and articulatory dimensions of the interaction between palatalization and trilling in Russian, and the factors that condition it, are examined in detail.
Nine native speakers of Russian, four female and five male, were recorded directly onto a Macintosh computer under quiet conditions, using Mobile M-Audio preamplifier and Rode NTG-2 shotgun microphone. The data were recorded into the Praat Program (copyright 1992–2009 by Paul Boersma and Weenink), in .wav format with the sampling rate 44 kHz, 16 bit quantization. All participants were monolingual in Russian until at least the age of 20, and are now relatively fluent in English. The age range was between 35 and 64. None of the participants had any reported history of speech or hearing problems.
The participants were asked to read a corpus that consisted of real Russian words with [r] and [rj] in the following environments: word-initial, intervocalic, preconsonantal, and word-final (see Appendix for full list of words used). The vowels adjacent to the trills in question were [a], [u], and [i-ɨ]. There is a long-standing problem of analyzing [i] and [ɨ] as the allophones of one vowel or as separate vowels. In postconsonantal position these vowels are in complementary distribution: [i] occurs after palatalized consonants, while [ɨ] is pronounced after non-palatalized ones. These vowels can be thus considered allophones of the same phoneme (Timberlake 2004: 40 after Panov 1967: 58ff). However, phonetically [i] and [ɨ] are quite distinct, [i] being a front high vowel, while [ɨ] a central high vowel with quite different formant values.
Most words were two syllables long, except for three tokens where it was difficult to find real words of two syllables for the relevant conditions, so we used monosyllabic words.
One more factor that was originally present in the data set was stress. We varied the stress on the vowel adjacent to the trill in question. However, only tokens with stressed vowels adjacent to the trill will be considered in the analysis.
The words were presented to the speakers in randomized lists. The words in Cyrillic script appeared on a computer screen, and there were 3 seconds between each item. In case the speaker made a mistake, the problematic tokens were re-recorded at the end of the session. Ten repetitions were collected from each speaker. A total of 240 tokens (2 Consonants × 4 Positions × 3 Vowels) were recorded for each speaker. (See the Appendix for the tokens used).
Words were recorded individually, without a frame sentence, because we wanted to investigate the number of taps in initial and final trills without any coarticulation from previous or following context. However, since the words were spoken in isolation, with three-second pauses between them, prosodically the tokens constituted entire utterances. We therefore assume the effects of the prosodic boundary at the beginning and end of the word to be that of a domain boundary. However, we still use the phrase “word position” to describe the position of the trill in the word, whether it is initial, intervocalic, preconsonantal, or final.
Besides acoustic duration, two types of measurements were made on each token: an acoustic measure of the apical gesture and an acoustic measure of the secondary articulation gesture. The apical gesture was measured by counting the number of contacts in each trill. Figure 1 shows spectrograms of a male Russian speaker’s pronunciation of the words [rat] ‘glad’ and [rjat] ‘row’.
A trill contains contact portions where the vocal tract is briefly closed, which will here be called the “closed phase” or contact, interspersed with portions where vocal tract resonances can be seen, which will here be called the “open phase.” As can be seen in the Figure 1, the token of [r] contains 3 closed phase portions, whereas [rj] contains only one. We used both manual and automatic tracking to determine the number of contacts for each token. In the manual method, the spectrogram and waveform were used to visually count the number of contacts. For manual tracking, both the waveform and spectrogram were viewed in expanded scale, and a count was made of significant drops in energy. In many tokens, including the one on the left of Figure 1, it was difficult to determine whether the last dip in energy is due to a closure or not. To be consistent we used the criterion that the dip in energy is at least half the dip seen in previous closures. In the automatic method, root mean square (RMS) energy with 3 ms windows was measured, then 16 sine waves with frequency 5–40 Hz were used as matched filters on the RMS2, with the filter providing the highest weight picked as the one giving the right tap count. The duration of the trill was then used to determine how many contacts (minima) there were. Initially, this approach was unsuccessful and necessitated the adaptation of the filters for each participant. After that, 88% of the automatic data matched the manually labeled data. Of the remaining 12%, 40% was determined to be accurate by visual checking, and the remaining mismatches (7%) were eliminated.
To measure the acoustic consequences of the secondary articulation, F2-F1 was taken as the indicator of tongue fronting, where a large F2-F1 indicates a fronted dorsum and a low F2-F1 indicates a backed dorsum. The formants were measured at the beginning and end of both /r/ and /rj/ in the same positions within the syllables containing these segments—the specific points chosen will be discussed later in this section. They were measured using automatic formant tracking, based on multitaper spectra (Blacklock, 2004), and all formant estimates were visually checked. Initially, 27% of the data used were mis-estimated. Manipulation of the window size, especially for female participants, improved the estimation, so that only 5% of the data had to be excluded. Multitaper spectra provide more reliable spectral estimates than LPC-based spectra (Blacklock, 2004).
Based on F2-F1 measures for each of the participant’s repetitions in a particular context, contiguous to a particular vowel, the goal was to characterize the distance between the F2-F1 distribution at a point in the palatalized trill to that in the corresponding point in the non-palatalized trill, to study the phonetic distance or contrast between the two, within a context. The F2-F1 values for all tokens within a context for /rj/ form a distribution and those for /r/ form another distribution. We then measured the distance between those two distributions using Cohen’s d, which is measured in standard deviations (Cohen, 1990). The formula for Cohen’s d is: , where the means and standard deviations (std) are taken over all tokens from each context. A Cohen’s d of .2 is interpreted as a small difference between distributions, while .5 is interpreted as a medium difference, and .8 and above as a large difference. Therefore, in each context and at each point in the trill, we were able to state how distinct the /r/s and /rj/s were with respect to their F2-F1 measure, an estimate of tongue backing.
The goal of this section is to show how secondary articulation, coarticulation and prosodic boundaries affect the number of contacts, duration, and F2-F1 difference in Russian trills. Measurements will be presented separately for the number of contacts, the duration, and the tongue fronting. Figures will present averages across subjects and the tables present the statistical analysis.
For each token in the data, the number of contacts was categorized into 3 categories, 0-contact, 1-contact, and 2–3 contacts. No tokens in the data had more than 3 contacts, and the number of 3-contacts was small enough warranting their inclusion into one category with the 2-contacts. The 0-contacts will be termed “approximant”, the 1-contacts will be termed “tap”, and the 2–3-contacts will be termed “full trill”. The percentage of each contact category was determined by secondary articulation, word position, and vowel. Figure 2 presents these results averaged across the subjects, since the subjects are highly consistent in these percentages. Each bar in the figure is broken down into the percentages of each of the categories. Figure 2a shows that the percentage of approximants for /r/ and /rj/ are approximately the same. For /rj/, the rest of the data (79%) is in the tap category, whereas for /r/, about 57% are taps, while 26% are full trills. The latter occur for only 1% of the /rj/. Therefore, it is certainly not the case that both /r/ and /rj/ are realized in the same way, in terms of the number of contacts. Furthermore, the main asymmetry between the two trills is that full trilling (2–3 contacts) occurs almost only for the non-palatalized trill. Figure 2b shows that full trilling occurs primarily for initial and final trills, and only rarely for intervocalic and preconsonantal trills, which have more approximant realizations. Figure 2c shows that the contiguous vowel does not affect the number of contacts, as can be seen from the similarity of the breakdown for the vowels.
To statistically investigate how Secondary Articulation, Vowel, and Word Position affect the number of contacts, we used the logistic regression instantiation of the Generalized Linear Model within a Mixed Linear Effects framework (Baayen, 2007) to account for the fact that the data from each participant are correlated. In the model, the error distribution is taken to be the binomial distribution to allow for binary responses, using the logit model to linearize probabilities. Several models have been used to analyze ordinal data (Agresti, 2002). The one we felt was most appropriate for the data analyzed here is the cumulative Logit Model, where the model is built to estimate that the probability of the dependent variable is less than or equal to a particular value. In this case, there were two models tested: 1) The Logit_1 model tests if the factors and their interactions increase or decrease the odds of a trill having 1 or greater than one closure vs. 0 contacts, and 2) The Logit_2 model tests if the factors and their interactions increase or decrease the odds of a trill having 2 or more contacts vs. less than 2 contacts (0 or 1 contacts). The results are shown in Table 2.
For the Logit_1 model, Secondary Articulation and Vowel showed no effect on the odds of having zero, one, or more contacts. For Word Position, Final, Intervocalic, and Preconsonantal trills all have a significantly higher chance of having 0 contacts, i.e., of being approximant.
For the Logit_2 model, /rj/ has a significantly lower chance of having 2 or 3 contacts than /r/. Also, Initial trills have a significantly higher chance of having 2 or 3 contacts than all other positions. Final Position also has a significantly higher chance of having 2–3 contacts than Preconsonantal Position. There was no statistically significant interaction between Secondary Articulation and Vowel. In summary, palatalization prohibits full trilling (2≥), and there is greater likelihood for an approximant (0 contact) in intervocalic and preconsonantal position and greater likelihood for full trilling on the boundary, especially initial trills.
Figure 3 shows the normalized means of trill duration across subjects as a function of secondary articulation. The normalization was performed by subject-centering the data, that is, the subject’s mean trill duration was subtracted from each token trill’s duration. Figure 3a shows that subjects do not show a sizable difference in duration as a function of secondary articulation. The difference is only about 5 ms. Figure 3b shows the mean Position effect on Duration. When the means are broken down by participant, we find that all participants show Final trills as longer than Intervocalic and Preconsonantal trills, except for Participant M5, whose Intervocalic and Final trills have comparable durations. Participants W2, W3, W4, M1, M2, and M4 all produce Final trills with longer durations than initial ones, but Participant M3 has Initial trills longer than Final ones, while Participants M1 and M5 show comparable durations for Initial and Final trills. Furthermore, Initial and Final trills are consistently longer than word-medial trills. Figure 3c shows the relation between contiguous vowel and trill duration. For all participants, trills contiguous to /a/ are not consistently longer than trills contiguous to /u/, but participants other than W4 show that trills contiguous to /a/ are longer than those contiguous to /i/. And unpalatalized trills contiguous to /u/ are consistently longer than unpalatalized trills contiguous to /ɨ/.
To examine how Secondary Articulation (SA), Contiguous Vowel (V), and Word Position (WP) affect the duration of Russian trills, a Linear Mixed-Effects General Linear Model test was performed with Dependent variable Duration, with fixed effects factors: SA (Levels: Palatalization, Velarization), V (Levels: /a/, /u/, /i/ with /rj/ and /a/, /u/, /ɨ/ with /r/) and WP (Initial, Intervocalic, Preconsonantal, Final), and Random Factor: Participant. Both main effects and interactions were estimated. Markov Chain Monte Carlo simulations with 10000 iterations were used for significance testing and estimating 95% Confidence Intervals (CI) for the effect coefficients, as described in Section 2.
Table 3 shows the statistical results of the study on duration. For the SA contrast between palatalized and non-palatalized trills, there was no significant difference in duration. Trills contiguous to back vowels /u/ and /a/ are significantly longer than ones contiguous to front /i/ and central /ɨ/ but are not significantly different from each other.
Word Position shows two main types of effects: trills at the boundary are longer than trills inside the domain, and boundary-final trills are longer than boundary-initial trills for most participants.
To investigate the intrinsic relation between the number of contacts and duration, there is the basic problem that number of contacts is a categorical variable, whereas duration and F2-F1 are continuous variables. To deal with this problem, we split the scale of Duration into ten parts and evaluated the odds of having 0 vs. 1 or more closures in each part of the scale and the odds of having 2≥ vs. 0 or 1 contacts. The 0 vs. 1,2,3 contacts Odds ratio is an indication of the likelihood of an approximant trill, whereas 2≥ Contacts vs. 0 or 1 is an indication of the likelihood of a highly trilled trill. Duration was correlated with the Odds ratios and with each other, and R2, the amount of explained variation, was then calculated for each relation. The results are plotted in Table 4. The relations with high R2 variation are highlighted in gray.
As can be seen from Table 4, within /r/, /rj/, and across all the data, Duration is a good predictor of whether the trill will be an approximant or will have one or more contacts. The longer a trill, the less likely it will be an approximant, and the more likely it will be fully trilled. For the likelihood of having 2 or 3 contacts, duration is a good predictor when the data are analyzed as a whole and within /r/, but not within /rj/, since there is little variability in duration in that category.
To acoustically investigate how the position of the tongue body varies in a Russian trill as a function of Secondary Articulation, Contiguous Vowel, and Word Position, we used F2–F1. The distance between the first two formants increases as the tongue body assumes a palatal position and decreases as the tongue body assumes a back position, and has been previously called the sharp-plain distinction (Jakobson, Fant, and Halle, 1952), where sharp configurations have a low F2-F1 and plain configurations have a large F2-F1. We use this measure since the palatalization contrast is a contrast exactly in the front-back positioning of the tongue, and should therefore structure F2-F1 variability. Also, this measure is less speaker-dependent than F2 or F1 alone (Ladefoged, 2001). Sproat and Fujimura (1993) used this measure to investigate the phasing between front and back gestures in American English /l/ and showed that it correlates highly with articulatory measures made using Xray Microbeam tracking of tongue position. In the current section, we investigate F2-F1 at the beginning of the trill and the end of the trill as a function of Secondary Articulation, Vowel, and Word Position of the trill.
Measurement of F2-F1 “at the beginning of a trill” means F2-F1 in a 25 ms window at the beginning of the first open phase component for a 1-contact or 2≥ Contact or beginning of the approximant for a 0-contact, whereas an F2-F1 measure “at the end of the trill” means F2-F1 in a 25 ms window at the end of the last open phase for a 1-contact or 2≥ contact or last 25 ms of an approximant. Specifically, we attempt to answer two questions. 1) Does palatalization span the entire trill, i.e., is it parallel with the apical gesture (synchronous), or is it sequential to the apical gesture (asynchronous)? If the two gestures are synchronous, then the palatalization contrast as expressed in F2-F1 should be equally detectable at the beginning and end of the trill, whereas asynchronous gesture organization would show a F2-F1 contrast at one end of the trill, but not the other. The effect of word position and vowel on whether the gestures are synchronous or not is also investigated. 2) Does the syllable’s vowel span the entire trill in its influence or only the beginning or end of the trill? We attempt to answer this question by measuring the difference between the effects of each vowel pair (e.g. /a/ vs. /u/) on the beginning and end of the non-palatalized trill by itself or the palatalized trill by itself at each position in the word. If the difference between vowel pairs can be detected equally at the beginning and end of the trill, then the contiguous vowel spans the entire trill, whereas if the difference can only be detected at one side of the trill, there is evidence that vowels only overlap with a part of the trill.
Figure 4 shows the means for all the participants of F2-F1 at the beginning and end of the trill for /r/ and /rj/, as well as the means for the change in F2-F1 from the beginning to the end of the trill. To be able to compare across different participants with different formant ranges, F2-F1 for each participant was normalized for this figure by subtracting from each token the speaker’s mean F2-F1 calculated across all the data for that participant. As can be seen, all participants show a large difference in F2-F1 measured at the beginning/end of a palatalized trill as compared to that measured at the beginning/end of a non-palatalized trill. When all the trills are combined together, as they are in Figure 4, participants seem to consistently show little change through the trill from beginning to end in /r/, whereas for /rj/ there is a small amount of change, around 100–200 Hz. This figure combines data for all the trill categories. In Figure 5, the mean phonetic differentiation in F2-F1 (non-normalized) between /r/ and /rj/ at the initiation of the trill are shown for approximant and tap trills. Full trills (>2-contacts) were not included, since there are so few cases of them for /rj/. The data is not normalized by subject to show that the effects are present, even when the normalization is not undertaken. It can still be seen that at the beginning of the trill, there is still high differentiation between the two trills. Crucially, the differentiation does not depend on word position. However, the differentiation is greater contiguous to the non-back vowel than in the back vowel context, which could be due to back vowels having a lower F2 before the palatalized segment than before the non-palatalized one. Figure 6 shows the mean change in F2-F1 across the trill (F2-F1 at the end of the trill minus F2-F1 at the initiation of the trill. It can be seen that onset (initial and intervocalic) palatalized trills, for both approximants and taps, show an increase in F2-F1, but that coda palatalized trills do not show a change in F2-F1 across the trill. When viewed by position, the non-palatalized trills show almost no change in F2-F1 across the trill, however when the data is separated by vowel, we see that there is a positive change in F2-F1 contiguous to /a/ and /u/ only.
Tables 5, ,66 and and77 show the significant results for the General Linear Mixed Model tests for F2-F1 at the beginning, end, and F2-F1change, respectively. The Vowel and Word Position effects were investigated by separating the data into the two classes, since the data behaves quite differently in the classes. The significant results reflect the patterns seen for individual participants. Some effects do show significance regarding Word Position. Specifically, initial trills show significantly higher F2-F1 than final and preconsonantal trills at the end of palatalized trills, and initial trills also show higher F2-F1 than intervocalic and preconsonantal trills at the beginning of non-palatalized trills. However these effects are relatively small, all less than 150 Hz, and their confidence intervals are quite high. It can be seen from the individual participant data that some participants do show these effects to a large extent and therefore bias the significance of the results. We will therefore not consider these effects as being robust. Table 7 shows that palatalized trills show an increase of about 130 Hz through the trill.
Figures 7,,88,,99 show the investigation of how vowel contrasts are exhibited at the beginning and end of the trill, as measured by Cohen’s d distance between distributions of F2-F1 for each vowel pair, measured at the beginning and end of each trill. Figure 7 is for the /a/-/ɨ/ and /a/-/i/ distinction, Figure 8 is for the /ɨ/-/u/ and /i/-/u/ distinction and Figure 9 is for the /a/-/u/ distinction. The upper row in each figure shows data for /r/, while the bottom row shows data for /rj/. The left column in each figure shows data for 1-Contacts, while the right figure shows data for 0-contacts. The major generalization arising from this investigation is as follows: the contrast between vowels is detectable in the portions of the trill close to vowels, but is poorly detectable or not detectable at all in regions of the trill farther from the vowel. For instance, in /r/, the /ɨ/-/a/ distinction is above two standard deviations in all positions in the trill, except for the beginning of the initial trill and the end of the preconsonantal and final trills. These positions are different from all the others in that they are not contiguous to the vowel. Examination of the various vowel distinctions reveals, however, that the distinction at the beginning of initial position is the only one that is always compromised. Several vowel distinctions are detectable quite significantly for all participants at the end of the preconsonantal and final positions. The /a/-/u/ distinction for /rj/ is least detectable across contexts, whereas that same vowel distinction is highly detectable in /r/.
To investigate the extent of coproduction of the trill with its contiguous vowel statistically, as a function of the trill’s position in the word, we tested for the interaction of Vowel and Word Position on F2-F1 at the start of the trill and at the end of the trill, within /rj/ and within /r/. The results for /rj/ are in Table 8. It can be seen that the vowel distinctions (/i/ vs. /u/) and (/i/ vs. /a/) have an effect at the beginning and end of the palatalized trill in Intervocalic, Preconsonantal, and Final positions. However, in Initial position, the vowel distinctions are significant at the end of the trill (immediately preceding the vowel), but not at the beginning of the trill. The /u/ vs. /a/ distinction has a significant effect only at the beginning of Final trills. Table 9 shows the effect of vowel distinctions on the beginning and end of /r/. The same basic pattern can be seen for this trill. The beginning of Initial /r/ shows little (ɨ/u and a/u) to no (ɨ/a) effect of the contiguous vowel distinctions, but all other positions show sizable effects.
The most basic finding in this study is that the contrastive, aerodynamic, and physiological constraints on palatalized rhotics do not prohibit coarticulatory or prosodic variability. In this section, we summarize the various generalizations that have emerged in the last section. There are four types of effects on variability that have emerged—effects of boundaries (position), coarticulation, palatalization contrast, and syllable structure. Table 10 presents each of the generalizations, together with the evidence for each generalization. Boundaries lengthen contiguous trills and strengthen their trilling (Generalizations 1 and 2). That is, the gesture that is shared by /r/ and /rj/ is not always implemented in the same way to differentiate the two trills from coronal non-trills. This result is in contradiction to the first prediction of the first hypothesis (Section 1.3), which states that the trill portion of both trills would be implemented in the same way, and supports the first prediction of the second hypothesis, which states that the apical gesture of trills would be strengthened domain-initially. That strength accompanies lengthening follows from several models: Lindblom’s undershoot model (1963), Fujimura’s (2000) C/D model, and Byrd and Saltzman’s (2003) pi-gesture model. Since boundaries lengthen a trill, there is more time for full trilling.
However, crucially, the effect of the boundaries does not alter the palatalization contrast effect (Generalization 5), which is present throughout the trills and does not vary markedly between initial, intervocalic, preconsonantal, and final trills, supporting prediction 2 of the first hypothesis. There is no evidence of neutralization of the palatalization contrast in the word medial contexts, contrary to the second prediction of the second hypothesis. The strengthening of the palatalization in initial and final positions and weakening of them in the utterance medial positions predicted by the positional neutralization (Steriade, 1994) would enhance contrast near the boundaries and eliminate it inside the domain. This is not what happens; instead, the contrast is maintained throughout the trills for all of the trills, regardless of word position. Boundaries have an effect on duration and the number of vibrations in a trill, but they do not significantly alter the palatalization. The lack of weakening of palatalization in word medial trills is also surprising from the point of view of the undershoot hypothesis (Lindblom, 1963), since one would expect that the shorter word medial trills would be less able to implement a secondary contrast, or would show a weaker contrast, since there is little time to achieve the target.
We believe that there are two factors responsible for the maintenance of the palatalization contrast domain-medially. One possible explanation of the absence of the palatalization strengthening at the boundaries and the absence of palatalization weakening domain-internally is that boundaries affect primarily consonantal gestures, whereas accent affects vocalic gestures (Cho, 2002). Since the palatal gesture is more open than the apical, it can be analyzed as a vocalic gesture, and would thereby be unaffected by the boundaries. However, this effect has not been thoroughly demonstrated, and moreover, the vocalic gestures studied in this context are often in the vowel and far from the boundary, whereas palatalization of a trill is in the consonant itself and can be next to the boundary. Therefore it is not clear how applicable this idea is to secondary palatalization, even though it may contribute to the result. Another factor in keeping medial trills strong with regards to palatalization is seen in the other effect of contrast: the palatalization contrast itself reduces the chance of full trilling (Generalization 6). The palatalization contrast affects the production of the trill aspect, prohibiting full trilling in the palatalized trills (Generalization 5) regardless of domain position. We believe that by introducing a phonetic differentiation between the two contrasting trills in word medial position in the phonetic property related to the phonological property they share (apical gesture specification), the possibility of the contrasting gesture (palatalization) remaining strongly differentiated phonetically is increased. That is, from the point of view of the undershoot hypothesis, medial trills, due to their decreased duration, would not be able to achieve their primary or secondary targets as well as the trills near the boundary. But since the palatalization contrast itself induces a weakening in the trilling in all positions, the secondary gesture in medial position is more able to achieve its target, since the primary gesture has been simplified from a highly difficult trill to an approximant or a tap, which is less resistant. There is therefore the possibility, which can be tested in the future by a perception experiment, that the introduction of a phonetic differentiation between /r/ and /rj/ in how many vibrations they contain is therefore helpful to the maintenance of the palatalization contrast. The first hypothesis, presented in Section 1.3, predicts that the gesture the trills share would be produced in the same way, whereas the palatalization gesture would be maximally different between the trills. What seems to be happening is that the second prediction is accomplished in part by phonetically differentiating the property they share (i.e., trilling).
The realization of palatalized trills as taps or approximants may also have a physical motivation. One possibility is that the palatalization gesture affects the apical gesture through retraction or by reducing the ability to control the stiffness and inertia, which allows the tip to vibrate. Palatalization would thus weaken trilling. Keating (1993), as discussed in Section 1.2, showed that palatalization retracts the apical contriction for /t/, which would support the possibility that this may also be happening in the case of trills. However, she also showed that the apical gesture for /s/, which is more similar in the level of constraint on the tongue, is not retracted. Also, based on the evidence from assimilation in Catalan consonant clusters, it seems more likely that the apical gesture is stronger than the weaker palatal gesture, and is not likely to be weakened by it. Articulatory measurement of dynamic speech is necessary to establish whether the tip of the tongue is configured in the same way in both trills.
Coarticulation is not in general blocked from operating, despite the density of the contrastive system and the high degree of constraint on the tongue body in trills (Generalizations 3 and 4). But again, despite the fact that vowels have an effect on even the “weakest” trills, those trills are still able to fully implement the palatalization contrast. The effects of coarticulation and its interaction with the boundary effects may seem surprising from the point of view of DAC, since one would not expect coarticulation to affect a segment with a high DAC value. However, the theory does not say that coarticulation cannot affect such a segment, it only says that the effect on such a segment would be less than the effect on a segment with lower DAC value. The ability of coarticulation to affect even such highly resistant segments constitutes evidence that coarticulation is a necessary factor of speech production that can be more or less minimal, but can never be eliminated. This result is not inconsistent with DAC. What is more surprising from the perspective of DAC is that there is no evidence of a weakening of the palatal gesture due to it having a lower DAC value than the apical gesture. Such an effect would result in the weakening of palatalization, which is not observed. This would be analogous to assimilation in consonant clusters in Catalan (Recasens and Pallares, 2001). However, this does not mean that DAC makes the wrong prediction. The contrasts in the two languages are quite different, and the interactions examined here are within consonant, rather than in a consonant cluster. This data can therefore be taken as support for a difference between intersegmental and intrasegmental interaction between segments.
Even though palatalization is consistently present from beginning to end of trills, its intensity seems to rise through the trill in onset positions (Generalization 8). Coda trills are palatalized throughout, however, the intensity of the palatalization does not rise in them. This is consistent with Kochetov’s (2006) findings regarding the palatalized labial stop in Russian and his preliminary data regarding palatalized trills (Kochetov, 2004). This particular timing pattern, however, seems to not agree with some expectations about gestural timing in other languages. The rise in intensity of palatalization in the onset is consistent with the expectation of Sproat and Fujimura (1991) that the vocalic gesture is attracted to the vowel, and it is consistent with the 180 phase difference between gestures in the onset postulated by Goldstein et al. (2007). However, it may be expected that the palatalization (F2-F1) should fall, instead of remaining steady, in the coda (Sproat and Fujimura, 1991). It may also be expected that there would be even greater sequencing of the palatalization in the coda (Goldstein et al. 2007), since the vowel and the coda are in a 180 phase relation, and the gestures within the coda are also in 180 phase relation, which would be observable as a rise in F2-F1. However neither of these effects takes place. We believe that further acoustic and articulatory research on timing of the palatalization gesture in Russian is necessary, before it is clear whether the patterns there are part of a universal pattern generalization.
In summary, we see that variability due to coarticulation and prosodic boundaries is not avoidable—these are inextricable parts of speech that cannot be eliminated due to phonological and physical constraints on variability. This may not seem surprising for many speech segments, but it is surprising with segments that are highly constrained physiologically and phonologically. However, the contrasts are maintained, despite the variability. The sources and limits in variability are part of a non-zero sum game, in which they both achieve their effects. Moreover, there is evidence that contrast and the sources of variability interact in subtle ways in the case of palatalized trills to maintain the contrast, despite the difficulty of maintaining the contrast in intervocalic and preconsonantal position due to prosodic variability.
The phonetic variability seen in this study is quite reminiscent of the phonological strategies used by the different Slavic languages to resolve the conflict between coronal and palatal gestures. There are several such strategies used. Some Slavic languages de-palatalize the trill, a phenomenon that we see no evidence for in Russian. Other Slavic languages de-trill /rj/. Russian appears not to de-trill, however, the generalizations related to the number of contacts show that /rj/ is far less likely to have two or more contacts than /r/. Thus, there is no categorical de-trilling, but there is a phonetic tendency for /rj/ to not be fully trilled. Some Slavic languages categorically sequence the palatal gesture after the apical one. Again, Russian shows no evidence of that. Indeed, the palatalization contrast is evident from the first pitch period of a trill. However, there is a tendency for the palatalization to be more intense at the end of a trill than at the beginning of the trill. Therefore, the phonetic properties of palatalized trills show a delicate balance between trilled and de-trilled and between parallel and sequential gestures. De-palatalization, fricativization, and sequentiality resolve the conflict of palatalization and trilling in some Slavic languages by elimination of the contrast. We therefore conjecture that the partial presence of de-trilling and gestural sequentiality in Russian play the role of stabilization and thus assist in contrast maintenance. That is, we do not believe that Russian is moving towards neutralizing the contrast, rather coarticulatory and prosodic variability have yielded stable, yet phonetically differentiated sounds in contrast. In future research we will test this conjecture by investigating the influence of prosody and coarticulation on rhotics in Slavic languages where the palatalization contrast is no longer present and compare it to the results from Russian.
We thank D. H. Whalen, Tine Mooshammer, and Carol Fowler for many helpful comments regarding this study. All shortcomings of this work are due only to the authors. This work was supported by NIH NIDCD grant 02717.
1For instance, dentals distinguish palatalization for all vowels (before [e] non-palatalized dentals are present only in borrowings, some of which, however, are fully assimilated) and palatalized velars before [a o u] only occur in borrowings. Palatalized velars do not occur after vowels word-finally. In clusters, palatalized dental obstruents do not occur before other dentals or palatals; there is also regressive palatalization assimilation in a number of consonant clusters (for details, see Timberlake, 2004). The distribution of palatalized consonants in consonant clusters is less free than that of their non-palatalized counterparts, but the contrast nonetheless is present word-initially, intervocalically, preconsonantally, and word-finally.
2The frequencies of the filters were chosen based on reports of trill vibration frequency in different languages in the literature (Lindau, 1985; Ladefoged and Maddieson, 1996; Barry, 1997) and also based on model-based trill simulations (McGowan, 1989).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.