|Home | About | Journals | Submit | Contact Us | Français|
One of the aspects of major relevance to singing is the control of fundamental frequency.
The effects on pitch inaccuracy, defined as the distance in cents in equally tempered tuning between the reference note and the sung note, of the following conditions were evaluated: (1) level of external feedback, (2) tempo (slow or fast), (3) articulation (legato or staccato), (4) tessitura (low, medium or high) and (5) semi-phrase direction (ascending or descending).
The subjects were 10 non-professional singers, and 10 classically-trained professional or semi-professional singers (10 males and 10 females). Subjects sang one octave and a fifth arpeggi with three different levels of external auditory feedback, two tempi and two articulations (legato or staccato).
It was observed that inaccuracy was greatest in the descending semi-phrase arpeggi produced at a fast tempo and with a staccato articulation, especially for non-professional singers. The magnitude of inaccuracy was also relatively large in the high tessitura relative to the low and medium tessitura for such singers. Counter to predictions, when external auditory feedback was strongly attenuated by the hearing protectors, non-professional singers showed greater pitch accuracy than in the other external feedback conditions. This finding indicates the importance of internal auditory feedback in pitch control.
With an increase in training, the singer’s pitch inaccuracy decreases.
Singers are typically required to sing with a high magnitude of precision in their fundamental frequency (fo). This requires constant self-monitoring of vocal output and frequent small, corrections in thyro-arytenoid and cricothyroid muscle activity. In the context of singing, it is important that pitch accuracy be maintained even when singers cannot hear their own voices, so that their performance is not impaired by a loud orchestral accompaniment or by the choral sound of fellow singers.
During a performance, a singer will often perform in several different locations, in which the acoustic conditions and the balance with the orchestra will differ. Hence, with (classical) training, a singer will learn to rely not only on external auditory feedback, i.e., the sound that the singer perceives of his or her own voice via air conduction, but also on proprioceptive feedback associated with internal (pallesthetic and kinesthetic) sensitivities. The main source of pallesthetic feedback is internal auditory feedback resulting from skull vibrations (bone conduction). Vibration of the vocal folds gives rise to a concomitant vibration in the bones of the skull, which stimulates the cochlea. There is also a perception of thoracic, facial, and other skeletal vibrations.1–2–3 The most relevant receptors of kinesthetic feedback are the laryngeal sensory receptors.4
The significance of external and internal feedback to pitch control has been considered in a few previous studies. It has been found that, in the absence of auditory feedback, pitch accuracy typically decreases.5–6–7 Hence, Ternström et al.7 argued that ‘proprioceptive feedback plays a less important role than the auditory feedback in the [fo] control by singers.” (1988; p. 191) Elliot and Niemoeller5 and Schultz-Coulton8 found that external auditory feedback was vital to pitch accuracy, especially for adults without voice training (cf. Watts et al.9).
In a partial replication of the Ward and Burns6 study, Mürbe et al.12 found that the pitch accuracy of 28 singers who were at the beginning of their professional solo singing education decreased in the following conditions: (1) when auditory feedback was masked by noise at 105 dB(A) presented via headphones, (2) when a staccato vs. a legato articulation was used, and (3) when a fast vs. a slow tempo was used. In a second study, conducted immediately after the same subjects had completed 3 years of professional singing education, Mürbe et al.13 found the same trends with regard to masking noise, style and tempo. They reported a smaller difference between masked and unmasked conditions in the slow (40 bpm) tasks in the second set of recordings, indicating a greater reliance on internal auditory feedback after the three years of education than before the education. There was no apparent effect of education on pitch accuracy in the fast tempo (160 bpm).
The effects of interval direction on pitch accuracy were reported by Edmonson.14 He analyzed the pitch accuracy of five groups of music students (vocalists, string instrumentalists, pianists, brass instrumentalists, and woodwind instrumentalists) on four intervals (the perfect fourth, perfect fifth, major sixth and minor third). It was demonstrated that vocal pitch acuity on ascending intervals is much better than acuity on the same descending intervals.
In the present study, the effects of the singer’s level of training and the magnitude of external auditory feedback on pitch inaccuracy are investigated. In a previous study,15 the magnitude of the Lombard effect16 was investigated in the same 20 singers. It was found that trained singers were less responsive to changes in the level of the accompaniment than untrained singers, indicating less reliance on external auditory feedback. Therefore, the first prediction in the current study is that pitch accuracy is greater and less variable for trained than untrained singers across levels of external auditory feedback, tempo, articulation and semi-phrase direction (ascending and descending) conditions. Secondly, it is predicted that pitch accuracy is better in an arpeggio produced at a slow tempo, with a legato articulation and in the ascending semi-phrase than in an arpeggio produced at a fast tempo, with a staccato articulation and in the descending semi-phrase. Finally, it is hypothesized that singers’ pitch accuracy is reduced in the absence of external auditory feedback.
This use of human subjects for this research was approved by Michigan State University’s Human Research Protection Program (IRB #13-1149). Ten female and ten male singers (mean age 22.9 ±4.5 years) volunteered to take part in the experiment. The sample was divided into two groups: the first comprised non-professional singers, and the second comprised professional classical singers. The age, gender, group and voice type of the 20 subjects are reported in Table 1. The members of the non-professional group were mainly choristers in a cappella choirs, with a primarily popular repertoire. The professional singers were predominantly Master’s students in classical singing, with a primarily operatic repertoire and a mean number of years of singing lessons equal to 7.6.
The experiment was conducted in a sound-treated booth (2.5 × 2.75 m and h=2.0 m). In the first condition, this environment was unchanged (Set Normal). In the second condition (Set Panels), two reflective panels were placed in this room at 0.5 m from the singers, 45° from the mouth axis. In this condition, external auditory feedback was increased. In the third condition (Set Hear. Protector), singers wore over-the-head, earmuff-style hearing protectors, which strongly attenuated external auditory feedback.
After an initial (guided) warm-up, consisting of 5 note scales covering the singer’s range and a few repetitions of the arpeggio object of the study, singers performed arpeggi in three different external auditory feedback conditions. As a prompt, the first note was played on a keyboard before each arpeggio. The arpeggi were sung without musical accompaniment and without the use of falsetto. A metronome was displayed on a screen outside the sound booth and was visible but not audible.
A total of 12 tasks were recorded for each subject by means of a head-mounted microphone (HMM, Glottal Enterprises M-80), connected to a PC via a Scarlett 2i4 Focusrite soundboard. The recording software was Audacity 2.0.6. The order of Set presentation was randomized, and the order of Tempo and Articulation conditions was randomized within Set. Tempo was varied between 40 and 160 bpm. Articulation varied between Staccato and Legato. The arpeggio covered one octave and a fifth and the keys were C major for sopranos and tenors and A major for altos and basses (Figure 1). In the analysis, the notes sung by the singers were grouped according to Tessitura (Low, Medium and High), as shown in Figure 1.
As described above, the subjects performed in three external auditory feedback conditions. The first condition consisted of the soundproof room without reflective panels (Set Normal). The auditory feedback was at a medium–low level. The mid-frequency Reverberation Time (T30) in the room was 0.05 s and the trend over the octave bands was almost flat, as reported in Table 2. It was measured following the standard ISO 3382-2:2008.17 The Schroeder Frequency18 was 121 Hz; consequently, it was possible to evaluate the room acoustic parameters only for frequencies higher than 121 Hz. In the second condition, auditory feedback was increased by the presence of reflective panels at 0.5 m from the singers (Set Panels). The dimensions of the transparent shields of the polycarbonate panels were 56 cm by 66 cm (22 × 26”). The presence of the panels did not affect reverberation time (see Bottalico et al.15). The third condition (Set Hear. Protector) involved the lowest level of external auditory feedback and was obtained using hearing protectors. The insulation provided was 25.8 dB on average for frequencies between 250 Hz and 8 kHz (Table 2).
MATLAB version 2014b and Praat version 5.4.01 were used for signal analysis. MATLAB was used to extract the central portion of each note in order to exclude voice attack and release effects. Fundamental frequency was estimated by means of the autocorrelation method in Praat, using Hanning windows with a temporal length of 3 divided by the value of the pitch floor, with pitch limits of ±123 cents from the reference value, with a 0.05 time step, an octave cost of 0.0025 per octave, and a voiced/unvoiced cost of 0.20. All other parameters had default values. The absolute value of the difference (distance) in cents between the produced note and the reference note is given by
where fo is the produced note and fref is the reference note in Hertz.
Reference notes were established for both equal temperament and pure and just intonation; however, the results were not statistically different. In order to compare the present results with those of previous studies, results are reported for the equal temperament.
Statistical analysis was conducted using R version 3.1.2. In agreement with analytic methods used for generic accuracy data, the distribution of the response variable (the distance in cents between the note produced by the singer and the reference note — thus, pitch inaccuracy — on the basis of equal tempered tuning) was most consistent with the Gamma distribution (after being divided by 100 and with the addition of the constant 1). This distribution had a shape parameter of 34.91 and a rate parameter of 26.92, according to the log-likelihood and plots of the fit of various possible distributions (using the MASS package in R). Hence, a generalized linear mixed model (GLME) was fit with the Gamma family of distributions and the inverse link function. This function relates the condition mean of the response to the linear predictor, i.e., it makes the regression of Y on the Xs linear. The model was fit by maximum likelihood using Laplace’s approximation, providing estimates of the regression coefficients and the standard errors of the coefficients. The model output also includes the test statistic, t, and the associated p value. The independent variables or fixed factors were Tempo, Articulation, Set, Group, and Semi-phrase direction and interactions of Tessitura and Group and Set and Group, and the random effects term was subject. The best of a set of nested models were selected on the basis of the Akaike information criterion and the results of likelihood ratio tests and were built using lme4 and lmerTest packages.
Fligner-Killeen median tests were used to test the null hypotheses that variances in pitch inaccuracy (distance in cents between the between the note produced by the singer and the reference note, on the basis of equal tempered tuning) were the same for the two or more levels of each of the following variables: Group, Tempo, Articulation, and Semi-phrase direction. The p values were corrected using the Benjamini-Hochberg False Discovery Rate method.
The means, medians and standard errors of the distance in cents between the note produced by the singers (Non-professional and Professional, respectively) and the reference note by Articulation (Legato and Staccato), Tempo (Slow = 40 BPM and Fast = 160 BPM) and Semi-phrase direction (Ascending and Descending) are shown in Table 3. The means, medians and standard errors of the distance in cents between the note produced by the singers (Non-professional and Professional, respectively) and the reference note by Set (Normal, Panels and Hear. protector) are shown in Table 4.
Non-professional singers were found to have a mean pitch inaccuracy of 34.5 cents, and professionals, 25.0 cents. Mean inaccuracy was greater in the Fast Tempo (33.7 cents) than in the Slow Tempo (25.7 cents). Mean inaccuracy was greater in the Staccato Articulation (32.7 cents) than in the Legato Articulation (26.7 cents). Mean inaccuracy was greater in the Descending Semi-phrase (31.5 cents) than in the Ascending Semi-phrase (28.1 cents).
A GLME was run with a measure of pitch inaccuracy as the response variable, and the fixed factors (1) Tempo (Slow or Fast), (2) Articulation (Legato or Staccsato), (3) Group (Non-professional or Professional), and (4) Semi-phrase direction (Ascending or Descending), and interactions of (5) Group and Tessitura (Low, Middle or High) and (6) Group and Set (Normal, Panels or Hearing protectors). Subject was included as a simple random effects term. Model estimates with associated standard errors and p values are given in Table 5.
Pitch accuracy tended to be better for the Professional singers than the Non-professional singers at p = 0.07 (when singing in the Slow Tempo, in the Legato Articulation in the Normal Set and in the Ascending Semi-phrase). As shown in Figure 2, the difference between the groups was especially apparent in the High Tessitura (p < 0.01), in which the Professional singers were most accurate (of the three tessitura), and the Non-professionals were least accurate. Figure 3 illustrates that, for both groups, pitch inaccuracy was reduced in the Slow relative to the Fast Tempo (p < 0.0001). As shown in Figure 4, inaccuracy was reduced in the Legato relative to the Staccato Articulation (p < 0.0001), especially for Non-professional singers. Inaccuracy was lower in the Ascending than the Descending Semi-phrase (p < 0.0001), once again especially for the Non-professional singers (Figure 5). As illustrated by Figure 6, Non-professional singers showed a lower inaccuracy in the Set in which Hearing protection were worn relative to the Normal Set (p < 0.05), while Professional singers were not influenced very much by external auditory feedback.
Fligner-Killeen (median) tests with Benjamini-Hochberg corrected p values indicated that there was greater variance in pitch inaccuracy for the Non-professional than the Professional singers (F-K X2 = 77.23, df = 1, p < 0.0001), for the fast than the Slow Tempo (X2 = 83.02, df = 1, p < 0.0001), for the Staccato than the Legato (X2 = 53.47, df = 1, p < 0.0001), and for the Ascending than the Descending Semi-phrase (X2 = 5.76, df = 1, p < 0.05). There was insufficient evidence that variances differed between Tessitura (X2 = 5.89, df = 1, p = 0.053), or sets (X2 = 1.18, df = 1, p = 0.55).
The present study was carried out to assess the effect of training level on singing pitch accuracy and to quantify differences in pitch accuracy for vocal tasks differing in Articulation, Tempo, Semi-phrase direction and levels of external auditory feedback.
The mean pitch inaccuracy of the professional group, equal to 25.0 cents, was lower than that of the non-professional group, equal to 34.5 cents. This means that, generally, in all conditions, the performance of the professional singers was more accurate in pitch. This result, even though different from that found by other researchers,6–13 is predictable because of the increased capability of the professional vocal instrument in terms of (1) agility in the use of the laryngeal muscles and (2) control in the breath support,19 both of which will allow the singers to achieve good pitch accuracy even in the high tessitura. A higher level of agility will lead to a lower pitch inaccuracy in the staccato articulation (38.3 cents for non-professional singers and 27.1 cents for professional singers) and in the fast tempo pattern (37.6 cents for non-professional singers and 29.9 cents for professional singers), while improved breath support will lead to a lower pitch inaccuracy in the legato articulation (30.5 cents for non-professional singers and 22.9 cents for professional singers), in the slow tempo (31.3 cents for non-professional singers and 20.2 cents for professional singers) and in the highest notes (or high tessitura; 35.3 cents for non-professional singers and 22.7 cents for professional singers).
The direction of the pattern in a melody, whether ascending or descending, has been shown to be one of the factors affecting pitch accuracy. Lower values of pitch inaccuracy were found for the ascending semi-phrase (32.2 cents for non-professional singers and 24.1 cents for professional singers) than for the descending semi-phrase (37.1 cents for non-professional singers and 26.1 cents for professional singers). While there has been some disagreement between studies on the relative accuracy of ascending and descending directions, this finding is consistent with that of Edmonson.14 Better pitch accuracy was found by Edmonson for undergraduate Music students singing different intervals in the ascending direction than in the descending direction. The reason for this phenomenon could be an increase in psychological and muscular relaxation after the achievement of the highest note(s) or climax of the pattern performed. Professional singers are usually trained in maintaining a consistent muscular tension associated with the breath support during the entirety of the musical phrase, which is consistent with the smaller difference in pitch accuracy between the two semi-phrases for professional than non-professional singers (2.0 cents and 4.9 cents, respectively).
It was hypothesized that pitch inaccuracy would be greater when external auditory feedback was reduced (Set 3) than under normal conditions (Set 1). However for non-professional singers, even if of a smaller quantity than the just notable difference of the human ear (5 cents),20 pitch inaccuracy was reduced when external auditory feedback was strongly attenuated by means of hearing protectors (33.1 cents in the condition with hearing protectors, 35.0 cents when feedback was increased by reflective panels, and 35.3 cents in the normal condition). Different results were found for the professional group, who did not appear to be influenced by the level external auditory feedback (24.2 cents in the condition with hearing protectors, 25.9 cents when feedback was increased by reflective panels, and 24.9 cents in the normal condition.
The present results are somewhat different from the findings of Mürbe et al.12–13 In both of the studies of Mürbe and colleagues, the masking effect of white noise at 105 dB(A) presented via headphones resulted in a decrease in pitch accuracy, and the size of this effect did not differ according to the level of training. This apparent difference between the present study and those of Mürbe et al.12–13 is arguably related to the difference in protocol. In the studies of Mürbe et al.12–13 the introduction of noise at a high level in the ear could have affected internal auditory feedback,3 which results from the transmission of laryngeal vibrations to the skeletal framework by means of extrinsic laryngeal muscles. In particular, the pressure variation introduced by the noise could have been large enough to greatly reduce the internal vibrations. Mürbe et al. stated in their earlier publication12 that only 2 of the 28 singers reported being able to hear themselves, and only in their highest notes. Hence, it is possible that the use of masking noise delivered via headphones in these studies strongly attenuated both the external and the internal feedback. During the present experiment, only the external auditory feedback was modified by the wearing of hearing protectors or the use of the reflective panels, and the internal feedback was preserved. In fact, it could be argued that the internal auditory feedback was emphasized due to the strong attenuation of the background noise by the hearing protectors. This emphasis of the internal auditory feedback improved the pitch accuracy of non-professional group, but did not affect the pitch accuracy of professional singers, whose internal auditory feedback perception is likely to have been better developed by means of training. Indeed, in a previous paper,15 the current authors argued that professional singers rely less on external auditory feedback and more on internal auditory feedback than do non-professional singers, as reflected by sound pressure level and singing power ratio. This finding suggested that the quality of the professional singers’ performance is less dependent on the environmental conditions in which they are singing.
It has been shown that pitch inaccuracy in singers is affected by the level of training, the tempo, articulation, semi-phrase direction (ascending or descending), and tessitura (low, medium or high), and the level of the external auditory feedback. Mean pitch inaccuracy was as high as 58 cents for the non-professional singers in staccato, fast arpeggi in the high tessitura and in normal external auditory feedback conditions, and as low as 13 cents for the professionals in legato, slow arpeggi in the high tessitura. An interesting finding concerning the role of internal and external auditory feedback in pitch accuracy was that when external feedback was strongly attenuated, there was an increase rather than a decrease in accuracy for non-professional singers. This increase may have been caused by the masking of external feedback, which arguably encouraged greater reliance on internal auditory feedback, which was preserved. Future research will better address the roles of external and internal auditory feedback in singers’ pitch accuracy in a more realistic setting. Additionally, a study comparing the results of various methods for the computation of pitch accuracy, such as methods evaluating interval precision or sharpness and flatness, will be conducted.
The kind cooperation of the singers has made this work possible. Special thanks to Professor R. Fracker and A. Lee for their assistance in recruiting singers. Thanks also to C. Gavigan and A. Lee for assistance with data entry. Research reported in this publication was supported by the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health under Award Number R01DC012315. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.