|Home | About | Journals | Submit | Contact Us | Français|
This investigation used synchronous high-speed videoendoscopy (HSV) and electroglottography (EGG) to systematically study contact and separation behavior along the length of the vocal folds.
Facilitated by EGG and digital kymograms derived at 20%, 35%, 50%, 65%, and 80% of the posteroanterior length of the vocal folds, the pattern of vocal-fold contact and separation was determined for 7 female and 7 male vocally healthy subjects while producing “breathy,” “comfortable,” and “pressed” phonations.
The female subjects consistently used an anterior-to-posterior contact pattern and posterior-to-anterior separation pattern when producing a breathy or comfortable voice, with several employing a simultaneous pattern of contact and/or separation for pressed phonation. The male subjects showed more variable “zipperlike” separation patterns, but consistently used a simultaneous contact pattern for pressed voice that was also commonly used when producing comfortable phonation.
Findings indicate longitudinal phase differences in vocal-fold vibration are both common and expected in vocally healthy speakers. The implications for vocal assessment, as well as for the use and interpretation of the EGG signal, are discussed.
The vibratory undulation of the vocal-fold mucosa has long been recognized1–5 and is now known to be essential for normal phonatory function.6–9 Rather than uniform medial contact and lateral separation, there is, at least in modal-register voice, a substantial vertical phase difference in which the vocal folds first lose medial contact along their lower margins and proceed to separate toward their upper edges. As the upper margins lose contact, they continue to move laterally, even as the lower edges begin to medialize. The pattern of vocal-fold contact likewise progresses in an inferior-to-superior direction.
As a consequence of the delay between the motion of the upper and lower medial surfaces, changes in glottal geometry are quite complex. This complexity makes it necessary to distinguish between the glottis, the space between the medial surfaces of the vocal folds, and the patency of the laryngeal airway. For instance, although the airway may be occluded initially by approximated lower fold margins, a superior divergent glottis continues to close. Following maximal medial contact, the lower fold margins begin to separate and a convergent glottis develops, even though it is only after contact is lost between the upper margins that transglottal airflow is possible.
The wavelike behavior of the medial mucosa may be attributed to the mechanical and aerodynamic coupling of the vocal folds.10–15 Nonetheless, the multilayered structure of the vocal-fold mucosa, glottal geometry, as well as biomechanical and aerodynamic forces vary considerably along the length of the folds between the anterior commissure and the vocal process of each arytenoid cartilage.7,16–19 For instance, the superficial layer of the lamina propria is more voluminous at the midportion of the folds and becomes thinner anteriorly and posteriorly.20 Conversely, the intermediate and deep layers are thinnest at the midportion, becoming thicker toward the anterior and posterior ends of the folds.20,21 Consequently, the vocal fold is the most pliable near the midpoint of its membranous portion. Indeed, when the vocal folds are viewed superiorly, the most vigorous wavelike movement appears to occur at, or slightly posterior to, the mid-membranous portion of the glottis.6,18,22 Conversely, when viewed from below, the greatest wavelike movement reportedly takes place in the vicinity of the anterior commissure.6 Furthermore, approximation of the posterior lower margins may be incomplete even when there is complete contact between the superior fold margins.23
Although the presence of some element of a longitudinal phase difference in glottal opening and closing has long been noted for speakers without vocal complaints, the relatively few research findings to date, based on high-speed filming or videostroboscopy usually supplemented by various glottographic signals, have been inconsistent. Baer, Löfqvist, and McGarr24 reported that glottal closure occurs nearly simultaneously along the length of the vocal folds, although fold separation commences at the midglottis, extending afterward in both a posterior and anterior direction. However, Sonesson25 and Tanabe et al.26 observed anterior-to-posterior vocal-fold separation. Childers et al.27 have likewise suggested a “zipperlike” pattern, but one in which the folds initially separate posteriorly and then progressively lose anterior contact, followed by contact that increases in an anterior-to-posterior manner. This posterior-to-anterior separation of the superior margins has also been observed by Hirano et al.22 and by Anastaplo and Karnell.28 A zipperlike contacting pattern has not been confirmed, which is unfortunate considering the crucial role of glottal closure in the acoustic excitation of the vocal tract.29–31
The disparity among the reported longitudinal phase data may be ascribed to differences in the adductory force applied by speakers and the consequent medial compression of the vocal folds. For instance, based on two vocally healthy male subjects, Hess and Ludwigs32 reported that “breathy” and “pressed” phonation were associated with posterior-to-anterior and anterior-to-posterior glottal opening, respectively, whereas “normal voice” was characterized by a “simultaneous” mid-membranous opening. Granqvist et al.33 similarly noted a pattern in which “the anterior part of the glottis opened slightly earlier than did the posterior parts” when a male subject produced “pressed” phonation.
Another issue is the sexual dimorphism of the larynx. In addition to the difference in mass and thyroid angle, there are substantial differences in the morphology of the vocal folds and therefore the glottis. In particular, after approximately 15 years of age, the relative length of the intermembranous glottis is substantially greater in the male larynx.34,35 Although Hirano, Kurita, and Nakashima35 found no appreciable sex difference for the length of the non-vibratory intercartilaginous glottis, they nonetheless reported a clearly dichotomous membranous-to-cartilaginous length ratio of 4.7 to 6.2 in men and 3.3 to 4.5 in women. Pontes et al.36 have since advanced an argument that the sex difference in “glottic proportion” contributes to the greater incidence of vocal nodules among women. At the very least, this difference is likely to have a substantial influence on vocal-fold/glottal dynamics, including those in the horizontal plane. However, with the exception of three female subjects compiled from two separate studies,24,28 all data regarding posteroanterior mucosal behavior have been derived from a substantially limited number of men.
The vibratory behavior of the vocal mucosa is crucial to voice quality and function. Detailing the nature of longitudinal phase differences is important not only for understanding the nature of vocal-fold vibration,33,37,38 but may clarify the reasons behind the development of certain types of dysphonia.39–41 The purpose of this study was to elaborate posteroanterior phase differences in mucosal behavior for vocally healthy men and women using the electroglottogram (EGG) synchronized to images obtained via laryngeal high-speed videoendoscopy (HSV). Also investigated was whether longitudinal mucosal phase differences varied as a consequence of differences in maintained adductory force.
Fourteen vocally healthy speakers (7 men and 7 women, between the ages of 22 and 29 years) were recruited from the Columbia, SC, area. None had a history of vocal pathology or any laryngeal or respiratory symptoms or complaints at the time of participation. Because of the need for placement of a laryngeal endoscope and to ensure adequate illumination and visualization, each of the subjects was asked to produce several steady productions of the vowel /i/, sustained using a “comfortable,” “pressed,” or “breathy” voice. To elicit a pressed voice, subjects were instructed to phonate “as if lifting a heavy box,” whereas a breathy voice was to be associated with a “high loss of air.” Examples of pressed and breathy voice were provided as well and practiced by the subject. Visual examination of the subject’s EGG was used to determine the stability of the phonation as well as to evaluate the adductory force used during the production, based on a subjective assessment of the contact quotient (CQ), the contacting proportion of the vibratory cycle.29,42,43
As shown schematically in Figure 1, the subjects were recorded using HSV precisely synchronized (error of synchronization ≤ 11 µs) with acoustic and EGG signal channels. The HSV system included a monochrome Phantom v7.1 high-speed camera (Vision Research, Inc., Wayne, NJ), equipped with a JEDMED (St. Louis, MO) 70-degree 10-mm rigid laryngoscope (model 49–4072) and a KayPentax (Lincoln Park, NJ) 300-watt xenon light source (model 7152A). The video recordings were captured at the rate of 16,000 frames per second (fps) with a spatial resolution of 320 × 320 pixels and a pixel depth of 12 bits. The EGG signal was obtained using a KayPentax electroglottograph (model 6103). A head-mounted cardioid condenser microphone (model AKG C420; AKG Acoustics Harman Pro GmbH, Munich, Germany) was used for the acoustic recordings. The data acquisition system was a 10-channel 24-bit M-Audio Delta 1010 LT sound card (Avid Technology, Inc., Irwindale, CA), set at the capture rate of 96,000 samples per second. That is, the video and data captures were synchronized at a 1:6 ratio, which allowed 6 data samples to be attributed to the integration time of their corresponding video frame. Following data acquisition, three 1,000-frame video segments from each subject within each voice condition were selected for further analysis. These selected segments represented stable intervals that did not include vocal onset or offset behavior. The complete dataset included 42 segment samples for each of the comfortable, pressed, and breathy phonatory conditions, a total sample size of 126 video recordings.
Following application of a subpixel technique44 to compensate for motion of the endoscope relative to the larynx, the posterior and anterior commissures were identified and digital kymograms (DKGs) were derived at five equally spaced locations along the posteroanterior axis of the vocal folds, namely, at 20%, 35%, 50%, 65%, and 80% line intervals, with the most posterior DKG derived at 20% and the most anterior DKG, at 80% (Figure 2). At the time of extraction from the HSV recording, the DKG images were upsampled using cubicspline interpolation to match the 96,000-Hz temporal resolution of the EGG signal. Based on the consensus of two examiners, five EGG landmarks were identified and manually tagged using customized software: (1) the apparent initiation of contact; (2) the change in slope of increasing contact; (3) the maximum contact/EGG peak; (4) the change in slope of decreasing contact/EGG “knee”;24,28,30 and (5) the apparent termination of contact. Identification of the slope changes was facilitated by examination of the first temporal derivative of the EGG waveform (dEGG). The EGG waveform and contact markers were then aligned with the contemporaneous multiline DKGs for further analysis (as shown in Figure 3).
Samples elicited under the different voice conditions (i.e., “breathy,” “comfortable,” and “pressed”) were included in the analysis if the corresponding EGG met pre-established CQ criteria. CQs were measured by dividing the EGG interval between the two slope changes (marks 2 and 4) by the entire vibratory cycle (mark 1 to the subsequent mark 1), expressed in percent. The contact interval was identified in this way to be comparable to CQs reported in the literature29,42 and because peaks in the dEGG have been shown to correspond well to glottal opening and closing instants.28,45 All samples of elicited “breathy voice,” “comfortable voice,” and “pressed voice” used in the analysis had measured CQs of 40% or less, between 40 and 55%, and 55% or more, respectively. These ranges were determined empirically prior to this investigation and are consistent with male and female CQ data reported by Verdolini et al.46 in a study of vocal-fold impact stress.
The EGG CQs and DKG-determined posterior glottal closure data are shown in Table 1. For all subjects, the mean CQ averaged 31.3% (SD = 7.9), 48.1% (SD = 3.6), and 60.1% (SD = 3.7) for the breathy, comfortable, and pressed voice conditions, respectively. The CQs were significantly different across the three voice conditions (F(2,36) = 94.18, p < .001, η2 = 0.84). Each of the female subjects had a posterior glottal gap, or chink, when producing the breathy sample; all but one female subject had a chink when producing the comfortable voice sample; and only one female subject had a chink when producing the pressed sample. The extent of the posterior gap was either the same or smaller between the breathy and comfortable phonatory conditions, and was progressively smaller (from breathy, to comfortable, to pressed) for the sole subject (F01) who showed a chink in all three conditions. Conversely, as suggested by the literature,47,48 the male subjects were substantially less likely to have a posterior chink and, when they did, it was typically less extensive and most commonly occurred when producing a breathy voice sample. Also consistent with previous studies was the absence of a significant sex difference in CQ values,43,49 despite a slight propensity toward somewhat longer contact intervals for young adult men under similar comfortable and pressed phonatory conditions.49,50 Nonetheless, despite the greater incidence of a posterior glottal chink in women, the female subjects did not have substantially lower CQs, largely because the measure is sensitive to the relative duration of vocal-fold contact rather than the extent or completeness of contact.43,49
Table 2 outlines the pattern of vocal-fold contact, gained and lost, for each of the subjects in the three voice conditions. Zipperlike anterior-to-posterior (A→P) contact and posterior-to-anterior (P→A) separation patterns were uniformly demonstrated by the female subjects for the breathy and comfortable voice samples (Figure 4), whereas several showed simultaneous contact and/or separation along the length of the vocal folds when producing a pressed voice. The contact and separation patterns were far more variable for the male subjects. For the breathy samples, three of the male subjects used the A→P contact pattern and P→A separation pattern (Figure 3) that was universal among the women. However, the men also exhibited simultaneous contact (M01) or a contact pattern initiated at the mid-glottis and extending toward the anterior and posterior ends (M02, M03), or initiated at the anterior and posterior extremes, subsequently extending toward the middlemost portion of the glottis (M05). For the comfortable voice samples, only two of the men evinced the A→P contact and P→A separation patterns used by the female subjects. More common for the men was a simultaneous contacting pattern (Figure 5), whereas separation varied between an A→P, a P→A, and an initial mid-glottis pattern. The male subjects showed the greatest consistency with a simultaneous contact pattern when producing pressed voice, although they were as likely to exhibit an A→P as a P→A pattern of vocal-fold separation.
Interestingly, regardless of voice condition, all subjects, female and male, who had an observed posterior glottal chink used a P→A separation pattern. The sole exception was one male subject (M06) with a posterior gap, whose separation pattern for comfortable voice began at the midglottis and extended toward the anterior and posterior ends. For breathy and pressed phonation, each male subject without a glottal chink produced a separation pattern that began at a location other than the posterior end of the glottis. Nonetheless, P→A separation patterns were observed for several female subjects producing pressed voice and for several male subjects producing comfortable voice despite no evidence of a static posterior glottal gap.
The identified EGG landmarks corresponded well with DKG-determined contact behavior. In general, regardless of voice condition, the first observed contact in any of the simultaneous DKGs appeared to occur synchronously with the first EGG marker. The second marker, representing the initial dEGG peak, typically occurred within 100 µs of the first and appeared to correspond to more complete longitudinal contact between the lower vocal-fold margins. The third EGG marker, placed at the EGG peak, appeared to correspond to when there was maximum longitudinal contact between the upper vocal-fold margins, as suggested by light reflecting from the traveling mucosal wave on the superior surface. The fourth EGG marker, placed at the EGG “knee” and associated with a second dEGG peak, was often difficult to assign because a decontacting slope change is not a consistent EGG feature.24,27,28,51 When the knee was clearly identifiable, it was typically associated with the first breach in contact, independent of any posterior glottal chink when present. Likewise, the fifth EGG marker was difficult to place, because there was often no clear indication of the transition from the decontacting portion of the EGG and the loss of contact. Nonetheless, this marker was most often associated with a more complete loss of longitudinal contact between the upper vocal-fold margins as determined from the DKGs.
For comfortable voice samples, the present study supports prior observations22,27,28 regarding a tendency toward A→P contact and P→A separation patterns. Although the present findings of a predominantly P→A pattern of vocal-fold separation in breathy voice were consistent with Hess and Ludwigs,32 the mid-glottal separation they reported for comfortable phonation was not observed in any of the female subjects and was seen in only two of the male subjects. Likewise, the A→P separation pattern for pressed voice noted by several investigators25,26,32,33 was only observed in three male subjects and in none of the women. In general, the female subjects in the present study showed the greatest consistency, demonstrating a zipperlike A→P contact and P→A separation pattern with a minority using a simultaneous contact and/or separation pattern when producing a pressed phonation. Overall, the male subjects in the present investigation showed an increasing tendency to use a simultaneous contact pattern in comfortable and pressed voice, while demonstrating a zipperlike separation pattern. The longitudinal phase difference was primarily associated with a P→A separation pattern for breathy and comfortable voice, with an increasing likelihood of an A→P pattern when producing pressed voice.
The EGG findings of this study are consistent with previous reports that have related similar landmarks to the vertical phase characteristics of vocal-fold vibration.24,27,28,51,52 However, the present results suggest further that an improved ability to interpret the EGG signal will require a better understanding of how longitudinal vocal-fold behavior contributes to the contact pattern. The results also lend support to Rothenberg’s53 early hypothesis that some of the EGG slope changes are most likely tied “to phase differences along the length of the vocal folds,” such as when “the upper margins of the vocal folds begin to separate rather suddenly at one region… and then proceed to separate more gradually along the rest of their length.” This cross-validation of EGG features is also significant in the validation of HSV as a valuable visualization tool.
Despite the use of a limited number of multiline DKGs, the present study has demonstrated that longitudinal phase differences are common and expected in normal vocal-fold vibration. It must be recognized, however, that the findings are limited by the fact that visualization was restricted to a superior view. Substantial mucosal upheaval has been observed on the inferior fold surface.54 Viewing the vocal folds from below, Hirano et al.23 noted that vocal-fold separation is “delayed at the vocal process relative to the membranous vocal fold” and, more recently, Jiang et al.18 provided data to suggest that there is a greater posteroanterior phase difference for the lower margins of the vocal folds than for their upper edges. As recently suggested by Krausert et al.,55 because extensive contact and separation behavior may be hidden from a standard laryngoscopic view, there may be great advantage to supplementing even very sophisticated vocal-fold imaging with glottographic techniques.
Research funded by NIH grant R01 DC007640: “Efficacy of Laryngeal High-Speed Videoendoscopy.” Special thanks to Habib Moukalled for his assistance with software design and to Heather Shaw Bonilha and Terri Treman Gerlach for their assistance with data collection.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Robert F. Orlikoff, Department of Speech Pathology and Audiology, West Virginia University, Morgantown, WV, USA.
Maria E. Golla, Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, OH, USA.
Dimitar D. Deliyski, Communication Sciences Research Center, Cincinnati Children’s Hospital Medical Center, Department of Otolaryngology–Head and Neck Surgery, University of Cincinnati, Cincinnati, OH, USA.