|Home | About | Journals | Submit | Contact Us | Français|
Signal typing is central to the understanding of vocal fold vibratory patterns. Digital kymography (DKG) allows the direct observation of vocal fold vibratory patterns, and therefore, using DKG for vibratory signal typing may provide a useful complement to traditional signal typing techniques.
Video data collected from twenty larynges excised from mongrel dogs were observed using DKG in order to find examples of type 1 (nearly periodic), type 2 (subharmonic), and type 3 (aperiodic) vibratory patterns. Time series, frequency spectra, and correlation dimensions were calculated for each signal type.
The type 1 pattern showed a periodic time series of glottal edge and discrete frequency spectrum. The type 2 vibratory pattern displayed a time series of alternating high and low amplitude waves and a frequency spectrum that included a subharmonic (f0/2) frequency component. Regular and symmetric vibratory patterns were observed in the type 1 and type 2 patterns. The type 3 vibratory pattern was characterized by an aperiodic time series of glottal edge, broadband frequency spectrum, and irregular and asymmetric vibratory patterns. Correlation dimension estimates increased from type 1 to type 2 to type 3.
DKG imaging demonstrated an ability to assign a signal type to various laryngeal vibrations. Signal typing techniques utilizing direct observations of the vocal folds could be useful to determine valid methods for the analysis of vocal fold vibrations.
Visualization of vocal fold vibrations can be a powerful tool in the diagnosis of laryngeal pathology. A variety of laryngeal imaging methods, such as stroboscopy, digital kymography (DKG), and high-speed photography, are currently employed to accomplish this. Stroboscopy is one commonly used method, but its utility is limited to cases of periodic vibration.1–3 In order to capture aperiodic vibratory patterns of the vocal folds, DKG1–10 and high-speed imaging11–13 have emerged as effective methods of visualization. As a more financially feasible and time-efficient imaging method than high-speed imaging1–3,7–10, DKG allows for the objective evaluation and analysis of vocal fold vibratory parameters at a selected single line across the glottis1–4,6,7–11 and thus plays an important role in understanding the mechanisms of disordered voice production and enhancing the assessment of laryngeal pathology.
Titze has suggested that acoustic voice signals can be qualitatively classified into three types: the nearly periodic type 1 voice signals, the type 2 signals containing strong modulating or subharmonic frequencies, and the aperiodic type 3 signals.15 Previously, signal typing of voices has been undertaken by means of acoustic voice analyses.15–21 Traditional perturbation methods, such as jitter and shimmer, are appropriate for type 1 voice signals, but they may not be applied to type 2 and 3 signals. To further quantitatively describe type 2 and 3 acoustic voice signals, nonlinear dynamic methods have proven to be useful.15–21 However, acoustic voice signals in these studies mix the information obtained from left and right vocal fold oscillations and couple with other external factors such as vocal tract filtering and aerodynamic turbulent noise. The acoustic recording does not discriminate the dynamic differences between the two sides of the vocal folds, and it is unable to provide direct information about the vibratory dynamics of the vocal folds. Although several studies have directly observed vocal fold vibration, vocal fold vibratory patterns have not been classified into types using laryngeal imaging.
DKG may provide a solution because, unlike acoustic recording, it can directly image vocal fold vibrations and effectively visualize both periodic and aperiodic vibrations. Some previous studies employed nearly periodic vibratory patterns to quantify mucosal wave amplitude and frequency via curve fitting analysis, but curve fitting could not be used to analyze aperiodic patterns.2–7 Wurzbacher22–24 and Schwarz25 estimated the biomechanical parameters of the vocal fold model for nearly periodic glottal area measurements developed by high speed recordings. However, these studies may not be applicable for aperiodic or chaotic vocal fold vibrations. In other studies, irregular vibratory patterns have been observed, but vibratory pattern typing of vocal folds has not been performed.26–28 Introducing the concept of signal typing to DKG is desirable for determining valid methods for DKG image analysis and exploring new analysis methods that are potentially valuable. Direct visualizations of the vocal folds via DKG could eliminate some of the limitations associated with acoustic data analysis techniques. Thus, the purpose of this study is to develop Titze’s acoustic signal typing concept into vocal fold vibratory pattern typing and to examine three types of vocal fold vibratory patterns via DKG imaging in excised larynx experiments: nearly periodic type 1, subharmonic type 2, and aperiodic type 3. The temporal and frequency characteristics of these three types of vibratory patterns of the left and right vocal folds are described. Nonlinear dynamic analysis is performed to quantify these three types of vibratory patterns. DKG will be established as a useful way to signal type vocal fold vibrations.
The excised larynx experimental setup is illustrated in Figure 1(a). Twenty excised larynges harvested from mongrel dogs euthanized for non-research purposes were used in the experiments. Using a hose clamp, a segment of the trachea inferior to the larynx was secured to a pipe. An Ingersoll-Rand (Type 30) conventional air compressor was used to generate airflow. Prior to passage through the larynx, the air was conditioned to 35–38 ° C and 95%–100% relative humidity using two ConchaTherm III heater-humidifiers (Fisher & Paykel Healthcare Inc., Laguna Hills, CA) placed in series. A three-pronged micrometer was used to secure the arytenoid cartilages and to control adduction and abduction. Using the air compressor, subglottal pressure Ps was gradually increased until phonation occurred.
Vocal fold vibrations were then recorded at subglottal pressures of 8 cm H2O, 30 cm H2O, and 40 cm H2O using a high-speed digital camera (Fastcam-ultima APX) mounted on a track system above the larynx. Although subglottal pressures of 40 cm H2O may be difficult for humans to achieve, they are easily applicable for excised larynges. Our previous study has shown that an extremely high subglottal pressure may cause an irregular vocal fold vibration.20,29 The camera recorded vocal fold vibration at a rate of 4000 frames per second with a resolution of 256 × 512 pixels. A single line at the midpoint of the vocal fold was selected to create a kymographic image, as shown in Figure 1(b). A thresholding segmentation method was applied in order to extract the glottal edges from the kymographic image. A threshold was selected such that the pixel intensities in the glottis were below the threshold and vocal fold tissue pixel intensities were above it. Based on the detected glottal edges of the image series, the vibratory patterns xα(t), where of the left and right vocal folds at the specific line-scan position, can be derived.
The vibratory time series xα(t) and the corresponding frequency spectra of the kymographic images of the left and right vocal folds were obtained. The kymographic image was analyzed to determine whether the vibratory pattern should be classified as Type 1, Type 2, or Type 3. These classifications of vocal fold vibratory type were developed from Titze’s classification of periodicity in acoustic signal typing.15 The vibratory patterns displaying nearly periodic movements of the left and right vocal folds and much higher fundamental frequency components than other frequency components were categorized as Type 1. Vibratory patterns displaying strong subharmonic frequencies or modulating frequencies were categorized as Type 2. Lastly, vibratory patterns displaying aperiodic movements and broadband spectrum were considered as Type 3.
In order to further quantitatively describe the dynamics of these vibratory patterns, nonlinear dynamic analyses were performed. Detailed descriptions of nonlinear dynamic analysis can be found in the literature.16,18,21,29–33 Phase space reconstructions and correlation dimension calculations were based on the numerical algorithms that were applied to analyze excised larynx phonations20 and human voices16,18,21. Briefly, a phase space can be reconstructed with the time delay vector,34
where τ is the time delay and m is the embedding dimension. The time delay τ was estimated using the mutual information method.35 The correlation dimension can be calculated according to its definition,36
where r is the radius around Xi,α and the correlation integral is calculated.37 A more detailed description of r is given by our previous studies.38 W was set to be the time delay τ and θ(x) satisfies . Because of the finite signal length and finite measurement accuracy, there is a finite region in the curve of log2 Cα(W, N,r) vs. log r in which the slopes of log2 Cα(W, N,r) vs. log r curves initially increase but eventually converge as m increases.36,37 We derived the dimension estimate and its standard deviation (< 5%) using linear curve-fitting of the curve of log2 Cα(r) vs. log2 r in this region. Correlation dimension D2 specifies the number of degrees of freedom needed to describe a system. The estimate of D2 of stochastic white noise does not converge with increasing m, whereas D2 of a deterministic system converges to a finite value with the increase of m. The correlation dimension of these three types of vibratory patterns can then be obtained when the embedding dimension m is substantially large.
Figures 2(a), (b), and (c) show the vibratory patterns of the type 1, 2, and 3 kymographic data of the larynx no. 16, respectively. For 8 cm H2O of subglottal pressure Ps, the type 1 pattern clearly displays a regular, periodic glottal area. The type 2 pattern arises at Ps = 30 cm H2O, shows double opening, and alternates between small and large mucosal wave amplitudes. In comparison with the type 1 and 2 vibratory patterns, the type 3 pattern arises at Ps = 40 cm H2O and has no apparent periodicity. All 20 recorded larynges showed qualitatively similar DKG patterns. Therefore, because larynx no. 16 is a typical representation of the group, only the results of this larynx are given here.
From the detected glottal edges, figures 3(a), (b), and (c) show the time series of the three types of kymographic data of the left and right vocal folds. Figures 4(a) through (c) display the corresponding frequency spectra of these vibratory patterns. The time series xα(t) and the frequency spectra of the DKG images clearly distinguish the three different types of vibratory patterns. The type 1 vibration shows a periodic time series, and the frequency spectrum reveals a discrete fundamental frequency f0; the type 2 vibration shows the alternating peak and subharmonic frequencies f0/2, respectively; lastly, the type 3 vibration displays an aperiodic time series and a broadband frequency spectrum.
Figures 5(a) and (b) show the correlation integral log2 Cα(r) vs. log2 (r) of the type 1 DKG data of the left and right vocal folds, respectively, where the curves from top to bottom correspond to the embedding dimension m =1,2,···10. The slopes of these curves increase with m and approach saturation, giving the dimension estimates of the type 1 vibrations of the left and right vocal folds as 1.15 ± 0.01 and 1.12 ± 0.01, respectively. Figure 6 illustrates the estimated slopes versus m of the signals in Figure 3. Differing from the infinite dimension of the white noise, the estimated dimension D2α of the type 2 vibrations of the left and right vocal folds were 1.98 ± 0.01 and 1.86 ± 0.04, respectively. The D2α of the type 3 vibrations of the left and right vocal folds were 2.95 ± 0.01 and 3.17 ± 0.03, respectively. These data show that the correlation dimension increases from type 1 to type 3 signals.
Three types of vibratory patterns were recorded using DKG imaging techniques and analyzed using time series, frequency spectra, and nonlinear dynamics. As shown in figures 2(a), 3(a), and 4(a), the type 1 vibrations of the left and right vocal folds display a periodic glottal edge series and discrete frequency spectra. The nonlinear dynamic method effectively quantified the type 1 vibrations in DKG imaging, as shown in figure 5. The estimated dimensions of the type 1 vibrations in figure 6 were close to 1 to suggest that the left and right glottal edges are dominated by one-dimensional symmetric vibrations. The results of this study corroborate the findings of several previous studies using excised canine and in vivo human larynges, showing that type 1 vocal fold vibrations are dominated by the first few orders of harmonic components.5,7,18
Type 2 and type 3 vibrations have not received the same attention in the literature as type 1 signals due to the difficulty of quantifying these more complex vibratory patterns.15,17 For this reason, subharmonic vibrations displaying period doubling were quantified in this study by using nonlinear dynamics techniques. As shown in figure 2(b), subharmonic vocal fold movement characterized by period doubling can be observed in the vibratory pattern of the type 2 signal. The signal produced a time series characterized by alternating high and low amplitude waves, while the frequency spectrum revealed a subharmonic frequency component in figure 4(b). The result of the dimension estimate of the type 2 pattern in figure 6 is less than 2 and greater than 1, indicating that 2 independent variables, or degrees of freedom, are needed to describe its dynamics.
The type 3 pattern showed in figure 2(c) displays aperiodic and asymmetric vibrations without any clear pattern in either amplitude or frequency (Figs 3(c) and 4(c)). For such a signal from DKG imaging, traditional perturbation techniques may not be applicable, as typical perturbation analyses can not be applied to an aperiodic glottal area time series.15,16 It is sometimes possible to qualitatively distinguish the kymographic image of a type 2 vibratory pattern from a type 3 simply by examining the kymographic images for repeating patterns, which would indicate a type 2 signal. However, such a distinction becomes more difficult to make as the subharmonics become more complicated (e.g. signals characterized by period tripling, quadrupling, quintupling, etc.). For signals such as these, nonlinear dynamic analysis techniques can provide valuable information about the nature of the vibratory pattern. As shown by figure 6, the D2α of the three signals increases as the complexity of the signals increases, with the type 3 vibration clearly displaying a higher D2α than the type 2 signal, which, in turn, displays a higher D2α than the simple type 1 vibratory pattern. Such quantitative analysis methods are valuable in distinguishing type 2 and 3 vibratory patterns in addition to the aforementioned qualitative DKG image distinction between type 2 and type 3 patterns.
The use of signal typing in DKG imaging may help determine which analysis techniques would be most appropriate. For example, if an image displays the regular, periodic motion of a type 1 pattern, the traditional mucosal wave curve fitting would be applicable.4–7,11,12 However, for a signal displaying a more complex pattern, such as those characteristic of type 2 and type 3 vibrations, such analysis techniques may need to be improved; nonlinear dynamic analysis is one means of accomplishing this. Svec has shown that the recognition of irregularities in glottal activity based on kymographic images may allow one to diagnose certain diseases.14 Left-right asymmetries in mucosal wave amplitude and velocity associated with asymmetric vocal fold tension may indicate laryngeal paralysis.6,14 Similarly, if incomplete glottal closure and chaotic vocal fold vibrations are observed in the DKG pattern, vocal mass lesion, such as vocal nodules and polyps, may be diagnosed. Valid quantitative analyses of DKG images derived from signal typing have the potential to provide a valuable tool to the assessment of laryngeal pathology.
Acoustic analyses have been established as a reliable and accurate method to assign signal types to voices.15,17,21 However, acoustic signal typing does have its limitations, such as the mixing of information from the left and right vocal folds, in addition to problems associated with external factors, such as interference from vocal tract filtering and aerodynamic turbulent noise. DKG has shown utility in signal typing and has the potential to overcome these problems by directly visualizing the motion of both vocal folds. Through direct visualization in figure 2, DKG displays the pattern of both the left and right vocal folds without mixing them, as they would be in an acoustic waveform. DKG is not vulnerable to the errors created by vocal tract filtering and aerodynamic turbulent noise, as it observes the vocal folds from an unobstructed view, while in acoustic analysis the sound must travel from the vocal folds, through the pharynx, and through the oral cavity before it can be recorded. Also, DKG and the visual data that it collects are not affected by the ambient noise or environmental factors that may influence acoustic data. However, DKG is not without its own limitations. Adjustments of line-scan positions used to create DKG images can potentially result in differences in the appearance of the resulting kymogram.5 Despite their limitations, Type 1 vocal fold vibratory patterns may correlate with Type 1 acoustic signals because nearly periodic vibratory patterns may lead to nearly periodic voice in most instances. However, the two signal types are measured in a much different way, and acoustic methods allow for other non-glottal factors such as vocal tract and aerodynamic noise to affect the acoustic signal type. In some cases, VKG signal typing may show a different result than acoustic signal typing. For example, in some breathy voices, the vocal folds may exhibit regular and periodic “Type 1” vocal fold vibratory patterns, but incomplete glottal closure may cause a broadband noise and aperiodic voice signal, which can be classified as a “Type 3” acoustic signal. Therefore, the use of traditional acoustic methods in conjunction with DKG may represent an improved method of signal typing, mitigating the individual problems associated with the two methods and allowing comparisons to be made to ensure accurate data extraction, resulting in a more complete picture of vocal fold vibrations.
Three types of vocal fold vibrations were directly observed and quantified based on DKG imaging in excised larynx experiments and time series, frequency spectra, and nonlinear dynamic analysis methods. These techniques effectively described the dynamics of these three types of DKG vibratory patterns. The nonlinear dynamic analysis technique of correlation dimension estimation effectively distinguished between type 2 and type 3 vibratory patterns, suggesting that such techniques may provide a solution in distinguishing these often ambiguous vocal fold vibratory patterns. DKG proved to be an accurate method for typing vocal fold vibratory patterns. Its use in conjunction with other previously established acoustic signal typing methods may provide a more complete representation of vocal fold vibrations and increase signal typing reliability.
The research was supported by NIH grant number 1-R01 DC05522 from the National Institute on Deafness and Other Communication Disorders.