DISE has good test-retest reliability, especially in its evaluation of the hypopharyngeal airway. The interpretation of ICC analogs, similar to Cohen's kappa values, is controversial, but one framework has been proposed by Landis and Koch.20
While the coefficient point estimates and their confidence intervals overlap multiple categories, overall the level of agreement is moderate to substantial.
Test-retest reliability compares results from two distinct tests, and there are four potential comparisons that describe the variation in findings. The lowest (in magnitude) of the four estimates of reliability presented here is generally that for the ICC Reviewer-Exam, comparing findings across reviewers and exams (one reviewer's evaluations of one exam to the other reviewer's evaluations of a distinct exam). Because this incorporates multiple levels of variation, it is not surprising that the ICC analog values were the lowest, although still largely in the range of moderate-substantial agreement. There were no systematic differences in the point estimates and confidence intervals among the other three measures of test-retest reliability. Future studies incorporating more reviewers, evaluations of the same exam, examinations of each patient, or more patients may generate more precise estimates of these aspects of test-retest reliability.
DISE offers a unique structure-based assessment of the airway, compared to other commonly-used evaluation techniques. We present a region- (Analyses I and II) and structure-based (Analysis III) method to serve two major purposes of OSA surgical upper airway evaluation: characterizing the pattern of airway obstruction and selecting among treatments. We believe that identifying the primary structure contributing to obstruction in each segment of the airway serves both purposes, and the test-retest reliability of this specific assessment was higher than that for the involvement of individual structures (including those with primary and secondary roles).
There is wide variation in the DISE classification schemes presented in the literature, and we attempted to balance completeness and simplicity to describe the variation in patterns of upper airway obstruction. The upper airway does not consist of two independent regions (palate and hypopharynx), each containing various structures that can contribute to airway obstruction in isolation; instead, these two regions and the various structures have dynamic interactions that are not understood completely. Any attempt to simplify these relationships will have important deficiencies, and we anticipate revisions to our method over time. In fact, the first modification to our original method was based on the idea that it was important to determine the primary structures contributing to airway obstruction in each region (as in Analysis III).
Because surgical procedures are ultimately directed at specific structures, DISE may improve procedure selection and outcomes. This is especially true for the hypopharyngeal airway, where evaluation of the hypopharyngeal airway—and the choice among treatment options—is often a critical factor in surgical decision making. The three structures that most commonly contribute to hypopharyngeal airway obstruction are the tongue, epiglottis, and lateral pharyngeal walls, and the results for Analysis III indicate that DISE can differentiate their contributions to airway obstruction with good test-retest reliability. The array of surgical and non-surgical treatment options to treat the hypopharyngeal airway may exert differential effects on these various structures. For example, the genioglossus advancement and tongue radiofrequency procedures likely produce greater changes in tongue position during sleep than in the lateral pharyngeal walls. The hyoid suspension may have less effect on tongue position but may alter the behavior of the epiglottis and/or lateral pharyngeal walls during sleep. Because there is choice among these procedures for treatment of the hypopharynx, a diagnostic test may be most valuable if it not only determines whether hypopharyngeal obstruction is present but also which structures contribute most to that obstruction.
For surgical treatment of the palatal airway, DISE may not differentiate palate vs. velopharynx lateral pharyngeal wall obstruction as well as for the hypopharyngeal structures, based on the lower point estimates and wider confidence intervals. The implications are unclear. The most common surgical treatment for palatal obstruction in previously-untreated OSA patients is uvulopalatopharyngoplasty, with tonsillectomy in most patients without previous tonsillectomy. Because a similar surgical approach is used for patients regardless of whether the soft palate or velopharynx lateral pharyngeal walls contribute more to obstruction, the question of whether a patient has palate-level obstruction or not (as in Analysis I) may be more important than determining whether specific structures contribute to collapse (Analysis III). Because almost all patients in this study demonstrated palatal obstruction, the confidence intervals for Analysis I for the palate were wide, suggesting that the test-retest reliability cannot be determined precisely by this sample. Differentiating palate vs. velopharynx-level lateral pharyngeal wall obstruction based on DISE (Analysis III) appears more challenging. Again, the importance of this distinction is unclear; with the adoption of a wider variety of first-line palate procedure, this may prove more important.
This study is not without limitations. First, the confidence intervals for many ICC analog estimates were somewhat wide. We believe that the pattern of the estimates and confidence intervals is more important than any single result and that the pattern suggests that the test-retest reliability of DISE is good (moderate-substantial according to one framework). Larger studies could encompass a broader population of OSA patients and generate estimates with narrower confidence intervals, however.
DISE as a diagnostic procedure has important logistical drawbacks. There are costs and risks (allergic reaction and airway obstruction) that must be balanced against the benefits of the procedure, and ultimately specific subgroups of patients may benefit most from the procedure.
Although DISE has demonstrated validity compared to a gold standard of polysomnography,10-13
the ideal fiberoptic evaluation of the airway would occur with natural sleep. Previous researchers have shown that this is cumbersome and problematic, in part due to activation of airway reflexes with instrumentation. DISE requires drug-induced sleep, and the differences from natural sleep have not been elucidated completely. Because there is likely heterogeneity in the anatomical factors that produce airway obstruction in OSA, it is reassuring that patients in this study demonstrated a diversity of obstruction patterns during DISE. Further research can compare upper airway mechanics and physiology during drug-induced and natural sleep.
The final limitation of our study was that both reviewers are experienced sleep surgeons. We examined four types of test-retest reliability and found that the correlation for ratings for two different reviewers was similar to the correlation for different evaluations and exams. The generalizability of the findings can be explored with larger studies that include more reviewers.