|Home | About | Journals | Submit | Contact Us | Français|
We present three methods of performing pattern recognition on spatiotemporal plots produced by pharyngeal high-resolution manometry (HRM).
Classification models, including the artificial neural networks (ANNs) multilayer perceptron (MLP) and learning vector quantization (LVQ), as well as support vector machines (SVM), were evaluated for their ability to identify disordered swallowing. Data were collected from twelve normal and thirteen disordered subjects swallowing 5 ml water boluses. Following extraction of relevant parameters, a subset of the data was used to train the models and the remaining swallows were then independently classified by the networks.
All methods produced high average classification accuracies, with MLP, SVM, and LVQ achieving accuracies of 96.44%, 91.03%, and 85.39% respectively. When evaluating the individual contributions of each parameter and groups of parameters to the classification accuracy, parameters pertaining to the upper esophageal sphincter were most valuable.
Classification models show high accuracy in segregating HRM data sets and represent one method of facilitating application of HRM to the clinical setting by eliminating the time required for some aspects of data extraction and interpretation.
The pharyngeal swallow is a complex physiological event which requires muscle contraction and consequent pressure generation to move a bolus from the mouth to the esophagus (Kim et al., 1997; McConnel 1988; Cook 1991). Accurate quantification of these rapidly changing pressures requires high spatial and temporal resolution. High-resolution manometry (HRM) represents a promising clinical and research tool which is capable of capturing the detailed pressure activity during the pharyngeal swallow.
Our version of HRM (ManoScan360 High Resolution Manometry System, Sierra Scientific Instruments, Los Angeles, CA) uses 36 circumferential pressure sensors which can measure rapidly changing pressures in asymmetric structures such as the pharynx (Fox and Bredenoord, 2008). Though informative and potentially clinically valuable, it has yet to be applied routinely to the assessment of dysphagia. One reason may be difficulty extracting and interpreting the large amounts of data present in the three-dimensional spatiotemporal plot generated by HRM. An algorithm for efficient, automated interpretation of these data based on pattern recognition techniques may be valuable and facilitate increased clinical use.
Classification models, including artificial neural networks (ANNs) and support vector machines, are powerful mathematical models which can classify data into groups according to nonlinear statistical analysis (Cross et al., 1995; Baxt 1995; Santos et al., 2006). Further, ANNs can handle extremely large data sets. ANNs have been used to analyze voice and swallow events, differentiating between normal and disordered events, as well as distinguishing among different types of disorders (Cross et al., 1995; Baxt 1995; Santos et al., 2006). Acoustic analysis of pathological voice production, achieved a 93.5% success rate in the classification of unknown voice samples as normal or pathological with ANNs (Boyanov and Hadjitodorov, 1997). Recently, pattern recognition of acoustic data has been used to differentiate between patients with muscle tension dysphonia and adductor spasmodic dysphonia (Schlotthauer et al., 2010). ANNs have also been used to differentiate between normal and dysphagic subjects based on swallowing acoustics (Lazareck and Moussavi, 2004). Additionally, patients have been accurately ruled in for gastroesophageal reflux with 100% accuracy (Pace et al., 2005).
Germane to the current study, patients have been classified according to their type of esophageal dysphagia (esophageal dysmotility) based on manometric measurements with an 8 sensor Dentsleeve manometric catheter, achieving a classification accuracy of 80% (Santos et al., 2006). Though previously applied to traditional esophageal manometry, ANNs and other classification models have not been used with HRM of the pharynx. The amount of data points sampled and the potential number of variables extracted increases dramatically when moving from traditional to high-resolution manometry and from measurements in a relatively simple structure, the esophagus, to a complex structure, the pharynx. As such, HRM is well-suited to analysis by ANNs and other classification models.
As a first step in this process, we determined if pattern recognition techniques could correctly classify a swallow as normal or abnormal. We analyzed data from normal and dysphagic subjects and extracted feature vectors containing relevant parameters such as maximum pressures and timing events. Feature vectors form a training set, which is used as the input to train several types of neural networks including a multilayer perceptron, support vector machine, and self-organizing map with learning vector quantization. These networks utilize machine learning algorithms to classify swallows as normal or abnormal. Parameters were tuned to achieve a higher correct classification rate, and the components of the feature vector were examined to consider their individual contribution to classification. Thus, the purpose of this study was to determine which classification approach yielded the most accurate classification of normal versus abnormal swallowing pressure patterns, as well as to determine the relative importance of different feature sets in these classifications.
A solid-state high resolution manometer was used for all data collection (ManoScan360 High Resolution Manometry System, Sierra Scientific Instruments, Los Angeles, CA). The manometric catheter has an outer diameter of 4.1 mm and 36 circumferential pressure sensors spaced 1 cm apart. Each sensor spans 2.5 mm and receives input from 12 circumferential sectors. These inputs are averaged and a mean pressure is recorded as the pressure detected by that individual sensor. The system is calibrated to record pressures between −20 and 600 mmHg with fidelity of 2 mmHg. Data were collected at a sampling rate of 50 hertz (Hz) (ManoScan Data Acquisition, Sierra Scientific Instruments). Prior to calibration, the catheter was covered with a protective sheath to preserve sterility without the need to sterilize the catheter between uses (ManoShield, Sierra Scientific Instruments). The catheter was calibrated before each participant according to manufacturer specifications.
Twenty-three subjects participated in this study with the approval of the Institutional Review Board of the University of Wisconsin-Madison. Twelve subjects were without swallowing, neurological, or gastrointestinal disorders, while thirteen had a diagnosis of a swallowing disorder. All subjects in the disordered group reported at least one symptom of dysphagia: diet change, food sticking, cough with eating, or globus sensation. Subjects also displayed abnormalities on either fiberoptic endoscopic evaluation of swallowing (FEES) or modified barium swallow study (MBSS), as determined by their medical history. Specific clinical characteristics of the disordered subjects are presented in table 1. Subjects displayed significant variation in etiology and manifestation of dysphagia. Including a diverse subject group allowed us to evaluate the robustness of our analysis and also reflects the wide range of symptoms that patients present to the otolaryngologist or speech-language pathologist. Participants were instructed not to eat for four hours and not to drink liquids for two hours prior to testing to avoid any potential confounding effect of satiety.
Topical 2% viscous lidocaine was applied to the nasal passages with a cotton swab and participants gargled a solution of 4% lidocaine (1 to 2 cc) for several seconds. The manometric catheter was lubricated with 2% viscous lidocaine to ease passage of the catheter through the pharynx. Once the catheter was positioned within the pharynx, participants rested for 5–10 minutes to adjust to the catheter prior to performing the experimental swallows.
For the normal subjects, a 5 ml water bolus was swallowed five times while the subject was upright with the head in the neutral position. Each water bolus was delivered to the oral cavity via syringe. Four random swallows from each normal subject were included in the analysis to ensure approximately equal numbers of normal and disordered swallows were inputted into the ANNs. Disordered participants swallowed 5 ml boluses between one and five times. Forty-eight swallows were analyzed for normal subjects and forty-one swallows were analyzed for disordered subjects. The number of samples per class in a pattern recognition problem should be on the order of five times the number of features worth of samples in each class (Jain and Chandrasekaran, 1982). Our classes contain roughly this number in the full featured set, and meet or exceed this in the reduced feature sets.
Pressure and timing data were extracted using a customized MATLAB program (The MathWorks, Inc., Natick, MA) which locates peak pressures in areas of interest [velopharynx, region of the tongue base/posterior pharyngeal wall, and upper esophageal sphincter (UES)] and then calculates relevant parameters based on those points (Mielens et al., 2011). The basic workflow is automated, but the user may override program suggestions in cases of misidentification and manually select the correct manometric sensors and temporal location of the areas of interest.
Regions of interest were defined manometrically as in McCulloch et al. (2010). The velopharynx is the region of swallow-related pressure change just proximal to the area of continuous nasal cavity quiescence and extending two centimeters distally. The tongue base is the area of swallow related pressure change with a high pressure zone approximately midway between the nasopharynx and UES, with its epicenter at the high pressure point and extending two centimeters proximal and distal to that point. The UES is the midpoint of stable high pressure just proximal (rostral) to the baseline low esophageal pressure zone, extending to a point of low esophageal pressure distally and low baseline pharyngeal pressure proximally. During swallowing, the UES is mobile along the catheter, moving rostrally as much as 4 cm. We account for this movement in our analysis by treating the UES as a range of sensors, and selecting the appropriate sensor for a given time when considering specific phases of the swallow.
Data were extracted automatically as in Hoffman et al. (2010). An example of the automated analysis algorithm screen is shown in figures 1a and 1b. To locate the regions of interest, the program first locates the peak pressure values on each sensor channel. Once the range is determined, the program identifies which peaks best represent the velopharynx and tongue base. This determination is made on the profile of the peaks present within the range of interest. The velopharynx is detected by comparing the most proximal (rostral) peaks of the range, as the peaks increase continually until maximal velopharyngeal pressure is reached. After the sensor containing the velopharyngeal pressure max is identified, the peaks of the sensors immediately caudal to the maximum continually decrease to a local minimum. The region of the tongue base/posterior pharyngeal wall is then detected by comparing the sensors immediately below this local minimum, which increase until another local maximum is reached, the maximum tongue base pressure. The location of the UES is determined by computing the average resting pressure of each sensor and selecting the sensor with the highest value. Additional pressure maximums before the opening and after the closing of the UES are also of interest. To locate these maximums, allowing for the inherent movement of the UES during swallowing, the program considers up to three sensors immediately rostral to the detected UES sensor. For these sensors there are two peaks corresponding to the pre- and post-swallow UES pressure maximums on that channel, and the highest among the candidate peaks are chosen as the true pre- and post-swallow UES pressure maximums. Minimum UES pressure is also calculated by finding the point of minimum pressure between the detected pre- and post-swallow UES pressure maximums. We consider sensor channels immediately rostral to the UES resting position in order to account for movement of the UES during swallowing.
Timing information is calculated by measuring the time elapsed between pressure maximums, as well as the onset and offset of elevated pressure on the relevant sensor channel. Parameters including durations and the rate of pressure increase are determined based on these onset and offset points. UES activity time is calculated similarly, by calculating the difference between the post-swallow UES pressure peak and the point at which the UES pressure begins to fall. Total swallow duration is defined as the time lapse between onset of velopharyngeal pressure and the post-swallow UES pressure peak.
While maximum pressure can provide valuable information on swallowing physiology which can easily be compared to previous manometric investigations, it does not provide a complete picture of pharyngeal pressure events. Measuring the total pressure created in a specific region offers more information and, when combined with durative data, reveals more about the shape of the pressure curve and thus a better estimation on the pressure affecting bolus propulsion. Integrals are calculated for the area beneath the velopharynx and tongue base pressure curves, as well as above the UES minimum with the UES resting pressure as an upper limit. Temporal bounds in all cases are the onset and offset of pressure elevation or depression determined previously.
The pharyngeal swallow can be thought of as a traveling pressure wave, with peak pressure traveling caudally and ending at the UES. We can calculate the velocity of this pressure wave by taking the distance from the velopharyngeal pressure peak to the maximum post-swallow UES pressure peak and dividing by the time lapse between these two points.
In total, 89 swallows were analyzed and the derived feature sets were used as a basis for determining models of normal and disordered swallowing. By attaching the known status of a swallow to its feature vector, machine learning techniques can be applied with the goal of modeling the relationship between the input features and the pathological status of a given swallow. These techniques share the common procedure of first being presented with the known data, going through a 'training' stage, and finally being presented with new data during a 'test' stage. The training data and testing data are kept separate in order to better gauge the generalizing ability of the classification.
Data were normalized and each variable in the data set ranged in value from −1 to 1, with a mean of 0 and a standard deviation of 1. Normalizing the data improves both the efficiency and accuracy of the algorithms, especially when using the scaled conjugate gradient algorithm in the multi-layer perceptron technique (Saarinen et al., 1993). Additionally, principal component analysis was used to reduce dimensionality to improve generalization. The feature set was subjected to two levels of reduction, which removed features that minimally contributed to overall variation. This was done because extra features which do not significantly contribute to classification can be detrimental to correct classification rates. The two levels of reduction were compared to the full, unreduced feature set.
For training purposes, a five-fold cross validation was performed. As random influences may occur during the partitioning process, a more stable performance measurement was obtained by repeating each classification task twenty times and averaging over the individual results. A standard multi-layer perceptron (figure 2a) was created using sigmoidal activation functions in one hidden layer, and the number of nodes in the hidden layer was varied in increments of 5 from N=5 to N=60 to attain better performance. The Levenberg-Marquardt learning algorithm was used. The goal of the learning algorithm in this model is to modify the weights associated with the connections between the nodes (represented by lines in figure 2a) such that an input vector will produce the specified desired output vector, essentially mapping the input space onto the output classes of a normal or disordered swallow.
The second approach used was Kohonen's learning vector quantization (figure 2b) (Kohonen 1988). Learning vector quantization is a competitive learning technique, where the goal is to move 'codebook vectors' into positions where they accurately represent the structure of the input space. Codebook vectors are hypothetical input vectors which attempt to represent the feature space by locating themselves in regions containing many swallows. Then, individual swallows can be classified by determining the codebook vector nearest to it, making learning vector quantization similar to a nearest neighbor clustering method. We modified the number of codebook vectors to reduce misclassifications. Noting that with large codebook sizes comes a high degree of overfitting and poor generalization, we kept the codebook size low enough to prevent each vector from simply associating with a particular subject. This allowed us to keep good generalization with new subjects.
The third and final approach used was support vector machines (figure 2c). Support vector machines are traditionally a linear classification technique, where a hyperplane with a maximum-margin of separation between the two classes (normal and disordered) is constructed. Classification is then a simple matter of projecting a new swallow into this feature space, and determining the side of the hyperplane to which it falls. We use a non-linear approach (Boser et al., 1992) known as the kernel trick, whereby the feature space undergoes a non-linear transformation, and the hyperplane is then fit to this higher dimensional data. In particular, we use a radial basis function with a variable gamma parameter as our kernel function, which provides the transformation from our feature space into the higher dimensional space used for classification.
Separate from the variation of models, the feature set was selectively reduced in an attempt to discover the classification ability of various subsets of the features. These subsets included the categorical elimination of pressures, integrals, timing parameters, and the three manometrically defined regions of interest. In addition to their inclusion in these subsets, all parameters were used on their own as a singular input.
To determine the potential of each classification model as a diagnostic tool, receiver operating characteristic (ROC) analysis was performed and area under the curve (AUC) was determined.
A multilayer perceptron using the Levenberg-Marquardt training algorithm provided the lowest average error rate (3.56% across architectures with varying numbers of hidden nodes) and also performed well with a modest number of hidden nodes (2.58% N=25, where N = number of hidden nodes). Among the learning vector quantization models, codebook size (the number of hypothesized classes) had little impact on misclassification rate (average misclassification rate of 8.97%). The support vector machine models performed the worst, with an average misclassification rate of 14.61%.
Area under the receiver operating characteristic (ROC) curve for multilayer perceptron, learning vector quantization, and support vector machine were 0.95, 0.94, and 0.88, respectively (figures 4a, 4b, 4c).
Principal component analysis provided no significant benefit, and reduced performance in several instances, so the full featured data set was selected as optimal. In addition to PCA, eliminating features associated with the UES resulted in the greatest increase in misclassification, while eliminating the velopharygeal measurements decreased misclassification only slightly. Concerning individual parameters (table 5), the pressure maximum prior to UES opening performed the best, achieving a misclassification rate approaching that of the support vector machines (average misclassification 20.68%). The UES integral performed the worst, barely improving on randomness (average misclassification 45.6%). This analysis was done to identify individual features which contribute most strongly to correct classification. We found that the features associated with the UES were most crucial to achieving correct classification.
Subject health status (normal or disordered) was determined prior to the manometric experiment and accomplished using traditional assessments such as history and physical exam, modified barium swallow study, or fiberoptic endoscopic evaluation of swallowing. We achieved greater than 95% classification accuracy and agreement with health status determined using the aforementioned metrics. Therefore, different results were not obtained between traditional assessment tools and HRM with topical anesthetic. Also, though topical anesthesia was used in this study, it may not have significantly altered swallowing physiology with regard to our measurements (McCulloch et al., 2010). Omitting topical anesthetic in pilot experiments led to increased gagging and resting UES pressure, confounding data collection. As swallowing is a sensorimotor phenomenon, impairing pharyngeal afferent nerves could potentially alter normal physiology. However, mechanoreceptors deep to the mucosa are largely responsible for modulating swallow physiology (Ali et al., 1997) and these fibers were likely unaffected. Additionally, the oral mucosa was minimally affected, and afferent information from this area is also important to regulating swallow function. We believe that the benefit of increased subject comfort at the expense of short-term pain/temperature afferent alteration improved the reliability of our data.
Three classification model techniques were studied to determine effective discrimination between normal and disordered swallowing based on data extracted from HRM spatiotemporal plots. The ability to distinguish normal from disordered swallows is the first step in in distinguishing among different specific disorders, which is the goal of this type of analysis in a clinical setting. If normal subjects present with significant variation, then the likelihood of a classifier distinguishing among disorders is low. The three classification techniques used in this study were multi-layer perceptron (MLP), learning vector quantization (LVQ), and support vector machine (SVM). The multi-layer perceptron technique performed best, achieving an average classification accuracy of 96.44%. However, support vector machines classified normal versus disordered swallows with 85.39% accuracy, which is also considered a high success rate. These results suggest that these techniques, particularly the ANNs, can effectively distinguish normal from abnormal swallowing, which could be valuable clinically.
Our efforts to improve performance by modifying the architecture of the ANN, such as increasing the number of hidden nodes and codebook sizes, had a minimal effect in most cases. This is likely a consequence of implementing measures to prevent overfitting in large networks. Increasing the number of data points available by analyzing more swallows from a larger subject pool could potentially prevent this overfitting and allow these larger networks to run longer, potentially improving accuracy and generalization to new data (i.e. different types of dysphagia).
Differences between the three techniques could point to a lack of well defined clustering in the data or could be the result of combining dysphagic subjects into a single group rather than separating them by disorder. With both learning vector quantization and support vector machines, the winner-take-all nature of the learning algorithm means that correct classification depends to a great degree on the identification of clusters associated with particular output classes. The multilayer perceptron, though clearly improved by clustered data, is not as reliant on that condition since it lacks both the competitive nature of learning vector quantization and the direct partition construction, and inherent clustering, utilized by support vector machines.
Performing a feature reduction analysis allows us to determine which parameters are most frequently affected by dysphagia. Variations in these parameters may be sensitive indicators of swallowing abnormalities. Using maximum pre-opening UES pressure as the only parameter of interest, a classification accuracy of 79.32% was obtained. The accuracy obtained using this one parameter approached that using the entire feature set, demonstrating the impact of the UES to disruptions in swallowing physiology. Removing all UES-related parameters from the feature set resulted in the greatest decrease in classification accuracy (table 4), in part due to the sensitivity of the maximum pre-opening UES pressure. As the UES was the region most sensitive to physiological abnormalities, we expected the UES integral to be a powerful parameter in distinguishing normal from disordered swallows; however, classification accuracy was only 55.40%. At our modest sample size in this preliminary stage, this may be due to some subjects exhibiting hypertonicity and some subjects exhibiting hypotonicity. Additionally, our method used to calculate the UES integral may have contributed to this as local pressure maximums occur far above resting UES pressure, but the integral we measured was the area above minimum UES pressure but below resting UES pressure. Extending the area of interest to include the area bounded by local pressure maximums, and thus integrating by parts over multiple sensor channels, may increase the utility of the parameter by more accurately accounting for the movement of the UES during swallowing. Interestingly, removing velopharyngeal pressure from the feature set did not greatly affect classification accuracy (table 4), resulting in a decrease of only 1-2% depending on the classification method.
Based on the data presented in this study, UES abnormalities are likely the most common errant pressure feature associated with dysphagia, at least for our subject pool. As the UES requires fairly complex and appropriately timed sphincteric action, this is not surprising. Bolus gravitational force may be sufficient to compensate for a dysfunctional velopharynx or tongue base and elevated velopharyngeal pressure may adjust for low tongue base pressure. However, UES opening to facilitate bolus passage to the esophagus and closing to prevent regurgitation and reflux are critical aspects of a functional swallow.
Even at this preliminary stage, the pattern recognition techniques employed here appear to be clinically useful in distinguishing normal from abnormal swallowing. We recognize that the ultimate goal of a swallowing evaluation is to define the underlying physiologic abnormality that impairs successful swallow function. However, an immediate report on whether a subject’s swallow is normal or disordered could aid clinicians in patient screening and assessment based on pharyngeal HRM. The next step is to define manometric abnormalities according to dysphagia characteristics, which would be further aided by coupling HRM with videofluoroscopy. Although this study focused on differentiating normal and disordered swallows, the many features generated by our analysis of HRM data could prove able to distinguish between different types of dysphagia. The high accuracy in this preliminary study provides evidence that HRM has potential as an alternative clinical assessment tool, especially when coupled with ANN techniques.
Three neural networks are presented which can be used effectively to distinguish normal from disordered swallowing based on pharyngeal high-resolution manometry. Feature reduction analysis demonstrated that the upper esophageal sphincter is critical region for distinguishing normal versus disordered swallows in our data set. Continuing to modify the pattern recognition methods along with the use of additional disorder-specific data will refine the utility of these techniques. Even at this preliminary stage, high classification rates were achieved. As high-resolution manometry provides robust information on swallow events, applying pattern recognition methods will be useful in facilitating clinical application and enhancing assessment utility.
This research was supported by NIH grant numbers R01 DC008850 and R21 DC011130A from the National Institute on Deafness and other Communicative Disorders.
Conflicts of interest: None.