|Home | About | Journals | Submit | Contact Us | Français|
Although in high income countries rheumatic heart disease is now rare, it remains a major burden in low and middle income countries. In these world areas, physicians and expert sonographers are rare, and screening campaigns are usually performed by nomadic caregivers who can only recognise patients in an advanced phase of heart failure with high economic and social costs. Therefore, great interest exists regarding the possibility of developing a simple, low-cost procedure for screening valvular heart disease. With the development of computer science, the cardiac sound signal can be analysed in an automatic way. More precisely, a panel of features characterising the acoustic signal are extracted and sent to a decision-making software able to provide the final diagnosis. Although no system is currently available in the market, the rapid evolution of these technologies recently led to the activation of clinical trials. The aim of this note is to review the state of advancement of this technology (trends in feature selection and automatic diagnostic strategies), data available regarding performance of the technology in the clinical setting and finally what obstacles still need to be overcome before automated systems can be clinically/commercially viable.
Over the past few decades, the prevalence and natural history of valvular heart disease (VHD) have changed dramatically in developed nations where rheumatic heart disease (RHD) is now uncommon and residual VHD is mostly degenerative.1 Conversely, in low and middle income countries (LMIC), RHD remains a major burden2 and VHD causes most of the cardiovascular morbidity and mortality in young people.3 In Nepal, RHD is the second most common cause of hospital admission (21%) preceded only by coronary artery disease (43%).4 In China, where the mortality rate from stroke is three times than that from coronary heart disease,5––7 one out of six cases of atrial fibrillation is due to RHD.8––10 The global prevalence of VHD may even increase in the next future because of the increasing population age and life expectancy in LMIC. The diagnosis and risk stratification of patients with VHD are nowadays mainly guided by echocardiography. Although low-cost portable instruments are now available, skilled physicians and expert sonographers are rare in LMIC. Furthermore, universal healthcare is often not available in LMIC where skilled personnel often work in the private sector. Therefore, great interest exists regarding the possibility of developing a simple, low-cost diagnostic procedure for population screening.
The function of heart valves is conventionally explored during physical examination by using the stethoscope and a diagnosis can be made accordingly. This approach may have limitations because it requires training11 and also because human perception of the initial qualitative alterations of heart sounds might be limited. Electronic stethoscopes can provide the ability to record in digital format the sound of the patient’s heart, even frequencies not audible by the human ear, with the achievement of high levels of quality. Most importantly with the development of computer science, the cardiac sound signal can now be analysed in an automatic way. Therefore, in recent years, special attention was paid to create a machine that can give a trusted answer to the question: ‘does the patient have a pathological heart murmur?’ Although no system is currently available in the market, the rapid evolution of these technologies recently leads to the activation of clinical trials. Those systems follow a two-step process: in the first step, cardiac sound is analysed and a panel of features characterising the acoustic signal are extracted; second, extracted features are sent to a decision-making software which provides the final diagnosis.
The aim of this note is to review various approaches currently used to extract and select features and to reach an automatic diagnosis.
The first step in the analysis of acoustic signal is the extraction of parameters which will be used by the decisional system to reach the diagnosis. Parameters may be extracted both in the time and frequency domain. Most of researches segmented the cardiac cycle in the time domain. When the first (S1) and the second tone (S2) are identified, systole and diastole can be recognised; the presence and time location of murmurs (systolic and diastolic) can thus be used as a diagnostic element for cardiac abnormalities.12–15 The time domain analysis showed its validation and simplification for distinguishing the normal and abnormal heart sounds, but its value for murmur characterisation is limited. Murmurs are indeed characterised by parameters which can be extracted in the frequency domain. The range of frequencies of diastolic murmurs is usually larger than for systolic ones. Therefore, many researches focused on murmur characterisation in the frequency domain and Fourier transform (FT) or wavelet decomposition are now commonly used tools.
Signal characteristics that are not time–frequency related require a different non-linear approach.
Time related features can be extracted directly from the acoustic signal, represented in its temporal progression. In these cases, the signal does not undergo geometric or mathematical transformations; the phases of the cardiac cycle are still perfectly recognisable and identifiable. Features extracted with this approach are the time intervals between the various heart sounds, the amplitude (or intensity) and heart rate (table 1).
A very large number of features can be extracted by exploring the signal in the frequency domain (table 1).16 17 27 28 Magnitude and phase characterise the frequency content; power spectrum is useful to characterise periodic signals; and energy spectrum is especially useful for time limited portions of the signal.
Analyses are usually performed with the FT and the wavelet transform (WT), often combined in the same algorithm.
1) FT: considers the signal as a sum of sinusoids. The short time FT (STFT) is obtained by calculating the FT of sequential portions of the time signal by applying a shifting window. The location of the window gives the time dimension to the frequency analysis. Window length is critical when using STFT because a wide window gives complete frequency information but does not follow the physiological signal short time variation. On the other hand, a small window reduces the frequency resolution. STFT allows distinguishing S1 and S2 because of their different frequency extent.17 The frequency domain also allows characterising the range of frequencies (bandwidth) of murmurs, which is usually larger for diastolic than for systolic ones.45 The SD of the duration of intervals from S1 and the point of maximum intensity within each cardiac cycle, the mean value of such intensity, and the distance between S1 and the beginning of the systolic murmur are three significant distinguishing features identified by El-Segaier et al.24 Information obtained in the frequency domain (envelopes of autocorrelation functions) may also allow for a method that does not require heart sound segmentation.30 A different approach based on cepstrum analysis (the result of taking the FT of the logarithm of the estimated spectrum of a signal)18 has been very successful as a feature vector for representing the human voice and musical signals. Unfortunately, there is too little literature on the implementation of this type of feature analysis of heart sounds, and so it is difficult to understand the reliability of this approach.
2) WT: This method of signal analysis allows obtaining high-resolution time and frequency information simultaneously. The main difference with the FT is that the wavelets are localised in both time and frequency whereas the standard FT is only localised in frequency. A WT plot is generally a 3D diagram in which the amplitude of each frequency of the signal is related to time.45 A high number of features can be extracted using a WT approach. Turkoglu et al35 extracted 13 coefficients using a wavelet decomposition method. Then, the STFT of wavelet coefficients, performed at seven different frequency intervals, was used to obtain 91 wavelet entropy features.35 Andrisevic et al33 used the wavelet techniques to de-noise and prefilter heart sounds. The WT plot of a whole cardiac cycle was then reorganised as a vector and taken as input of the classifier.33 Choi and Zhongwei32 presented a method in which only two features were extracted from the autoregressive spectral envelope after a wavelet-based analysis of the acoustic signal. In this case, the manipulation of the signal was minimal and easy, the features being simply extracted from the plot of the envelope. Starting from an initial set of 83 features obtained by decomposing a matrix of continuous WT values, Chen et al36 selected a final set of 26 features as input of the classifier by using the sequential floating forward selection. Yuenyong et al30 based their classification system on 35 features extracted with a WT approach (the values of the signal energy of 32 non-overlapping windows, the number of peaks detected, the mean distance between two consecutive peaks, and signal energy of the whole segment). It is now recognised that WT applications have a better performance than FT, regardless of the decisional instrument.
Physiological signals often vary in a complex and irregular manner. Analysis of linear statistics such as mean values, variability measures and spectra of such signals generally does not address directly their complexity and useful information may be missed. Non-linear dynamics, of which Chaos Theory forms an important part, have been used to extract signal characteristics that are not time–frequency related. Ahlstrom et al37 combined ‘traditional’ parameters obtained in the time and frequency domains with other non-linear and chaotic features.37 Nigam and Priemer44 used a non-linear feature (simplicity index of the signal) to identify S1, S2 and murmurs in the cardiac cycle. The most used fractal parameters are the Lyapunov exponents (formally the quantity that characterises the rate of separation of infinitesimally close trajectories of a dynamical system). The maximum of these exponents determines a notion of predictability for a dynamical system. Using a database of 164 phonocardiographic recordings, Delgado-Trejos et al41 tested the accuracy of the system using several combinations of time-varying and time–frequency features, perceptual, and fractal features. Fractal type features were the most robust family of parameters (in the sense of accuracy vs computational load) for the automatic detection of murmurs, providing the best contribution to accuracy (97.17%), followed by time-varying and time–frequency (95.28%), and perceptual features (88.7%). Accuracy around 94% was reached just by using the two main features of the fractal family.41
Early approaches based on statistical models were overcome by the development of computer science and the creation of a specific branch of artificial intelligence, machine learning, with its different approaches.46 47
Each diagnosis is the result of a sequence of decisions which may have different complexity. When evaluating the diagnostic performance of an automated system, the structure of the classifier used is to be considered with special attention. The structure depends on the goal of the system which may be either to screen normal from abnormal heart sounds or to identify a specific VHD. Systems following the first approach (healthy/pathological) may be useful for mass screening camps in rural health units as a preliminary investigation tool. These systems perform a single binary response,37 and performance characteristics are usually high (table 2). A second group of systems are designed to offer a more advanced diagnosis. These systems are characterised by sequential decision trees in which every stage gives a binary (YES/NO) response to a specific question: a YES answer ends the algorithm, a NO answer activates the next stage, until the diagnostic output is produced.32 The final performance characteristics of a sequential decision-maker are the product of the sensitivity/specificity of each stage. Therefore, performance characteristics, usually high at the first step (diseased/healthy), reduce when the number of arborisations (and the final decision classes) increases. Diagnostic performances (sensitivity and specificity) of proposed automated systems are reported in table 2.
The decision algorithm proposed by El-Segaier et al24 using a stepwise logistic regression analysis has 95% sensitivity and 72% specificity in distinguishing normal from pathological sounds. Other systems are based on fuzzy clustering, a process of assigning membership levels to assign data elements to one or more class. The system by Nigam and Priemer,44 based on a fuzzy clustering technique, had 73% sensitivity and 100% specificity in detection of presence of systolic murmur.44
The three main machine learning based algorithms used to classify heart sounds are K-nearest neighbour (K-nn), artificial neural network (ANN) and support vector machine (SVM).
These systems require a learning phase. The learning phase for K-nn and SVMs is a single step process where systems are fed with a set of known normal and pathological heart sounds (training set) to train the system to screen pathological (training phase). ANN requires a further step (test phase) on a different data set (test set) to verify that the training is unbiased and that the selected set of features is solid and fitting. When the data set is limited, the test phase can be iteratively performed by recruiting one observation each time for testing the ANN trained with the remaining set of data (Jack-Knifing method).27 28 At the end of the learning phase, the performance of the system can be assessed on the field.
1) K-nn classifier: The k-nn algorithm is one of the simplest machine learning algorithms—an object is assigned to the class most common among its k-nn values already correctly labelled (k is a positive integer, typically small). These classifiers are typically computationally simple. Performance (accuracy) of a K-nn classifier in the screening of cardiac valve disease (healthy/diseased) was 86% when the system was fed with WT features18 and 97% with non-linear features.41
2) ANN: ANN emulates the architecture and property of learning which characterise biological information systems, being a network of interconnected processing nodes (neurones) able to perform simple mathematical computation. A typical neurone receives inputs from the neurones feeding into it from which it generates an output. Its output is then disseminated to the neurones that it feeds into. The interconnections are weighted and modulate the inter-neuronal interactions. An ANN derives its property of emergent learning from the malleability of interconnection strengths.
ANN is the most used classifier in the automatic diagnosis of heart sounds. The system developed by DeGroff et al27 misclassified pathological cases (a case with pulmonary stenosis and a case with an atrial septal defect) belonging to pathological classes that were grossly under-represented in the training data. These results show that ANN generalisation would improve with better representation of all classes in the training data for which more data would have to be collected.
Bhatikar et al28 developed a classifier with 252 neurones input layer (corresponding to 252 bins in the discrete energy spectrum with a range of 0–252 Hz and a bin size of 1 Hz), 15 neurones hidden layer and one binary output neurone (0, innocent; 1, pathological),28 and high performance in the diagnosis of ventricular septal defect (88% sensitivity; 83% specificity).
Pretorius et al17 combined the results of a series of six ANNs. Each one of the six ANNs was designed to recognise the presence or the absence of a specific VHD (0, NO; 1, YES). The average sensitivity at the four auscultation sites was 60%. When using a subset of ANN combining registrations performed at the four auscultation sites, 91% sensitivity and 94% specificity were reached. Therefore, the number of auscultation sites is also important.
3) SVM: SVMs are supervised learning models with associated learning algorithms that analyse data and recognise patterns. Given a set of training examples, each marked as belonging to one of two categories, a training algorithm builds a model that predicts whether a new example falls into one category or the other. Multi-class classification follows a hierarchical structure. Choi and Zhongwei32 created a decision-making tree composed of six binary SVM modules classifiers having a set of the three features named above as input. The system recognised abnormal cardiac sound with 99.9% sensitivity and 99.5% specificity. As observed with other classifiers, performance reduces when the number of final decision classes increases so that these high performance standards were not reached in the diagnosis of the single valve disease (89.9% for aortic valve disorders, regurgitation or stenosis; 88% for mitral valve disorders, regurgitation or stenosis). Likewise, diagnostic performance of a sequential SVM-based decision tree16 using a wide feature set (100 features) was high at the first decisional step (healthy vs pathological) (sensitivity 87.5%; specificity 94.7%), and at each single decisional step (systolic vs diastolic 89.3% and 93.4%; aortic stenosis vs mitral regurgitation 91% and 93%; aortic regurgitation vs mitral regurgitation 94.7%, and 92.1%, respectively). However, the four steps were sequential so that the total accuracy of the system was 77% and 78% for systolic and diastolic murmurs, respectively.
Skilled physicians may reach a high sensitivity in the diagnosis of cardiac valve diseases with heart auscultation, an operationally simple, low-cost and non-invasive method. Automated diagnostic tools may enhance preventive care in cardiology facilitating screening of heart diseases especially in low resource settings where skilled personnel are rare. The financial cost of the systems for heart sound assessment is well below US$1000. However, notwithstanding the good performance characteristics no instrument is at the moment commercially available (table 3). Probably the wide diffusion of echocardiogram and the search for low-cost portable echocardiographs might have limited investment toward the development of an automated system for cardiac auscultation. In the current era, where the debated issue is ‘Is physical examination dead?’, the mind seems to be more oriented in the favour of imaging technology.48 For some patients in low resource settings the possibility to reach technology however may remain a dream. Likewise, in these areas, skilled doctors cannot be dedicated to screening purposes and probably for a patient the possibility of meeting a skilled physician is also a dream. Therefore, automated systems might be particularly useful in LMIC.
Although preliminary data are encouraging, available systems still have limitations. Almost all studies have been performed ‘in the laboratory,’ testing a limited set of data whereas the methodology requires trials on the field. Second, current studies trained and tested the classifier systems on patients with moderate to severe classes of VHD. These patients have surgery as the only high cost option whereas the goal should be to detect asymptomatic patients with minor alterations (mainly of mitral valve).49 These subjects are the ideal target for the low-cost option of secondary penicillin prophylaxis, an approach recommended since the 1980s by WHO and the World Heart Federation. Although the screening of these patients is expected to be difficult,3 the low performance of clinical detection50 could be improved by the high acoustic sensitivity of new electronic stethoscopes. Systems have to be trained in this population. The problem is crucial because the possibility to miss a large number of young patients who might be treated with antibiotics can be expected with current systems. The possibility that even unskilled personnel may use the device for screening purposes in remote areas of LMIC might be of high interest for world areas at early stages of epidemiological transition.
Contributors: Both GM and PAM gave substantial contributions to conception, design, literature search and drafting the article. PAM revised it critically for important intellectual content. Both GM and PAM gave final approval of the version to be published.
Competing interests: None.
Provenance and peer review: Not commissioned; externally peer reviewed.