Traditional ways of teaching techniques of physical examinations in the first clinical courses are rather demanding in terms of teacher involvement and a pool of patients suitable for demonstrations. For a long time, various audio-visual tools have been used to save teachers' and students' time and patients' patience. The modern technology of WWW publishing of multimedia allows good access to such teaching materials - and there already exist several collections of heart sounds, breath sounds etc. The aim of our project is to design and set up a comprehensive multimedia textbook of internal propedeutics that would present various physiological and pathological findings (auscultation, inspection, basic imaging) in the context of diagnostic patient investigation - the status praesens - as it is taught in the first clinical courses.
Unlike classical textbooks, hypertext presentation allows to ogranize the material into several structures - reflecting various approaches: systemic (digestive, cardiovascular etc.) approach, nosological, differential diagnoses, etc. To identify and implement the various useful approaches is the most difficult part of the task. The accompanying illustrative material is being prepared with the use of modern technologies - digital camera, scanner, video-camera and digitizer, digital audio recording, etc.
In the first year of the project, the skeleton of the multimedia presentation is being constructed - corresponding to the various approaches to the subject. Concurrently, suitable illustrative material is being gathered from cases of the Internal Clinic. Various existing WWW presentations dealing with heart and breath sounds and other relevant investigations have been searched and listed.
Experience and feedback from other projects of this type confirm that a rather elaborate logical and technical construction of multimedia textbooks is rewarded by a good acceptance by both students and teachers. Good access to Internet, sufficient for multimedia transfers, however, is a necessary prerequisite. Internal propedeutics is a very suitable field for internet-based multimedia textbooks: instant access to audio and video recordings is much welcome in development of clinical skills. The project is supported by a grant of the Czech Universities Development Fund.
Medical Education; Distance Education; Internet; Multimedia; Internal Medicine; Physical Examination
Human heartbeat intervals are known to have nonlinear and nonstationary dynamics. In this paper, we propose a model of R–R interval dynamics based on a nonlinear Volterra–Wiener expansion within a point process framework. Inclusion of second-order nonlinearities into the heartbeat model allows us to estimate instantaneous heart rate (HR) and heart rate variability (HRV) indexes, as well as the dynamic bispectrum characterizing higher order statistics of the nonstationary non-Gaussian time series. The proposed point process probability heartbeat interval model was tested with synthetic simulations and two experimental heartbeat interval datasets. Results show that our model is useful in characterizing and tracking the inherent nonlinearity of heartbeat dynamics. As a feature, the fine temporal resolution allows us to compute instantaneous nonlinearity indexes, thus sidestepping the uneven spacing problem. In comparison to other nonlinear modeling approaches, the point process probability model is useful in revealing nonlinear heartbeat dynamics at a fine timescale and with only short duration recordings.
Adaptive filters; approximate entropy (ApEn); heart rate variability (HRV); nonlinearity test; point processes; scaling exponent; Volterra series expansion
Left bundle branch block (LBBB) and right bundle branch block (RBBB) not only mask electrocardiogram (ECG) changes that reflect diseases but also indicate important underlying pathology. The timely detection of LBBB and RBBB is critical in the treatment of cardiac diseases. Inter-patient heartbeat classification is based on independent training and testing sets to construct and evaluate a heartbeat classification system. Therefore, a heartbeat classification system with a high performance evaluation possesses a strong predictive capability for unknown data. The aim of this study was to propose a method for inter-patient classification of heartbeats to accurately detect LBBB and RBBB from the normal beat (NORM).
This study proposed a heartbeat classification method through a combination of three different types of classifiers: a minimum distance classifier constructed between NORM and LBBB; a weighted linear discriminant classifier between NORM and RBBB based on Bayesian decision making using posterior probabilities; and a linear support vector machine (SVM) between LBBB and RBBB. Each classifier was used with matching features to obtain better classification performance. The final types of the test heartbeats were determined using a majority voting strategy through the combination of class labels from the three classifiers. The optimal parameters for the classifiers were selected using cross-validation on the training set. The effects of different lead configurations on the classification results were assessed, and the performance of these three classifiers was compared for the detection of each pair of heartbeat types.
The study results showed that a two-lead configuration exhibited better classification results compared with a single-lead configuration. The construction of a classifier with good performance between each pair of heartbeat types significantly improved the heartbeat classification performance. The results showed a sensitivity of 91.4% and a positive predictive value of 37.3% for LBBB and a sensitivity of 92.8% and a positive predictive value of 88.8% for RBBB.
A multi-classifier ensemble method was proposed based on inter-patient data and demonstrated a satisfactory classification performance. This approach has the potential for application in clinical practice to distinguish LBBB and RBBB from NORM of unknown patients.
Heartbeat classification; Left bundle branch block (LBBB); Right bundle branch block (RBBB); Independent component analysis (ICA); Linear discriminant classifier; Support vector machine (SVM); Ensemble
Auscultation of the heart is accompanied by both electrical activity and sound. Heart auscultation provides clues to diagnose many cardiac abnormalities. Unfortunately, detection of relevant symptoms and diagnosis based on heart sound through a stethoscope is difficult. The reason GPs find this difficult is that the heart sounds are of short duration and separated from one another by less than 30 ms. In addition, the cost of false positives constitutes wasted time and emotional anxiety for both patient and GP. Many heart diseases cause changes in heart sound, waveform, and additional murmurs before other signs and symptoms appear. Heart-sound auscultation is the primary test conducted by GPs. These sounds are generated primarily by turbulent flow of blood in the heart. Analysis of heart sounds requires a quiet environment with minimum ambient noise. In order to address such issues, the technique of denoising and estimating the biomedical heart signal is proposed in this investigation. Normally, the performance of the filter naturally depends on prior information related to the statistical properties of the signal and the background noise. This paper proposes Kalman filtering for denoising statistical heart sound. The cycles of heart sounds are certain to follow first-order Gauss–Markov process. These cycles are observed with additional noise for the given measurement. The model is formulated into state-space form to enable use of a Kalman filter to estimate the clean cycles of heart sounds. The estimates obtained by Kalman filtering are optimal in mean squared sense.
heart sound; murmurs; ECG; Kalman filters; acoustic cardiac signals
The level of bone-conducted sound in the auditory meatus is increased at low frequencies by occlusion of the meatus, for example by the earmold of a hearing aid. Physical measures of this “occlusion effect” (OE) require vibration of the skull. In previous research, either self-voicing or audiometric bone-conduction vibrators have been used to produce this vibration, with the result that the OE could not be measured for frequencies below 125 Hz. However, frequencies below this can be important for music perception by hearing aid users. The objective was to develop and evaluate a method that gives a lower-bound estimate of the OE for frequencies below 125 Hz.
A low-noise amplifier with extended low-frequency response was used to record the output of a miniature microphone inserted into the meatus of participants. The signal came from sounds of the heartbeat and blood flow of the participant, transmitted via bone-conduction through the walls of the meatus. A simultaneous recording was made of the carotid pulse to permit time-locked averaging (and hence noise reduction) of the microphone signal. Recordings were made from seven otologically and audiometrically normal participants, using clinical probe tips to produce the occlusion. Recordings were also made from an overlapping group of nine participants, using fast-setting impression material to provide a more consistent degree of occlusion. The difference in level of the recorded signal for unoccluded and occluded conditions provided a lower bound for the magnitude of the OE.
The mean OE increased with decreasing frequency, reaching a plateau of about 40 dB for frequencies below 40 Hz. For some individual recordings, the OE reached 50 dB for frequencies below 20 Hz. With occlusion, the heartbeat became audible for most participants.
The OE can be very large at low frequencies. The use of hearing aids with closed fittings, which may be employed either to prevent acoustic feedback or to allow amplification of low frequencies, may lead to an unacceptable OE. We suggest reducing the OE by the use of a seal deep in the meatus, where the wall of the meatus is more rigid.
Occlusion; earmold; hearing aid; bone conduction; heartbeat
We studied digital stethoscope recordings in children undergoing simultaneous catheterization of the pulmonary artery (PA) to determine whether time-domain analysis of heart sound intensity would aid in the diagnosis of PA hypertension (PAH). Heart sounds were recorded and stored in .wav mono audio format. We performed recordings for 20 seconds with sampling frequencies of 4,000 Hz at the second left intercostal space and the cardiac apex. We used programs written in the MATLAB 2010b environment to analyze signals. We annotated events representing the first (S1) and second (S2) heart sounds and the aortic (A2) and pulmonary (P2) components of S2. We calculated the intensity (I) of the extracted event area (x) as , where n is the total number of heart sound samples in the extracted event and k is A2, P2, S1, or S2. We defined PAH as mean PA pressure (mPAp) of at least 25 mmHg with PA wedge pressure of less than 15 mmHg. We studied 22 subjects (median age: 6 years [range: 0.25–19 years], 13 female), 11 with PAH (median mPAp: 55 mmHg [range: 25–97 mmHg]) and 11 without PAH (median mPAp: 15 mmHg [range: 8–24 mmHg]). The P2∶A2 (P = .0001) and P2∶S2 (P = .0001) intensity ratios were significantly different between subjects with and those without PAH. There was a linear correlation (r > 0.7) between the P2∶S2 and P2∶A2 intensity ratios and mPAp. We found that the P2∶A2 and P2∶S2 intensity ratios discriminated between children with and those without PAH. These findings may be useful for developing an acoustic device to diagnose PAH.
auscultation; second heart sound; phonocardiography; machine learning
Zebrafish (Danio rerio), due to its optical accessibility and similarity to human, has emerged as model organism for cardiac research. Although various methods have been developed to assess cardiac functions in zebrafish embryos, there lacks a method to assess heartbeat regularity in blood vessels. Heartbeat regularity is an important parameter for cardiac function and is associated with cardiotoxicity in human being. Using stereomicroscope and digital video camera, we have developed a simple, noninvasive method to measure the heart rate and heartbeat regularity in peripheral blood vessels. Anesthetized embryos were mounted laterally in agarose on a slide and the caudal blood circulation of zebrafish embryo was video-recorded under stereomicroscope and the data was analyzed by custom-made software. The heart rate was determined by digital motion analysis and power spectral analysis through extraction of frequency characteristics of the cardiac rhythm. The heartbeat regularity, defined as the rhythmicity index, was determined by short-time Fourier Transform analysis.
The heart rate measured by this noninvasive method in zebrafish embryos at 52 hour post-fertilization was similar to that determined by direct visual counting of ventricle beating (p > 0.05). In addition, the method was validated by a known cardiotoxic drug, terfenadine, which affects heartbeat regularity in humans and induces bradycardia and atrioventricular blockage in zebrafish. A significant decrease in heart rate was found by our method in treated embryos (p < 0.01). Moreover, there was a significant increase of the rhythmicity index (p < 0.01), which was supported by an increase in beat-to-beat interval variability (p < 0.01) of treated embryos as shown by Poincare plot.
The data support and validate this rapid, simple, noninvasive method, which includes video image analysis and frequency analysis. This method is capable of measuring the heart rate and heartbeat regularity simultaneously via the analysis of caudal blood flow in zebrafish embryos. With the advantages of rapid sample preparation procedures, automatic image analysis and data analysis, this method can potentially be applied to cardiotoxicity screening assay.
WHO case management algorithm for paediatric pneumonia relies solely on symptoms of shortness of breath or cough and tachypnoea for treatment and has poor diagnostic specificity, tends to increase antibiotic resistance. Alternatives, including oxygen saturation measurement, chest ultrasound and chest auscultation, exist but with potential disadvantages. Electronic auscultation has potential for improved detection of paediatric pneumonia but has yet to be standardised. The authors aim to investigate the use of electronic auscultation to improve the specificity of the current WHO algorithm in developing countries.
This study is designed to test the hypothesis that pulmonary pathology can be differentiated from normal using computerised lung sound analysis (CLSA). The authors will record lung sounds from 600 children aged ≤5 years, 100 each with consolidative pneumonia, diffuse interstitial pneumonia, asthma, bronchiolitis, upper respiratory infections and normal lungs at a children's hospital in Lima, Peru. The authors will compare CLSA with the WHO algorithm and other detection approaches, including physical exam findings, chest ultrasound and microbiologic testing to construct an improved algorithm for pneumonia diagnosis.
This study will develop standardised methods for electronic auscultation and chest ultrasound and compare their utility for detection of pneumonia to standard approaches. Utilising signal processing techniques, the authors aim to characterise lung sounds and through machine learning, develop a classification system to distinguish pathologic sounds. Data will allow a better understanding of the benefits and limitations of novel diagnostic techniques in paediatric pneumonia.
We seek to characterise lung sounds associated with different respiratory illnesses in children using electronic auscultation and determine whether these sounds can be differentiated from normal through computerised lung sound analysis.
We summarise the study design and methods with standardised protocols for electronic auscultation and chest ultrasound in children.
We aim to develop a protocol for increased specificity of paediatric pneumonia diagnosis in developing countries.
Strengths and limitations of this study
Our study is limited by the case definitions available. With no gold standard for many paediatric respiratory diseases, we will rely on clinical exam findings and chest radiography.
By investigating a number of novel and commonly used diagnostic tools for a variety of respiratory diseases in children, we will gain valuable information regarding the diagnostic potential of each, with a main focus on the electronic stethoscope.
Smart Wireless Body Sensor Nodes (WBSNs) are a novel class of unobtrusive, battery-powered devices allowing the continuous monitoring and real-time interpretation of a subject's bio-signals, such as the electrocardiogram (ECG). These low-power platforms, while able to perform advanced signal processing to extract information on heart conditions, are usually constrained in terms of computational power and transmission bandwidth. It is therefore essential to identify in the early stages which parts of an ECG are critical for the diagnosis and, only in these cases, activate on demand more detailed and computationally intensive analysis algorithms. In this work, we present a comprehensive framework for real-time automatic classification of normal and abnormal heartbeats, targeting embedded and resource-constrained WBSNs. In particular, we provide a comparative analysis of different strategies to reduce the heartbeat representation dimensionality, and therefore the required computational effort. We then combine these techniques with a neuro-fuzzy classification strategy, which effectively discerns normal and pathological heartbeats with a minimal run time and memory overhead. We prove that, by performing a detailed analysis only on the heartbeats that our classifier identifies as abnormal, a WBSN system can drastically reduce its overall energy consumption. Finally, we assess the choice of neuro-fuzzy classification by comparing its performance and workload with respect to other state-of-the-art strategies. Experimental results using the MIT-BIH Arrhythmia database show energy savings of as much as 60% in the signal processing stage, and 63% in the subsequent wireless transmission, when a neuro-fuzzy classification structure is employed, coupled with a dimensionality reduction technique based on random projections.
embedded signal processing; wireless body sensor nodes; electrocardiogram; classification
Several respiratory diseases are associated with specific respiratory sounds. In contrast to auscultation, computerized lung sound analysis is objective and can be performed continuously over an extended period. Moreover, audio recordings can be stored. Computerized lung sounds have rarely been assessed in neonates during the first year of life. This study was designed to determine and validate optimal cut-off values for computerized wheeze detection, based on the assessment by trained clinicians of stored records of lung sounds, in infants aged <1 year.
Lung sounds in 120 sleeping infants, of median (interquartile range) postmenstrual age of 51 (44.5–67.5) weeks, were recorded on 144 test occasions by an automatic wheeze detection device (PulmoTrack®). The records were retrospectively evaluated by three trained clinicians blinded to the results. Optimal cut-off values for the automatically determined relative durations of inspiratory and expiratory wheezing were determined by receiver operating curve analysis, and sensitivity and specificity were calculated.
The optimal cut-off values for the automatically detected durations of inspiratory and expiratory wheezing were 2% and 3%, respectively. These cutoffs had a sensitivity and specificity of 85.7% and 80.7%, respectively, for inspiratory wheezing and 84.6% and 82.5%, respectively, for expiratory wheezing. Inter-observer reliability among the experts was moderate, with a Fleiss’ Kappa (95% confidence interval) of 0.59 (0.57-0.62) for inspiratory and 0.54 (0.52 - 0.57) for expiratory wheezing.
Computerized wheeze detection is feasible during the first year of life. This method is more objective and can be more readily standardized than subjective auscultation, providing quantitative and noninvasive information about the extent of wheezing.
Lung sound; Auscultation; Phonopneumography; Wheezing; Computerized wheeze detection; Infants
Manual cough counting is time-consuming and laborious; however it is the standard to which automated cough monitoring devices must be compared. We have compared manual cough counting from video recordings with manual cough counting from digital audio recordings.
We studied 8 patients with chronic cough, overnight in laboratory conditions (diagnoses were 5 asthma, 1 rhinitis, 1 gastro-oesophageal reflux disease and 1 idiopathic cough). Coughs were recorded simultaneously using a video camera with infrared lighting and digital sound recording.
The numbers of coughs in each 8 hour recording were counted manually, by a trained observer, in real time from the video recordings and using audio-editing software from the digital sound recordings.
The median cough frequency was 17.8 (IQR 5.9–28.7) cough sounds per hour in the video recordings and 17.7 (6.0–29.4) coughs per hour in the digital sound recordings. There was excellent agreement between the video and digital audio cough rates; mean difference of -0.3 coughs per hour (SD ± 0.6), 95% limits of agreement -1.5 to +0.9 coughs per hour. Video recordings had poorer sound quality even in controlled conditions and can only be analysed in real time (8 hours per recording). Digital sound recordings required 2–4 hours of analysis per recording.
Manual counting of cough sounds from digital audio recordings has excellent agreement with simultaneous video recordings in laboratory conditions. We suggest that ambulatory digital audio recording is therefore ideal for validating future cough monitoring devices, as this as this can be performed in the patients own environment.
New technologies like echocardiography, color Doppler, CT, and MRI provide more direct and accurate evidence of heart disease than heart auscultation. However, these modalities are costly, large in size and operationally complex and therefore are not suitable for use in rural areas, in homecare and generally in primary healthcare set-ups. Furthermore the majority of internal medicine and cardiology training programs underestimate the value of cardiac auscultation and junior clinicians are not adequately trained in this field. Therefore efficient decision support systems would be very useful for supporting clinicians to make better heart sound diagnosis. In this study a rule-based method, based on decision trees, has been developed for differential diagnosis between "clear" Aortic Stenosis (AS) and "clear" Mitral Regurgitation (MR) using heart sounds.
For the purposes of our experiment we used a collection of 84 heart sound signals including 41 heart sound signals with "clear" AS systolic murmur and 43 with "clear" MR systolic murmur. Signals were initially preprocessed to detect 1st and 2nd heart sounds. Next a total of 100 features were determined for every heart sound signal and relevance to the differentiation between AS and MR was estimated. The performance of fully expanded decision tree classifiers and Pruned decision tree classifiers were studied based on various training and test datasets. Similarly, pruned decision tree classifiers were used to examine their differentiation capabilities. In order to build a generalized decision support system for heart sound diagnosis, we have divided the problem into sub problems, dealing with either one morphological characteristic of the heart-sound waveform or with difficult to distinguish cases.
Relevance analysis on the different heart sound features demonstrated that the most relevant features are the frequency features and the morphological features that describe S1, S2 and the systolic murmur. The results are compatible with the physical understanding of the problem since AS and MR systolic murmurs have different frequency contents and different waveform shapes. On the contrary, in the diastolic phase there is no murmur in both diseases which results in the fact that the diastolic phase signals cannot contribute to the differentiation between AS and MR.
We used a fully expanded decision tree classifier with a training set of 34 records and a test set of 50 records which resulted in a classification accuracy (total corrects/total tested) of 90% (45 correct/50 total records). Furthermore, the method proved to correctly classify both AS and MR cases since the partial AS and MR accuracies were 91.6% and 88.5% respectively. Similar accuracy was achieved using decision trees with a fraction of the 100 features (the most relevant). Pruned Differentiation decision trees did not significantly change the classification accuracy of the decision trees both in terms of partial classification and overall classification as well.
Present work has indicated that decision tree algorithms decision tree algorithms can be successfully used as a basis for a decision support system to assist young and inexperienced clinicians to make better heart sound diagnosis. Furthermore, Relevance Analysis can be used to determine a small critical subset, from the initial set of features, which contains most of the information required for the differentiation. Decision tree structures, if properly trained can increase their classification accuracy in new test data sets. The classification accuracy and the generalization capabilities of the Fully Expanded decision tree structures and the Pruned decision tree structures have not significant difference for this examined sub-problem. However, the generalization capabilities of the decision tree based methods were found to be satisfactory. Decision tree structures were tested on various training and test data set and the classification accuracy was found to be consistently high.
The automatic interpretation of electrocardiography (ECG) data can provide continuous analysis of heart activity, allowing the effective use of wireless devices such as the Holter monitor.
Materials and Methods:
We propose an intelligent heartbeat monitoring system to detect the possibility of arrhythmia in real time. We detected heartbeats and extracted features such as the QRS complex and P wave from ECG signals using the Pan–Tompkins algorithm, and the heartbeats were then classified into 16 types using a decision tree.
We tested the sensitivity, specificity, and accuracy of our system against data from the MIT-BIH Arrhythmia Database. Our system achieved an average accuracy of 97% in heartbeat detection and an average heartbeat classification accuracy of above 96%, which is comparable with the best competing schemes.
This work provides a guide to the systematic design of an intelligent classification system for decision support in Holter ECG monitoring.
heartbeat detection; heartbeat classification; decision tree; electrocardiography monitoring
Acute respiratory infections are the leading cause of childhood mortality. The lack of physicians in rural areas of developing countries makes difficult their correct diagnosis and treatment. The staff of rural health facilities (health-care technicians) may not be qualified to distinguish respiratory diseases by auscultation. For this reason, the goal of this project is the development of a tele-stethoscopy system that allows a physician to receive real-time cardio-respiratory sounds from a remote auscultation, as well as video images showing where the technician is placing the stethoscope on the patient’s body.
A real-time wireless stethoscopy system was designed. The initial requirements were: 1) The system must send audio and video synchronously over IP networks, not requiring an Internet connection; 2) It must preserve the quality of cardiorespiratory sounds, allowing to adapt the binaural pieces and the chestpiece of standard stethoscopes, and; 3) Cardiorespiratory sounds should be recordable at both sides of the communication. In order to verify the diagnostic capacity of the system, a clinical validation with eight specialists has been designed. In a preliminary test, twelve patients have been auscultated by all the physicians using the tele-stethoscopy system, versus a local auscultation using traditional stethoscope. The system must allow listen the cardiac (systolic and diastolic murmurs, gallop sound, arrhythmias) and respiratory (rhonchi, rales and crepitations, wheeze, diminished and bronchial breath sounds, pleural friction rub) sounds.
The design, development and initial validation of the real-time wireless tele-stethoscopy system are described in detail. The system was conceived from scratch as open-source, low-cost and designed in such a way that many universities and small local companies in developing countries may manufacture it. Only free open-source software has been used in order to minimize manufacturing costs and look for alliances to support its improvement and adaptation. The microcontroller firmware code, the computer software code and the PCB schematics are available for free download in a subversion repository hosted in SourceForge.
It has been shown that real-time tele-stethoscopy, together with a videoconference system that allows a remote specialist to oversee the auscultation, may be a very helpful tool in rural areas of developing countries.
Telemedicine; Stethoscope; Tele-stethoscopy; Wireless; Real-time; E-health; Libre software; Libre hardware; Open-source
Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extracted manually from textual data and stored in biological databases to build up annotations for large collections of gene products. Those annotation databases are crucial for the interpretation of large scale analysis approaches using bioinformatics or experimental techniques. Due to the growing accumulation of functional descriptions in biomedical literature the need for text mining tools to facilitate the extraction of such annotations is urgent. In order to make text mining tools useable in real world scenarios, for instance to assist database curators during annotation of protein function, comparisons and evaluations of different approaches on full text articles are needed.
The Critical Assessment for Information Extraction in Biology (BioCreAtIvE) contest consists of a community wide competition aiming to evaluate different strategies for text mining tools, as applied to biomedical literature. We report on task two which addressed the automatic extraction and assignment of Gene Ontology (GO) annotations of human proteins, using full text articles. The predictions of task 2 are based on triplets of protein – GO term – article passage. The annotation-relevant text passages were returned by the participants and evaluated by expert curators of the GO annotation (GOA) team at the European Institute of Bioinformatics (EBI). Each participant could submit up to three results for each sub-task comprising task 2. In total more than 15,000 individual results were provided by the participants. The curators evaluated in addition to the annotation itself, whether the protein and the GO term were correctly predicted and traceable through the submitted text fragment.
Concepts provided by GO are currently the most extended set of terms used for annotating gene products, thus they were explored to assess how effectively text mining tools are able to extract those annotations automatically. Although the obtained results are promising, they are still far from reaching the required performance demanded by real world applications. Among the principal difficulties encountered to address the proposed task, were the complex nature of the GO terms and protein names (the large range of variants which are used to express proteins and especially GO terms in free text), and the lack of a standard training set. A range of very different strategies were used to tackle this task. The dataset generated in line with the BioCreative challenge is publicly available and will allow new possibilities for training information extraction methods in the domain of molecular biology.
Timbre is the attribute of sound that allows humans and other animals to distinguish among different sound sources. Studies based on psychophysical judgments of musical timbre, ecological analyses of sound's physical characteristics as well as machine learning approaches have all suggested that timbre is a multifaceted attribute that invokes both spectral and temporal sound features. Here, we explored the neural underpinnings of musical timbre. We used a neuro-computational framework based on spectro-temporal receptive fields, recorded from over a thousand neurons in the mammalian primary auditory cortex as well as from simulated cortical neurons, augmented with a nonlinear classifier. The model was able to perform robust instrument classification irrespective of pitch and playing style, with an accuracy of 98.7%. Using the same front end, the model was also able to reproduce perceptual distance judgments between timbres as perceived by human listeners. The study demonstrates that joint spectro-temporal features, such as those observed in the mammalian primary auditory cortex, are critical to provide the rich-enough representation necessary to account for perceptual judgments of timbre by human listeners, as well as recognition of musical instruments.
Music is a complex acoustic experience that we often take for granted. Whether sitting at a symphony hall or enjoying a melody over earphones, we have no difficulty identifying the instruments playing, following various beats, or simply distinguishing a flute from an oboe. Our brains rely on a number of sound attributes to analyze the music in our ears. These attributes can be straightforward like loudness or quite complex like the identity of the instrument. A major contributor to our ability to recognize instruments is what is formally called ‘timbre’. Of all perceptual attributes of music, timbre remains the most mysterious and least amenable to a simple mathematical abstraction. In this work, we examine the neural underpinnings of musical timbre in an attempt to both define its perceptual space and explore the processes underlying timbre-based recognition. We propose a scheme based on responses observed at the level of mammalian primary auditory cortex and show that it can accurately predict sound source recognition and perceptual timbre judgments by human listeners. The analyses presented here strongly suggest that rich representations such as those observed in auditory cortex are critical in mediating timbre percepts.
Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent a manually-designed summary of spectral information. However, recent work in machine learning has demonstrated that features learnt automatically from data can often outperform manually-designed feature transforms. Feature learning can be performed at large scale and “unsupervised”, meaning it requires no manual data labelling, yet it can improve performance on “supervised” tasks such as classification. In this work we introduce a technique for feature learning from large volumes of bird sound recordings, inspired by techniques that have proven useful in other domains. We experimentally compare twelve different feature representations derived from the Mel spectrum (of which six use this technique), using four large and diverse databases of bird vocalisations, classified using a random forest classifier. We demonstrate that in our classification tasks, MFCCs can often lead to worse performance than the raw Mel spectral data from which they are derived. Conversely, we demonstrate that unsupervised feature learning provides a substantial boost over MFCCs and Mel spectra without adding computational complexity after the model has been trained. The boost is particularly notable for single-label classification tasks at large scale. The spectro-temporal activations learned through our procedure resemble spectro-temporal receptive fields calculated from avian primary auditory forebrain. However, for one of our datasets, which contains substantial audio data but few annotations, increased performance is not discernible. We study the interaction between dataset characteristics and choice of feature representation through further empirical analysis.
Bioacoustics; Machine learning; Birds; Classification; Vocalisation; Birdsong
Although awareness of sleep disorders is increasing, limited information is available on whole night detection of snoring. Our study aimed to develop and validate a robust, high performance, and sensitive whole-night snore detector based on non-contact technology.
Sounds during polysomnography (PSG) were recorded using a directional condenser microphone placed 1 m above the bed. An AdaBoost classifier was trained and validated on manually labeled snoring and non-snoring acoustic events.
Sixty-seven subjects (age 52.5±13.5 years, BMI 30.8±4.7 kg/m2, m/f 40/27) referred for PSG for obstructive sleep apnea diagnoses were prospectively and consecutively recruited. Twenty-five subjects were used for the design study; the validation study was blindly performed on the remaining forty-two subjects.
Measurements and Results
To train the proposed sound detector, >76,600 acoustic episodes collected in the design study were manually classified by three scorers into snore and non-snore episodes (e.g., bedding noise, coughing, environmental). A feature selection process was applied to select the most discriminative features extracted from time and spectral domains. The average snore/non-snore detection rate (accuracy) for the design group was 98.4% based on a ten-fold cross-validation technique. When tested on the validation group, the average detection rate was 98.2% with sensitivity of 98.0% (snore as a snore) and specificity of 98.3% (noise as noise).
Audio-based features extracted from time and spectral domains can accurately discriminate between snore and non-snore acoustic events. This audio analysis approach enables detection and analysis of snoring sounds from a full night in order to produce quantified measures for objective follow-up of patients.
Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.
Auditory steady-state responses that can be elicited by various periodic sounds inform about subcortical and early cortical auditory processing. Steady-state responses to amplitude-modulated pure tones have been used to scrutinize binaural interaction by frequency-tagging the two ears’ inputs at different frequencies. Unlike pure tones, speech and music are physically very complex, as they include many frequency components, pauses, and large temporal variations. To examine the utility of magnetoencephalographic (MEG) steady-state fields (SSFs) in the study of early cortical processing of complex natural sounds, the authors tested the extent to which amplitude-modulated speech and music can elicit reliable SSFs.
MEG responses were recorded to 90-s-long binaural tones, speech, and music, amplitude-modulated at 41.1 Hz at four different depths (25, 50, 75, and 100%). The subjects were 11 healthy, normal-hearing adults. MEG signals were averaged in phase with the modulation frequency, and the sources of the resulting SSFs were modeled by current dipoles. After the MEG recording, intelligibility of the speech, musical quality of the music stimuli, naturalness of music and speech stimuli, and the perceived deterioration caused by the modulation were evaluated on visual analog scales.
The perceived quality of the stimuli decreased as a function of increasing modulation depth, more strongly for music than speech; yet, all subjects considered the speech intelligible even at the 100% modulation. SSFs were the strongest to tones and the weakest to speech stimuli; the amplitudes increased with increasing modulation depth for all stimuli. SSFs to tones were reliably detectable at all modulation depths (in all subjects in the right hemisphere, in 9 subjects in the left hemisphere) and to music stimuli at 50 to 100% depths, whereas speech usually elicited clear SSFs only at 100% depth.
The hemispheric balance of SSFs was toward the right hemisphere for tones and speech, whereas SSFs to music showed no lateralization. In addition, the right lateralization of SSFs to the speech stimuli decreased with decreasing modulation depth.
The results showed that SSFs can be reliably measured to amplitude-modulated natural sounds, with slightly different hemispheric lateralization for different carrier sounds. With speech stimuli, modulation at 100% depth is required, whereas for music the 75% or even 50% modulation depths provide a reasonable compromise between the signal-to-noise ratio of SSFs and sound quality or perceptual requirements. SSF recordings thus seem feasible for assessing the early cortical processing of natural sounds.
Auditory steady state responses to pure tones have been used to study subcortical and cortical processing, to scrutinize binaural interaction, and to evaluate hearing in an objective way. In daily lives, sounds that are physically much more complex sounds are encountered, such as music and speech. This study demonstrates that not only pure tones but also amplitude-modulated speech and music, both perceived to have tolerable sound quality, can elicit reliable magnetoencephalographic steady state fields. The strengths and hemispheric lateralization of the responses differed between the carrier sounds. The results indicate that steady state responses could be used to study the early cortical processing of natural sounds.
Amplitude modulation; Auditory; Frequency tagging; Magnetoencephalography; Natural stimuli
AIM: To determine the value of bowel sounds analysis using an electronic stethoscope to support a clinical diagnosis of intestinal obstruction.
METHODS: Subjects were patients who presented with a diagnosis of possible intestinal obstruction based on symptoms, signs, and radiological findings. A 3M™ Littmann® Model 4100 electronic stethoscope was used in this study. With the patients lying supine, six 8-second recordings of bowel sounds were taken from each patient from the lower abdomen. The recordings were analysed for sound duration, sound-to-sound interval, dominant frequency, and peak frequency. Clinical and radiological data were reviewed and the patients were classified as having either acute, subacute, or no bowel obstruction. Comparison of bowel sound characteristics was made between these subgroups of patients. In the presence of an obstruction, the site of obstruction was identified and bowel calibre was also measured to correlate with bowel sounds.
RESULTS: A total of 71 patients were studied during the period July 2009 to January 2011. Forty patients had acute bowel obstruction (27 small bowel obstruction and 13 large bowel obstruction), 11 had subacute bowel obstruction (eight in the small bowel and three in large bowel) and 20 had no bowel obstruction (diagnoses of other conditions were made). Twenty-five patients received surgical intervention (35.2%) during the same admission for acute abdominal conditions. A total of 426 recordings were made and 420 recordings were used for analysis. There was no significant difference in sound-to-sound interval, dominant frequency, and peak frequency among patients with acute bowel obstruction, subacute bowel obstruction, and no bowel obstruction. In acute large bowel obstruction, the sound duration was significantly longer (median 0.81 s vs 0.55 s, P = 0.021) and the dominant frequency was significantly higher (median 440 Hz vs 288 Hz, P = 0.003) when compared to acute small bowel obstruction. No significant difference was seen between acute large bowel obstruction and large bowel pseudo-obstruction. For patients with small bowel obstruction, the sound-to-sound interval was significantly longer in those who subsequently underwent surgery compared with those treated non-operatively (median 1.29 s vs 0.63 s, P < 0.001). There was no correlation between bowel calibre and bowel sound characteristics in both acute small bowel obstruction and acute large bowel obstruction.
CONCLUSION: Auscultation of bowel sounds is non-specific for diagnosing bowel obstruction. Differences in sound characteristics between large bowel and small bowel obstruction may help determine the likely site of obstruction.
Bowel sounds; Intestinal obstruction; Spectral analysis; Electronic stethoscope
Heart murmurs are the first signs of cardiac valve disorders. Several studies have been conducted in recent years to automatically differentiate normal heart sounds, from heart sounds with murmurs using various types of audio features. Entropy was successfully used as a feature to distinguish different heart sounds. In this paper, new entropy was introduced to analyze heart sounds and the feasibility of using this entropy in classification of five types of heart sounds and murmurs was shown. The entropy was previously introduced to analyze mammograms. Four common murmurs were considered including aortic regurgitation, mitral regurgitation, aortic stenosis, and mitral stenosis. Wavelet packet transform was employed for heart sound analysis, and the entropy was calculated for deriving feature vectors. Five types of classification were performed to evaluate the discriminatory power of the generated features. The best results were achieved by BayesNet with 96.94% accuracy. The promising results substantiate the effectiveness of the proposed wavelet packet entropy for heart sounds classification.
Providing quality, current cancer information to cancer patients and their families is a key function of the National Cancer Institute (NCI) Web site. This information is now provided in predominantly-text format, but could be provided in formats using multimedia, including animation and sound. Since users have many choices about where to get their information, it is important to provide the information in a format that is helpful and that they prefer.
To pilot and evaluate multimedia strategies for future cancer-information program formats for lay users, the National Cancer Institute created new multimedia versions of existing text programs. We sought to evaluate user performance and preference on these 3 new formats and on the 2 existing text formats.
The National Cancer Institute's "What You Need to Know About Lung Cancer" program was the test vehicle. There were 5 testing sessions, 1 dedicated to each format. Each session lasted about 1 hour, with 9 participants per session and 45 users overall. Users were exposed to the assigned cancer program from beginning to end in 1 of 5 formats: text paperback booklet, paperback booklet formatted in HTML on the Web, spoken audio alone, spoken audio synchronized with a text Web page, and Flash multimedia (animation, spoken audio, and text). Immediately thereafter, the features and design of the 4 alternative formats were demonstrated in detail. A multiple-choice pre-test and post-test quiz on the cancer content was used to assess user learning (performance) before and after experiencing the assigned program. The quiz was administered using an Authorware software interface writing to an Access database. Users were asked to rank from 1 to 5 their preference for the 5 program formats, and provide structured and open-ended comments about usability of the 5 formats.
Significant improvement in scores from pre-test to post-test was seen for the total study population. Average scores for users in each of the 5 format groups improved significantly. Increments in improvement, however, were not statistically different between any of the format groups. Significant improvements in quiz scores were seen irrespective of age group or education level. Of the users, 71.1% ranked the Flash program first among the 5 formats, and 84.4% rated Flash as their first or second choice. Audio was the least-preferred format, ranking fifth among 46.7% of users and first among none. Flash was ranked first among users regardless of education level, age group, or format group to which the user was assigned.
Under the pilot study conditions, users overwhelmingly preferred the Flash format to the other 4 formats. Learning occurred equally in all formats. Use of multimedia should be considered as communication strategies are developed for updating cancer content and attracting new users.
Lung cancer; Internet; multimedia; patient education; audio
Cardiolocomotor synchronization (CLS) has been well established for individuals engaged in rhythmic activity, such as walking, running, or cycling. When frequency of the activity is at or near the heart rate, entrainment occurs. CLS has been shown in many cases to improve the efficiency of locomotor activity, improving stroke volume, reducing blood pressure variability, and lowering the oxygen uptake (VO2). Instead of a 1:1 frequency ratio of activity to heart rate, an investigation was performed to determine if different harmonic coupling at other simple integer ratios (e.g. 1:2, 2:3, 3:2) could achieve any performance benefits. CLS was ensured by pacing the stride rate according to the measured heartbeat (i.e., adaptive paced CLS, or forced CLS). An algorithm was designed that determined the simplest ratio (lowest denominator) that, when multiplied by the heart rate will fall within an individualized, predetermined comfortable pacing range for the user. The algorithm was implemented on an iPhone 4, which generated a ‘tick-tock’ sound through the iPhone’s headphones. A sham-controlled crossover study was performed with 15 volunteers of various fitness levels. Subjects ran a 3 mile (4.83 km) simulated training run at their normal pace on two consecutive days (randomized one adaptive pacing, one sham). Adaptive pacing resulted in faster runs run times, with subjects running an average of 26:03 ± 3:23 for adaptive pacing and 26:38 ± 3:31 for sham (F = 5.46, p < 0.05). The increase in heart rate from the start of the race as estimated by an exponential time constant was significantly longer during adaptive pacing, τ = 0.99 ± 0.30, compared to sham, τ = 1.53 ± 0.34 (t = -6.62, p < 0.01). Eighty-seven percent of runners found it easy to adjust their stride length to match the pacing signal with seventy-nine percent reporting that pacing helped their performance. These results suggest that adaptive paced CLS may have a beneficial effect on running performance and may be useful as a training aid.
Key PointsSham-controlled crossover study using 15 experienced runners running 3 miles (4.83 km).Adaptive CLS pacing resulted in statistically significant 35 second average decrease in run-time (p < 0.05).Increase in heart rate during the run was significantly slower during adaptive pacing (p < 0.01).
CLS; pacing; coupling; entrainment
Pneumothorax is usually diagnosed based on the attenuation of respiratory sounds of the
affected side on auscultation, but it requires a skilled technique and is limited to
subjective evaluation. Thus, we designed a device which analyzes and converts the
frequency of auscultatory sounds to numerical values with a computer. With this device,
the bilateral sound pressure levels were compared between groups of 25 healthy subjects
and 21 patients with pneumothorax to investigate the efficacy of the diagnosing tool of
pneumothorax. While recording respiratory sounds of the bilateral precordial regions, the
fast Fourier transform was applied with a frequency analysis software, power spectra of
the auscultatory sounds were displayed in real-time, and the sound pressure level was
compared between the bilateral sides. The difference was investigated at frequencies
judged as less likely to be influenced by cardiac sounds (200–400 Hz). No difference was
observed in the control group (n = 25, P > 0.05), but
respiratory sound attenuation was detectable on the affected side in the pneumothorax
group (n = 21, P < 0.01 each for the paired Student's
t-test and Wilcoxon signed-rank test). When the cutoff value was 8 dB,
the sensitivity and specificity as diagnostic tool of pneumothorax was 71.4% and 100%,
respectively. This device would facilitate the detection of occult pneumothorax at
accident scenes, in emergency rooms and in intensive care units.
fast Fourier transform; frequency analysis; pneumothorax; real-time monitor