Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
IEEE Trans Biomed Eng. Author manuscript; available in PMC 2013 October 1.
Published in final edited form as:
PMCID: PMC3523127

A Real-Time Automated Point Process Method for Detection and Correction of Erroneous and Ectopic Heartbeats

Luca Citi, Member, IEEE, Emery N Brown, Fellow, IEEE, and Riccardo Barbieri, Senior Member, IEEE


The presence of recurring arrhythmic events (also known as cardiac dysrhythmia or irregular heartbeats), as well as erroneous beat detection due to low signal quality, significantly affect estimation of both time and frequency domain indices of heart rate variability (HRV). A reliable, real-time classification and correction of ECG-derived heartbeats is a necessary prerequisite for an accurate on-line monitoring of HRV and cardiovascular control. We have developed a novel point process based method for real-time R-R interval error detection and correction. Given an R-wave event, we assume that the length of the next R-R interval follows a physiologically motivated, time-varying inverse Gaussian probability distribution. We then devise an instantaneous automated detection and correction procedure for erroneous and arrhythmic beats by using the information on the probability of occurrence of the observed beat provided by the model. We test our algorithm over two datasets from the Physionet archive. The Fantasia normal rhythm database is artificially corrupted with known erroneous beats to test both the detection and correction procedure. The benchmark MIT-BIH Arrhythmia database is further considered to test the detection procedure of real arrhythmic events and compare it with results from previously published algorithms. Our automated algorithm represents an improvement over previous procedures, with best specificity for detection of correct beats, as well as highest sensitivity to missed and extra beats, artificially misplaced beats, and for real arrhythmic events. A near-optimal heartbeat classification and correction, together with the ability to adapt to time-varying changes of heartbeat dynamics in an on-line fashion, may provide a solid base for building a more reliable real-time HRV monitoring device.

Index Terms: Point-processes, heart rate variability, arrhythmias, ectopic beats, erroneous R-R intervals

I. Introduction

Heart rate variability (HRV) techniques [1]–[3] provide a window into the many physiological factors that modulate the normal heart rhythm. In particular, they have been found useful for non-invasive autonomic tone assessment in a wide range of clinical and non-clinical scenarios. In order to provide reliable results, these techniques require uninterrupted series of normal R-R intervals. Peak detection errors – when the algorithm misses a beat and/or detects one when there is none – and ectopic beats often determine abrupt changes in the R-R interval series that can lead to substantial deviations of the HRV indices.

The increasing quality of electrocardiogram (ECG) recording systems together with better R-peak detection algorithms has reduced the problem of noisy R-R series in controlled clinical and research environment. Nevertheless, this issue is still of primary importance in new applications of HRV studies, such as ambulatory monitoring in very unstructured environments and extreme conditions, and when the ECG is acquired using single-channel miniature devices, smartphones, or wearable technology.

The simple approach of discarding all sections contaminated by irregular R-R intervals might pose some problems. For example, if the rate of occurrence of mis-detections or ectopic events is not uniform but it is dependent on specific parts of the protocol (e.g., those with frequent motion artifacts) or on the physiological state of the subject (e.g., states that may result in abnormal morphology of the QRS or in higher incidence of ectopic events), then the exclusion of these periods might skew the resulting HRV indices [4]–[6]. When the goal of the research is to study HRV in extreme conditions (e.g., during rescue operations, military missions, or hiking expeditions) removing all sections contaminated by R-peak misdetections caused by motion artifacts is simply not an option.

A better approach is to attempt to correct the corrupted R-R series in order to obtain a new artifact-free R-R series reflecting the underlying HRV dynamics. This new R-R series can then be used for further processing using the HRV analysis techniques of choice. To date, the detection and correction of irregular beats has been mainly achieved by direct human expert evaluation and, more recently, by automatic techniques [4]–[16].

We have developed a novel point process based method for real-time R-R interval error detection and correction. Given an R-wave event, we assume that the length of the next R-R interval follows a time-varying history-dependent inverse Gaussian (IG) probability distribution [17]. We then use this model for the detection and correction of erroneous and arrhythmic beats. A preliminary version of the detection step of this novel approach has been briefly sketched in our previous publication [18]. In Section II-C we present the detection algorithm in detail and later in the paper we show its performance in comparison with three established methods for the detection of erroneous (Section III-A) and ectopic (Section III-C) beats. Whenever a beat is classified as erroneous, the correction step attempts to fix it according to the point-process model. In Section II-D we present the correction algorithm and in Section III-B we assess its performance in comparison with four reference methods.

A software tool that implements the algorithm described in this manuscript can be tested from our website [19]. It can be used as a preprocessing stage to detect and correct erroneous and ectopic beats in R-R series before analysing it with the chosen HRV technique [1]–[3], such as time-frequency analysis (LF/HF, …), standard time-domain methods (SDNN, SDANN, RMSSD, …), and non-linear indices (DFA, SampEn, MSE, …). Some methods, such as the Lomb spectrogram [20] do not require the correction step but might still benefit from the accurate detection capabilities of our algorithm.

II. Algorithmic Framework

A. Physiologically-based probability model of heartbeat intervals

Each cardiac contraction is initiated by the synchronous depolarization of the heart’s pacemaker cells beginning in the sinoatrial (SA) node of the right atrium and then propagating through its specialized conduction system to the left atrium and to the two ventricles. Following every depolarization, the transmembrane potentials of these cells return to their resting potentials and begin anew their spontaneous rise toward threshold [21]. Deterministic models of this integrate (rise of transmembrane potential)-and-fire (depolarization) mechanism, such as the integral pulse frequency modulation (IPFM) model, are used regularly to simulate heart beats [22]–[24] and for the analysis of HRV [25].

An elementary, stochastic integrate-and-fire model is the Gaussian random walk model with drift. We assume that the afferences to the SA node can be represented by random walk excitatory inputs responsible for the basal cardiac rhythm (B), and random walk excitatory (E) and inhibitory (I) inputs, reflecting the influence of the autonomic nervous system through the sympathetic and para-sympathetic branches. In this case, the membrane voltage equation is governed by:


where NB(t) ~ P (λB), NE(t) ~ P (λE) and NI(t) ~ P (λI) are independent Poisson processes that govern the respective times of the basal, excitatory and inhibitory inputs, while αB, αE and αI are the magnitude of steps up or down [26], [27]. Assuming V (0) = 0, m0 = αBλB, m1 = αEλEαIλI, and σ2=αB2λB+αE2λE+αI2λI, and using the properties of Poisson processes, the mean of V (t) is E [V (t)] = (m0 + m1) t [22], [23] and its variance Var[V (t)] = σ2t. A diffusion approximation of dV (t) before hitting the threshold θ is given by the Wiener process with drift:


where W(t) is a standard Wiener process (Brownian Motion). The probability density of the first passage time for this process, i.e., the times between threshold crossings, is given by the inverse Gaussian probability density [28], [29] defined as:


which can be written in the usual form after performing the change of variables μ = θ/(m0 + m1) and λ = θ2/σ2. The inverse Gaussian probability density has been used to study both neural processes [30] and heart beats [17], [31].

B. Point-process model of the heartbeat series

An ordered set {uj}j=1J of consecutive heartbeat times (for example, but not limited to, R-waves detected from an ECG) recorded in an observation interval (0, T] can be interpreted as a realization of a point process [32], a type of stochastic process for which any one realization consists of a set of isolated points. For t [set membership] (0, T], the cádlág function N(t) = max{k: ukt} is the associated counting process. Its differential, dN(t), denotes a continuous-time indicator function, where dN(t) = 1, when there is an event (such as the ventricular contraction) or dN(t) = 0, otherwise.

In this paper we assume a history dependent, time-varying, model of the the probability distribution of the waiting time until the next R-wave according to a physiologically motivated (see Section II-A) inverse Gaussian distribution. Specifically, given any beat event uk, the probability density function (PDF) of the length of the next R-R interval, τuk, is


where the shape parameter, λ, and the mean of the distribution, μ, depend on a vector of time-varying parameters θ(t) = {θ1(t), …, θP+1(t)} and a history vector Hk = {wk, wk−1, …, wk−(P−1)} containing information about P previous R-R intervals wi = uiui−1. While λ is simply λ (θ(t)) = θP+1 (t), the history-dependent mean is a regression of the past P R-R intervals with time-varying weights:


A local maximum likelihood method [17] is used to estimate the unknown time-varying parameter set θ(t). Given a local observation interval (tW, t] of duration W, we consider the subset of R-wave events within the interval, Um:n = {uk: mkn} with m = N(tW) + 1 and n = N(t). At each time t, we find the parameter vector [theta w/ tilde](t) that maximizes the local log-likelihood, given the R-wave events recorded in the local observation interval:


where ω (τ) = eατ is an exponential weighting function for the local likelihood [17], [33].

C. Detection of erroneous heartbeats

After observing the (k+1)-th R-wave, at time uk+1, it is possible to assess whether this observation is in agreement with the model resulting from the most recent parameter vector [theta w/ tilde](uk). For conciseness, in the following, let μ1 = μ(Hk, [theta w/ tilde](uk)) denote the mean for the distribution of the first R-R interval following uk, given the history and the model at time uk, and let λ1 = λ([theta w/ tilde](uk)) denote the shape parameter, given the model at time uk.

Normal beat

A straightforward way to test the hypothesis that the beat at time uk+1 is in agreement with the model is to evaluate the log probability density of observing the event uk+1 given the previous event uk, the history Hk and the model [theta w/ tilde](uk): p = log f (uk+1uk | μ1, λ1). In [14], this probability p was compared with a fixed threshold and, if lower, a beat was either inserted or deleted. While this approach works reasonably well, it has some glitches. In fact, a very low value of p might indicate an erroneous R-wave detection, an ectopic heartbeat, but also a sudden physiologic change in autonomic control. Ideally, we would like to be able to detect and possibly correct the former two cases while preserving the latter. For this reason, in this work, we take a different approach: instead of setting a threshold for the score p, we compare its value with the values obtained from alternative hypotheses, assuming that the event at time uk+1 is erroneous or arrhythmic. Below, we analyze these cases in detail while in Fig. 1 we give a schematic representation.

Fig. 1
Schematic representation of the different hypotheses that we make for the beats following uk. The plot at the top shows the IG distribution of the time of occurrence of the first beat after uk (black solid line), the second one (blue dashed line) and ...

1) Extra beat

One hypothesis that we consider is that the beat at time uk+1 is an extra event that was erroneously placed between two consecutive heartbeats. This can happen, for example, when the detection algorithm mistakenly identifies a T wave as a ventricular contraction. The score of this event is pe = log f (uk+2uk | μ1, λ1) where uk+2 is the time of the event following uk+1. If pe > p + ηe, where ηe is a pre-defined threshold, the event at time uk+1 is labelled as an extra (“e”) beat.

2) Missed beat

Another hypothesis is that between the events uk and uk+1 there was an additional event that has been missed. This can happen, for example, when the detection algorithm misses an R-wave because a superimposed artifact has drastically changed its morphology. In this case, the time elapsed between uk and uk+1 should be compatible with the prediction of the model for the sum of two R-R intervals given the history at time uk. We assume that there are two stochastic events following uk, the first at time τ′ and the second at time τ, of which only the latter was observed and mistakenly labeled as uk+1. According to our model, the distribution of the first R-R interval, τ′ − uk, is simply f (τ′ − uk | μ1, λ1). The distribution of the second R-R interval, ττ′, also depends on the first R-R interval. In fact, the unknown term τ′ − uk enters the history vector H′(τ′ − uk) = {τ′ − uk, wk, wk−1, …} and, through the regression in (1), affects the first moment of the next R-R interval, μ2 = μ(H′(τ′ − uk), [theta w/ tilde](uk)), that we write as μ2(τ′ − uk) to stress this dependency. The PDF of the second R-R interval can be written as f(ττ′| μ2(τ′ − uk), λ2) where λ2 = λ1. The total probability density function can be obtained marginalizing the unknown τ′ in the joint probability density:


Rather than attempting to solve (3), here we make the simplifying assumption that the resulting probability density function can be approximated by an IG distribution with mean μ1+2 = μ1 + μ2(μ1) and shape parameter


whose mathematical derivation is shown in the Appendix. Using these results, we can now evaluate the log probability density of observing a beat at uk+1 after a missed beat at an unknown time τ[set membership] (uk, uk+1): ps = log f (uk+1uk | μ1+2, λ1+2). If ps > p + ηs, where ηs is a pre-defined threshold, we assume that there might be a missed (skipped, “s”) beat.

3) Misplaced beat

A further hypothesis is that the beat at time uk+1 is misplaced, i.e., a beat at a different time in the interval (uk, uk+2) has been mistakenly assigned to time uk+1. This can happen, for example, when an artifact deforms the QRS complex and makes the detector misinterpret some random deflection for an R wave and miss the true R wave. For simplifying purposes, we will consider in this category also some types of ectopic beats, such as premature ventricular contractions that are followed by a complete compensatory pause. Even if, strictly speaking, ectopic beats are not mis-detections, it is often desirable to treat them this way. In fact, ectopic beats may significantly impair the results of many heart rate analysis techniques (including most time-domain and spectral estimation methods) while, on the other hand, acceptance of only ectopic-free, short-term recordings may introduce significant selection bias [4]–[6]. For this reason, it is desirable to detect irregular beats and, possibly, replace them with fictitious regular beats in order to carry on the desired analysis. A beat uk+1 is likely to be a misplaced one whenever the corresponding R-R interval does not fit the model while the sum of the two R-R intervals, uk+2uk, does. Therefore, the score for the case of a misplaced beat is: pm = log f (uk+2uk | μ1+2, λ1+2), and if pm > p + ηm, where ηm is a pre-defined threshold, we assume that there might be a misplaced (“m”) beat.

4) Two misplaced beats

This hypothesis accounts for the case that two beats in a row, uk+1 and uk+2, are misplaced. This can happen, for example, in some types of arrhythmias where two shorter beats are followed by a long compensatory pause. This case is handled similarly to the case of one misplaced beat. The beats uk+1 and uk+2 are likely to be misplaced whenever the corresponding R-R intervals do not fit the model while the sum of the three R-R intervals, uk+3uk, does. Making the approximation that uk+3uk follows an IG distribution of parameters μ1+2+3 and λ1+2+3 (defined in the Appendix), the score for the case of two misplaced beats is pt = log f (uk+3uk | μ1+2+3, λ1+2+3). If the condition for “m” holds and, additionally, pt > pm + ηt, where ηt is a pre-defined threshold, we assume that there might be two misplaced (“t”) beats. In other words, we require the first beat to be detected as misplaced (because this case is a special case of “m”), and, additionally, that the second beat is misplaced with respect to the prediction of the model for the first beat.

5) Resetting beat

The last hypothesis accounts for resetting ectopic beats, i.e., ectopic beats not followed by a compensatory pause. As a result of the absence of the compensatory pause, not only the R-R interval uk+1uk does not fit the model but neither do the sum of the two R-R intervals, uk+2uk, or the sum of three, uk+3uk. As a matter of a fact, the R-R interval after the ectopic beat, uk+2uk+1, is more likely to fit the model than any of previous hypotheses. For this reason, defining pr = log f (uk+2uk+1 | μ1, λ1), the beat uk+1 is likely to be a resetting (“r”) ectopic beat if pr > max{p, pe, ps, pm, pt} + ηr, where ηr is a pre-defined threshold.

Bootstrap detection

As the algorithm uses a sliding window of duration W to fit the model (see end of Section II-B), no model is available before time W from the beginning of the recording. A simpler bootstrap detection algorithm is used during this period to provide some results even for beats uk [set membership] (0, W ) and also to avoid training the algorithm on non-normal R-R intervals. This bootstrap algorithm is based on outlier rejection where beats whose distance from the median R-R is more than 7 times the median absolute deviation of the R-R intervals are marked as erroneous.

D. Correction of erroneous heartbeats

After an outlying beat is detected, our algorithm attempts to correct it. The correction strategy depends on the type of erroneous beat detected, as reported in detail below. Finally, the attempted solution is accepted only if it is an improvement according to the improvement check that we describe in the following section.

1) Extra beat

The case of an extra beat is simply addressed by removing the beat.

2) Missed beat

In case a missed beat is detected, a new beat is inserted between the event uk and the event uk+1. The time of the inserted event τ′ is:


where μ1, λ1, μ2(τ′ − uk) and λ2 are defined as before. Inserting an event τ′ in (uk, uk+1) splits this interval into two R-R intervals; therefore, equation (5) attempts to find the optimal τ′ accounting for the probability distribution of the first interval given the P R-R intervals preceding it (first factor of the product) but also for the probability distribution of the second R-R interval given its P previous R-R intervals (second factor), including τ′ − uk. This is an improvement over the approach in [14] where the beat was inserted at the time maximizing only the first of the two factors in (5).

3) Misplaced beat

The case of a misplaced beat is similar to the previous case. The event at time uk+1 is moved at time τ′ given by:


Equations (5) and (6) are solved numerically using a Newton-Raphson algorythm.

4) Two misplaced beats

To correct the case of two misplaced beats and find the optimal times of the two unknown beats we use a greedy algorithm. Each one of the beat times is optimized in turn, using (6) for the first beat and a similar equation for the second beat, while keeping the other fixed. This process is iterated until convergence.

5) Resetting ectopic beat

The case of a resetting beat cannot be corrected like the other cases because it would require shifting all the following beats. Therefore our algorithm defers the decision about the action to take to the user, who can choose the best strategy for the type of HRV analysis that follows (for example omit the beat or suspend adaptive causal estimation).

E. Improvement check

In the cases above, the correction step changes the series of events U = {uj} into an alternative series Û = {ûj} by removing, inserting, or moving one or two events. For the case of a resetting beat, “r”, we consider the virtual correction (that we do not take) of shifting back by uk+1uk all the beats starting from uk+1.

Before the new series is accepted, we evaluate the probability of observing Q events following uk for both series:


The corrected series is accepted only if P(ÛkP:k+Q) > P(UkP:k+Q)+ [eta w/ hat]a, where a is one of {e, s, m, t, r} depending on the action was taken to correct the beats. This ensures not only that the correction fits the past values (through Hk) but also that it is a good regressor for the future R-R intervals (through Ĥj, j > k). If the corrected series passes the test, then Û = {ûj} replaces U = {uj}, a new updated parameter vector [theta w/ tilde](uk+1) is estimated using (2), and the detection procedure starts again for the next event, uk+2 (uk+3 in the case “t”).

F. Values of the parameters

Empirical sensitivity analysis conducted on in-house recordings showed that the model provides robust results for parameters within meaningful ranges of values.

Without conducting an exhaustive research on the whole parameter space, we found that the following settings provide excellent results in most conditions: P = 5, W = 60 s, α = 0.02 s−1, ηe = 3, [eta w/ hat]e = 8, ηs = 0, [eta w/ hat]s = 4, ηm = 2, [eta w/ hat]m = 7, ηt = 8, [eta w/ hat]t = 28, ηr = 6, [eta w/ hat]r = 14, and Q = 3. These values of the parameters were also used during the analysis described in the following of this manuscript.

III. Validation and Performance Assessment

A. Detection on artificially-corrupted R-R series

We tested the performance of our detection algorithm on a set of real R-R timeseries with artificially deleted, shifted, and inserted beats and compared our results with three well-established approaches.


The first method that we implemented for comparison is the well-established algorithm by Berntson et al. [9] which uses two criteria to assess whether a heartbeat is an artifact: Maximum Expected Difference (MED) and Minimal Artifact Difference (MAD). In order to allow the algorithm to adapt to non-stationary data, for each R-event uk, MED and MAD were estimated using events in a 60 s sliding window preceding uk. The second algorithm that we used is based on the important contribution by Mateo and Laguna [13]. For each R-event uk we evaluated equation (1) from the original work (where tk corresponds to our uk) using a time-varying threshold Uk = 4.3 σk where σk is the standard deviation of the wj values obtained for events in a 60 s sliding window preceding uk. Then we classified the beat using the set of rules described in Section II.A of the referenced paper. Finally, we implemented the more recent method proposed by Rand et al. [15]. We followed the algorithm described in the original paper and used the values of parameters that they found optimal.

For this comparison, we used a subset of the Fantasia database [34] from the Physionet archive [35]. We selected all the records that, according to the annotation database, contained no more than two non-“N” heartbeats.1 We tested all four algorithms on each pristine record and recorded the number of “N” heartbeats correctly classified as such and the number of beats which triggered a false alarm. Then we tested the four algorithms’ performance in the detection of the three main types of erroneous beats: extra, missed, and misplaced beats. To test the detection of “e” (extra) beats, after inserting a beat before every beat of index k = 100 n (n [set membership] {1, 2, 3, …}) of each record, we recorded how many of these beats were correctly identified as erroneous beats. Similarly, we measured the performance in the detection of “s” (missed) beats by removing from the original record every beat of index k = 100 n.

Finally, we assessed each algorithm’s efficacy in the detection of “m” (misplaced) beats, by shifting each beat of index k = 100 n by a temporal displacement Δt. In order to account for the fact that the detectability of misplaced beats depends on the heart rate variability of the R-R series, we used a record-dependent displacement Δtr = q · RMSSDr where RMSSDr is the RMSSD measure [1] for the record r (see Fig. 2). If we think of the RMSSDr of a normal sinus series as the noise amplitude and Δtr as the amplitude of the signal we are trying to detect, q2 represents a form of signal-to-noise ratio (SNR). For obvious reasons, we capped Δtr at 0.75NN¯r where NN¯r is the average N-N interval of the record r. The test was repeated for the values of q [set membership] {2, 4, 8, 16}.

Fig. 2
The figure shows examples from three subjects of the original R-R series (continuous line with markers) with one artificially misplaced beat (dashed line) obtained as described in Section III-A. The R-R series on the left has a very low beat-to-beat variability, ...


Table I reports, for the normal series and for each artificially-corrupted series, the number of beats tested (columns “Tot #”) and the percentage of these that were correctly detected (columns labelled “Corr”) by the four algorithms. For the “m” beats, the average displacement Δt¯=meanrΔtr among all records is also reported.

Results of the detection of irregular beats on artificially-corrupted R-R series. Our method (PP) is compared with three reference methods (Rand 2007, Mateo 2003, and Berntson 1990) and the percentage of correct detection (Corr) of normal (“N”) ...

As one further goal of our algorithm is to try to correct erroneous beats, it is important that the algorithm is also able to classify the type of error (“e”, “s”, and “m”) and, as a result, the type of action needed to attempt to fix it. All algorithms considered, except the one by Berntson et al., provide this type of information. Table I reports, in the columns labelled “CT”, the percentage of beats for which each algorithm correctly classified the type of error.

These results show that our method represents an improvement over the previous algorithms. In fact it presents the best false detection rate (row “N”) with only 9 normal beats out of 60017 reported as erroneous (which correspond to 99.985% correct classification), while the second best method, by Berntson, triggered a false detection in 37 cases. Our method also shows perfect detection of missed (row “s”) and extra beats (row “e”), and better detection of misplaced beats at all levels of SNR (rows “m” of Table I and Fig. 3). In terms of sensitivity, only the method by Mateo and Laguna provides similar results to our approach but at the cost of a much lower specificity.

Fig. 3
The left side of the plot (Corr) reports the percentage of the beats shifted by Δtr = q · RMSSDr (as detailed in Section III-A) that were correctly detected as erroneous. The right side of the plot (CT) shows the percentage of the same ...

We also performed a Monte Carlo analysis of performance by randomly sampling the test beats with probability 1%, obtainining results very similar to those reported here. We decided to report the results of the deterministic sampling of the test beats (every hundredth beat), rather than those from random sampling, to support reproducibility of results and allow other scientists to test and compare their algorithms on exactly the same recordings and beats.

B. Correction on artificially-corrupted R-R series

In this section we report the results of evaluating the performance of the correction phase of our algorithm and compare it with existing methods.


We wanted to find out how accurately our model is able to estimate the unknown time of occurrence of a beat (either a missed or a misplaced beat). This was simply evaluated by using (6) to estimate the beat location u^kPP of every beat of index k = 100 n from each original record described in Section III-A and then measuring the difference, u^kPP-uk, between the predicted time of the beat and the actual time of the original beat.

We compared our algorithm with four existing methods of estimating the position of a missing beat. As first method we used the “nguess” function available from the WFDB library of the PhysioToolkit [35] and called its estimate u^kN. The second method consisted in splitting the interval around the missing beat in two identical R-R intervals by inserting a beat at time u^kH such that uk+1-u^kH=u^kH-uk-1. Then we tested the family of [delta with circumflex]N estimators given by equation (45) of [36] and found, in agreement with their findings, that the first order estimator ([delta with circumflex]1) is the one giving best results. Therefore, as third method we used the [delta with circumflex]1 estimator which returns a beat time u^kS such that the new R-R interval repeats the previous R-R interval, i.e., u^kS-uk-1=uk-1-uk-2. Finally, the fourth method that we used for comparison is a non-causal version of [delta with circumflex]1 which is simply the average of the beat time estimated using the previous R-R interval and that estimated using the next R-R interval, i.e., u^kS=uk+1+uk-1-(uk-2+uk+2)/2.


Table II reports a comparison between the different methods in terms of root-mean-square (RMS) error between the estimated time and the original beat time. The first row shows the result of pooling all 602 beats from all subjects and then evaluating the pooled RMS. The second row reports the result of finding the RMS error for each subject separately and then averaging the results. Similarly, the last row shows the median value of the RMS error for the individual subjects.

Results of the estimation of unknown beat times. The root mean square (RMS) of the error between the beat time estimated by our method u^kPP and the true beat time uk is compared with that of the other methods reported in the text.

Our algorithm (first column) clearly outperforms all four reference methods in all measures and provides accurate estimates with an RMS error of under 15 ms. This is also confirmed by the probability density function of the difference between the estimated beat time ûk and the actual time of the original beat uk (Fig. 4).

Fig. 4
Probability density function of the difference between the estimated beat time ûk and the actual time of the original beat uk for our point-process algorithm (PP) and for the four reference methods (N, H, S, S’). The PDF was estimated ...

C. Detection on arrhythmia database

As mentioned before, the algorithm presented here can also be used to detect ectopic events based on the R-R intervals alone, not for the purpose of classifying the different types of arrhythmias but simply to try to replace them with fictitious normal beats and carry on with the desired type of HRV analysis.


The performance of our algorithm in the detection of ectopic beats was evaluated on the MIT-BIH Arrhythmia database [35], [37]. For this analysis we selected the recordings having less than 10 non-normal beats in any 20-second window.2 The main reason is that we are interested in R-R series that are mildly affected by erroneous or ectopic events, and that can be recovered and further processed with HRV techniques. In a recording with an excessive number of irregular beats, the information about the underlying sinus rhythm cannot be reliably recovered and therefore the HRV indices cannot be assessed.

For each record of the MIT-BIH Arrhythmia Database considered, we ran the four algorithms described in Section III-A. As some of the algorithms use a 60 s sliding window to adaptively estimate their thresholds, we did not test the detection in the first minute of the record. We also ignored types of ectopic beats that merely affect the morphology of the contraction potential but not its timing because all the algorithm that we tested do not have access to the raw ECG data but only to the series of R-R intervals.


Table III reports the confusion matrices obtained for the four algorithms, whereas Table IV shows their performance measures. Our method outperforms the previous algorithms, especially in terms of a lower number of false detections, resulting in higher specificity and positive predicted value. Figure 5 shows a few examples of real arrhythmic events from this database and how our algorithm detects and corrects them.

Fig. 5
The figure shows examples of beats from the MIT-BIH Arrhythmia database and how they were detected and corrected by our algorithm. The markers (little star, triangle, and square) and the associated labels (“N”, “V”, “a”) ...
Confusion matrices for the point-process method presented here (PP) and for the three state-of-the-art methods considered against the experts’ annotations in the database (Annot). The label “N” represent normal beats while “≠ ...
Performance measures of the four methods.

IV. Discussion and Conclusions

We present a novel paradigm for detection and correction of ectopic and erroneous heartbeats. At the core of the method is the ability of the underlying probabilistic model to indicate with accurate precision the degree of probability of having a heartbeat at each moment in time. This feature allows to create a tree-like decisional structure based on quantitative comparisons between the observed events and the probability of having the event at that time. More specifically, we test the alternative hypotheses that: a) a beat has been correctly identified, b) a beat has been missed, c) the observed beat is spurious, d) one or two beats need to be moved to a most probable time according to the probability distribution, or e) a beat is a resetting ectopic beat. Since the model is defined in continuous time and is updated recursively in an on-line fashion, the devised algorithm is able to process the decisional tree in real-time using only short-time past information, and then finalizing the correction within at maximum three beats in the future.

Our method, similarly to the IPFM-based method, is based on a physiologically motivated model, with the further advantage that the point process framework allows for dynamically updating the model by tracking the time-varying heartbeat variations, including an objective goodness-of-fit framework which can validate the decisional outcome based on optimal model selection. Importantly, the parameter vector is here computed only to estimate the likelihood, but this is the same vector that defines the instantaneous time and frequency domain point-process HRV as in [17], [38]–[40], thus providing a simultaneous assessment of cardiovascular control that is unbiased by artifactual or arrhythmic events.

Our results provide evidence of the efficacy of the detection method in recognizing missing, spurious, mis-detected, and irregular heartbeats with remarkable accuracy even in scenarios that require discerning very small time resolution errors at millisecond scales. They further point at our method as more accurate than three previously published detection methods and four correction methods. In particular, our automated method achieves 99.985% specificity for detection of correct beats, 100% sensitivity to missed and extra beats, 96.01% sensitivity for beats artificially misplaced by 137 ms on average, and 98.73% positive prediction value for real arrhythmic events. Our algorithm represents an improvement over previous procedures: only one other method provides similar results but at the cost of a much lower specificity. Our correction procedure provides accurate estimates of the replacement time of an unobserved beat, with and error below 15 ms, outperforming other existing methods of estimating the time of missing beats.

One important feature of the technique is that it works with predetermined decisional thresholds that have been fixed once for all during our analyses, and have been demonstrated to work on a wide range of experimental recordings, with high robustness to noise and ability to cope with inter-subject variability. In particular, the algorithm can be used for exclusive pre-processing purposes of any R-R interval series and consequent application of any kind of HRV analysis. The routine is very fast, easy to run, and available online at no cost for research purposes [19]. We hope dissemination will further corroborate the robustness and efficacy of the proposed method in an even wider range of physiological and clinical settings.


The authors would like to thank Prof. Roger Mark for his valuable comments and suggestions that were of great help during the first stages of this work.

This work was supported by National Institutes of Health (NIH) through grants R01-HL084502 (R.B.) and DP1-OD003646 (E.N.B.).


An external file that holds a picture, illustration, etc.
Object name is nihms419572b1.gif

Luca Citi (S’06–M’09) was born in Pistoia, Italy, in 1979. He received a laurea (MS) degree in Electronic Engineering with major in Biomedical Engineering, in 2004 from Università degli Studi di Firenze (Florence, Italy). In 2009, he received his PhD in Biorobotics Science and Engineering jointly offered by IMT Lucca (Lucca, Italy) and Scuola Superiore Sant’Anna (Pisa, Italy).

He is currently a post-doctoral research fellow at Massachusetts General Hospital/Harvard Medical School and affiliated with the Massachusetts Institute of Technology (Cambridge, MA, U.S.A.) where he is part of the “Neuroscience Statistics Research Laboratory” led by Prof. Emery Brown, and the “Neuro–Cardiovascular Signal Processing Unit” led by Dr. Riccardo Barbieri. He works on the statistical analysis of human cardiovascular rhythms and of neural spike trains. His work on HRV focuses on devising and refining stochastic models of cardiovascular control where heart beat intervals are governed by a history-dependent point process. Other interests include signal processing in computational neuroscience and neural interfaces.

An external file that holds a picture, illustration, etc.
Object name is nihms419572b2.gif

Emery N Brown (M’01, SM’06, F’08) received the B.A. degree from Harvard College, Cambridge, MA, the M.D. degree from Harvard Medical School, Boston, MA, and the A.M. and Ph.D. degrees in statistics from Harvard University, Cambridge, MA. He is currently a Professor of computational neuroscience and health sciences and technology in the Department of Brain and Cognitive Sciences and the Division of Health Sciences and Technology, Harvard-Massachusetts Institute of Technology, the Warren M. Zapol Professor of Anesthesia at Harvard Medical School, and an Anesthesiologist in the Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital. His research interests include signal processing algorithms for the study of neural systems, functional neuroimaging, and electrophysiological approaches to study general anesthesia in humans. Dr. Brown is a member of the Association of University of Anesthesiologists. He is also a Fellow of the American Statistical Association, a Fellow of the American Association for the Advancement of Science, a member of the Institute of Medicine, and a Fellow of the American Academy of Arts and Sciences. He was a recipient of a 2007 National Institutes of Health Director’s Pioneer Award and the 2011 recipient of the National Institute of Statistical Sciences Sacks Award for Outstanding Cross-Disciplinary Research.

An external file that holds a picture, illustration, etc.
Object name is nihms419572b3.gif

Riccardo Barbieri (M’00, SM’08) was born in Rome, Italy, in 1967. He received the M.S. degree in Electrical Engineering from the University of Rome “La Sapienza”, Rome, Italy, in 1992, and the Ph.D. in Biomedical Engineering from Boston University, Boston, MA, in 1998.

Dr. Barbieri is currently Assistant Professor of Anaesthesia at Harvard Medical School, Massachusetts General Hospital, Research Affiliate at Massachusetts Institute of Technology and Principal Investigator of the Neuro-Cardiovascular Signal Processing Unit within the Neuroscience Statistics Research Laboratory. His main research interests include: modeling of biological systems (with emphasis on the cardiovascular system, the central nervous system and the autonomic nervous system); biomedical signal processing (with emphasis on cardiovascular control and cardiovascular variability signals); and signal processing in computational neuroscience (EEG, Evoked Potentials, single cell neuronal recordings).


In this Appendix we find the probability distribution of τuk which is the sum of two unknown intervals τ′ − uk and ττ′ distributed according to f(τ′ − uk | μ1, λ1) and f (ττ′ | μ2(τ′ − uk), λ2). Rather than finding the exact solution given by (3), we approximate it as an inverse Gaussian distribution of mean μ1+2 and shape parameter λ1+2, that we are about to introduce. Let us define ŵk+1 = τ′ − uk = μ1 + ε1 and ŵk+2 = ττ′ = μ2(ŵk+1) + ε2 where ε1 and ε2 are both zero-mean noise terms. The variance of ε1 is the same as that of ŵk+1, i.e., the variance of an inverse Gaussian distribution of parameters μ1 and λ1:μ13/λ1. The variance of ε2 is μ23(w^k+1)/λ2. We make the assumption that it can be approximated as μ23(μ1)/λ2 and that ε1 and ε2 are uncorrelated. Using (1) we have ŵk+2 = μ2(μ1) + [theta w/ tilde]1(uk) (ŵk+1μ1) + ε2. Therefore,


which leads to μ1+2 = E[τuk] = μ1 + μ2(μ1) and Var[τuk] = (1+ [theta w/ tilde]1(uk))2Var[ε1]+ Var[ε2]. This means μ1+23/λ1+2=(1+θ1(uk))2μ13/λ1+μ23(μ1)/λ2. Using λ1 = λ2 and solving for λ1+2, we obtain (4).

Following a similar reasoning, we obtain that the probability distribution of the sum of three unknown intervals can be modelled as an IG distribution with parameters μ1+2+3 = μ1 + μ2(μ1) + μ3(μ1, μ2) and



1The records used were: f1o01, f1o05, f1o10, f1y01, f1y03, f1y08, f2o04, f2y03, f2y06.

2The records used were: 100, 101, 103, 105, 108, 112, 113, 114, 115, 116, 117, 121, 122, 123, 215, 230.


1. Camm A, Malik M, Bigger J, Breithardt G, Cerutti S, Cohen R, Coumel P. Heart rate variability: standards of measurement, physiological interpretation and clinical use. task force of the european society of cardiology and the north american society of pacing and electrophysiology. Circulation. 1996;93(5):1043. [PubMed]
2. Acharya U, Joseph P, Kannathal N, Lim C, Suri J. Heart rate variability: a review. Med Biol Eng Comput. 2006;44:1031–1051. [PubMed]
3. Acharya U, Suri J, Spaan J, Krishnan S. Advances in cardiac signal processing. Springer Verlag; 2007.
4. Lippman N, Stein KM, Lerman BB. Comparison of methods for removal of ectopy in measurement of heart rate variability. Am J Physiol — Heart and Circulatory Physiology. 1994 Jul;267(1):H411–H418. [PubMed]
5. Kamath MV, Fallen EL. Correction of the heart rate variability signal for ectopics and missing beats. In: Malik M, Camm AJ, editors. Heart rate variability. 1995. pp. 75–85.
6. Clifford G, McSharry P, Tarassenko L. Characterizing artefact in the normal human 24-hour RR time series to aid identification and artificial replication of circadian variations in human beat to beat heart rate using a simple threshold. Proceedings of Computers in Cardiology; Memphis. Sep. 2002; pp. 129–132.
7. Cheung MN. Detection of and recovery from errors in cardiac interbeat intervals. Psychophysiology. 1981 May;18(3):341–346. [PubMed]
8. Malik M, Cripps T, Farrell T, Camm A. Prognostic value of heart rate variability after myocardial infarction. a comparison of different data-processing methods. Med Biol Eng Comput. 1989;27(6):603–611.
9. Berntson GG, Quigley KS, Jang JF, Boysen ST. An approach to artifact identification: Application to heart period data. Psychophysiology. 1990 Sep;27(5):586–598. [PubMed]
10. Weber EJM, Molenaar PCM, Molen MW. A nonstationarity test for the spectral analysis of physiological time series with an application to respiratory sinus arrhythmia. Psychophysiology. 1992 Jan;29(1):55–62. [PubMed]
11. Sapoznikov D, Luria M, Mahler Y, Gotsman M. Computer processing of artifact and arrhythmias in heart rate variability analysis. Computer methods and programs in biomedicine. 1992;39(1–2):75–84. [PubMed]
12. Brennan M, Palaniswami M, Kamen P. A new model-based ectopic beat correction algorithm for heart rate variability. Proceedings of the 23rd Annual International Conference of the IEEE EMBS; 2001. pp. 567–570.
13. Mateo J, Laguna P. Analysis of heart rate variability in the presence of ectopic beats using the heart timing signal. IEEE Trans Biomed Eng. (3) 2003 Mar;50:334–343. [PubMed]
14. Barbieri R, Brown EN. Correction of erroneous and ectopic beats using a point process adaptive algorithm. Proceedings of the 28th Annual International Conference of the IEEE EMBS; Sep. 2006; pp. 3373–3376. [PubMed]
15. Rand J, Hoover A, Fishel S, Moss J, Pappas J, Muth E. Real-Time correction of heart interbeat intervals. IEEE Trans Biomed Eng. 2007 May;54(5):946–950. [PubMed]
16. Widjaja D, Vandeput S, Taelman J, Braeken M, Otte R, Van den Bergh B, Van Huffel S. Accurate R peak detection and advanced preprocessing of normal ECG for heart rate variability analysis. Proceedings of Computing in Cardiology; Belfast. Sep. 2010; pp. 533–536.
17. Barbieri R, Matten EC, Alabi AR, Brown EN. A point-process model of human heartbeat intervals: new definitions of heart rate and heart rate variability. Am J Physiol — Heart and Circulatory Physiology. 2005;288:H424–435. [PubMed]
18. Citi L, Brown EN, Barbieri R. A point process local likelihood algorithm for robust and automated heart beat detection and correction. Proceedings of Computing in Cardiology; Hangzhou. Sep. 2011.
19. Citi L, Barbieri R. Point process detection and correction of erroneous and ectopic heart beats. 2012 [Online]. Available: [PMC free article] [PubMed]
20. Moody G. Spectral analysis of heart rate without resampling. Proceedings of Computers in Cardiology, IEEE; 1993. pp. 715–718.
21. Guyton A, Hall J. Textbook of medical physiology. Vol. 9 Saunders Philadelphia, PA: 1991.
22. Hyndman B, Mohn R. A model of the cardiac pacemaker and its use in decoding the information content of cardiac intervals. Automedica. 1975;1:239–252.
23. DeBoer R, Karemaker J, Strackee J. Description of heart-rate variability data in accordance with a physiological model for the genesis of heartbeats. Psychophysiology. 1985;22(2):147–155. [PubMed]
24. Berger R, Akselrod S, Gordon D, Cohen R. An efficient algorithm for spectral analysis of heart rate variability. IEEE Trans Biomed Eng. 1986;9:900–904. [PubMed]
25. Mateo J, Laguna P. Improved heart rate variability signal analysis from the beat occurrence times according to the ipfm model. IEEE Trans Biomed Eng. 2000;47(8):985–996. [PubMed]
26. Brown E. Methods and Models in Neurophysics. Vol. 14. Elsevier; 2005. Theory of point processes for neural systems, ser; pp. 691–726.
27. Feng J. Computational neuroscience: a comprehensive approach. CRC press; 2004.
28. Chhikara R, Folks L. The inverse Gaussian distribution: theory, methodology, and applications. Vol. 95 CRC; 1989.
29. Gerstein G, Mandelbrot B. Random walk models for the spike activity of a single neuron. Biophysical Journal. 1964;4(1):41–68. [PubMed]
30. Tuckwell H. Introduction to theoretical neurobiology: Nonlinear and stochastic theories. Vol. 2 Cambridge Univ Pr; 2005.
31. Stanley G, Poolla K, Siegel R. Threshold modeling of autonomic control of heart rate variability. IEEE Trans Biomed Eng. 2000;47(9):1147–1153. [PubMed]
32. Andersen P. Statistical models based on counting processes. Springer Verlag; 1993.
33. Loader C. Local regression and likelihood. Springer Verlag; 1999.
34. Iyengar N, Peng C, Morin R, Goldberger A, Lipsitz L. Age-related alterations in the fractal scaling of cardiac interbeat interval dynamics. Am J Physiol — Regulatory, Integrative and Comparative Physiology. 1996;271(4):R1078–R1084. [PubMed]
35. Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov P, Mark R, Mietus J, Moody G, Peng C, Stanley H. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):e215–e220. [PubMed]
36. Solem K, Laguna P, Sörnmo L. An efficient method for handling ectopic beats using the heart timing signal. IEEE Trans Biomed Eng. 2006;53(1):13–20. [PubMed]
37. Moody GB, Mark RG. The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag. 2001 Jun;20(3):45–50. [PubMed]
38. Barbieri R, Brown E. Analysis of heartbeat dynamics by point process adaptive filtering. IEEE Trans Biomed Eng. (1) 2006;53:4–12. [PubMed]
39. Chen Z, Brown E, Barbieri R. Assessment of autonomic control and respiratory sinus arrhythmia using point process models of human heart beat dynamics. IEEE Trans Biomed Eng. 2009;56(7):1791–1802. [PMC free article] [PubMed]
40. Chen Z, Brown E, Barbieri R. Characterizing nonlinear heartbeat dynamics within a point process framework. IEEE Trans Biomed Eng. 2010;57(6):1335–1347. [PMC free article] [PubMed]