An evaluation of ECG data fusion algorithms for wearable IoT sensors

In wearable sensing, accurate estimation of physiological parameters is paramount, although these signals can be corrupted by noise. The fusion of data from multiple sensor sources has the potential to enhance accuracy, even in the presence of disruptive noise. This paper aims to introduce and compare various existing state-of-the-art and novel data fusion techniques to improve the reliability of heart rate estimation. The comparisons were implemented using the MIT-BIH Arrhythmia database with additive noise signals taken from MIT Noise Stress Test Database. When it comes to the challenging low signal-to-noise ratio (SNR) regions, the Kalman fusion and the 𝛼 -trim mean filtering approach exhibits the best performance. The Kalman fusion approach dominates when both channels are corrupted, while the 𝛼 -trim mean filtering elimination algorithm takes the lead when at least one channel is clean. To make the most of these strengths, we have developed an innovative algorithm that can switch between the two fusion methods based on a signal quality indicator (SQI) that serves as a surrogate SNR. This algorithm outperforms the baseline 2-channel RR-interval averaging approach by ≃54% and ≃21% at SNRs of 20 dB and −20 dB respectively. Moreover, it outperforms other cutting-edge heart rate estimation methods.


Introduction
With a rapid expansion in the deployment of Internet of Things (IoT) devices, and the ever-increasing demand for continuous and proactive health monitoring, IoT-based systems are suited to play a major role in up-and-coming health monitoring systems [1][2][3].Cardio Vascular Diseases (CVDs) are often touted as the most dominant contributor to the increase in demand for continuous health monitoring.The analysis of the electrocardiogram (ECG) signal is widely applied in the diagnosis of various heart disorders [4], and heart rate is one of the vital signs indicating disturbances to the cardiovascular system [5], and a wearable cardiac sensor that can acquire a user's ECG data is typically used for continuous physiological monitoring.The ECG signal is a record of the electrical activity of the heart that provides information about the circulatory system.A clean ECG contains a regular cycle of the P wave corresponding to the depolarization of the auricle, QRS complex indicating the depolarization of the ventricles, and  wave corresponding to the repolarization of the ventricles.Extracting any one of the repetitive complexes and waveforms can be used to calculate the heart rate.With developments in IoT architectures enabling the deployment of wearable health monitoring devices, research in health monitoring like cardiovascular health and human activity are fast developing [6].
ECG signals can be easily corrupted by noise such as baseline drift, muscle artifacts, and electrode motion [7].The effects of these noises corrupting the ECG signals can lead to drastic variations in the heart A. John et al.QRS peak is usually easy to detect as it is a characteristic feature of the ECG signal with high-frequency components [16,17].A review of photoplethsymography-based heart rate estimation methods, which is widely studied was provided in [18].Ballistocardiogram signalbased heart rate estimation is also discussed in the literature [19,20].The PhysioNet/Computing in Cardiology (CinC) Challenge 2014 aimed to improve heartbeat detection from multimodal data consisting of ECG and other non-cardiovascular signals.All major results from the challenge have been discussed in [21] and the top score (as of May 2015) on the hidden test data set was 93.64%.A novel beat signal quality index (SQI) based majority voting fusion algorithm for robust heart rate (HR) estimation from cardiovascular and non-cardiovascular signals was proposed in [22].However, the signal quality metrics used were dependent on correctly identifying the beat locations (or fiducial features), while to identify the beat locations, signals of good quality are required.A probabilistic method to estimate the R-peak locations of an electrocardiogram (ECG) signal using a particle filter was discussed in [23], which exhibited a mean absolute heart rate estimation error of 5.044 beats-per-minute (bpm) at −6 dB signal-to-noise ratio (SNR) on a subset of the MIT-BIH Arrhythmia dataset.However, the study was limited to simple R-peak detection for clean data, and therefore subjects exhibiting frequent ventricular ectopies or highly elevated T-segments were not considered in the performance analysis.A probabilistic model to synthesize the heart rate annotations from multiple annotators (QRS detection algorithms) from ECG signals was proposed in [24].An unsupervised Bayesian framework to synthesize the heart rate through fusion from multiple annotators was proposed which exhibited a mean squared error of 14.37 bpm on the 2014 Physionet/CinC challenge database [25].In both these works [24,25], a precision value was required for each annotator which may not be available in real IoT scenarios.
For beat detection from electrocardiogram (ECG) and other noncardiovascular signals such as Electroencephalogram (EEG), Electrooculogram (EOG), and Electromyogram (EMG), a modified slope sum function and the Teager-Kaiser energy operator method was used.The performance of the majority voting fusion method was evaluated on the PhysioNet/CinC Challenge-2014 hidden test set and was found to achieve a score of 90.89% [26].However, the signal quality metrics used were dependent on correctly identifying the beat locations while to identify the beat locations, signals of good quality are required.In [27], a novel fusion algorithm of 12-lead ECG signals is described based on the idea of a local weighted linear prediction algorithm for obtaining a better estimate of the ECG signal from 12 noisy channels taken from the CinC 2011 challenge database, which can then be used for heart rate detection.However, in this method, the issue of quantifying the quality of the fused signal is an ongoing topic of research.In [28], two methodologies for the fusion of cardiac vibration signals to calculate heart rate, obtained from force sensors mounted on a bed was proposed.The first method focused on analyzing the cepstrum computed from the average spectra of the individual channels, while the second method applies Bayesian fusion to three interval estimators applied to each channel.The fusion algorithms were found to outperform the single-channel algorithms.However, the proposed fusion algorithm does not account for the quality of the signals used for fusion.
Signal Quality Indicators (SQIs) are an important consideration in data fusion algorithms as only those signal leads that are reliable should be used for fusion.SQI-based fusion of physiological signals for heartbeat detection has been discussed in [3,29,30].Methods for indicating signal quality as multiple classes or continuous values that specify the acceptability of ECG signals of diagnostic quality have been discussed previously in literature [31][32][33][34][35].The 2011 CinC challenge aimed to improve the quality of ECG signals obtained from mobile recording platforms by indicating whether the signals were of acceptable quality or not, and the premise and key results of the challenge are discussed in [36].In this article, we analyze the performance of various fusion algorithms with and without employing SQIs to analyze their suitability for heart rate estimation.

Contributions of this paper
This article proposes various fusion algorithms for deployment in a wearable IoT device, as well as compares them against other fusion algorithms discussed in the literature.In this article, various data fusion techniques to fuse R peak to R peak (RR) intervals obtained from 2 lead ECG signals are proposed, explored, and compared against each other.Some of these fusion algorithms make use of the instantaneous signal quality indices of the signals for fusion or decision-making, which are developed in this article.This article focuses on finding the best algorithms for calculating accurate RR intervals obtained from stateof-the-art peak detection algorithms.The flow diagram of the proposed methods and study is shown in Fig. 1, with the simulations and analyses carried out in Matlab.An SQI-based fusion algorithm selection model is proposed in this article, which uses a signal quality indicator as a surrogate for SNR to selectively switch between fusion algorithms.The extraction of RR intervals is discussed in Section 2. Signal Quality Indicators for determining the signal quality of the ECG signals are discussed in Section 3, followed by the discussion on Bayesian Elimination of improbable RR intervals in Section 4. The discussion on the various fusion algorithms explored in this article is discussed in Section 5, the performance comparison of the various fusion algorithms is discussed in Section 6, and the final results are covered in 7.

Math notation
Typically  = 0.99 and from the transfer function, the 3 dB cutoff frequency is calculated as 0.0016 ×   , where   is the sampling frequency.For a sampling frequency of 360 Hz, the 3 dB cutoff frequency  3 ≈ 0.6 Hz.

Dataset
Two-channel ECG recordings are used for algorithm evaluation.The recordings are taken from the MIT-BIH arrhythmia database [37,38].The MIT-BIH arrhythmia database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied at the Boston's Beth Israel Hospital Arrhythmia Laboratory.Pre-recorded noise (baseline wander, muscle artifacts, and electrode motion noise) available from the MIT-BIH Noise Stress Test Database (NSTDB) [37][38][39] is used for algorithm evaluation.The noise recordings were made using physically active volunteers and standard ECG recorders and electrodes.The electrodes were placed on the limbs in positions in which the subjects' ECGs were not visible.Therefore, it should be noted that the noise recordings used were not simulated, but rather taken from recordings made from physically active volunteers to simulate ambulatory noise that may exist in ambulatory ECG recordings.
A. John et al.

Single channel QRS complex detection and RR sequence calculation
The QRS complex locations were extracted using 2 widely used QRS detection algorithms made available in the WFDB toolbox [38,40]: GQRS and SQRS; from which 2 streams of RR intervals per ECG signal channel are calculated:  1 ,  2 ... 4 .The RR sequence is generated by taking the interval between two consecutive beat annotations.The RR interval at a given time sample is derived from the beat annotation prior to that time sample and the beat annotation that is followed immediately.The details can be observed from Fig. 2. The performance of the fusion algorithms discussed in this article is compared against the performance of the single-channel RR intervals calculated from these QRS detection algorithms.The QRS algorithms may not exhibit the same performance over all noise conditions and may not be linearly dependent on the ECG signal's quality which may lead to difficulties in analyzing the performance of the fusion algorithms.

Signal Quality Indicators
With the emergence of telemedicine and continuous monitoring of vital signs inside and outside clinical settings, the data acquired from wearable devices are often contaminated with various types of noises such as electrode contact noise, muscle noise, and motion artifacts.The data obtained may not be of good diagnostic quality and cannot be used for further processing, therefore Signal Quality Indicators (SQIs) for the successful deployment of monitoring devices have become evident over the years.

Kurtosis SQI (𝑆𝑄𝐼 𝑘𝑢𝑟𝑡 )
Kurtosis is defined as the fourth standardized moment of a probability distribution.Kurtosis is a measure of the sharpness of the peak of a distribution.The commonly used equation for calculating the kSQI for a random variable  is  ≜ (−) 4  4   .The sample kurtosis for  samples of  denoted by   , and is calculated as shown in (2) [32]: where μ and σ are the empirical estimates of the mean and standard deviation of the distribution (the ECG signal analysis window).To estimate the kurtosis SQI, we propose the following estimation method: where [] is the ECG signal sample at sample point  and μ[] is the moving mean of the record, calculated as   0.99 ([]).Then the kurtosis SQI is calculated as shown in (5).

Frequency SQI (𝑆𝑄𝐼 𝑓 𝑟𝑒𝑞 )
Frequency SQI is based on the dominant frequency band that contains ECG signal information [35].The energy in the QRS complex is centered at 10 Hz and is 10 Hz in width.Therefore, fSQI of ECG signal  is mathematically defined in literature as shown in (6): where   is the power spectral density of .In the equation for fSQI, the numerator represents the energy of the QRS wave and the denominator represents the overall energy of the ECG signal.To calculate an fSQI at each signal sample point per record, the method proposed is as shown in (7): where ℎ 1 [] and ℎ 2 [] are the impulse responses of bandpass filters with cutoff frequencies 5 Hz and 15 Hz corresponding to the QRS peak frequency range, and 5 Hz and 50 Hz corresponding to the ECG signal frequency range respectively.The outputs of these two filters are squared to calculate the energy in the desired frequency ranges and are then subsequently filtered using the low pass filter   0.99 .

Total SQI
The   and    are combined to form the total SQI per channel as: The total SQI for an ECG signal window can be seen in Fig. 3(c) and Fig. 3 Here the total SQI is averaged over a signal record of 20 s in length per sample per SNR.Most ideally, the SQI values must vary linearly with SNR, but since this is not possible as signal quality can vary from 0 to ∞, a monotonically increasing function is a sufficiently good estimate of the signal quality.The total SQI, which is a product-based combination of kurtosis SQI and frequency SQI is a sufficiently good indicator of signal quality as the function is monotonically increasing against the SNR values for all the noise conditions considered.The Gaussian noise is used to indicate the SQI performance under a combination of different noises by assuming that noise sequences consist of independent random variables and the sum of these tends towards a Gaussian distribution as more variables are added, as per the central limit theorem.Moreover, other SQIs discussed in the literature focus on quality assessment through the detection of the QRS peak.However, to achieve accurate QRS detection, the acquired signal has to be of acceptable quality [33].Therefore QRS detection-based SQIs, or SQIs which require the detection of ECG fiducial points are not ideal to be used for determining the acceptability of the signal.Both kurtosis and frequency SQIs do not require the estimation of the fiducial points (ie., the QRS peak locations and therefore are less expensive in terms of the computations required).

The SQI per RR interval stream
For each channel, the signal quality indicator is calculated as shown in (8).Each RR interval stream obtained per channel (it should be noted that 1 or more QRS detectors can act on a single channel of ECG signal to obtain one or more RR interval streams) is assigned the signal quality indicator for that channel, which then forms   [𝑛] where  corresponds to the th RR interval stream and  is used to indicate the SQI sample.These are then normalized to add to unity as shown in (9): The normalized SQIs for each channel can be seen in Fig. 3(e) and Fig. 3(f), assuming that there is only one QRS detector per channel.However, as the number of QRS detectors per channel increase, the corresponding SQI values per RR interval stream per channel reduce such that the sum of SQIs over all RR interval streams add up to unity.

Bayesian probability calculation
Bayes probability for each RR interval at each sample  is calculated by taking the past observations from all sensors and a probability function at each time step is calculated assuming a Gaussian distribution.This apriori probability distribution is called . We find the moving sample mean   and moving sample variance  2   over a trailing window of length 10 s as shown: where   [ * ] indicates the unique RR interval values within the trailing window of 10 s at , and  corresponds to the number of such unique RR intervals.Here, unique RR intervals alone are considered as since the RR values exist for each sample, the RR values obtained when a beat is missed (leading to more samples having the same wrong RR value) will have more influence over RR values obtained correctly.  and  2  are used to find the probability of the current observation given the probability of the previous observations, . Since we assume a Gaussian distribution for the RR intervals, the sample variance follows a chi-squared distribution ( 2 ).The sample variance is related to the variance of the Gaussian distribution: where  2  stands for the chi-squared distribution with  degrees of freedom.Therefore, the variance of the distribution can be estimated as: with a confidence of 99% and we take the lower limit as the estimate of variance.Therefore, the standard deviation estimate is calculated as shown in (13): where  is the total number of unique RR intervals in the window of length 10 s and  2 ,0.005 is the value of the inverse complementary  2 cumulative distribution function with  observations and a confidence interval of 0.005.From this we calculate as shown in (14): With that, we have a grouping of probabilities per RR interval, and we normalize the probabilities at each sample/ RR interval index such that the probabilities add up to 1.

Fusion algorithms
The RR intervals calculated per detector per channel are fused to estimate the correct RR interval using some common fusion algorithms discussed in this section.We consider three broad categories of fusion methods as indicative from their naming: 1. selection of a single most probable RR value, 2. calculation of most probable value by combination, and 3. calculation of most probable value through elimination and combination, and these fusion methods are detailed in Sections 5.1, 5.2, and 5.3.

Median
The median method extracts the median RR interval at each sample point.

RR best SQI
In the Best SQI-based RR calculation, the RR interval obtained from either channel is chosen depending on which channel has the best SQI.From the QRS detection algorithms, the SQRS algorithm was found to exhibit the best average performance for this particular data set when the signals are noisy, and therefore, the RR interval obtained from the SQRS algorithm is chosen from the channel (all RR intervals from a single channel has the same SQI) with the highest SQI.

Best Bayes
The Bayesian probabilities obtained post-Bayesian filtering can be used to select the most probable RR interval as: In the situation where multiple RR channels have equal probabilities, the preference order is RR interval from SQRS, followed by RR interval from GQRS.

Simple weighted average
In the simple weighted averaging (SWA) method of fusion, the RR intervals obtained from the multiple detectors and channels are fused by adding the RR intervals weighted by the normalized SQIs (as per (18)).

SWA with Bayesian probabilities
Although the Bayesian elimination stage is optional, we analyze the feasibility of substituting the SQIs with the Bayesian probabilities obtained in the Bayesian elimination.The Bayes probabilities can be used to indicate the suitability of the RR interval streams obtained per detector per channel for fusion.The probabilities can be normalized such that they add up to 1: From the Bayesian probabilities based SQI obtained in (19), the SWA using    can be calculated as shown in (20):

Kalman fusion
Kalman fusion is a well-known technique in data fusion based on Kalman filtering [41].Kalman filter essentially creates an estimate of the current state of a system and then updates the state based on observations at the current time step.The estimate of the current state is then used to update the initial prediction for the state at the next time step.Prior to the Kalman fusion stage, the algorithm first checks for possibly erroneous RR intervals from the input RR intervals   by checking if the RR interval exceeds 1.7 s or falls below 0.3 s.If such values are observed, they are replaced by the closest upper or lower limit values to obtain the pre-processed RR intervals  , .
We use these corrected interval sequences in finding the RR interval sequence estimate   [𝑛], or the state that we are interested in estimating, which is a scalar at time step  (in general Kalman filters have a state vector, which is what the filter learns to estimate and is continuously updated, and in our case, we employ a scalar state variable ≜   [𝑛]) at time step .The observation vector will be the RR intervals calculated from the QRS detection algorithms  , [] at time step , which is the input to the Kalman filter.For performing Kalman filtering, some important parameters are: the apparent noise variance of each individual RR interval detector per channel, which is represented as a row vector of length .

𝑅𝑅 𝑘𝑎𝑙
[], the apparent noise variance of the previous estimation (at time step  − 1), which is a scalar.
We define the noise  [] to be the difference between the sensor values and the apparent actual values.These values can be initialized to be any value and we define them as ) ∀ ; corresponding to each set of sensor observations  , [] at time step  [41].A weighting system  is created such that: where  is a column vector of ones, and  is a matrix of ones of dimension  × .Once we have all the requisite parameters at , we can then calculate the RR interval estimate   [ + 1] through Kalman filtering process as [41]: where   [1] = 0. Next, the new noise variances  2  [ + 1] and  2     [ + 1] based on their errors in comparison to the actual value is calculated.In real-life applications, we do not have an actual value to use as a control so in its stead we use the values found from the Simple Weighted Averaging (   [𝑛]).We then have: where  is the number of updates that have already occurred (here  = , which corresponds to the time step).Noise variance from the previous estimate,  2     is updated as: , (24) where the parameter  controls how quickly the values of the output variance should change and was set as 5 empirically.

𝜎-trimmed mean filter
This filtering method makes use of the    [] computed in (18) to filter out RR interval streams that deviate by more than half of the standard deviation of all RR interval streams from the    [𝑛] estimate.The RR intervals that satisfy the following inequality (25): where | ′ | is the cardinality of the set  ′ at .

𝛼-trimmed mean filter
The -trim mean filtering is a method of filtering out ⌊⌋ channels or streams that are furthest below and above the median value of all  RR interval streams   [𝑛] where  = 1, 2, ..; The values that are furthest below and above the median are discarded resulting in a new set of streams having indices  ′ .For e.g. if  = 4, and stream index 2 is removed then  ′ = 1, 3, 4. We then re-normalize the SQIs using this sub-set of streams to obtain  ()    as shown in (27): The modified simple weighted average is then calculated as shown in (28) to obtain the RR interval estimate from the -trim mean filtering algorithm.

Bayes-eliminated fusion algorithms
Bayesian Elimination is an optional stage used in eliminating any improbable RR intervals, obtained per channel per QRS detector algorithm, based on the probability distribution generated from the previous RR intervals as discussed in 4. The probabilities obtained per RR interval are normalized at each sample/ RR interval index such that the probabilities add up to 1. Now, we discard intervals with relevant probabilities below 0.05.With the remaining intervals, we will now recalculate the fused results from the methods discussed in Section 5.1, 5.2, and 5.3 with the potential benefit of removing the influence of erroneous RR intervals.The fusion methods discussed in Section 5.1, 5.2, and 5.3 after the Bayesian elimination stage is called Bayes-eliminated fusion algorithms in subsequent portions of the article.

Mean absolute error
For the performance evaluation of the various fusion methods discussed in Section 5, the mean of the absolute errors   (MAE) between the estimated RR intervals R[] and the ground truth RR interval [] is used as a metric.  is calculated per record as shown in (29) where  is the length of the record.The RR intervals have a unit of seconds, and therefore,   has a unit of seconds.
Three different scenarios are simulated with three different types of noise added to channel 1 alone, channel 2 alone, and both channels such that the SNR is varied from −20 dB to 20 dB, and the performance of the fusion algorithms in terms of   is as shown in Tables 1, 2, and 3 respectively.4).Amongst the selection algorithms, the median method discussed in Section 5.1.1 exhibits the best performance, which is close to the performance of the non-corrupted lead in the scenario where only one lead is corrupted by noise in the high SNR regions.In the scenario where both leads are corrupted by noise, performance in the low SNR regions (below 8 dB) is close the to best performing selection algorithm, and is better than the performance of the individual leads without fusion.However, in the case of the Bayes-eliminated median selection, when only one channel is corrupted by noise, the Bayes-eliminated median method does not exhibit a performance better than the individual leads prior to fusion.The Bayes-eliminated median method could be performing poorly when Bayesian elimination is carried out, as due to the Bayesian elimination stage, the median interval which is selected in the simple median algorithm (which is from the reliable channel) is eliminated occasionally.This is also an indicator of the unreliability of the QRS detection algorithms.
The Best SQI method discussed in Section 5.1.2exhibits good performance in the low SNR regions but is outperformed by the median method in all scenarios.In the high SNR regions, its performance is comparable to the cleanest single channel.Bayes-eliminated Best SQI method exhibits better performance compared to the Best SQI method.This could be attributed to the Bayesian elimination stage successfully eliminating RR intervals and the renormalized SQIs ensuring that the  RR interval from the non-noisy channel is selected, unlike in the Bayes-eliminated median selection where the median value is directly selected.
The Best Bayes algorithm discussed in Section 5.1.3exhibits a performance similar to the median method in the low SNR regions.The errors in performance could be simply attributed to high Bayes probabilities being assigned to beat values detected from channel 2 when channel 2 is noisy, because a few wrong RR intervals in the initial stages of Bayesian elimination, could lead to subsequent RR intervals further away from the ground truth being assigned higher probabilities.This could also be due to the algorithm choosing just 1 RR interval based on the Bayesian probabilities without fusing all the RR intervals A. John et al.  by averaging or weighted averaging.In the high SNR regions, it exhibits a mean absolute error comparable to or even less than the cleanest individual lead.In the scenario where both leads are corrupted by muscle artifacts, the best Bayes method exhibits a performance that is significantly better than the individual leads, indicating that the Bayes probabilities are most reliable when both signals are corrupted by noise.The Simple Weighted Average (SWA) method discussed in Section 5.2.1 exhibits good performance when only one channel is corrupted by noise, as this method takes into account the signal quality.In a Kalman fusion algorithm and the Bayes Kalman algorithm used in this work uses the RR interval estimate from the SWA algorithm as the control, and therefore would require the SQI calculation and SQI normalization steps.However, to carry out Kalman fusion, any other RR interval estimate can be used as control and therefore the SQI calculation and SQI normalization steps are not checked here.the case of the Bayes-eliminated SWA method, the performance further improves.This is as expected as discussed in the case of the Best SQI method.In the scenario where both channels are corrupted by noise, both SWA and Bayes-eliminated SWA provides a better estimate of the RR interval sequence compared to the individual leads.The simple weighted average with Bayes probabilities discussed in Section 5.2.2 outperforms the SQI-based SWA in the low SNR regions when only one channel is corrupted by noise.This is expected as the Bayes probabilities would be high for RR intervals from clean channels.This is consistent with our findings with the Best SQI method.The Best SQI method however considers only one channel of information which may not be the most reliable, although the Bayes probability indicates it is.This issue is solved in the case of SWA with Bayes probabilities and, it is observed that this method outperforms the Best SQI method when only one channel is corrupted by noise.In the high SNR regions, this method exhibits the best performance amongst all methods considered and outperforms the performance of the average single-channel method.Amongst the fusion methods that use combination-based fusion, Kalman fusion discussed in Section 5.2.3 does not perform very well in the low SNR regions when only one channel is corrupted by noise.This is because the algorithm treats all RR intervals equally and as if calculated from 2 signals of the same quality, even though    is taken as the control sequence.In the scenario where both channels are corrupted by noise, Kalman fusion exhibits the best performance in the low SNR regions as both channels are now of similar reliability.The Bayes-eliminated Kalman fusion shows a further drop in performance compared to Kalman fusion, indicating that sometimes the Bayesian elimination stage eliminates RR segments that are reliable in favor of unreliable RR segments as discussed in the case of median filtering.The performance of the Kalman fusion algorithm could be improved upon by using improved estimates of initial conditions and update parameters like  in (24).However, since the Kalman fusion method is the best performing method in the low SNR regions when both signals are corrupted by noise and the MAE is significantly lower compared to the second-best performing algorithm, Kalman fusion is a clear winner for implementation in noisy scenarios when both channels are corrupted by noise.
With regard to the two elimination and combination algorithms, the -trim mean filtering method discussed in Section 5.3.1, can be thought of as an elimination algorithm followed by a simple averaging.Therefore, its performance is very close to the performance of the SQIbased simple weighted average method in the low SNR regions when only one channel is corrupted by noise (we assume that the eliminated RR sequences are from the noisy channels).However, when a channel is corrupted by muscle artifacts, leading to very large variations in input RR intervals (as indicated by the MAE of lead 1 in Table 1 and MAE of lead 2 in Table 2 for muscle artifacts at −20 dB), the performance is much better compared to the SWA method.This could be due to significantly erroneous RR intervals being eliminated, compared to SQIbased averaging, where even the channel with very poor SQI does contribute a little to the weighted average.The Bayes-eliminated trim method closely follows the performance of the -trim method.When a single channel is corrupted by noise, both the -trim mean filtering method and the Bayes-eliminated -trim mean filtering method performs better than the non-corrupted lead in the high SNR regions.
When both channels are corrupted by noise, both the -trim mean filtering method and the Bayes-eliminated -trim mean filtering method performs better than the individual leads in estimating the RR sequence at all SNRs.
The -trim mean filtering method discussed in Section 5.3.2exhibits the best performance, when only one channel is corrupted by noise, in the low SNR regions amongst all the algorithms considered.This indicates that the median-based elimination and SQI-based combination work well in the low SNR regions as unreliable RR intervals are eliminated and the remnant RR intervals are fused on the basis of the renormalized SQIs.The Bayes -trim method performs poorly compared to the simple -trim method as good RR intervals are probably eliminated in the low SNR regions.However, in the high SNR region, Bayes-eliminated -trim mean filtering exhibits the best performance along with SWA using Bayes probabilities, indicating that Bayes probability-based methods are highly reliable in the high SNR regions.The -trim mean filtering method is consistently found to be either the best-performing algorithm or exhibits a better performance than the single non-corrupted lead in all scenarios and is, therefore, a suitable fusion algorithm when only one channel is corrupted by noise.

Bland Altman plots
Figs. 8, 9, and 10 show the Bland Altman plots for when the ECG signal channel 1 is corrupted by electrode motion noise and channel 2 is clean, channel 2 is corrupted by baseline wander and channel 1 is clean, and both channels are corrupted by muscle artifacts at −20 dB and 20 dB respectively.The Bland Altman plot has the reference RR on the -axis instead of the mean of the reference RR and calculated RR from the algorithm for ease of comparison [42].The -axis shows , which is calculated as: where R[] is the estimated RR intervals and [] is the ground truth RR interval.The Bland Altman plot is shown as a heat map instead of points to indicate the density of points in each of the regions or bins.We chose the scenarios at −20 dB and 20 dB to showcase the fusion algorithms that exhibited the best performance in the extreme scenarios, as can be observed from Tables 1, 2, and 3.The plots are generated from all records considered in this dataset for every 10 samples.From Figs. 8, 9, and 10, we observe that there is no noticeable bias visible in the mean of the differences.However, we observe that the limits of agreement (99th percentile of differences, P 99 and 1 st percentile of differences, P 1 ) of the fusion algorithms at −20 dB are much narrower compared to leads that are corrupted by noise.Moreover, it can be seen that the best-performing fusion algorithms have a narrow spread compared to the performance of the individual leads at 20 dB.It can also be observed from the heat map that the highest density is concentrated around  = 0, for the fusion algorithms, which exhibits the merits of the algorithms.

Results
The key results and best strategies are discussed in Table 5.From Table 5, it can be observed that the -trim mean filtering algorithm performs well in the high SNR regions as well as when only one channel is corrupted by noise at low SNRs.Since the -trim mean filtering algorithm and the Bayes probability-based fusion exhibits similar performance in the high SNR regions, the -trim mean filtering algorithm is considered, as calculating the Bayes probabilities significantly adds to the computational overhead.When both channels are corrupted by noise and at very low SNRs, the Kalman fusion algorithm exhibits the best performance.Therefore, a combination of these two methods would provide the best results.The computational complexity of the -trim mean filtering stage is () and can be implemented on a wearable IoT device after optimizing the  [44]).Therefore, we analyze the feasibility of an SQI-based algorithm selection method based on an SQI cutoff threshold.For this, the performance of the -trim mean filtering algorithm and Kalman fusion algorithm were analyzed at 25 different SNR combinations when the channels are corrupted by muscle artifacts, and the SQI cutoff levels were chosen empirically.The proposed final RR interval estimate   at each sample is selected based on the SQIs of both leads at each timestamp as shown in (31).
We further analyze the efficacy of this method by corrupting the signal leads at 25 different SNR combinations per sample per signal in the dataset with electrode motion noise.The performance in terms of mean absolute error and root mean square error (RMSE) are as shown in Table 6.The RMSE is calculated as: From Table 6, it can be observed that in the high SNR regions, the algorithm selection method provides an estimate which is close to that obtained from the -trim mean filtering algorithm, while in the low SNR regions, the final estimate is influenced by the Kalman fusion algorithm to provide a better estimate than the -trim mean filtering algorithm.It should be noted here that although symmetric scenarios are expected to exhibit the same performance, the difference in the performance of the QRS detectors on lead 1 and lead 2 in the clean signal scenario (as discussed previously) leads to small differences.When both the signal channels are corrupted by noise such that the signal-to-noise ratio is 20 dB, the proposed fusion algorithm exhibits a performance improvement of 53.91% compared to the RR interval estimate obtained by simple averaging of the RR intervals from both leads.When both the signal channels are corrupted by noise such that the signal-to-noise ratio is −20 dB (low SNR), the proposed fusion algorithm exhibits a performance improvement of 20.83% compared to the mean RR interval estimate from both channels.Since in the low SNR regions, the algorithm performance provides a much better estimate than that obtained from the individual channels, this method of algorithm selection can be considered to be quite effective.Moreover, in a very low power setting, the Kalman fusion algorithm can be eliminated to bring down the computational complexity as the Kalman fusion algorithm is desirable only in the low SNR regions, and the -trim mean filtering can be chosen as the fusion algorithm for all scenarios with good confidence.
The performance1 of the SQI-based fusion algorithm selection method is compared against other state-of-the-art RR interval estimation algorithms (note that there are various other heartbeat detection algorithms in the literature, but here we focus on RR interval estimation) and is as shown in Table 7.The comparison with Nathan et al. [23] is carried out when the signals are at −6 dB SNR and the same subset of the MIT-BIH Arrhythmia dataset as used in [23] is used, and the proposed fusion method exhibits an MAE of 2.745 bpm.The proposed algorithm performs better as the RR estimation method proposed here fuses 2 ECG recordings, while in [23], only a single channel of ECG is used for RR estimation.The comparison with Xie et al. [24] and Di et al. [25] is carried out when the signals are clean and the proposed method outperforms the methods in [24,25].This difference in performance could be attributed to the requirement of obtaining a precision value associated with each annotator.However, direct comparison with Xie et al. [24] and Di et al. [25] is not advisable due to the different datasets used for analysis.It can be observed that in all scenarios our method outperforms the state-of-the-art.

Conclusion
The article explores various fusion algorithms to accurately estimate the RR interval by using the beats detected from two simultaneously recorded ECG channels obtained from two state-of-the-art QRS detection algorithms.The performance of the fusion algorithms on signals corrupted by muscle artifacts, electrode motion, and baseline wander indicates the suitability of the fusion algorithm for deployment on a wearable IoT device.It was observed that the Kalman fusion algorithm exhibits the best performance when both channels have signals with low SNRs, and the -trim mean filtering algorithm is suitable for all the other noise scenarios.Consequently, the feasibility of using SQIs as a surrogate for signal SNR for algorithm selection is proposed and it was observed that the strategy of using Kalman fusion when SQIs of both channels are below a threshold and -trim mean filtering elsewhere in the input signal space is effective in obtaining the best estimate of the RR interval.When both the signal channels are corrupted by noise such that the signal-to-noise ratio is 20 dB, the proposed fusion algorithm exhibits a performance improvement of 53.91% compared to the RR interval estimate obtained by simple averaging of the RR intervals obtained from the individual leads.When both the signal channels are corrupted by noise such that the signal-to-noise ratio is −20 dB (low SNR), the proposed fusion algorithm exhibits a performance improvement of 20.83% compared to the mean RR interval estimate from both

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 2 .
Fig. 2. Figure detailing how the RR sequence is extracted from the beat annotations.
(d), corresponding to two ECG channels with a portion of channel 1 corrupted by noise and channel 2 is clean.Figs. 4 (a), 4 (b), 4 (c), and 4 (d) show the output total SQI at varying SNRs for when the ECG signal is corrupted by (a) Electrode Motion noise, (b) Muscle Artifacts, (c) Baseline Wander, and (d) Gaussian noise respectively.

Fig. 4 .
Fig. 4. The Total SQI vs SNR plot for when the ECG signal is corrupted by (a) Electrode Motion noise, (b) Muscle Artifacts, (c) Baseline Wander noise, and (d) Gaussian noise.

Fig. 5 .Fig. 6 .
Fig. 5.The plot of MAE vs SNR when Channel 1 is corrupted by Muscle Artifacts and Channel 2 is clean.

A
.John et al.

Fig. 7 .
Fig. 7.The plot of MAE vs SNR when both Channel 1 and Channel 2 are corrupted by Muscle Artifacts.

Fig. 8 .
Fig. 8. Bland Altman plot showing the agreement between the RR interval information obtained from (a) Channel 1, (c) Channel 2, and (e) -trim mean filtering when channel 1 is corrupted by electrode motion noise at −20 dB and channel 2 is clean, and (b) Channel 1, (d) Bayes -trim mean filtering, and (f) SWA with Bayes probabilities when Channel 1 is corrupted by electrode motion noise at 20 dB and when channel 2 is clean, with the reference RR interval for all records.The limits of agreements of the differences are shown through  99 and  1 in the figure to indicate where 98% of the estimates lie.The values recorded are in beats per minute (The -axis limits of the plots have been adjusted for ease of comparison and, therefore, some outliers are not visible in the figures).

Fig. 9 .
Fig. 9. Bland Altman plot showing the agreement between the RR interval information obtained from (a) Channel 2, (c) Channel 1, and (e) -trim mean filtering when channel 2 is corrupted by baseline wander noise at −20 dB and channel 1 is clean, and (b) Channel 2, (d) Bayes -trim mean filtering, and (f) SWA with Bayes probabilities when Channel 2 is corrupted by baseline wander noise at 20 dB and when channel 2 is clean, with the reference RR interval for all records.The limits of agreements of the differences are shown through  99 and  1 in the figure to indicate where 98% of the estimates lie.The values recorded are in beats per minute (The -axis limits of the plots have been adjusted for ease of comparison and, therefore, some outliers are not visible in the figures).

Fig. 10 .
Fig. 10.Bland Altman plot showing the agreement between the RR interval information obtained from (a) Channel 1, (c) Channel 2, and (e) Kalman fusion when both channel 1 and channel 2 is corrupted by muscle artifacts at −20 dB, and (b) Channel 1, (d) Channel 2, and (f) SWA with Bayes probabilities when both channel 1 and channel 2 is corrupted by muscle artifacts at 20 dB, with the reference RR interval for all records.The limits of agreements of the differences are shown through  99 and  1 in the figure to indicate where 98% of the estimates lie.The values recorded are in beats per minute (The -axis limits of the plots have been adjusted for ease of comparison and, therefore, some outliers are not visible in the figures).

CRediT authorship contribution statement Arlene John :
Conception and design of study, Analysis and/or interpretation of data, Drafting the manuscript, Revising the manuscript critically for important intellectual content, Approval of the version of the manuscript to be published.Antony Padinjarathala: Conception and design of study, Analysis and/or interpretation of data, Drafting the manuscript, Approval of the version of the manuscript to be published.Emer Doheny: Analysis and/or interpretation of data, Revising the manuscript critically for important intellectual content, Approval of the version of the manuscript to be published.Barry Cardiff: Analysis and/or interpretation of data, Drafting the manuscript, Revising the manuscript critically for important intellectual content, Approval of the version of the manuscript to be published.Deepu John: Conception and design of study, Revising the manuscript critically for important intellectual content, Approval of the version of the manuscript to be published.
[37][38][39]from the dataset do not undergo any pre-processing prior to being fed into the fusion stage.Pre-recorded noise (baseline wander, muscle artifacts, and electrode motion noise) available from the MIT-BIH Noise Stress Test Database (NSTDB) is used for simulating the three different scenarios[37][38][39].Although ECG signals can be corrupted by a wide variety of noise scenarios like instrumentation noise, power-line interference, motion artifacts etc., we consider the noise signals in the MIT NSTDB database.The MIT NSTDB contains real recordings of electrode motion noise, muscle artifacts, and baseline wander and hence is a good representation of the noise signals corrupting ECG signals in real-life scenarios.Here, the   values indicate the mean of the   of all records in the MIT-BIH Arrhythmia database (except records 101, 117, and 200 that were dropped due to the poor performance of SQRS and GQRS algorithms when the signals were clean on channel 2).The algorithms exhibiting the best performance at each noise SNR are highlighted in bold and underlined in the Tables.Lead 1 outperforms lead 2 (Tables1 and 2) in the clean signal scenario (here the signals from the dataset are assumed to be clean) as the QRS detectors work better on lead 1 compared to lead 2, indicating that the QRS detectors do not work well on lead 2 (due to recordings on lead 2 being mainly from the precordial ECG leads).The algorithms that exhibit better performance than the clean individual lead are highlighted in bold.The performances of the algorithms in terms of   when Muscle Artifacts corrupts channel 1, channel 2, and both channels from −20 dB to 20 dB SNR are shown Figs. 5, 6 and 7 respectively.To aid in the comparison process, a table of steps used for each of the fusion algorithms is provided ( Table

Table 1
Mean Absolute Errors in seconds indicating the performance of fusion algorithms when lead 1 is corrupted by noise and lead 2 is clean.
a Pre Fusion lead 1 and lead 2 indicate the average of the MAEs of the RR intervals obtained from the 2 QRS algorithms per channel.

Table 2
Mean Absolute Errors in seconds indicating the performance of fusion algorithms when lead 2 is corrupted by noise and lead 1 is clean.Pre Fusion lead 1 and lead 2 indicate the average of the MAEs of the RR intervals obtained from the 2 QRS algorithms per channel. a

Table 3
Mean Absolute Errors in seconds indicating the performance of fusion algorithms when both lead 1 and lead 2 are corrupted by noise.
a Pre Fusion lead 1 and lead 2 indicate the average of the MAEs of the RR intervals obtained from the 2 QRS algorithms per channel.

Table 4
Steps involved for RR interval estimation for each algorithm.

Table 5
Strategies for fusion methods for different scenarios.

Table 6
SQI based algorithm selection to obtain reliable RR interval estimate.

Table 7
Comparison with state-of-the-art RR interval estimation methods.Moreover, since the Kalman fusion algorithm is desirable only in the low SNR regions, the computational complexity can be further reduced by employing just the -trim mean filtering for all scenarios with good confidence.