A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables

Smart watches and other wearable devices are equipped with photoplethysmography (PPG) sensors for monitoring heart rate and other aspects of cardiovascular health. However, PPG signals collected from such devices are susceptible to corruption from noise and motion artifacts, which cause errors in heart rate estimation. Typical denoising approaches filter or reconstruct the signal in ways that eliminate much of the morphological information, even from the clean parts of the signal that would be useful to preserve. In this work, we develop an algorithm for denoising PPG signals that reconstructs the corrupted parts of the signal, while preserving the clean parts of the PPG signal. Our novel framework relies on self-supervised training, where we leverage a large database of clean PPG signals to train a denoising autoencoder. As we show, our reconstructed signals provide better estimates of heart rate from PPG signals than the leading heart rate estimation methods. Further experiments show significant improvement in Heart Rate Variability (HRV) estimation from PPG signals using our algorithm. We conclude that our algorithm denoises PPG signals in a way that can improve downstream analysis of many different health metrics from wearable devices.


Introduction
Photoplethysmography (PPG) is a non-invasive optical measurement technique that provides vital information about the cardiovascular system.A PPG-enabled device consists of an optical sensor that measures volumetric variations of blood circulation as a PPG signal.Modern PPGenabled devices include a variety of technologies such as fingertip-based pulse oximeters, forehead and earlobe-based PPG sensors, and most commonly, wrist-worn smart watches (Castaneda et al., 2018).PPG monitoring can enable early detection of serious heart conditions that otherwise might go undetected (Allen et al., 2006;Pereira et al., 2020).A key application of PPG in wearable devices is the estimation of heart rate (HR) (Almarshad et al., 2022).
PPG is limited by its susceptibility to noise artifacts, including motion artifacts (MA) caused by body movements and artifacts arising from environmental factors like ambient light, sweat, pressure, etc. (Y.Zhang et al., 2020).In order to ensure accuracy of HR estimates and robust diagnosis of medical conditions, it is essential to mitigate such artifacts.Methods that address this limitation for prediction of HR from PPG signals can be broadly categorized into two types.The first type estimates HR directly from the signals despite the presence of artifacts (Biswas et al., 2019;Panwar et al., 2020;Reiss et al., 2019;Shyam et al., 2019;Temko, 2017).The second type of method attempts to extract, denoise or reconstruct a clean signal from the noise-corrupted signal (Chang et al., 2021;Galli et al., 2018;Kasambe & Rathod, 2015;Wu et al., 2017).These methods output a clean signal that can potentially be used for multiple downstream tasks, including HR estimation, which makes them more generally useful.These approaches reconstruct the entire PPG signal, even if most of the signal may already be artifact-free.This may potentially distort the original signal and cause loss of morphological information even in the useful parts of the signal.Ideally, we would like to have a method that denoises only the noisy part of the signal -preserving the valuable information in the uncorrupted part -and provides a clean signal that can be used for accurate HR estimation and for other downstream tasks.That is the focus of the present work.
In this paper, we present a novel method for reconstructing clean PPG signals from noisy signals.It preserves the useful segments of the PPG signals that are uncorrupted, and only reconstructs the corrupted sections.This is achieved by decoupling the tasks of artifact detection and removal.We apply an artifact-detection algorithm to remove artifacts from the signal, and then use a denoising autoencoder to reconstruct the signal only in the regions where artifacts were removed.The denoised signal is then used for HR estimation using band-pass filtering and peak detection.This way, our reconstructed signals are more faithful to the truth and more useful for downstream tasks.
An interesting aspect of the proposed approach is the way it leverages publicly available data.This study relies on two sufficiently large and complex public PPG datasets, PPG-DaLiA (Reiss et al., 2019) and the Stanford dataset (Torres-Soto & Ashley, 2020).PPG-DaLiA records subjects in a wide variety of settings (e.g., walking, driving, eating, cycling).It is a high-quality dataset that contains an external means of extracting ground truth HR using electrocardiogram (ECG) signals that are simultaneously recorded and are relatively free from noise.On the other hand, the Stanford dataset is much larger, but has only PPG signals and no external way to assess ground truth HR.Deciding how to best leverage these datasets was a challenge.We chose to use PPG-DaLiA only for out-of-sample testing purposes, since it has ECG for ground truth and thus can provide an honest assessment of HR.We extract and leverage the clean signals from within the Stanford dataset to devise a self-supervised training methodology that is able to reconstruct realistic clean signals.
The proposed method is called SPEAR -Self-supervised PPG Erase Artifacts and Reconstructand it is a novel algorithm for denoising PPG signals.SPEAR learns to denoise PPG signals using the following training and evaluation paradigm, outlined in Figure 1: (1) Removal of segments with artifacts using an artifact-detection algorithm, leaving only clean signal, (2) Erasing random parts of the clean signal, and (3) Training a denoising autoencoder to reconstruct these erased parts of the clean signal.The signal is reconstructed in such a way that only the locations that have been erased are reconstructed, and the rest of the signal is unchanged.In this way, given a new noisy signal for testing, our method would (1) apply the artifact-detector, (2) erase the artifacts, and (3) reconstruct the missing pieces using the trained denoising autoencoder to form a clean signal that can be used for downstream tasks.Since it has learned to reconstruct from clean PPG signals in training, it will reconstruct clean signals during testing.We estimate HR using band-pass filtering and peak detection; this type of basic method works precisely because the PPG signal is now clean.
The experimental results reveal that traditional signal processing techniques generally achieve limited efficacy in heart rate estimation, and that supervised deep-learning methods show better estimation accuracy on datasets they are trained on, with diminished generalizability to other datasets.SPEAR does not exhibit these limitations.Its performance on the PPG-DaLiA test set is comparable to that of deep learning methods trained on the same distribution, despite SPEAR being trained on the Stanford dataset.On a hold-out test set from the Stanford dataset, SPEAR outperforms all other methods.Most importantly, the fact that SPEAR produces clean, continuous PPG signals allows the results to be used for downstream tasks beyond heart rate estimation.This study also investigates heart rate variability (HRV) estimation as a downstream task, revealing that the accuracy of HRV estimates benefits from the utilization of the denoised PPG signals produced by SPEAR.

Related Works
Signal Quality and Artifact Detection Techniques.Various studies focus on assessing the quality of the PPG signal.Lin et al., 2019 propose a statistical approach that computes five key characteristics of the signal to determine signal quality and reject noisy outliers.Another approach (Goh et al., 2020) divides the signal into sliding windows and uses CNNs to classify whether each window contains an artifact.A study by Guo et al., 2021 approaches this task as a 1D segmentation problem and uses a convolutional network to classify noise-corrupted regions within a signal.This allows for detection of noise artifacts on a higher resolution.These approaches only detect the presence of noise and do not provide further steps on mitigating the artifacts for HR analysis.We utilize the Segade model (Guo et al., 2021) as a preprocessing step in SPEAR.
Artifact Reduction Techniques.Signal processing methods have been used to reduce artifacts in PPG signals.Specifically, discrete wavelet transforms (Kasambe & Rathod, 2015), adaptive filtering (Comtois et al., 2007;Pan et al., 2016;Wu et al., 2017) and independent component analysis (ICA) (Peng et al., 2014) have been used to perform signal denoising.Salehizadeh et al., 2016 perform sliding window-based signal denoising using spectral filtering.A recent study (Bradley & Kyriacou, 2024) proposed a non-filtering signal processing approach that uses anomaly detection to find segmentation points in the signal and remove noise artifacts.A limitation of signal processing approaches is that their performance is dependent on heuristic thresholds and parameters.Reiss et al. (2019) demonstrate that state-of-the-art signal processing techniques perform poorly on a larger and more comprehensive dataset (PPG DaLiA, Reiss et al., 2019) compared to the smaller IEEE dataset (Z.Zhang et al., 2015) that they have been tested on.
Recent works have also introduced deep learning based approaches for this problem.Lee et al. (2019) use a bidirectional recurrent autoencoder for PPG denoising trained on hand-picked clean PPG signals.DeepHeart (Chang et al., 2021) uses a denoising convolutional network followed by spectrum analysis-based calibration to perform HR estimation.In this approach, signal reconstruction is performed for small, overlapping time windows; as a result, the reconstructed clean signals cannot easily be joined together to get a continuous long signal.The survey paper by Mishra and Nirala, 2020 on PPG denoising techniques concludes that deep learning approaches perform better for denoising signals that are affected by motion artifacts, as compared to signal-processing approaches.
A limitation of existing noise reduction approaches is that they can not discern when the degree of noise corruption is severe and may produce unexpected results if large parts of the signal is completely lost to motion artifacts (Park et al., 2022).An approach that classifies the signal into clean/noisy segments and selectively analyzes the classified sections can mitigate this issue (Park et al., 2022).Another challenge with denoising is the availability of noisy-clean signal pairs in the data, which is required for supervised learning.This is hard to obtain with PPG signals because it is not possible to record clean and noisy signals synchronously while performing certain activities.Workarounds are typically used to overcome this challenge, such as generating fake noisy signals by adding simulated noise to clean signals (Lee et al., 2019).We will discuss how we overcome this challenge using self-supervised learning in Section 3.
Direct Heart Rate Estimation Without Denoising.A category of methods focuses on the task of estimating heart rate (HR) directly from noisy PPG signals, without attempting to reconstruct or denoise the signal.Signal processing techniques including Wiener Filtering (Temko, 2017), Least-Means Square Adaptive Filtering (Schäck et al., 2015) and TROIKA (Z.Zhang et al., 2015) utilize accelerometer data and analyze the signals in the frequency domain.Deep learning has also been utilized for HR estimation, most commonly as a supervised learning task where groundtruth HR labels are obtained from synchronous electrocardiogram (ECG) signals.DeepPPG (Reiss et al., 2019) and PPGnet (Shyam et al., 2019) use deep convolutional networks to predict heart rate from noisy PPG signals.CorNET (Biswas et al., 2019) uses a combination of CNNs and LSTMs to predict HR from single-channel PPG signals for patient-specific models.PP-Net (Panwar et al., 2020) also uses CNNs and LSTMs for HR estimation using single-channel PPG.These approaches tend to outperform the denoising approaches for HR estimation, but do not output a denoised signal that can be utilized in downstream analysis.
Heart Rate Variability (HRV) Estimation from PPG. HRV measures the fluctuation in the time intervals between adjacent heartbeats (Shaffer & Ginsberg, 2017).HRV is used to investigate the sympathetic and parasympathetic function of the autonomic nervous system (Shaffer & Ginsberg, 2017) and has many important applications including predicting risk of stroke (Tsuji et al., 1996), detecting arrhythmia (Tsipouras & Fotiadis, 2004) and guiding training for athletes (Singh et al., 2018).Studies that use PPG signals to estimate HRV focus only on clean signals obtained from subjects at rest (Lu et al., 2008).A recent study showed poor performance of HRV estimation from PPG signals under free-living conditions (Lam et al., 2020).Denoising PPG signals can improve the accuracy of HRV monitoring in real-world conditions, as we will show.
Denoising Autoencoders.Autoencoders are networks that learn to reconstruct their inputs from a latent representation.An autoencoder takes as input a vector x and maps it to a hidden latent representation r.The resulting latent representation r is then mapped back to a "reconstructed" vector z.The model is trained to minimize the reconstruction error between x and z.A denoising autoencoder is trained to reconstruct a clean input from a corrupted or partially destroyed one (Vincent et al., 2008).For training, the input vector is first partially corrupted, which yields a mapping (x, x), where x is the vector resulting after corrupting x.After obtaining the (x, x) mappings, an autoencoder network is trained to reconstruct a vector z given x as input, such that it minimizes the reconstruction error between z and x.It is not immediately clear how one would apply a denoising autoencoder to denoise PPG signals from unlabeled data; the novelty of our approach is a framework that incorporates it.

Method
We propose a self-supervised training approach that requires only a sufficiently large collection of clean PPG signals, and does not require synchronous ECG measurements, like other approaches (Biswas et al., 2019;Chang et al., 2021;Panwar et al., 2020;Reiss et al., 2019;Shyam et al., 2019).Using a self-supervised training approach solves the challenge of unavailable noisy-clean training pairs required for supervised learning.This is because the model is trained to reconstruct clean signals from signals where the noise has been erased -it requires no information about the noise artifacts except for where they are located.

3.1.
Training Process for SPEAR.In this subsection, we outline the methodology of training SPEAR's specialized denoising autoencoder.Figure 2 summarizes the training procedure.The process for denoising a new signal will be discussed afterwards and in Figure 3.
Selecting Clean PPG Signals.The first step to prepare the training data is selecting clean PPG signals.A noise detection model is used to detect the occurrence of artifacts in the signal; any signal determined to have no corrupted regions is deemed as a clean signal.We use the Segade model (Guo et al., 2021) for this purpose (code available at: github.com/chengstark/Segade).Given a 30second PPG signal as input, Segade predicts the regions within the signal that are corrupted by noise artifacts.Segade is the state-of-the-art segmentation model for noise detection: it outperforms other noise artifact detection methods on the DICE score, a well-established measure for segmentation accuracy, by a large margin (Guo et al., 2021).Further, it has been tested on several well-known public PPG datasets (Reiss et al., 2019;Schmidt et al., 2018;Z. Zhang et al., 2015) in comparison to other approaches.The application of this model, along with experiments on SPEAR's dependence on this model in Appendix B.6.
Preparing Training Data.In the next step, a denoising autoencoder (DAEs) (Vincent et al., 2008) is given a partially corrupted signal as input and trained to recover the original signal.Training a DAE requires element-wise pairs of signals (X, X) where X is a partially corrupted or destroyed version of X.To prepare the training dataset, each clean PPG signal was partially erased at one or more continuous sub-segments.More precisely, we create mask vectors {M i } of the same dimension as X: M i has the value 1 everywhere except one or more continuous patches, where it is 0. We start with M i = 1 (vector of all 1's).A starting point s i and a patch length l i were randomly selected and the values M[s i : s i + l i ] were set to 0. The patch lengths were varied between 1 and 15 seconds, and the number of patches in a mask was set to either 1 or 2, to account for cases where a signal might be corrupted for a continuously long duration and cases where it contains multiple, shorter bursts of noise artifacts.For each signal, 10 such masks were generated, each yielding a new training sample (X ⊙ M i , X), where X is the clean signal, and ⊙ is the element-wise (Hadamard) product.
Model Architecture and Training Parameters.The denoising autoencoder consists of an encoder network that maps the input signal to its latent space, and a decoder network that reconstructs the clean signal from the latent space input.The encoder network consists of 4 convolutional layers, each followed by a ReLU activation and batch normalization.The decoder network consists of 4 Transpose Convolution-ReLU-BatchNormalization blocks.The fourth block is followed by a convolutional layer with a sigmoid activation that outputs the reconstructed signal.The encoder receives input in the dimension (N, 1920), where N is the number of 30-second signals sampled at 64Hz.Details on the exact model architecture are provided in the appendix.Appendix A.2 provides the detailed model architecture, along with the procedure for selecting hyperparameters.
Loss was computed as the Root Mean Square Error (RMSE) between the original clean and the reconstructed signals.The model was optimized using the Adam optimizer (Kingma & Ba, 2014), trained over 50 epochs.

3.2.
Using SPEAR to Denoise PPG Signals.In Section 3.1, we discussed the training procedure for the denoising autoencoder used in our algorithm.In this section, we provide an end-to-end framework for denoising PPG signals and estimating HR using SPEAR.This process is illustrated in Figure 3.
Step 1: Artifact Detection.The first step in the signal denoising algorithm is to locate the noise artifacts.The Segade model is again used for this purpose.Similar to the preprocessing defined in Section 3.1, first the signals are fitted to 30 second segments sampled at 64 Hz.The signals are normalized to the [0, 1] range using min-max normalization.The Segade model receives these signal segments as input and outputs predictions for the noise artifact locations.A threshold of 0.5 is applied to determine the binary classification labels.
Signals that are excessively corrupted beyond recovery are discarded.We consider a 75% threshold of cumulative duration, beyond which a 30-second signal is considered unrecoverable.Intuitively, it is reasonable to assume that it may be futile to recover a 30-second signal where 27 seconds are corrupted by noise artifacts, since too much information is lost to noise.We discuss the effect of different noise thresholds on downstream performance in Appendix B.5.
Step 2: Signal Reconstruction.The remaining signal is used as input for the denoising autoencoder model.The locations where Segade output has a classification label of 1 (indicating the presence of a noise artifact) are erased in the signal (set to 0).Let X seg be the signals obtained after discarding the excessively corrupted signals in the previous step.Similarly define Y seg as the binary classification of noise artifacts by Segade corresponding to X seg .Then the input to the denoising autoencoder can be obtained as:

J u s t A c c e p t e d
A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables 8 Here, 1 is a matrix comprising of 1 at all positions.The denoising autoencoder receives as input the matrix X in , and outputs Y out .Intuitively, since the denoising autoencoder was trained to recover clean signals from partially corrupted ones, Y out consists of clean PPG signals.
Step 3: Post-Processing and Heart Rate Detection.The final step is to post-process the clean signals in Y out .First, the denoised signal is merged with the original signal X in , in that the artifact-corrupted regions of X in are replaced with the reconstructed regions in Y out .This can be obtained as: where Y seg is the binary classification by Segade and contains 1's in the noise-corrupted regions and 0 otherwise.After this, X denoised can be utilized for downstream analysis tasks.
To perform HR estimation, a bandpass filter with low-end cutoff of 0.9 Hz and a high-end cutoff of 5 Hz is applied, and the signal is re-normalized to the [0, 1] range using min-max normalization.Since the resulting signal is clean, a simple peak-detection algorithm can be applied to perform HR estimation.We use the peak-detection algorithm by Elgendi et al. (2013) which is implemented in the HeartPy python package (van Gent et al., 2018).

Experimental Setup
In this section, we describe the datasets, baselines, experimental setup and evaluation metrics for heart rate estimation.4.1.Datasets.Two datasets were used in this study: the Stanford Dataset (Torres-Soto & Ashley, 2020) and PPG DaLiA Dataset (Reiss et al., 2019); both datasets consist of PPG recordings collected from a wrist-worn device sampled at 64Hz.Table 1 compares the main properties of these two datasets.The Stanford dataset contains a large publicly available collection of PPG signals from wrist-worn wearables.The dataset is divided into training, validation and test sets with no subject overlap.The training set was used for training the reconstruction model.The validation set was used for hyperparameter tuning.The test set was used for testing the performance of SPEAR in comparison with baselines.
PPG DaLiA has a comprehensive data collection protocol from subjects of different ages while performing a variety of daily activities such as walking, cycling, driving, among others.It also provides synchronous ECG and accelerometer recordings, which are required by some baseline algorithms.The Stanford test and PPG DaLiA datasets (Reiss et al., 2019) were used for out-ofsample testing and comparison against baselines.No. of test samples 872 6.7k 4.2.Baselines.In this section, we introduce the 6 state-of-the-art baseline methods that were used for comparison.Baselines 1 and 2 use signal processing techniques and were chosen as they achieved the best performance on the IEEE Signal Processing Cup data (Z.Zhang et al., 2015).Baselines 3-6 use deep learning for HR estimation without denoising; the models are based on the works of (Biswas et al., 2019;Panwar et al., 2020;Reiss et al., 2019;Shyam et al., 2019).Although none of these studies have publicly available code, these approaches were chosen for their performance and feasibility of implementation.Implementations for other baselines (Chang et al., 2021;Comtois et al., 2007;Lee et al., 2019;Peng et al., 2014;Salehizadeh et al., 2016;Wu et al., 2017;Z. Zhang et al., 2015) are not publicly available and not reproducible.
• Since this dataset does not contain ECG for ground truth measurement, we utilized the clean signals as the source of ground truth.We introduce simulated noise in the clean signals similar to the training procedure defined for Baseline 5 using the RRest package (Charlton, 2022).This produced clean-noisy signal pairs.Ground-truth HR was computed on the clean signal using a peak detection algorithm (Elgendi et al., 2013).Baselines 1 and 2 could not be evaluated on this dataset since they require 3-axis accelerometer data as input, which is not available in the Stanford data.
4.4.Heart Rate Variability (HRV) Estimation.In this experiment, we estimated Heart Rate Variability (HRV) from PPG signals.The goal of this experiment was to evaluate whether denoised signals produced by SPEAR can be utilized for downstream tasks and provide improvements over existing methods.Several metrics are used to measure HRV.In our experiments, we focus on two metrics: SDNN is the standard deviation of the inter-beat intervals measured in milliseconds (ms), and RMSSD is the root mean square of the successive differences between normal heartbeats measured in ms (Shaffer & Ginsberg, 2017).A review of HRV-capable wearable devices shows that RMSSD and SDNN are the two metrics that are most commonly available on such devices (Hinde et al., 2021).SDNN is generally studied in clinical settings, considered to be the "gold standard" metric for assessing cardiac risk, and used for predicting morbidity and mortality (Shaffer & Ginsberg, 2017).HRV requires continuous measurement over long time duration; typically, a time window of 5 minutes is used for short-duration estimation (Shaffer & Ginsberg, 2017).The full PPG DaLiA dataset was used for evaluation.The signals were segmented into sliding windows of duration 5 minutes and an offset of 15 seconds.The two HRV metrics (SDNN and RMSSD) were computed from PPG signals and synchronously recorded ECG (for ground-truth).Mean absolute error was computed between HRV estimates from PPG and HRV ground truth from the corresponding ECG.
There is a dearth of literature on directly estimating HRV from PPG, and existing denoising approaches (Chang et  do not directly work for our HRV estimation task.This is because they perform denoising on short, overlapping (8-second) signal segments for heart rate estimation; they don't offer methods on reconstructing longer signals that can be used for HRV.Consequently, the baselines used for HR estimation could not be adapted for the HRV task, except baseline 7. SPEAR, as well as baseline 7, reconstruct non-overlapping 30-second signals that can be combined into continuous long-term recordings using interpolation.Thus, existing approaches based on simple peak detection can be applied for HRV estimation.For comparison, we perform HRV estimation on four variants of the PPG signals: the original signal from the dataset, the signal after applying bandpass filtering, the denoised signal from DAE_SimNoise (baseline 7) and the denoised signal from SPEAR.
To estimate HRV from PPG, we adapt the widely used methods of Bartels et al. (2017) and Morelli et al. (2018).First, a peak detection algorithm (Elgendi et al., 2013) detects the R-peaks.Then, a moving filter is applied to remove physiologically implausible peaks.The filtering criterion differs over studies (Bartels et al., 2017;Morelli et al., 2018); we used a filter based on Inter-Quartile Range, which rejects R-R intervals that lie outside of the IQR of measured interval durations.The HRV metrics (SDNN and RMSSD) were computed using their respective formulae based on R-R intervals.The implementation for these approaches are available in the HeartPy library (van Gent et al., 2018).4.5.Evaluation Metrics.The Mean Absolute Error (MAE) is a widely used metric in HR estimation challenges.HR is estimated on signals segmented into sliding time windows of length 8 seconds and an overlap of 6 seconds.When the signals are split into N overlapping windows, we let {BPM est (i)} N i=1 be the HR estimated from the PPG windows.For reconstruction techniques, this is estimated from the denoised signal.Similarly, let {BPM true (i)} n i=1 be the HR estimated from the ground-truth windows.Then the mean absolute error is: For Heart Rate Variability, the same approach for computing estimation error is used, but the HRV metrics are computed over a 5 minute interval, based on the recommended time interval used for HRV (Shaffer & Ginsberg, 2017).The MAE is computed on the two HRV metrics, SDNN and RMSSD, using the ECG segments as ground-truth.

J u s t A c c e p t e d
A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables 12

Heart Rate Estimation Results
. Our main result is that the proposed algorithm performs well at the task of HR estimation from PPG signals across datasets, generally outperforming baselines.The results are summarized in Table 2. On the Stanford experiment, SPEAR outperforms all baselines and achieves the lowest MAE.In the PPG DaLiA experiment, SPEAR's MAE (5.36 bpm) is lower than every baseline except CNN+LSTM_HR_DaLiA (4.17 bpm), which was trained to detect HR on the PPG-DaLiA dataset.This shows that SPEAR's out-of-distribution performance is on par with fully supervised approaches trained on data from the test distribution.To measure the statistical significance of these results, paired t-tests were conducted on the HR estimation errors produced by SPEAR versus each baseline reported in Table 2. To adjust for multiple hypothesis testing against the seven baselines, a Bonferroni alpha-correction (Weisstein, 2004) was applied, yielding a target α = 0.0071 for each t-test.The analysis showed that each test reported a p-value < 0.0001, which confirms the hypothesis that the observed difference in the mean absolute error (MAE) between SPEAR and each baseline method is statistically significant.
Results further indicate that deep learning-based HR estimation models (Baselines 3-6) perform better than the signal processing approaches.The models whose architecture contains both convolutional and recurrent (LSTM) layers outperform models with purely convolutional networks.However, the direct HR estimation approaches demonstrate limited generalizability across datasets.The CNN+LSTM_HR_DaLiA baseline (trained on the PPG DaLiA training data) performs very well on the PPG DaLiA test set, but shows reduced HR estimation accuracy on the Stanford test data.The CNN+LSTM_HR_Stanford baseline (trained on the Stanford training data) has good performance on the Stanford test set, but poorer performance on the PPG DaLiA dataset.In contrast, SPEAR generalizes well to both datasets, as it has the best performance on the Stanford dataset and secondbest on the PPG DaLiA dataset.It is also evident that approaches trained on PPG containing simulated noise (Baselines 5-7) show poorer performance on out-of-distribution data than SPEAR.This shows that our self-supervised technique of erasing noise artifacts, instead of simulating them, generalizes better to signals that contain noise artifacts under real-world conditions.Appendix B.1 contains an analysis on constant and variable errors, providing insights on systematic biases and measurement variability in heart rate estimation of the compared methods.
Table 3 provides a break down of the HR estimation MAE results over the activities performed by the subjects.The results were computed on the PPG DaLiA dataset, which provides activity labels.To ensure sufficient data is available across each of the activities, we use the full PPG DaLiA dataset for testing, and report performance only the models trained on the Stanford dataset.We note that while the Stanford dataset contains a large collection of subjects, it has relatively limited representation of high-intensity activities compared to PPG DaLiA.As a result, we expect relatively poorer performance on higher-intensity activities from methods trained on the Stanford data.In the assessment of the activities "driving a car", "lunch break", "walking", and "working", SPEAR yields the smallest estimation errors by a considerable margin.Conversely, on the activity of "stair climbing", the direct HR estimation methods outperform SPEAR by a considerable margin.For the remaining activities, namely "sitting", "cycling", and "table soccer" while SPEAR does not attain the lowest estimation error, its mean absolute error closely approximates the most accurate baselines.
The results show that SPEAR produces accurate estimates of heart rate for subjects performing routine activities.However, it shows relatively weaker accuracy on higher-intensity activities like climbing stairs and cycling (though other algorithms performed comparably on cycling).For realworld applications, it is recommended that the algorithm's training data be expanded to include a broader variety of physical activities, including both routine movements and higher intensity activities.
Figures 4 and 5 show examples of denoised signals produced by SPEAR. Figure 6 shows two examples from the PPG DaLiA dataset, along with a visualization of peak detection on the noisy and denoised signals.The figure demonstrates denoising under different conditions -in the first signal, the subject has a normal heart rate, but introduced some motion artifacts, while in the second, the subject has elevated heart rate.SPEAR achieves an improvement of approximately 62% over the original signal and 34% over the denoised signal of DAE_SimNoise.For RMSSD, SPEAR achieves an improvement of approximately 63% over the original signal and 39% over the denoised signal of DAE_SimNoise.This demonstrates that SPEAR's denoising algorithm yields significant improvements on the task of HRV estimation from PPG signals.either require ground-truth data, or simulating artificial noise artifacts to train the models.Both cases limit the comprehensiveness of the training data.Training the model on a large and realistic dataset enables learning robust denoising representations for SPEAR.
Current denoising approaches reconstruct the entire signal, whereas SPEAR preserves the clean segments.This leads to better HR estimates, particularly when the source signal may only have small amount of corruption.This is evidenced by the activity-wise performance breakdown in Table 3, which shows that SPEAR achieves better estimates when working with daily activities like walking, working and driving.Additionally, SPEAR outputs longer (30-second) non-overlapping denoised signals that can be joined to achieve long-term continuous recordings.In comparison, other methods generally work on smaller, overlapping segments, which makes it hard to rejoin them to obtain a continuous signal.This enables SPEAR to enhance downstream applications that may require longer continuous recordings, like HRV estimation.
Applications of SPEAR.PPG technology is becoming increasingly ubiquitous with the adoption of modern wearables such as smartwatches, wristbands and smart jewellery.These devices allow users to continuously monitor heart rate throughout daily life, made possible by their simplicity of operation, cost effectiveness and comfort of use.Most modern wearable devices now also provide continuous HRV measurement (Hinde et al., 2021), which further enables many healthcare applications.PPG has several personal health applications, such as tracking blood pressure (Yoon et al., 2009) and blood oxygen saturation (Almarshad et al., 2022), monitoring sleep quality (Korkalainen et al., 2020), and guiding exercise and recovery (Singh et al., 2018;Y. Zhang et al., 2020).Continuous long-term monitoring of PPG has important clinical applications, such as diagnosing cardiovascular diseases (Allen et al., 2006) and arrhythmia (Pereira et al., 2020).PPG is susceptible to noise and these applications are consequently limited in their accuracy and reliability due to the corruption of underlying metrics obtained from the signals.Technological advances in wearable devices, quality of sensors and software-based signal processing algorithms can improve the reliability of these applications (Kim & Baek, 2023).SPEAR can be integrated as a pre-processing step for any application that uses PPG recordings for predictive or analytical tasks.The SPEAR algorithm receives a continuous PPG signal of arbitrary duration, splits it into segments, rejects the few segments that are too corrupted to recover and reconstructs the rest only in the noise-corrupted regions, while preserving the useful information in the rest of the signal.The reconstructed clean signal can then be rejoined and passed down to further tasks.Since these signals are now clean, they result in more reliable performance in downstream tasks.
Deployment.SPEAR can be deployed in personal health applications as well as clinical settings.The algorithm can be implemented in a smartphone application that integrates with the user's wearable.Such an integration only requires a device that provides access to PPG waveform data, such as the Empatica E4 or Actigraph wristwatches (see Hinde et al., 2021 for a review on the data afforded by various PPG enabled devices).For clinical applications, the SPEAR algorithm can be integrated into existing data processing pipelines as a pre-processing step to produce clean PPG signals to be used in various predictive or analytical tasks.SPEAR's code is publicly available and open source; hence, it is free to integrate into any personal or clinical application without requiring FDA approval.This serves three purposes: (1) troubleshooting can be crowdsourced, (2) it can be used as a baseline for comparison with proprietary products, and (3) developers can combine SPEAR with off-the-shelf open source algorithms that work for clean PPG.That said, SPEAR can also be built into algorithms that can apply for FDA approval and have a greater degree of trust and reliability.Most importantly, our work will provide users the ability to glean metrics on their health, without being limited to the proprietary algorithms provided by device manufacturers.This can enable the development of a variety of PPG enabled applications available to the public.

Conclusion
We introduced a novel self-supervised learning method for eliminating noise artifacts and estimating heart rate from PPG signals collected from wrist-worn wearable devices.An advantage of our approach is that it only requires clean PPG signals for training, which allows us to use larger datasets without ground-truth labels.SPEAR outperforms baselines at HR and HRV estimation and generalizes well across datasets.This illustrates how SPEAR enables more accurate downstream analysis of many aspects of heart monitoring from wearables.
Disclosure Statement.The authors have no conflicts of interest to declare.
Acknowledgments.The present work is partially supported by NIH grant R01HL166233.

Appendix A. Implementation Details
In this section, we describe the implementations of SPEAR's model architecture and the baseline architectures used in our experiments.
Model Architecture.The denoising autoencoder in the SPEAR algorithm uses a convolutional neural network architecture.Figure A1 illustrates the model architecture for the autoencoder.The network is summarized as follows: • The encoder consists of 4 convolutional blocks.Each block consists of a 1D convolutional layer with ReLU activation, followed by batch normalization.Each of the encoder conv layers has a stride of 2 and zero padding.• The encoder layers have 16, 32, 64 and 128 filters respectively.The kernel sizes are 32, 64, 128 and 320 respectively.• The decoder network consists of 5 convolutional blocks.Each of the first 4 blocks consists of 1D convolutional transpose layer with ReLU activation followed by batch normalization.Each of the first four conv layers has a stride of 2 and zero padding.• The decoder layers have 128, 64, 32 and 16 filters respectively.The kernel sizes are 320, 128, 64 and 32 respectively.• The final block consists of a convolutional layer with a single filter, kernel size 3 and stride of 1.This layer uses sigmoid activation.The output of this layer is the denoised signal of the same dimension as the input.
Hyperparameter Tuning.Model hyperparameters were optimized via grid search over values for the kernel sizes, number of filters and number of convolutional layers in the encoder and decoder networks.The hyperparameters were optimized based on the downstream task of minimizing mean absolute error in HR estimation on the Stanford validation dataset.The evaluated values for kernel sizes were {8, 16, 32, 64, 128, 320}.The evaluated values for number of filters were {8, 16, 32, 64, 128, 256}.We tested model architectures where successive encoder layers had either incrementally increasing or decreasing values for the size and number of filters.We also varied the number of convolutional layers by testing models with 2, 3 and 4 layers.In all tests, the decoder  The Variable Error of SPEAR is the lowest on the Stanford dataset, which shows that its HR estimation results are relatively more consistent than other methods.On the PPG DaLiA experiment, SPEAR's VE (9.9 bpm) is lower than all methods except CNN+LSTM_HR_DaLiA (6.4 bpm).So, SPEAR has relatively lower bias but relatively higher variability on the PPG DaLiA experiment than the best-performing method, which was trained on data from the test distribution.B.2. Ablation and Sensitivity Analysis.In this section, we perform ablation studies to verify the effectiveness of each component of SPEAR's algorithm design.We define four variants of SPEAR: • SPEAR-LSTM has the same architecture as SPEAR, but adds an LSTM layer in the encoder network.Since LSTM-based architectures had superior performance in the HR estimation baselines (CNN+LSTM_HR_DaLiA and CNN+LSTM_HR_Stanford), this comparison baseline was used to see if any similar improvements would be found in our denoising model as well.• SPEAR-N is trained such that the artifact locations are replaced with gaussian noise instead of setting it to 0. In this case, the autoencoder does not receive a 0 signal at the location of the artifact, so it reconstructs the full signal, not just the corrupted regions.• SPEAR-Sm follows the same training procedure as SPEAR but it uses smaller kernel sizes of convolutional layers in the model architecture.• SPEAR-L removes the first two convolutional layers from the encoder and last two layers from the decoder in SPEAR's model architecture.
The results are shown in Table A1, indicating that SPEAR is not sensitive to changes in kernel size or the addition of gaussian noise in the input, but it is sensitive to major ablations such as removal of the convolutional layers.We also see that adding an LSTM layer to the SPEAR architecture does not offer any improvements, and as such does not offer the same benefits of added complexity that were seen in the HR estimation baselines.Figure A2 reports the runtime comparison.Results show that CNN-only ML approaches are faster than the CNN+LSTM approaches.On a relatively smaller recording, SPEAR's runtime is better than other baselines (except Kalman Filtering), but on the large Stanford dataset, SPEAR tends to take longer than the baselines.The comparatively longer runtimes of SPEAR and DAE_SimNoise can be explained by the fact that they are generative algorithms that perform signal reconstruction and use a model architecture involving encoder and decoder networks.B.5.Effect of Noise Thresholds on HR Estimation.The first step in the SPEAR algorithm for denoising signals is to determine the noise levels in the signal and discard any signal segments that are too corrupted.In the analysis section above, we use a noise threshold of 0.75, meaning that a signal segment is rejected if more than 75% of it is corrupted.In this section, we analyze the effect of changing this threshold.Clearly, by reducing the threshold, we are likely to achieve better heart rate estimation (this is visualized in Figure A3.)But by reducing this threshold, we also lose more of the recordings that may be partially corrupted, but still recoverable.We compared the effects of varying the noise threshold over several values and what percentage of the recordings are eliminated for each threshold.We conducted this experiment on the PPG DaLiA dataset since it provides ground-truth ECG data.Table A3 shows the result of this experiment.We can see that most signals in this dataset are fairly noisy.The intuitive inverse relationship between the accuracy of HR estimation and how much of the noisy signal is removed are evident from the table.There isn't a clear "best" choice of noise threshold.In fact, the ideal choice of noise threshold should depend on the use-case.For applications that require high accuracy, such as for healthcare diagnosis, a lower noise threshold (like 0.5) would be preferable.On the other hand, for daily tracking, we can choose to preserve more of the recordings with a larger threshold (like 0.75) and provide HR estimates that are slightly less accurate (though still more accurate than other state-of-the-art methods).
We conducted another experiment to compute the Reconstruction Ratio: a metric we defined to measures the degree of signal reconstruction by SPEAR.The Reconstruction Ratio represents the percentage of the input signal that SPEAR identifies as being corrupted by noise artifacts and subsequently reconstructs.Intuitively, we would expect the reconstruction ratio to increase as the noise threshold increases, since we allow signals with greater level of corruption to be denoised.Table A3 reports this metric for the PPG DaLiA and the Stanford test datasets.B.6.Sensitivity to Noise Detection Accuracy.The Artifact Detection step of SPEAR uses a noise detection model to locate the artifacts within a signal.We chose to use the Segade model (Guo et al., 2021)    We conducted an experiment to test the sensitivity of HR estimation to the performance of the noise detection algorithm.In the PPG DaLiA test set, for each signal, we added some false positives to SEGADE's predictions.Effectively, this would increase the identified "noise" in the signal by predicting some of the clean regions as noisy.Then, these predictions are passed down the SPEAR pipeline and heart rate is predicted after denoising.We found that by randomly adding 5% of false positives to the signals, the mean absolute error in HR increases from 5.36 to 5.97 bpm.This meagre increase can be attributable to the fact that there is now more "noise" in the signal the algorithm must reconstruct.But since SPEAR is able to reconstruct clean signals in the deleted noisy regions, this shows that the performance is not sensitive to false positives in the noise detector's performance.
Since our approach does not attempt to reconstruct clean parts of the signal, a limitation of SPEAR is that it may be potentially sensitive to false negatives in the noise detector's predictions.
Transfer learning (Zhuang et al., 2021) can improve performance of the noise detection model on new data, if some noise artifact labels are available.Post-processing model calibration (Wang et al., 2023) can be used to obtain empirical prediction probabilities; the model can be tuned to classify only higher probability predictions as noise, thus decreasing the false negative rate.Though this may come at the cost of higher false positives, we showed that SPEAR is robust to false positives.
Since SEGADE achieves good accuracy on our experimental data, experimental results on these approaches are deemed out-of-scope for this paper.
B.7. Importance of Signal Imputation.The Signal Reconstruction step of SPEAR removes the noisy regions of the signal and imputes them using a denoising autoencoder model to construct a clean signal.However, this raises a question whether the imputation step is required at all; what if we just delete the noise artifacts and compute HR from the remaining signals?
We conducted an experiment to analyze this.We define a baseline SPEAR-NoImpute that runs the noise detection model and deletes the noise-corrupted regions from the signals.Then it re-joins the remaining signal to form a continuous signal (this joined signal could have discontinuities of its own since they were joined at arbitrary positions).Then the corresponding matching ECG recordings are used to compute the ground-truth.We ran the set of experiments from Section 4.3 to compare performance on HR estimation tasks with SPEAR.
We compare the results of the experiment for both Stanford and DaLiA test datasets in Table A3.The results show that SPEAR-NoImpute performs significantly worse on heart rate estimation than SPEAR.This shows that the imputation step is indeed an important step for signal denoising.The reason for this is that signals can contain many noise artifacts at arbitrary locations (for example, Figures A6 and A7), and simply joining the signal segments can lead to unpredictable positions of the R-peaks.Though the individual segments are clean, joining them together could lead to discontinuity at various points in the signal, leading to errors in peak detection and HR estimation.
Table A3.Comparison in HR Estimation tasks between SPEAR and its variant that does not impute deleted noise regions.

Figure 1 .
Figure 1.A schematic diagram of the proposed algorithm SPEAR.The left branch outlines the self-supervised training procedure for the denoising autoencoder.The right branch shows how a corrupted signal is denoised by SPEAR using the trained denoising autoencoder and an off-the-shelf artifact detection tool.

Figure 2 .
Figure 2. Training: Training Procedure for the Denoising Autoencoder

Figure 3 .
Figure 3. Evaluation: Denoising Test PPG Signals Using the SPEAR Algorithm.

Figure 4 .
Figure 4.An example of denoising on a 30-second signal segment from the Stanford data (sampling rate: 64Hz).Top: Signal corrupted by noise artifacts (highlighted in red).Bottom: The denoised signal produced by SPEAR.

Figure 5 .
Figure 5.An example of denoising on a 30-second signal segment from the PPG DaLiA data (sampling rate: 64Hz).

Figure 6 .
Two examples of 30-second signal segments from the PPG DaLiA dataset.In both examples, the first signal is the noisy PPG signal (heart rate HR noisy ), with the red regions highlighting the detecting noise artifacts.The second signal is the denoised signal produced by SPEAR, with heart rate HR denoised .The third signal is the ground-truth ECG signal with heart rate HR true .The green dots indicate detected R-peaks.compared to the ground-truth.Denoising the signals achieves improved estimates over the original and bandpass filtered signals, as seen on both SPEAR and the baseline DAE_SimNoise.SPEAR achieves the lowest error in both HRV metrics out of all the signals compared in the experiment.For SDNN,

Figure A1 .
Figure A1.Model Architecture for the denoising autoencoder used in SPEAR.
Figure A2.Model Runtime Comparison

Figure A3 .
Figure A3.Relationship between noise threshold chosen in SPEAR's Artifact Detection step and HR estimation.

•
The denoising examples from the stanford dataset consist of signals that are corrupted with real noise artifacts.This is in contrast with the Stanford experiment in Section 4.3, where simulated noise was added to generate clean-noisy pairs.For visualization purposes, we demonstrate real noisy signals from the Stanford dataset.

Figure A4 .
Figure A4.Example 1 of denoising from Stanford test set.

Figure A5 .
Figure A5.Example 2 of denoising from Stanford test set.

Figure A6 .
Figure A6.Example 3 of denoising from Stanford test set.

Table 1 .
Properties of the two datasets used in our study.A sample corresponds to a PPG signal of length 30 seconds.
(Biswas et al., 2019)ltering and Phase Vocoder (WFPV).This baseline is based onTemko (2017).It uses the three-axis accelerometer signals to estimate the noise signature and applies a Wiener filter to attenuate noise components in the frequency domain.A Phase Vocoder is used to estimate HR.The code for this approach is publicly available at https://github.com/andtem2000/PPG.• Baseline 2: Kalman Filtering.This baseline is based onGalli et al. (2018)and This baseline follows the same approach of direct HR estimation on sliding windows as Baseline 3. Our version of this baseline is based on the PP-Net(Panwar et al., 2020)and CorNET(Biswas et al., 2019)models.It is trained using the same procedure as Baseline 3; the only difference is in the model architecture.This model uses a combination of convolutional and Long-Short-Term-Memory (LSTM) layers.• Baseline 5: CNN_HR_Stanford.This model is architecturally identical to Baseline 3, but was trained on the Stanford training set.Since the Stanford dataset includes a significantly larger collection of signals, it was important to establish comparisons with baselines trained on similar data as SPEAR.However, since the Stanford dataset does not contain ECG signals for ground-truth labels, the technique of clean signal selection and simulated corruption to generate noisy-clean training pairs was used.Clean signals were selected using the technique defined in Section 3.1 and simulated noise artifacts were added using the RRest This is a denoising model based on the training approach of Lee et al. (2019).To train this model, noisy-clean pairs of PPG signals were generated by selecting clean PPG signal segments of duration 30 seconds and adding simulated noise.The noise simulation procedure was similar to Baselines 5-6, where a combination of FM and BW noise were added, while ensuring total corruption is under 75%.A training dataset was generated from the Stanford data such that the number of samples were roughly equal to SPEAR's training set.A denoising autoencoder with an identical architecture to SPEAR was trained to denoise the signals.PPG DaLiA Experiment: The PPG DaLiA dataset was divided into a training and test set.The test set contains signals from subjects 1, 14 and 15.The signals were first split into non-overlapping 30-second segments.Segments that were more than 75% corrupted were discarded (same as our method).The corresponding ECG as well as the three accelerometer signals were similarly segmented.The accepted signals were joined into one longer signal and used for testing the baselines.For Baselines 1 and 2, the continuous subject-wise signals were used as input.For Baselines 3-6, the signals were segmented into 8-second overlapping windows (with an overlap of 6 seconds).Heart Rate estimation was performed on the PPG signals using the ECG for ground-truth labels.(2) Stanford Dataset Experiment: The Stanford test set was used for this experiment.
Elgendi et al. (2013)g technique that produces a reconstructed PPG signal over small time windows.It performs signal decomposition over PPG and three-axis accelerometer signals and performs clean PPG reconstruction based instead on the degree of correlation of PPG with accelerometer signals.Kalman smoothing is used for HR estimation from the reconstructed signal.The code for this approach is publicly available at https://github.com/AlessandraGalli/PPG.• Baseline 3: CNN_HR_DaLiA.This baseline model uses supervised learning to estimate HR directly from sliding time windows over a noisy PPG signal.Our version of this baseline was based on DeepPPG(Reiss et al., 2019)and PPGNet(Shyam et al., 2019).The singlechannel PPG signals in PPG DaLiA dataset were used for training and HR ground-truth labels were obtained from the synchronously recorded ECG.• Baseline 4: CNN+LSTM_HR_DaLiA.toolbox(Charlton, 2022): a combination of Frequency Modulation (FM) and Baseline Wander (BW) were added to clean signals, while ensuring that no more than 75% of the signal is corrupted to match the maximum degree of corruption expected by SPEAR.HR labels were generated on the clean PPG pairs usingElgendi et al. (2013)'s technique.•Baseline 6: CNN+LSTM_HR_Stanford.This model is trained on the Stanford training set using the procedure as Baseline 5.The model uses a combination of convolutional and LSTM layers and has an identical architecture to Baseline 4.• Baseline 7: DAE_SimNoise.

Table 2 .
Mean Absolute Error (MAE) (± Standard Deviation) in bpm for heart rate estimation.Baselines 1 and 2 are signal processing methods and not trained on a dataset.Baselines 3, 4, 5 and 6 were trained for heart rate estimation.SPEAR and Baseline 7 were trained for denoising.Best performance among all methods is in bold.

Table 3 .
Mean Absolute Error on HR estimation: Activity breakdown for PPG DaLiA.Best performance highlighted in bold.

Table 4 .
Mean Absolute Error (± Standard Deviation) in milliseconds for HRV estimation.Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables 15 Discussion of Results.Experimental results show that SPEAR often outperforms the state-of-theart for HR estimation and generalizes across datasets better than other approaches.This is not unexpected: SPEAR's self-supervised approach allows training of the reconstruction model on a significantly larger dataset since it only requires clean signals to train.Other supervised approachesJ u s t A c c e p t e dA

Table A1 .
Constant Error (± Variable Error) in bpm for heart rate estimation.Tested on the PPG DaLiA and Stanford test datasets.CNN_HR_DaLiA and Baseline 5 CNN_HR_Stanford use a convolutional neural network for HR prediction.They use an identical model architecture.The model consists of two Convolutional-ReLU-MaxPool-Dropout blocks.The convolution layers are one-dimensional and have a kernel size of 9 and 64 and 32 filters respectively.The Max Pooling layer had a size of 4 and Dropout was used with probability 0.1.The two blocks are followed by a Fully Connected Layer that outputs a single HR prediction label.The training data consisted of overlapping time windows of PPG signals: 8-second windows were generated in a sliding window fashion with a 2s interval (6 s of overlap).The model was trained for 100 epochs.Constant and Variable Error Analysis.In this section, we report and analyze the constant and variable error results from the heart rate estimation experiments.Constant Error (CE) is the mean of the measurement errors, taking into account the directionality of the error.It helps identify systematic biases in the model that may consistently skew measurements in a particular

Table A1 .
Ablation & Sensitivity Analysis Results.VE) is the standard deviation of the measurement errors.It provides insights on the consistency and stability (or lack thereof) of the measurements.Table A1 reports the results on the constant and variable errors of the compared methods from the heart rate estimation experiments on the PPG DaLiA and Stanford datasets.SPEAR has a CE of −0.03 bpm on the PPG DaLiA experiment, which indicates a lack of directional bias.It has a CE of 2.33 bpm on the Stanford experiment, which indicates a slight bias in overestimating HR.In the Stanford experiment, the direct HR estimation baselines trained on the Stanford dataset (Baselines 5 and 6) demonstrate relatively lower bias in HR estimation, but relatively higher variability compared to SPEAR.

Table A1 .
Mean Absolute Error on HR estimation: Subject-wise breakdown for PPG DaLiA.Best algorithm's performance for each subject is in bold.Runtime Analysis.In this section, we compare the runtime of each of the algorithms.We ran each of the algorithms on the full PPG DaLiA and Stanford test datasets.On the PPG DaLiA dataset, we measured the average runtime of the algorithm over the subjects' full recordings.On the Stanford dataset, we compute the runtime of the algorithms over the entire test dataset, averaged over 10 runs.The experiments were conducted on a 2021 Macbook Pro with the M1 Pro Chip with 16 GB of RAM.
B.3.Subject-wise Breakdown of MAE Results.TableA1provides a subject-wise breakdown of HR estimation results for the full PPG DaLiA dataset.For fairness of comparison, we include only the baselines that did not use the PPG DaLiA dataset for training.SPEAR achieved the lowest HR estimation error on 13 out of 15 subjects, and the second-lowest error on the other 2.This

Table A3 .
Effect of noise threshold on testing sample size and HR estimation on the PPG DaLiA dataset.The second column shows the percentage reduction in the number of signal recordings for a given noise threshold.The third column shows the mean absolute error in HR estimation on the remaining signals.

Table A3 .
Effect of noise threshold on Reconstruction Ratio (RR).which measures segmentation accuracy.Furthermore, it generalizes across multiple datasets and is available as an open-source repository.Hence, it makes for an appropriate candidate in the SPEAR algorithm.