Pulse rate estimation using imaging photoplethysmography: generic framework and comparison of methods on a publicly available dataset

Objective: to establish an algorithmic framework and a benchmark dataset for comparing methods of pulse rate estimation using imaging photoplethysmography (iPPG). Approach: first we reveal essential steps of pulse rate estimation from facial video and review methods applied at each of the steps. Then we investigate performance of these methods for DEAP dataset www.eecs.qmul.ac.uk/mmv/datasets/deap/ containing facial videos and reference contact photoplethysmograms. Main results: best assessment precision is achieved when pulse rate is estimated using continuous wavelet transform from iPPG extracted by the POS method (overall mean absolute error below 2 heart beats per minute). Significance: we provide a generic framework for theoretical comparison of methods for pulse rate estimation from iPPG and report results for the most popular methods on a publicly available dataset that can be used as a benchmark.


Introduction
Heart rate (HR) is an important indicator of functional status, psycho-emotional state and health conditions in general. Traditionally HR is estimated from electrocardiogram or photoplethysmogram (PPG); however, both techniques require contact sensors, which can be disadvantageous (Perry and Watkins, 2011), whereas non-contact HR estimation is useful, for example, for detecting driver drowsiness or abnormal state (Sahayadhas et al., 2012).
For a non-contact assessment of pulse rate (equivalent of HR obtained from indirect peripheral measurements) imaging photoplethysmogramm (iPPG) analysis has been proposed (Takano and Ohta, 2007;Verkruysse et al., 2008). Similarly to contact PPG, iPPG is acquired by measuring variations in the intensity of light reflected by the skin (see Allen (2007); Tamura et al. (2014) for details), but a video camera is used instead of simple photodetector. Then iPPG is computed from sequence of images, usually acquired from face or palm. Theoretical underpinnings of imaging photoplethysmography are provided in Hülsbusch (2008); Kamshilin et al. (2015); Wang et al. (2017).
Rapid development of iPPG analysis (Tarassenko et al., 2014;McDuff et al., 2015;Sun and Thakor, 2016) emphasizes importance of comparing various algorithms for iPPG-based pulse rate estimation. Theoretical comparison is complicated since algorithms for iPPG acquisition consist of multiple steps that are often non-uniformly described. For empirical comparison a publicly available benchmark dataset is required since pulse rate estimates reported in different studies are not comparable due to the differences in experimental conditions. However, to the best of our knowledge no suitable dataset has been proposed for iPPG benchmarking 1 .
To overcome the problems of comparing algorithms we suggest a generic algorithmic framework describing main steps of iPPG-based pulse rate estimation; we discuss popular methods employed at various steps and compare their performance on a publicly available dataset (Koelstra et al., 2012), containing facial video and reference contact PPG. We report experimental results demonstrating how the choice of the methods for each step influences overall quality of pulse rate estimation.
Our framework consists of five steps 2 . Methods used at single steps of pulse rate estimation were previously compared in (Holton et al., 2013;Cui et al., 2015;Wang et al., 2017); here we combine methods used at each of five steps to find their optimal configurations.

Dataset Description
The Dataset for Emotion Analysis using EEG, Physiological and video signals (DEAP, Koelstra et al. (2012)) contains physiological recordings and frontal face videos of 22 human volunteers watching music videos in 40 one-minute trials. We denote trials as Px Ty, where x is the number of participant in DEAP and y is the number of trial. Altogether, DEAP dataset consists of 861 one-minute trials with facial video and reference contact PPG data (37 trials for P11; 39 for P3, P5 and P14; 40 for other participants). We reject 13 trials where large part of the face was occluded (P4 T17; P6 T24;  P12 T14, T18; P15 T12, T16, T23; P18 T4, T10; P22 T13, T18-T20) since for these videos stable iPPG acquisition was impossible.
Videos were recorded in DV PAL format using a SONY DCR-HC27E camcorder and transcoded to 50 FPS deinterlaced video using the h264 codec. The resolution of all the videos is 720 × 586.
Contact PPG was acquired from the left thumb. We computed reference pulse rate values from PPG by determining intervals between diastolic minima (Schäfer and Vagedes, 2013) using the method proposed in Elgendi et al. (2013) 3 .

Methods
In this section we propose a generic algorithmic framework of iPPG-based pulse rate estimation. It takes as an input a sequence of T RGB frames; t-th frame for t = 1, 2, . . . , T consists of pixels given by vectors are the red, green and blue channels for the pixel with coordinates (i, j); v ⊺ stands for the transposed vector v. The algorithm consists of five steps schematically shown in Fig. 1; below we consider them in details.
1 Uncompressed video normally used for iPPG acquisition is too large for publishing on-line. An attempt to compare existing algorithms on a publicly available MAHNOB dataset  was made in (Li et al., 2014). However, this dataset seems to be unsuitable for iPPG benchmarking as the videos underwent strong compression making consistent iPPG extraction impossible (Wang et al., 2017).
2 A similar three-step framework was proposed in (Rouast et al., 2016), but that article gives an overview of iPPG acquisition while here we focus on the algorithmic details of iPPG processing steps.
3 When applying this method to contact PPG from DEAP, we realize that it does not detect some diastolic minima since their amplitudes vary significantly. To alleviate this problem we introduce two modifications. First, to detect minima with varying amplitudes we determine offset level α (Elgendi et al., 2013, Eq (7)) not as mean of the whole signal, but as running mean over window of 7 s. Second, to reject false positives that may arise from the first modification we add a post-processing step: after method detects diastolic minima DM1, DM2, . . . , DMN, we reject DMi if it holds where coefficients 2 3 and 2.3 are selected empirically. 1. For every frame t = 1, 2, . . . , T select the region of interest ROI(t) as a set of pixels containing PPG-related information, and compute average color intensities over ROI (color signals): where |ROI(t)| is the number of pixels in ROI(t) (see Subsection 2.2.1 for ROI(t) selection).
3. Extract raw iPPG as a combination of refined color signals: with weights w r (t), w g (t), w b (t) ∈ R (see Subsection 2.2.3 for weights calculation).
We test several popular methods for every step of estimation algorithm ( Figure 2) in order to find out which combinations of methods provide most precise pulse rate estimation.
Figure 2: Scheme of considered methods. Big blocks represent five steps of pulse rate estimation (see Figure 1), each box inside the block represents a sub-step and contains a list of methods used at this sub-step. We try various combinations of methods, each time taking one method for every sub-step.

Selecting Region of Interest
To compute color signals c 0 (t) by (1), color intensities c i,j (t) are averaged over ROI. As PPG-induced variations of facial color are weak in comparison with noise and artifacts, the aim of ROI selection is to choose pixels containing maximal pulsatile information, so that averaging reduces noise while preserving the iPPG signal. ROI selection consists of two sub-steps: initial choice of facial region for iPPG acquisition (ROI choice) and excluding irrelevant pixels (ROI refinement).
ROI choice. The most popular approach is to take ROI as a rectangle encompassing the whole-face region (Lewandowska et al., 2011;Poh et al., 2011;de Haan and Jeanne, 2013;Mannapperuma et al., 2014). Other popular regions are the whole face excluding eye region (McDuff et al., 2014b;Li et al., 2014) and forehead (Verkruysse et al., 2008).
In DEAP dataset for some participants EEG cap covers most of the forehead, which hinders using the forehead region; therefore we consider the whole-face region and the facial region below eyes. In both cases we detect facial rectangle for each frame by the commonly-used cascade classifier (Lienhart, 2000) constructed by means of the Viola-Jones algorithm (Viola and Jones, 2001). We take the width of ROI equal to 80% of the estimated face width as recommended in (Poh et al., 2011).
ROI refinement. Even when ROI is selected properly, some pixels may not contain iPPG signal. Examples include non-skin pixels (for instance, hair), over or under-lit areas, damaged pixels in the sensor. To exclude such pixels ROI-refinement methods are used, here we consider two of them.
First, non-skin pixels are discarded. This is an essential part of ROI refinement for DEAP since in many videos cables hang in front of participants' faces. We use simple HSV masking 4 : pixels with hue, saturation or value outside of the ranges [0 • , 46 • ], [23,132] and [88,255], respectively, are considered non-skin and discarded (ranges are selected empirically as providing effective skin selection for the entire dataset).
Then we reject pixels that differ considerably from other pixels in ROI (outliers). Namely, we discard pixels (i, j) that do not satisfy the following inequality (Tasli et al., 2014): In (Tasli et al., 2014) γ = 3 is used; since this value does not provide effective outliers rejection for DEAP videos, we take γ = 1.5.
Another important part of ROI refinement is motion compensation (Kumar et al., 2015;Wang et al., 2015). Here we do not use it since there is no prominent head movements in videos from DEAP dataset.

Pre-processing of Color Signals
At this step refined color signals c(t) are computed from raw signals c 0 (t) for t = 1, . . . , T by suppressing noise and artefacts. To preserve relevant information, frequency components in human heart rate bandwidth (40-240 beats per minute (BPM), which corresponds to 0.65-4 Hz) should not be suppressed. Typical pre-processing sub-steps are detrending, band-pass and moving average filtering (see Figure 2, Step 2). They are often used in combination (Holton et al., 2013;Li et al., 2014), but some sub-steps can be omitted or applied at post-processing (Step 4, see Subsection 2.2.4).
Detrending is important since pulsatile component of iPPG has much lower amplitude than the slowly-varying baseline (Hülsbusch, 2008). A simple detrending method consists in mean-centering and scaling the signal (MCaS, de Haan and Jeanne (2013)): is an L-point running mean of color vectors c(t); we take L corresponding to 1 s. Using MCaS is required for many methods of iPPG extraction (see Subsection 2.2.3).
Moving average (MA) filtering smooths the signal and suppresses high-frequency noise. MA filtering with M -point average is provided by the following equation: When choosing M one should take into account that M -point MA filter suppresses frequencies n M F SR for n = 1, 2, . . ., where F SR is the sampling rate of the signal, see (Smith, 1997, Chapter 16) for details. Since human pulse rate can reach 4 Hz, we recommend using M < 1 4 F SR . For instance, F SR = 50 Hz requires M ≤ 12, thus we consider MA filtering with 3-, 6-, 9-or 12-point average.
Band-pass filtering suppresses frequency components outside the heart rate bandwidth. Here we employ two commonly used filters, either the 255-th order finite impulse response (FIR) filter with linear phase designed using the Hamming window (Lewandowska et al., 2011;Poh et al., 2011;Li et al., 2014) or the 5th order Butterworth infinite impulse response (IIR) filter (Sun et al., 2013).

Extracting Photoplethysmogram from Color Signals
This step (Figure 2, Step 3) can be represented as: ⊺ ∈ R 3 are weights of color signals. For computing these weights the following methods are often used: • Estimating iPPG by the green signal (G method). This approach is popular (Tarassenko et al., 2014;Cui et al., 2015) due to its simplicity, in this case w(t) = (0, 1, 0) that is iPPG 0 (t) = g(t).
• Estimating iPPG by the green signal while the red signal is considered as containing artefacts only (green-red difference or GRD method). Here This method was first proposed in (Hülsbusch, 2008, Chapter 6) as a robust alternative to G method.
• Decomposing color signals into components by means of blind source separation (BSS) and choosing the component with the most prominent peak in the heart rate bandwidth. Independent component analysis (ICA) is the most popular BSS technique for iPPG computation (Holton et al., 2013;Wang et al., 2017). Here we use JADE algorithm of ICA by Cardoso (1999) as suggested in (Poh et al., 2010(Poh et al., , 2011 5 . • CHROM method (de Haan and Jeanne, 2013) employs a model of PPG-induced variations in color intensity and defines iPPG signal as where for i = 1, 2. We follow (de Haan and Jeanne, 2013) in taking L corresponding to 1.6 s.
• The recently proposed POS method (Wang et al., 2017) can be considered as an improved and simplified version of CHROM: , respectively. We take L corresponding to 1.6 s as suggested in (Wang et al., 2017).
In order to make CHROM and POS compliant with our generic framework, we introduce minor algorithmic changes not affecting the nature of the methods. Namely, we use running means and standard deviations instead of computing iPPG signal in overlapped windows. For the considered dataset the performance of our modified versions is slightly better than of the original methods.
Note that special pre-processing is required for some iPPG extraction methods, namely MCaS detrending for GRD, CHROM and POS and band-pass filtering for aGRD. When testing the effect of pre-processing (see Figure 2, Step 2) we always use required pre-processing with these iPPG extraction methods.

Post-processing of Imaging Photoplethismogram
Post-processing (Figure 2, Step 4) improves quality of iPPG signal and is especially necessary if noise and artifacts were not removed at pre-processing ( Step 2) or if iPPG was extracted at Step 3 in a non-linear fashion (which is the case for aGRD, ICA, CHROM and POS). Here we consider three typical sub-steps of post-processing: band-pass, MA and adaptive band-pass (ABP) filtering.
Band-pass and MA filtering described in Subsection 2.2.2 as pre-processing sub-steps can be also used at post-processing (Poh et al., 2010(Poh et al., , 2011, this results in different iPPG signal for all considered methods of iPPG extraction except for linear G and GRD methods. ABP filtering assumes that frequency components of iPPG signal pertaining to pulse rate have relatively high power; then weak components correspond to noise and should be suppressed (Hülsbusch, 2008;Bousefsaf et al., 2013;Wang et al., 2015;Feng et al., 2015). Here we use a two-step wavelet filtering suggested in (Bousefsaf et al., 2016) 6 .
Modifications of iPPG signal provided by MA, band-pass and wavelet filtering are shown in Figure 3.

Estimation of Pulse Rate
We consider here four most popular methods of pulse rate estimation (Figure 2, Step 5).
• Interbeat interval (IBI) estimation is the most direct way to assess pulse rate, however this approach is rarely used for iPPG since precise IBI estimation is often problematic (Schäfer and Vagedes, 2013;Elgendi et al., 2013;Kamshilin et al., 2016). IBI corresponds to a cardiac cycle; thus momentary pulse rate is equal to the inverse IBI duration. IBI is usually defined for iPPG as time between successive systolic peaks (Schäfer and Vagedes, 2013) using some method of peak detection; here we employ method from Elgendi et al. (2013) with modifications described in Subsection 2.1, see Figure 4 for an illustration. For accurate IBI estimation we increase sampling rate of iPPG signal from 50 to 250 Hz using cubic spline interpolation as suggested in (Takano and Ohta, 2007).
• Another approach is to assess average pulse rate as frequency corresponding to maximal power spectral density (PSD). By computing PSD over N points one estimates average pulse rate value over time interval τ = N/F SR , where F SR is the sampling rate of the iPPG signal (F SR = 50 Hz for DEAP). PSD is usually estimated by Discrete Fourier Transform (DFT) or by autoregressive (AR) modeling.
DFT is a direct way to estimate PSD (Poh et al., 2011;de Haan and Jeanne, 2013). Yet, DFT is often criticized (Hülsbusch and Blazek, 2002;Holton et al., 2013) since its frequency resolution is 60/τ BPM (1/τ Hz) and leads to a crude estimation of pulse rate for τ < 20 s, while taking τ > 20 s hinders tracking of pulse rate variations. Here we use N = 1024, which results in averaging pulse rate over τ = 20.48 s.
AR modeling considers iPPG as an output of linear system with added white noise (Takano and Ohta, 2007;Tarassenko et al., 2014); parameters of this system are estimated to compute PSD. In comparison with DFT, AR modeling yields improved resolution for short samples. We implement AR modeling using Burg's method (Matlab function pburg) and employ models either of 23-rd 6 First we perform continuous wavelet transform of iPPG and filter wavelet coefficients with a wide Gaussian window centered at scale corresponding to the maximum of squared wavelet coefficients averaged over 15 s temporal running window. Then we apply usual Gaussian filter. The filtered signal is reconstructed by performing the inverse continuous wavelet transform. See Bousefsaf et al. (2016) Figure 4: Contact PPG and iPPG signal (extracted using POS and post-processed by MA, bandpass and wavelet filtering) for P1 T24, red circles indicate diastolic minima for PPG and systolic peaks for iPPG detected using algorithm from Elgendi et al. (2013) with modifications described in Subsection 2.1. Note that for contact PPG signal interbeat intervals are estimated from diastolic minima since they are more clear and prominent than peaks.
order (for iPPG signal with wavelet filtering at Step 4) or 34-th order (without wavelet filtering) as these settings provide best pulse rate estimation (we have tested orders 5, . . . , 80).
Since DFT and AR modeling estimate only average pulse rate, in order to make all estimates comparable, we average pulse rate estimates for IBI and CWT in windows of τ = 20.48 s.
Note that methods of pulse rate estimation have been recently compared in (Cui et al., 2015) for iPPG extracted using G method, but in that study CWT and AR modeling were not considered, while DFT was used either with long windows of 30 s resulting in low time resolution or with short windows of 2 s providing very low frequency resolution.

Metrics
To investigate quality of pulse rate estimation, we split each trial (see Section 2.1) into epochs of 20.48 s with 9.88 s (approximately 50%) overlap and get five epochs per trial. For each epoch i we compare estimated average pulse rate PR i with the averaged reference value PR ref i . The following quantities are used to assess estimation performance for the epochs of each participant.

Mean absolute error (MAE) is given by
where N = 5 epochs per trial × amount of trials and i is the number of epoch. MAE ≈ 3 BPM was observed in (Tarassenko et al., 2014) for epochs comprising 4 heart beats (approximately 4 s) and MAE ≈ 2.5 BPM on average in (Lewandowska et al., 2011) for 30 s epochs.
Root-mean-square error (RMSE) is given by RMSE is more sensitive to large estimation errors than MAE, so small number of large errors results in high RMSE and low MAE. Pulse rate estimates from uncompressed video of stationary subjects usually have RMSE in range of 1-2 BPM for epochs of 30 -60 s (Poh et al., 2011;Li et al., 2014;Bousefsaf et al., 2016) 8 .
Percentage of epochs (PE) for those pulse rate is estimated with error below 3.5 BPM 9 is given by We also assess quality of iPPG signal by signal-to-noise ratio (SNR) defined as (de Haan and Jeanne, 2013): whereŜ i (f ) is the spectrum of the i-th iPPG epoch computed by using DFT and U i (f ) indicates whether frequency component f is attributed to the signal (U i (f ) = 1) or to noise (U i (f ) = 0): In order to make results comparable with those in (Wang et al., 2017) we take ∆f = 50·60 1024 ≈ 2.93 BPM.

Overview
In Table 1 we present quality metrics fo pulse rate estimates and iPPG extraction methods under best pre-and post-processing (MCaS detrending is beneficial for all methods, other pre-and post-processing methods providing best results are summarized in Table 2). In all cases best results are obtained for whole face ROI with skin selection and outliers rejection (see Figure 2, Step 1).
The lowest estimation errors are achieved when using POS for iPPG extraction and CWT for pulse rate estimation. Altogether, values of quality metrics are comparable with those reported in the literature for pulse rate estimation from uncompressed video (see Subsection 2.3).
Below we discuss influence of various steps on the pulse rate estimation quality. We begin with methods for iPPG extraction (see Figure 2, Step 3), since results for different methods vary considerably (Subsection 3.2). We proceed with ROI selection (Step 1), pre-and post-processings (Steps 2, 4) and finish with pulse rate estimation (Step 5) in Subsections 3.3-3.5, respectively.
8 RMSE < 1 BPM observed in (de Haan and Jeanne, 2013) is obtained for video recorded under dedicated professional illumination, which makes results incomparable with those for DEAP. On the other hand, RMSE > 7.6BP M reported in (Li et al., 2014) for MAHNOB dataset is too high and indicates limited usefulness of this dataset for iPPG-based pulse rate estimation. 9 In (Holton et al., 2013) best method estimates pulse rate with error below 6 BPM for PE6 = 87% of epochs. Here we are interested in percentage of epochs for those pulse rate is estimated well; precision of 6 BPM seems insufficient for this, so we bound error by 3.5 BPM (5% of average human pulse rate 70 BPM). Table 1: Quality metrics (averaged over all epochs and participants) for pulse rate estimates computed from iPPG with best pre-and post-processing (Table 2). Values in each cell stand for MAE (BPM) / RMSE (BPM)/ PE 3.5 (%); best values of metrics for each iPPG extraction method are shown in bold. iPPG

Step 3: Imaging photoplethysmogram extraction
In Table 3 we present performance metrics for all considered iPPG extraction methods. We use preand post-processing (Steps 2, 4) ensuing best SNR (see Table 2) and estimate pulse rate by CWT since this method provides best results at Step 5. Table 3: Average SNR and quality metrics for CWT pulse rate estimates from iPPG extracted using pre-and post-processing settings providing best SNR. In each cell we present values with and without wavelet filtering. For comparison we include overall SNR values from (Wang et al., 2017). Best values for each quality metrics are shown in bold. As you can see from Table 3, POS has the highest signal-to-noise ratio and provides most precise pulse rate estimation. The ranking of methods is generally in line with results reported in (Wang et al., 2017), except for GRD performing worse than ICA. We explain this difference by sensitivity of ICA to the number of source components in the signal; light variation and motion in (Wang et al., 2017) introduce additional components to the color signals and may complicate extraction of pure iPPG by means of ICA.
Note that average SNR for DEAP dataset is worse than values reported in (Wang et al., 2017). It might be due to the compression of videos in DEAP and using professional dedicated lighting for video acquisition in (Wang et al., 2017). Figure 5 shows average values of SNR and MAE for every participant for three best iPPG extraction methods (POS, CHROM and ICA). In most cases high SNR corresponds to low MAE, which (as expected) indicates that good quality of iPPG ensures precise pulse rate estimation.  (Table 2) and CWT pulse rate estimation.

3.3
Step 1: ROI Selection ROI choice. Results for the whole face region are better than for the region below eyes, both in terms of iPPG quality and pulse rate estimation. Namely, SNR for signal acquired from the whole face region is at least 0.2-0.3 dB higher (for GRD and aGRD; for other methods the difference is 1.1-1.9 dB), while MAE of pulse rate estimation is lower (contribution varies from 1% for GRD and aGRD to more than 10% for other methods).
ROI refinement. Outliers rejection is always beneficial, as it increases SNR of iPPG signal (0.25-0.5 dB) and decreases MAE (10%-15% when CWT or DFT pulse rate estimation is used in combination with GRD, aGRD, CHROM or POS methods and 5%-10% otherwise).

Steps 2 and 4: Pre-processing and Post-processing
Detrending. Among two detrending methods, MCaS always improves pulse rate estimation while SPA does not provide any positive effect (probably raw color signals in our study are too noisy for successful application of this technique). For G, aGRD and ICA methods of iPPG extraction ( Step 2), using MCaS increases SNR (average increase is 1.6, 0.7 and 3 dB, respectively) and improves pulse rate estimation (decrease of MAE is above 12%). For CHROM, POS and GRD methods using MCaS at pre-processing is immanent, therefore performance without MCaS was not tested.
MA filtering. Quality of iPPG signal and of pulse rate estimation enhances with increase of MA filter length M and reaches maximum for M = 12 (MA filtering with M > 12 affects heart rate bandwidth and was not tested). The only exception is pulse rate estimation by CWT (Step 5) from iPPG obtained by CHROM and POS methods: in this case best results are observed for M = 9. Figure 6 illustrates this effect for MAE, effect on other quality metrics is similar.  Table 2).
Band-pass filtering improves quality of iPPG signal and, in most cases, performance of pulse rate estimation, see Table 4. Surprisingly, band-pass filtering has little positive or even negative effect on pulse rate estimation by AR modeling; we cannot explain this result. The 255th order FIR filter performs slightly better than the 5th order IIR Butterworth filter; this was expected since frequency response of the latter is slightly worse. Wavelet filtering results in elimination of large errors in pulse rate estimation which is reflected by prominent decrease of RMSE (see Table 3). However, wavelet filtering only slightly improves MAE and almost does not change PE 3.5 . Parameter choice for the wavelet filter deserves a separate study: preserving several harmonics of pulse rate is of interest to keep the shape of iPPG signal.
Pre-vs post-processing. Band-pass and MA filtering are preferable at pre-processing ( Step 2) when ICA or aGRD are used at Step 3 and at post-processing (Step 4) for other methods of iPPG extraction, see Table 2. For ICA, POS and CHROM using filtering at a different step considerably decreases quality of iPPG and of pulse rate estimation. This is quite unexpected since originally POS and CHROM were proposed with band-pass filtering as pre-processing (de Haan and Jeanne, 2013;Wang et al., 2017), while for ICA post-processing was recommended (McDuff et al., 2014a). For aGRD band-pass filter is essentially a pre-processing sub-step and we do not observe any difference between using MA filter as pre-and post-processing.

Step 5: Pulse Rate Estimation
The best results in terms of all metrics are provided by CWT. This method is especially useful since it allows to estimate not only average but also momentary pulse rate (see Figure 7a).
Other tested methods for pulse rate estimation have certain drawbacks. DFT provides the second best result in terms of MAE and PE 3.5 (Table 1), but it has low frequency resolution (see Figure 7b) and highest RMSE. IBI estimation (Figure 7c)  Figure 7: Performance of pulse rate estimation methods for iPPG signal extracted using POS method from data P1 T24: momentary pulse rate estimated by CWT (a, smoothed by 1-s moving average) and IBI (c), spectrograms of iPPG signal estimated using DFT (b) and by AR modeling (d).

Conclusion
Let us summarize the main results of this work. We have established a generic framework for iPPGbased pulse rate estimation. Using this framework we have compared various methods of iPPG analysis for compressed video from DEAP dataset; best pulse rate estimation is obtained when using following methods.
Step 1, ROI selection: whole face ROI with skin selection and outliers rejection.
Step 2, pre-processing: mean-centering and scaling; moving average filtering (for ICA) with filter length M close to 1 4 F SR , where F SR is sampling rate in Hz; band-pass filtering (for ICA and aGRD) with 255th FIR filter.
Step 3, iPPG extraction: POS; result for CHROM and ICA are also relatively good.