Robust respiration detection from remote photoplethysmography

: Continuous monitoring of respiration is essential for early detection of critical illness. Current methods require sensors attached to the body and/or are not robust to subject motion. Alternative camera-based solutions have been presented using motion vectors and remote photoplethysmography. In this work, we present a non-contact camera-based method to detect respiration, which can operate in both visible and dark lighting conditions by detecting the respiratory-induced colour diﬀerences of the skin. We make use of the close similarity between skin colour variations caused by the beating of the heart and those caused by respiration, leading to a much improved signal quality compared to single-channel approaches. Essentially, we propose to ﬁnd the linear combination of colour channels which suppresses the distortions best in a frequency band including pulse rate, and subsequently we use this same linear combination to extract the respiratory signal in a lower frequency band. Evaluation results obtained from recordings on healthy subjects which perform challenging scenarios, including motion, show that respiration can be accurately detected over the entire range of respiratory frequencies, with a correlation coeﬃcient of 0.96 in visible light and 0.98 in infrared, compared to 0.86 with the best-performing non-contact benchmark algorithm. Furthermore, evaluation on a set of videos recorded in a Neonatal Intensive Care Unit (NICU) shows that this technique looks promising as a future alternative to current contact-sensors showing a correlation coeﬃcient of 0.87.


Introduction
Monitoring of respiration is important in clinical care since it provides valuable information of a person's health status.An abnormal respiratory rate (RR) is a sensitive early indicator of critical illness that often accompanies, and may precede, changes in other vital signs such as heart rate, blood pressure, or reduction in peripheral oxygen saturation (SpO 2 ) [1].For example, events of apnea can lead to permanent brain damage and even death.Continuous monitoring of respiration has the potential to detect and prevent such events from occurring.
The common methods to monitor respiration are contact-based and consequently require one or more sensors attached to the body, e.g.electrodes or a belt [2].Recently, Charlton et al. [3] assessed the performance of 314 algorithms for the estimation of RR from ECG and PPG waveforms under ideal operation conditions.They showed that most time-domain techniques perform better compared to frequency-domain techniques.Their superior performance may be explained by the fact that the respiratory signal is not required to be quasi-stationary, unlike frequency-domain techniques.The PPG waveform contains three respiratory features [4].From the three PPG respiratory features, the respiratory induced intensity variations, i.e. baseline modulation, provides the highest accuracy, and a combination, fusion, of all three features in general performed better compared to the features solely.It should however be noted that Charlton et al. used single-channel contact PPG signals, and furthermore, they benchmarked the algorithms under ideal operation conditions.Albeit their ability to measure respiration, most of these methods are cumbersome and can therefore cause stress and discomfort to the patient.Non-contact methods to monitor respiration address these issues.
In this paper, we introduce a non-contact-based respiration monitoring system.Alternative noncontact methods have been proposed in different ranges of the electro-magnetic spectrum [5][6][7], e.g.radar-based or using thermal cameras.These methods however require expensive equipment which limit their applicability.Additionally, non-contact methods have been documented using low-cost cameras based on motion or remote PPG (rPPG).Motion-based methods [8][9][10] detect the respiratory induced movements of the chest and/or abdomen.The challenge with these methods is to differentiate between respiratory-induced movements and other movements which are not related to respiration.Furthermore, to clearly register the minute respiratory-induced chest movements, the range of viewpoints of the camera is somewhat limited.rPPG-based methods extract respiration from respiratory features present in the blood volume pulse signal [4,[11][12][13][14].A number of approaches have been proposed to extract these features from the PPG waveform, including wavelet decomposition [15], complex demodulation [16] and auto-regression [17].The need for fairly long time windows and assumptions on the regularity of the RR limit the applicability of these methods in real life conditions.Furthermore, all aforementioned contact and non-contact methods for RR measurement that use the PPG signal rely on single-channel PPG waveforms.Single-channel PPG waveforms do not allow to eliminate in-band distortions, such as sensor noise and motion artifacts.This is especially problematic for non-contact based solutions since these distortions are typically present in the rPPG waveforms which have a lower signal-to-noise ratio (SNR) compared to contact PPG waveforms.In [18], it has been shown that robust pulse rate detection is feasible when using a multi-channel camera.This solution exploits the different characteristics of cardiac-related blood volume variations and distortions, e.g.specular reflectance and motion.Inspired by this result, our proposed method also uses a multi-channel camera.
In this paper, we present a novel motion-robust non-contact, camera-based method to extract the respiratory signal near-continuously in both visible and dark (infrared) lighting conditions by exploiting the respiratory-induced skin colour variations present in the different channels of the camera.Furthermore, we exploit the spatial redundancy of the camera to obtain a good quality rPPG signal.From this rPPG waveform we extract the respiratory-induced baseline modulation to obtain the respiratory rate.Compared to earlier methods, the length of time windows are significantly reduced and no assumptions on the periodicity of the respiratory signal are made, which makes it possible to detect irregular breathing patterns and even (central) apneustic events.Furthermore, the proposed method is robust to subject movements not related to respiration.

Materials and methods
In this section, we will first summarize the earlier work on rPPG and underlying elementary physiology and optics, which are the foundation for our proposed method.Hereafter, we present the processing framework, the protocol and setup used for the creation of our dataset, a description of the benchmark algorithm and evaluation metrics, and finally the implementation details.

PPG RIFV RIIV RIAV
Fig. 1.Respiration modulates the PPG signal in three ways; 1) RIFV is a synchronization of heart rate with respiratory rate, 2) RIIV is a change in the baseline signal due to intrathoracic pressure variation, and 3) RIAV is a change in pulse strength caused by a decrease in cardiac output.

Background
The beating of the heart causes pressure variations in the arteries as the heart pumps blood against the resistance of the vascular bed.Since the arteries are elastic, their diameter changes in sync with the pressure variations.These diameter changes occur even in the smaller vessels of the skin, where the blood volume variations cause a changing absorption of the light.Photoplethysmography (PPG) uses this principle for the optical measurement of blood volume variations by capturing the reflected or transmitted light from/through the illuminated skin, resulting in a PPG waveform.Respiration modulates this PPG waveform in three ways [4], which is visualized in Fig. 1: • Respiratory induced frequency variation (RIFV) -A periodic change in pulse rate that is caused by an autonomic nervous system response.The heart rate synchronizes with the respiratory cycle (RSA).
• Respiratory induced intensity variation (RIIV) -A change in the baseline signal that is caused by a variation of perfusion due to intra-thoracic pressure variations.
• Respiratory induced amplitude variation (RIAV) -A change in pulse strength that is caused by a decrease in cardiac output due to reduced ventricular filling during inspiration.
Respiration, much like the contraction of the heart, also causes blood-pressure variations, as the varying pressure in the chest-abdominal area affects the pressure in the large blood vessels.Where the pulse causes volume variations mainly in the arteries, the respiration also affects the pressure, and consequently the volume in the veins.Arteries and veins have different mechanical properties.Under low pressure, veins are 10-20 times more compliant than arteries [19].Vessel compliance (C) is defined as the ability of a blood vessel to distend and increase in volume (∆V ) with increasing transmural pressure (∆P): Transmural pressure is the difference in pressure between two sides of a vessel wall (∆P = P inside − P out side ).With small changes in pressure, the circulating blood inside the veins experiences large volume changes compared to the arteries because of the difference in compliance.
The effects of pressure changes and venous return caused by breathing have been studied, but contradicting observations have been found.This is probably due to the high complexity of the underlying principle which is not fully understood yet.The volume of the thoracic cavity increases during inspiration, and therefore intra-thoracic pressure decreases, causing an increase in long volume forcing air into the lungs.For venous return to the heart, two large veins are present which deliver deoxygenated blood to the right atrium of the heart.The inferior vena cava (IVC) returns blood from all body regions below the diaphragm, whereas the superior vena cava (SVC) transports the venous blood from the upper part of the body to the heart.The increase in intra-abdominal and/or intra-thoracic pressure, depending on the type of breathing, during inspiration causes a partial collapse of the venae cavae.This partial collapse either leads to an increase or a decrease in venous return, depending on the pressure gradient.Although it is relevant to understand the underlying principles, without knowing whether the venous return increases or decreases during inspiration, useful analysis can still be performed since we are interested in differences of the observed amplitude rather than the sign of these changes.
Current motion-tolerant rPPG methods for pulse extraction require a multi-spectral camera, e.g.RGB, which captures blood volume variations at different wavelengths [12,20,21].The pulsatile amplitude of the PPG waveform as function of wavelength is simulated by Hülsbusch, who explained that the relative PPG-amplitude is determined by the contrast between the blood and the blood-free tissue [22].The absolute PPG-amplitude as function of wavelength was measured by Corral et al. using a spectrometer [23] and a white, halogen, illumination.This absolute PPG spectrum PPG(w), displayed in Fig. 2(a), is related to the relative PPG-curve, RPPG(w), via the emission spectrum of the halogen illumination, I (w), and the skin-reflection spectrum, ρ s (w):

(B)
HbO 2 Hb Fig. 2. a) The measured absolute PPG spectrum of Corral [23] and the derived relative PPG spectrum, scaled to 1 for their peak locations.b) The absorption spectrum oxyhemoglobin (HbO 2 ) and hemoglobin (Hb) [24].Since venous blood has a different ratio of HbO 2 and Hb compared to arterial blood and these chromophores have different absorption spectra, also the venous and arterial blood have a different absorption spectrum.
These curves are simulated/measured for arterial blood with normal blood oxygenation levels.However, because of its lower oxygenation level and hereby different ratio of oxygenated and deoxygenated hemoglobin, venous blood has a slightly different absorption spectrum compared to arterial blood.From Fig. 2(b) it can be observed that because of the different ratio of Hb and HbO 2 and the difference spectra of both chromophores, venous blood has a different absorption spectrum compared to arterial blood.In visible light, [400-700] nm, this difference is mainly in red, 600 ≤ λ ≤ 700 nm.As can be observed from the same figure, in near-infrared (NIR), λ > 700 nm, the absorption spectra of Hb and HbO 2 also differ, resulting in different absorption spectra for venous and arterial blood, with the exception of the "isosbestic" point around 805 nm.It has been shown in [21] that the main PPG-contribution in the red colour channel of a video camera comes from the wavelength interval between 500 and 600 nm.
An important consequence of the above reasoning is that the linear combination of the normalized colour channels that provides the pulse signal with the best signal-to-noise ratio (SNR), is approximately the same linear combination that would also provide the respiratory signal with the best SNR.In the following section, we shall elaborate this method to obtain a robust respiratory signal from an rPPG-camera.

Processing framework
An overview of our proposed processing framework is visualized in Fig. 3.In the next subsections we will provide a detailed description of each processing step.3) The extracted respiratory signal is scaled based on the ratio of respiratory and pulse energies.

Tracking
The first stage of the framework is the tracking stage, where the movements of the selected Region-of-Interest (ROI), indicated with the bounding box, are being tracked.For this task, the feature-based Kanade-Lucas-Tomasi (KLT) tracker is employed [25] because of its accuracy, simplicity and limited assumptions made about the underlying image.Feature points (indicated with white crosses) are calculated using the minimum eigenvalue algorithm [26].The geometric transformation of the feature points between two consecutive frames is calculated and applied to the bounding box.This bounding box is subsequently down-sampled into equally-sized subregions.
For each frame, the spatial average of both the pixels within the ROI and each subregion are calculated, which enables to discard distorted, unreliable, subregions as will be discussed later.
The temporally normalized pixel differences between two adjacent frames for (sub)region i are defined as: where d = (dx, dy) is the spatial displacement between t and t + 1.These motion-compensated, normalized pixel differences are concatenated into a collection of traces, C n, i , each corresponding to a (sub)region, and subsequently integrated over the window-length l, which is the input for our method: In our system we selected the face as the ROI for both physiological and practical reasons.The practical reason to use the face is that it is one of the few human body parts which is typically not covered by clothes and therefore skin is directly exposed to the camera to measure blood volume variations.One physiological reason to use the face is that blood volume variations are well-measurable from this anatomic location.In their study, Tur et al. [27] revealed a collection of regions (finger, palm, face, ear) for which cutaneous perfusion is much higher than all other locations.Another physiological reason is that the RIIVs are well-present in photoplethysmographic signals from the face because the distance to the heart is small compared to the locations at the extremities which are typically also not covered, e.g. the hand.Nilsson [28] measured the respiratory energies in the PPG signals from multiple sensors attached to body parts including hand and forehead, and found that the respiratory energy on the forehead is approximately a factor of six larger compared to that on the finger.For the tracking of the face we decided to manually initialize the ROI and not to use a face detector, e.g. the commonly used Viola-Jones detector, since these are typically trained for full view frontal upright faces, whereas our dataset also contains faces which do not meet this criterion.

Processing
After obtaining the motion-compensated, normalized pixel differences, we aim to find the optimal linear combination to construct the cardiac pulse signal, and hereafter, the respiratory signal.The processing stage consists of two operations: 1) weights calculation, and 2) weights selection.

1) W C
The cardiac pulse signal S can be written as a linear combination of the temporally normalized colour channels C n : where the weights W can be determined by blind source separation techniques (BSS) [12,29].However, a heuristic selection criterion, e.g. based on the periodicity of the pulse-signal, is required to select the component corresponding to pulse or respiration.Two current state-of-the-art rPPG algorithms which do not require this selection criterion are 'CHROM' [20] and 'PBV' [21].We will evaluate both methods to calculate the weights.Essentially, the weights are calculated for the filtered traces of normalized pixel differences which include the range of pulsatile frequencies.The pulse rate of a healthy adult is in the range 40-240 BPM and breathing rates are typically in the range 10-40 breaths/min.More details on the selection of these filter parameters for adults and how these have been selected for neonates can be found in Section 2.7.
In the continuation of the method description we will use the filter parameters for adults.
A filter with a pass-band of 40-240 BPM is designed and applied to the normalized pixel differences, leading to C fp .The weights are calculated and applied to C fp , leading to a first signal, S 1 , which contains the cardiac pulse signal: S 1 = W C fp .Subsequently, these weights are applied to differently filtered pixel differences, C fr , which include only respiratory frequencies.The resulting signal, S 2 , contains the respiratory signal: S 2 = W C fr .We will now briefly discuss the two methods used to calculate the weights.

CHROM method
The chrominance-based method (CHROM) uses colour difference signals, X s and Y s , in which the specular reflection component is eliminated, assuming a standardized skin-colour vector in RGB-space, [0.77, 0.51, 0.38], enabling white-balancing of the camera.Its weights result as: with , with X s = [+0.77,−0.51, 0]C n and Y s = [+0.77,+0.51, −0.77]C n (7) and where the operator σ corresponds to the standard deviation.For applications in NIR, the assumption of the standardized skin-colour vector does not hold and has to be modified, resulting in different parameters for X s and Y s .The color of the light is not very important, and the spectrum also does not have to be continuous, e.g.fluorescent lamps are allowed.For more details on the CHROM algorithm, we refer to [20].

PBV method
Compared to CHROM, the PBV method does not make assumptions on the distortions or skin-colour, but suppresses all variations not aligned with the signature of the blood volume pulse, i.e. the normalized ratio of pulse amplitudes in the different colour channels, compiled in P bv .Its weights are determined as: where Q is the covariance matrix and k the gain.
Based on earlier work in visible light [21] and NIR [30], the values for P bv are selected as: P RG B bv = [0.33,0.78, 0.53] and P 675,800,840 bv = [0.29,0.74, 0.61].The parameters of P bv are, among others, dependent on the camera sensitivity and the illumination conditions.If the experimental conditions change compared to our experimental settings, re-calibration is required, particularly when narrow-spectrum light-sources are applied.However, small variations in setup, e.g. using a camera with different sensitivity specifications, will not have a large impact on the performance.For more details on the determination of P bv we refer to [21].

2) W S
From the collection of weights from each (sub)region, the 'best' weights need to be selected, which are subsequently applied to the normalized differences of the entire ROI, which include only respiratory frequencies, C fr .This is achieved by selecting the weights which provide the pulse signal with the highest SNR.These weights suppress distortions best and are consequently best capable for the extraction of the respiratory signal to suppress distortions in this frequency band.In order to calculate the SNR of each sub(region), the correct pulse rate is required.Since not all signals will have a clear energy peak in its spectrum, a robust estimation of pulse rate is required, which will be employed for the calculation of SNR values of all (sub)regions.From the collection of pulse traces, an average pulse trace is constructed by calculating the α-trimmed mean, where α is set to 0.7 based on experimental evaluation.Principal component analysis (PCA) is performed on the periodic pulse traces to obtain the eigenvectors, which are ranked in terms of variance.The eigenvector (among the top 5 eigenvectors) that has the best correlation with the mean pulse trace is selected to be the pulse signal after correcting the arbitrary sign of the eigenvector as: where P t→t+l eigen and P t→t+l mean indicate the eigenvector and mean pulse trace respectively, <, > corresponds to the inner product (correlation) between two vectors, and |.| denotes the absolute value operator.Pulse rate is subsequently determined by selecting the peak in the spectrum of the selected eigenvector.The SNR is defined as: where f is the frequency in beats per minute (BPM), S f = F (S) and (U t ( f ) is a binary template window centered around the pulse rate peak and its harmonics with a predefined margin.

Scaling
In eliminate the influence of the momentary strength of the PPG-signal on the amplitude of the respiratory signal, a gain factor, k, is computed by the ratio between the energies in the respiratory frequency band, e.g.10-40 breaths/min, and the energy of the pulse signal.It is fair to assume that when the pulse amplitude doubles, also the respiratory amplitude doubles.Hence, by using the relative amplitude of respiratory energy versus pulse energy, one gets rid of the variations in pulsatility over time: where pulse rate (PR) is determined by a peak-detector in the frequency domain and the margin value is determined based on the length of the Fourier window.After scaling, the partially overlapping time-intervals are glued together with an overlap-add procedure similar to [20], by using Hanning windowing on individual intervals.

Experimental setup
For the recording of video sequences in visible light and infrared, two separate experimental setups are used.In both setups, participants in the experiments are asked to follow a particular breathing pattern visualized on a screen in front of the participant.The video sequences in visible light are recorded with a global shutter RGB CCD camera (type USB UI-2230SE-C of IDS) and stored in an uncompressed data format, at a frame rate of 20 frames-per-second (fps), with a resolution of 768 × 576 pixels and with 8 bits depth.Recordings are made in a room with stable light conditions.Participants wear a finger sensor (pulse-oximeter), which data is synchronized with the video frames.To include both the face and chest-region, the camera is placed at a distance of 1 meter.An illustration of the experimental setup is visualized in Fig. 4.
The experimental setup used for recordings in infrared is similar to the setup in an earlier study [30].Three monochrome cameras (type F046B of Allied Vision) with 25 mm lenses capture at a frame rate of 15 fps, with a resolution of 640 × 480 pixels and with 8 bits depth.Optical filters of 675, 800 and 840nm are mounted to the cameras.The data is transferred to an acquisition PC over FireWire, where it is stored uncompressed.An illumination unit consisting of incandescent light bulbs is placed in front of the participant.

Dataset
The performance of our proposed method is evaluated on two different datasets: A) guided breathing of healthy adults in a laboratory setting, and B) spontaneous breathing of neonates in an intensive care environment.The study was approved by the Internal Committee Biomedical Experiments of Philips Research, and the informed consent has been obtained for each adult subject.In addition, the medical ethical research committee at Maxima Medical Center (MMC) approved the neonatal study and informed parental consents were obtained prior to data acquisition.Guided breathing scenarios enable to simulate challenging scenarios over the entire range of breathing rates, which may not be present during spontaneous breathing.In real life, during spontaneous breathing, both RR and respiratory effort are varying continuously.To evaluate the performance for these scenarios, a dataset consisting of videos recorded in a NICU is created.

Guided breathing
For the guided breathing scenario's, recordings are made of three (in visible light), and one (in infrared), healthy, Caucasian adult males, which are in sitting position.The duration of each recording is 120 or 150 seconds, depending on the scenario.The participants are asked to follow the breathing patterns displayed on a screen in front of them and to keep their head steady for all non-motion scenarios.Except for the shallow breathing scenario, participants are asked to breath with normal air volumes.The benchmark dataset consists of 32 recordings with a total duration of 60 minutes and contains 1092 breaths in total.An overview of the breathing patterns is provided in Fig. 5.   • (A,B,C  • (E) Rapidly changing respiratory rate 120 seconds recording, a constant RR of 15, with 3 events where the RR shortly increases to 35 breaths/min for 10 seconds.
• (B) Central apneustic event 150 seconds recording, a constant RR of 20, where the participant is asked to hold breath after 60 seconds for as long a possible, after which the breathing pattern is followed again.
• (B) Motion 150 seconds recording at a constant RR of 20, with head movements uncorrelated with respiration after 30 seconds.Subjects were asked to move their head quasi-periodically with frequencies non-equal to the constant breathing frequency to assure that the extracted respiratory signal is not induced by motion.The type of motion is translational, where we instructed the subjects to move their head within the sight of the camera, resulting in average ROI displacements of approximately 200 pixels.
• (F) Shallow breathing 120 seconds recording with shallow breathing at a constant rate of 20 breaths/min.

Benchmark algorithm
To benchmark our proposed method, we compare the output with the state-of-the-art using the method of Karlen et al. [4] as our benchmark algorithm.This method is the best-performing PPG-based respiration detection algorithm from the 314 respiratory algorithms evaluated by Charlton et al. [3].Karlen et al. use a smart fusion of all three respiratory modulations.Since our method only uses the baseline modulation of the PPG signal, we compare the performance of our method with two versions of the benchmark algorithm: 1) the complete framework including all three features (BM F ), and 2) with only the RIIV as feature (BM I ).Because the algorithm can only be applied on a single waveform, we selected the wavelength with the highest SNR; the green colour channel in visible light and 840 nm in infrared.

Evaluation
To assess the performance of the proposed method, the instantaneous respiratory rates are calculated.Here a peak detection algorithm is applied on the interpolated respiratory signal and compared with the ground truth: the breathing pattern for guided breathing scenarios, and the ECG-derived respiratory signal for the NICU recordings.We preferred the guided breathing pattern over a possible respiration belt, as the latter suffers too much from the subject motion to be a reliable reference.The Pearson correlation coefficient r, slope of linear fit B, mean absolute error (MAE), root-mean-square-error (RMSE) and standard deviation (σ) are calculated.Furthermore, correlation plots are included and Bland-Altman analysis is performed to test for magnitude bias in respiratory rate differences.Here, the 95% limits of agreement were determined by [-1.96σ,+1.96σ].

Parameter settings
To determine the filter parameters for bandpass filters of our method, the typical ranges in breathing and pulse rate for neonates and adults have to be taken into account.Both the respiratory and pulse rate are approximately a factor of three higher for neonates compared to adults [31].The normal resting breathing rate of adults is in the range [12][13][14][15][16][17][18][19][20] breaths/min, whereas for neonates normal rates are in the range  breaths/min.Breathing rates lower than 10 breaths/min.may occur for adults under extreme resting conditions, however, other additional rhythms, i.e.Traube-Hering-Mayer (THM) rhythms and vasomotion, interfere with the respiratory signal.THM rhythms, which are caused by the sympathetic control of the tones of the vascular tree, have a fixed rate of about 6 min -1 ; whereas vasomotion rhythms, which are slow rhythmic changes in the diameter of the small blood vessels of the microcirculatory bed, have a frequency of 4-9 min -1 [32].The normal pulse rate of adults is in the range [60-100] BPM, whereas for neonates it is in the range [120-160] BPM.These values are for healthy subject under normal, resting conditions.To verify the performance of our method for realistic scenarios where rates are outside these ranges, we chose our filter parameters for C fp+fr as [40-240] and [100-240] BPM for adults and neonates, respectively.For C fr , we set the filter parameters to  BPM for adults and [25-100] BPM for neonates.The number of subregions is set to 30 for all evaluated recordings and the length of the time-windows is set to 8 seconds with 80% overlap.Both parameters are set heuristically.

Implementation
The proposed algorithm is implemented in Matlab (The Mathworks, Inc.) and executed on a laptop with a Intel Core i5 2.60 GHz processor and 8-GB RAM.A rectangular ROI indicating the face region is initialized manually in the first frame of the sequence.

Guided breathing
An overview of the results for the different guided breathing scenarios is visualized in Fig. 6 for both CHROM and PBV.The evaluation results for each scenario, both in visible light and infrared, are summarized in Table 1, together with the results of the benchmarking algorithm and the overall results.Correlation plots and Bland-Altman analysis are displayed in Fig. 7.

Spontaneous breathing
The evaluation results of the neonatal dataset are displayed in Table 2. Figure 8 provides a visual comparison between both proposed rPPG methods and the reference signal derived from ECG. Correlation plots and Bland-Altman analysis are displayed in Fig. 9.

Guided breathing
• Constant breathing rate: At low respiratory rates, the breathing pattern, and consequently the respiratory signal, is typically not sinusoidal, which is an implicit assumption of our algorithm.This may lead to selecting of local peaks and consequently erroneous instantaneous breathing rates.However, these low rates can still be clearly observed in the  frequency spectra obtained on longer time-windows, as can be observed from Fig. 6(b).For normal breathing rates, high breath-to-breath accuracy is achieved, as can be observed from Fig. 7.At high breathing rates a modest reduction in breath-to-breath accuracy is observed.This can be explained by the decreased amplitude of RIIVs at increased breathing rates [33].
• Responsiveness: A large benefit of RIIV-based methods compared to RIFV-based methods is that short time-windows can be used to extract the respiratory signal, allowing to track rapid changes in breathing rate near-continuously without the requirement to make any assumptions on the periodicity of the respiratory signal.Figures 6(b) and 6(c) demonstrate this responsiveness of our method.
• Apnea: During central apnea, when breathing movements disappear, the intra-thoracic pressure variations that drive the circulatory variation synchronously with respiration are gone.Although the rhythmic RIIV signal disappears, there are fluctuations that can interfere with registration of respiration.Irregular fluctuations of low amplitude can be observed in the peripheral venous pressure and RIIV signals during apnea as can be observed in Fig. 6(d) [34].During obstructive apnea, when the airway is partly obstructed, an increase in the force of respiratory movements takes place and RIIV is more prominent.This confirms the hypothesis that, similar to motion-based methods, respiratory effort is detected with RIIV-based methods, and not the actual airflow or related modulations.
• Motion: Movement of the head causes intensity variations within the tracked ROI, which are typically stronger than the RIIVs.Most earlier proposed methods including the benchmark do not aim to suppress these distortions, but instead detect these to exclude them from the measurement.As can be observed from Fig. 6(e), when applying the weights on the normalized colour differences in the respiratory frequency band, it is possible to eliminate non-respiratory related intensity variations and accurately extract the respiratory signal.
• Shallow breathing: Shallow breathing causes reduced intra-thoracic/abdominal pressure variations compared to normal breathing, and consequently also reduced RIIVs.As a consequence, it can be observed from Fig. 6(f) that the breath-to-breath accuracy is decreased compared to Fig. 6(a).However, the constant breathing rate of 20 min -1 is still clearly detectable in the spectrogram and our method still clearly outperforms the benchmark.

Spontaneous breathing
Comparison in performance between the two camera distances shows that distance has no large influence.Only a modest decrease in performance is observed for increased distance, 0.09 BPM, which is likely caused by the reduced number of pixels per subregion, leading to a lower SNR.Compared to the guided breathing scenarios, the performance is somewhat worse.This was expected for a number of reasons: 1) RIIVs are reduced in supine position compared to sitting position [35], 2) the variability in both breathing rate and amplitude are much higher, and 3) the ground-truth used for evaluation is derived from ECG which suffers from motion-artifacts itself, as can be observed from Fig. 8. Consequently, not all peaks can be accurately identified.It should also be mentioned that we evaluate our algorithm based on individual breath-intervals without post-processing, whereas many other algorithm, e.g. the RR oxi algorithm of Addison et al. [36], use long time-windows and may additionally average over previous estimates to arrive at a breathing rate.This may yield a smaller error, but rapid changes in breathing rate cannot be tracked and the potentially dangerous events of apnea may not be detected.
Overall, we consider the results promising for a future transition to remote monitoring of respiration.However, all results for guided breathing are obtained on young, healthy subjects.Since the neonatal study is limited, extensive validation on subjects suffering possible health issues is needed to proof clinical validity of our method.A potential improvement can be identified in a hybrid system, where our rPPG-based method is combined with a motion-based method.This may eliminate the major limitations of the individual approaches; the requirement of visible skin for our method, and the requirement of visible chest/abdomen for motion-based methods.

Conclusion
We have demonstrated that respiration can be detected with a camera in both visible and dark lighting conditions by using the close similarity between pulse and respiration induced colour variations of the skin.The proposed method has been thoroughly evaluated using 52 challenging videos containing both seated adults performing guided breathing, and neonates in a supine position breathing spontaneously.For guided breathing, all typical respiratory rates in the range of 10-40 breaths/min can be detected, even when the changes in rate are rapid and transient.Furthermore, respiration can be detected during head movements and also potentially dangerous breath-holding events, e.g. during central apnea, can be clearly identified.Overall, the mean absolute error for guided breathing scenarios is 1.74 BPM and 2.27 BPM in visible light and infrared, respectively, compared to 3.55 BPM and 3.65 BPM for the best-performing benchmark algorithm.For spontaneous breathing, the breathing rate can be detected with an error of 4.72 BPM compared to 7.40 BPM for the benchmark.This result demonstrates an important step towards a non-contact alternative for the commonly used contact sensor(s), which may lead to trauma of the fragile skin of neonates.Our proposed method showed a large improvement to earlier PPG-based methods for respiration monitoring.

Fig. 3 .
Fig. 3. Overview of the proposed framework for robust respiration detection from remote PPG. 1) The manually initialized bounding-box indicating the face is tracked over time and divided into equally-sized subregions.2) The weights for each (sub)region are calculated.From this collection of weights, the best are selected based on the SNR values of the pulse signals.3) The extracted respiratory signal is scaled based on the ratio of respiratory and pulse energies.
t→t+l sel ect ed = P t→t+l eigen , P t→t+l mean | P t→t+l eigen , P t→t+l mean | × P t→t+l eigen ,

Fig. 4 .
Fig. 4. Overview of the experimental setup used for the creation of the dataset.

Fig. 5 .
Fig. 5. Overview of breathing patterns for guided breathing scenarios in both visible light and infrared.

Fig. 6 .Fig. 7 .
Fig. 6. Results from the different breathing scenarios in visible light conditions for both proposed methods.The spectrograms are calculated with a window-size of 8 seconds.

Fig. 8 .
Fig. 8.Both proposed rPPG-based methods show large agreement with the reference ECGderived respiratory signal.The snapshots on the right illustrate the four different viewpoint and distances included in the NICU dataset.

Fig. 9 .
Fig. 9. Correlation and Bland-Altman plots for spontaneous breathing.The black lines in the correlation plots indicate the linear relationship y=x.
of head and a part of the chest, (2) wide range top-view, (3) zoomed side-view of head and a part of the chest, and (4) wide range side-view.The medical ethical research committee at MMC approved the study and informed parental consents were obtained prior to data acquisition.The medical ethical research committee at Máxima Medical Center has reviewed the research proposal and considered that the rules laid down in the Medical Research involving Human Subjects Act (also known by its Dutch abbreviation WMO), do not apply to this research proposal.
2.4.2.Spontaneous breathingNon-contact respiration monitoring is particularly important and interesting for neonatal monitoring, because of the sensitive skin of newborns.Therefore, a dataset containing 20 videos (in different scenes) from 2 neonates in supine position is built for evaluation and demonstration with a total number of 588 breaths.The videos are recorded in the Neonatal Intensive Care Unit (NICU) of Máxima Medical Center (MMC, Eindhoven, The Netherlands) under visible light conditions, where the neonates are recorded from 4 different camera views: (1) zoomed top-view

Table 1 .
Results from the guided breathing scenarios in both visible and infrared lighting conditions.

Table 2 .
Results from spontaneous breathing scenarios recorded in a NICU under visible light conditions.