Analysis and improvement of non-contact SpO2 extraction using an RGB webcam

: Peripheral oxygen saturation (SpO2), a vital physiological sign employed in clinical care, is commonly obtained by using a contact pulse oximeter. With the rapid popularization of ordinary red-green-blue (RGB) webcams embedded in devices such as smartphones or laptops, there are broad application prospects for exploring techniques for non-contact SpO2 extraction using RGB webcams. However, many issues remain to be solved in the traditional webcam-based SpO2 extraction methods, such as the inherent low signal-to-noise ratio (SNR) of alternating current (AC) components of RGB signals and the potential defects in using RGB signals combination for SpO2 extraction. In this study, we conducted an in-depth examination of the existing research on webcam-based SpO2 extraction techniques, analyzed the practical problems in using them, and explored new ideas to solve the problems. Rather than roughly using the standard deviations (SD) of AC components for calculations, we performed blind source separation for AC components, and then used the energy coefficients retained in the mixed matrix to replace the variables required in the algorithm. Moreover, steady data was selected to compensate for the potential defects in using RGB signals combination. Through these efforts, the anti-noise capability of the algorithm was significantly enhanced, and the related defects were compensated for. The experimental results indicated that the proposed method produced reliable SpO2 estimation that could potentially—with further research—be used in real applications.


Introduction
Peripheral blood oxygen saturation (SpO2), which represents the concentration of oxygenated hemoglobin (HbO2) molecules with respect to the total hemoglobin (Hb) molecules in arterial blood, is a vital physiological sign used in clinical practice [1,2]. Currently, pulse oximeter is the primary approach used for non-invasive SpO2 measurements. This method, which works based on the Beer-Lambert law, takes advantage of the fact that HbO2 and deoxygenated Hb absorb incident light differently at different wavelengths. The pulse oximeter emits light of two different wavelengths (red and infrared light) on to the tissue surface (a finger or earlobe) and then measures the intensities of the reflected or transmitted light from the tissue. The SpO2 values are then calculated by the oximeter based on the intensities of the reflected or transmitted light [3]. Although pulse oximeter has many advantages, the device still needs to be pressed against the tissue surface continuously, easily causing discomfort, skin irritation, and infection in patients (especially in burn patients or newborns) [4].

Principle of SpO2 extraction based on IPPG
In recent years, imaging photoplethysmography (IPPG), a camera-based technique developed to measure physiological signs in a non-contact way, has gradually gained research interest. This technique collects video data of the naked skin surface and then reconstructs the physiological signs (typically heart rate (HR)) from the image frames. Owing to its remarkable advantages, IPPG has become a hot research topic in the non-contact physiological measurement field [5][6][7][8][9]. In IPPG technique, the video recorded by camera comprises a sequence of frames that contain the intensity of the light reflected from the skin. It is accepted that the intensity of the reflected light is interrelated to a certain part of the incident light that is absorbed by the tissues and the blood inside the arteries and capillaries after penetrating the skin [3,10,11]. Moreover, there is an intrinsic volume pulsation in arteries and capillaries in accordance with the cardiac cycle. Thus, this pulsation affects the absorption and is revealed in the reflected light. In general, the reflected light can be divided into pulsatile and non-pulsatile components, which are commonly considered to be alternating current (AC) and direct current (DC) components, respectively [11]. As shown in Fig. 1, for the entire reflectance component in PPG/IPPG, the AC components represent the variable absorption caused by volume pulsation in arterial blood vessels (∼0.5%), whereas the DC components represent absorption by the mean arterial (0.5∼1%), venous average (9∼10%) blood volumes and the static signals due to tissue absorption (skin, skeleton, muscle, lymph, etc.) (∼90%) [12]. Based on the principle of SpO2 extraction, the incident light is selected to have two special wavelengths, at which the absorption by HbO2 and deoxygenated Hb in the blood are similar and different, respectively [3]. Subsequently, SpO2 parameters can be derived from the AC and DC components generated from the reflected light. A number of teams have conducted IPPG studies to derive SpO2 based on special wavelengths combination (for example, the combination of 650nm and 520nm) with the aid of special equipment [13][14][15][16][17][18][19][20][21], including high-speed and high-precision CCD cameras, narrow-band filters, and stable LED light sources. Considering the complexity of the equipment employed, these methods may not offer significant advantages over the traditional oximeter, especially in terms of convenience. With the rapid popularization of intelligent consumer products, webcams or consumer-level cameras with Red-Green-Blue (RGB) three color pixel sensors, have been deeply integrated into our daily lives. Webcam-based IPPG techniques, with wide application prospects in health monitoring, have drawn much research attention [5][6][7]9]. Among these researches, HR monitoring has become increasingly mature in applications. However, there has been little progress in SpO2 extraction research [5][6][7][8][9]. The webcam-based SpO2 extraction research is still not mature enough for real-world application and faces many challenges.

Potential defects of traditional webcam-based SpO2 studies
Owing to its broad prospects, various studies have reported webcam or consumer-level camera for SpO2 extraction [22][23][24][25][26][27][28][29][30]. A few of these works are generally imitations of previous studies, which simply use two channels combination among the three-channel RGB signals yielded from RGB video-that is the Red_Blue (R_B) channels or occasionally Red_Green (R_G) channels-to replace the special wavelengths combination mentioned above-without in-depth exploration. The principle of these R_B channels studies could be briefly presented as follows, with the illustration in Fig. 2.  Fig. 2(b)), and DC Blue & DC Red are the mean values of the DC components of R_B channels in sliding-window (see Fig. 2(c)). By the sliding-window, the sequence of R was obtained. Synchronously, the SpO2 reference values were recorded by a commercial oximeter. Next, the empirical coefficients A and B in the prediction equation were estimated by linear fitting (see Eq. (2)). Finally, the SpO2 estimates could be achieved by the prediction equation. For convenience of follow-up analysis, we defined R AC = AC Red /AC Blue , R DC = DC Blue /DC Red , and R =R AC · R DC , then Eq. (1) was converted into Eq. (3) as shown below.
Besides, on the basis of the principle above, some teams tried to achieve SpO2 information by the aid of optimal algorithms. Guazzi et al. [26] introduced a Skin-Oxygen Photoplethysmographic Image Analysis (Sophia) method, which used an automated region of interest (ROI)-selection algorithm to determine changes in oxygen saturation. Alessandra et al. [27] explored a low-cost SpO2 measurement technique by employing the Eulerian video magnification method. Harnani et al. and Nakano et al. [28,29] tried to capture retinal blood vessel images using fundus cameras and attempted to determine the HR, respiration rate (RR), and SpO2 information from the images. Wang et al. [30] explored self-adaptive singular spectrum analysis algorithm-to obtain RGB signals with accurate pulse waves beneficial for the extraction of vital signs. In general, although a lot of efforts and attempts have been presented into webcam-based SpO2 extraction, many studies are still limited to the framework described in Eq. (1) and Eq. (2). Fundamentally, there are two primary problems pertinent to this research field, which cannot be ignored: (1) Poor signal-to-noise ratio (SNR) of AC components. The poor SNR is a common problem in IPPG research, since RGB signals are sensitive to irregular motion artifacts of facial regions, breath artifacts, and subtle changes in ambient light (e.g., light source intensity changes or reflector movements) [11]. These interference factors are reflected in the RGB waveforms. The components of PPG/IPPG shown in Fig. 1 reveal that the AC component which is equivalent to the blood volume pulse (BVP) signal caused by arterial pulses accounts for a small proportion of the entire reflected light and would be easily interfered with by noises/outliers. Figure 2(b) shows the approximate AC components derived from RGB signals by using 0.6-3 Hz band-pass (BP) filter. To mark the noises/outliers more clearly in the AC components, we used the Hampel identifier-a function in the MATLAB toolbox. As illustrated in Fig. 3, there are many noises/outliers distinguished by the Hampel identifier. Generally, the intensities of these noises/outliers exceed that of the AC components, which may interfere with the SpO2 estimates.
(2) Defects in using RGB signals combinations for SpO2 extraction. In earlier reports, the wavelengths of incident light sources and imaging equipment were selected intentionally, such as the special wavelengths combination and professional CCD cameras with special optical filters [13][14][15][16][17][18][19][20][21]. These wavelengths have good penetrability through human tissues-that is, they pass through the dermis and other vascular rich layers-with stable absorption coefficients for HbO2 or deoxygenated Hb near the spectral bandwidths. However, the RGB pixel sensors of webcam are specifically optimized for human visual effect, resulting in fundamental defects-the corresponding spectral bandwidths of the imaging light sources are too wide, and the absorption coefficients vary within the spectral bandwidth ranges, which do not confirm to the principle of SpO2 extraction [31]. Considering these deviations from the requirements, RGB signals combination-such as the commonly employed R_B channels-might be defective for use in SpO2 extraction algorithm.
Therefore, it is worth paying attention to these two problems. This paper focused on the problems associated with the traditional webcam-based SpO2 extraction methods using R_B channels combination. Through experimental analysis, we investigated the details and problems related to the algorithms that are normally ignored, and explored new ideas to overcome them.

Methods
Focuses on the inherent defects mentioned above, a new method was proposed to resolve the problems and obtain optimal SpO2 estimates. The equipment, materials, and experimental environment involved in the proposed method were selected based on ordinary application scenarios.

Materials and experimental paradigm
There were two ordinary webcams (a ThinkPad P1 built-in camera and an iPhone8 front-facing camera) adopted in the experiments. A Philips DB12 finger-clip oximeter was selected for recording of the reference values, and MATLAB2018a was used as the analysis software. Two ambient light conditions were used respectively in the experiments-one was sunlight and the other was fluorescent lamps. The experimental paradigm was presented as follows: 1. The subject wore an oximeter on his/her forefinger and sat approximately 0.3-0.5 m from the camera, ensuring that his/her face ROI was of an appropriate size (about two-thirds of the whole image in height) and that the display screen of the oximeter could be captured by the camera.
2. Once the recording began, the subject kept his/her head still and breathed evenly for about 10 s (the time was freely controlled by himself/herself). Next, the subject held his/her breath and paid attention to the oximeter values for at least 30 s, waiting for a decrease in SpO2. When reaching his/her breath-holding tolerance limit, the subject slowly resumed breathing until the values returned to normal, after which the experiment was completed. The whole procedure lasted for approximately 3-6 min.
3. In the experiment, if the subject's breath-holding reached tolerance limit, and the oximeter values still did not decrease, he/she could resume breathing and finish the procedure.
4. If the subject felt discomfort beyond his/her endurance during breath-holding, he/she can stop the experiment immediately and restart or cancel it after enough rest.
Considering that holding one's breath during the experiment has certain requirements in terms of cardiopulmonary function, eight healthy young people aged between 20-27 were chosen as the subjects. In the experiments, we recorded four different videos for each subject according to the four paired-combinations based on the two webcams and the two light conditions, thus a total of thirty-two videos for the eight subjects were collected. The experiments were approved by the Anhui University Ethical Committee and carried out under supervision. Before participating in the experiments, all subjects gave their informed consent in written form.

Principle
(1) Solutions to the problem of AC components contamination.
To overcome the problem of AC components being contaminated by noises/outliers, we used blind source separation (BSS) / independent component analysis (ICA), that could recover hidden source signals from observed signals relying on the statistical characteristics when both the sources and mixed models are unknown, to derive key SpO2 parameters from observed AC components. Combined with the above contents, the principle of BSS/ICA algorithm can be interpreted as follows. The observed AC components are expressed as represent potential source signals. It is assumed that there exists a linear instantaneous mixture such as: In Eq. (4), the elements in the linear mixed matrix A represent the energy distribution of the source signalss =[s 1 ,s 2 , In blind separation operation, there is a hypothesis thaty = Wx, where y is the target approximate to s, andW =A −1 is the inverse of the mixed matrix A. The essence of blind separation is to approach the convergence of W by iterative operation according to the statistical characteristics of x, and further to obtainy. Many IPPG studies have illustrated that the BSS/ICA algorithms are suitable for RGB signals separation [32][33][34]. In our previous studies, second-order blind identification (SOBI), a BSS method based on second-order statistics, has been shown to perform well in RGB signals processing [32]. Consequently, the SOBI algorithm was selected to process the three-channel observed AC components, and the mixed matrix was expressed using A SOBI instead of A. Figure 4 shows separation results on the observed AC components shown in Fig. 3 by using the SOBI algorithm. It could be seen in Fig. 4 that, among the three-channel source signals separated from the observed AC components, y 1 is the target BVP source, which has less noises/outliers than the observed AC components of R_B channels (see Fig. 3). Attributed to the energy normalization in process of BSS/ICA operation, the separated sources do not carry the original energy/amplitude information, which remains in the mixed matrix A SOBI as follows.
InA SOBI , the first column A SOBI (:, 1) = [a 11 ,a 21 ,a 31 ] T is the energy distribution vector of the BVP source y 1 in the observed AC components-named V BVP =A SOBI (:, 1). Among the three elements of the vector, V BVP (1) =a 11 and V BVP (3) =a 31 are the energy coefficients corresponding to the R_B channels. We used these two coefficients to replaceAC Red andAC Blue -to overcome the problem that AC components are prone to being disturbed by noises. The negative values that sometimes appear in V BVP are caused by the wave reversal of BVP in the blind separation, which does not affect the energy/amplitude analysis. Therefore, |V BVP (1)| and |V BVP (3)| were employed to replace AC Red and AC Blue respectively-that is, Eq. (3) was further converted into Eq. (6) as follows.
In addition, there is an inherent source permutation ambiguity in practical BSS/ICA separation. Thus, it is essential to identify the BVP source from the separation results, which is important for ensuring robustness of the algorithm. The existing literature usually uses simple methods such as directly specifying a fixed channel as the BVP source or identifying BVP source relying on the maximum peak of spectrum [33,34]. Considering the frequent RGB signals contamination by noises, it is difficult to achieve ideal recognition accuracy and stability by these methods. Based on practical experiences, a spectrum kurtosis combined with minimum HR-deviation method was designed for obtaining accurate BVP source identification. Relying on the detailed flow of the method illustrated in Fig. 5, the channel number k of the BVP source could be automatically selected from the separation results, thenV BVP could be derived from A SOBI -that is, V BVP =A SOBI (:, k).
(2) Solutions to the defects in R_B channels combination for SpO2 extraction.
To overcome the inherent defects in using the R_B channels combination, a section of steady data was selected to compensate for the defects. In the algorithm, a simple method was used to find approximate steady data. At the beginning of the experiment (breathe steadily), the SD of the R_channel data in the sliding-window was calculated, and the original RGB data in the sliding-window corresponding to the minimum value from the three consecutive SD values was taken as the steady data RGB steady . After 3 Hz low-pass (LP) filtering and smoothing, the mean value of the B_channel data in the RGB steady was recorded as one target parameter DC Blue_steady . Synchronously, the original RGB steady were filtered by 0.6-3 Hz BP-filter and processed by SOBI algorithm to obtain the mixed matrix A SOBI_steady . Then, the column vector V BVP_steady corresponding to the BVP was detected from A SOBI_steady by using the BVP identification method, and the absolute value of the element V BVP_steady (3) was recorded as the other target parameter. After that, these two target parameters were used to replace the |V BVP (3)|and DC Blue respectively, and Eq. (6) was eventually transformed into Eq. (7) to obtain optimal SpO2 parameters R AC , R DC and R. The flow described above was illustrated in Fig. 6.   Figure 7 shows the workflow of the proposed method for SpO2 extraction based on an RGB camera. The captured facial video is first converted into three-channel RGB signals. The commonly used RGB coherent pixel averaging method is used to generate the RGB signals from the ROIs selected in the forehead (with relatively small motion artifacts). Next, by the sliding-window (window-length: 10 s; sliding-step: 3 s), the sequences of R AC and R DC are calculated from the generated RGB signals, according to the flow chart in Fig. 6. Then, the sequence of R could be achieved by R =R AC · R DC . After that, the empirical coefficients A and B in the SpO2 prediction equation could be obtained by least-squares linear fitting between the R sequence and SpO2 reference values synchronously recorded by the finger-clip oximeter. It needs to be mentioned that, in the pre-processing module (the details are described in Fig. 6), for the steady data RGB steady and sliding-window data RGB window , we filtered them by 0.6-3 Hz BP-filter to obtain respective AC components, and synchronously processed them by 3 Hz LP-filter combined with a smooth filter in the MATLAB toolbox (the "smoothts" function based on the "exponential" method with period length "90") to obtain respective DC components. In addition, in the BVP-identification module, according to the detailed steps presented in Fig. 5, the BVP source could be automatically selected from the SOBI blind separation results, and the subject's HR values can also be extracted. Considering that the accuracy of BVP identification can be evaluated by the consistency between the HR estimates and the HR (oximeter) values, it is significant that the consistency could also be used to evaluate the robustness of R AC calculations which depend on the BVP identification.

Results
For the thirty-two videos collected in the experiments, we transformed them into RGB signals using ROI pixel averaging. Then, according to the work flows shown in Fig. 5, 6, the sequences of R AC , R DC were derived from the RGB signals based on the sliding-window. After that, the sequences of R =R AC · R DC were linearly fitted with the oximeter values to obtain the empirical coefficients A and B in prediction equation. Finally, the SpO2 estimates could be achieved by the equation SpO2 = A + B · R. Figure 8 shows the statistics of the calculation results on the experimental data, including: the curves of the SpO2 parameters R AC , R DC and R sequences (see Fig. 8(a)); the result of linear fitting between R sequences and oximeter values (Dataset 1 [35]) (see Fig. 8(b)); the 95% confidence ellipse of SpO2 estimation, combined with two statistical indexes-the root-mean-square error (RMSE) and the correlation coefficient (r) (see Fig. 8(c)); and the curves of the SpO2 estimates and oximeter values (see Fig. 8(d)). As presented in Fig. 8, wave-trends emerged in curves of the R AC , R DC and R sequences. Moreover, the R sequences maintain an appropriate linear fitting with the oximeter values, and the SpO2 estimates are basically consistent with the oximeter values in trend of curves.
As mentioned before, subject's HR can be extracted from RGB signals in BVP-identification module, and the accuracy of HR estimation could be used to evaluate the robustness of R AC calculations. Fig. 9 gives the comparative curves of the HR estimates and the HR (oximeter) values. It could be observed in the curves that HR estimates maintained an excellent consistency with HR (oximeter) values-that is, the BVP-identification module achieved a good performance in BVP identification for R AC calculations.
To evaluate the performance of the proposed method, data analysis was carried out on the calculation results on the thirty-two experimental data for the eight subjects. For each data, three statistical indexes-RMSE, r and the accuracy rate of HR estimation by BVP identification (BVP-HR) were examined. As listed in Table 1, the RMSE maintains a relatively stable range, and so does the r that represents linear dependence between SpO2 estimates and SpO2 (oximeter) values. In addition, the BVP-HR achieves an excellent level-close to 100% for most data.

Comparison
Based on the experimental data, a comparison was made between the proposed method and the traditional method using the R_B channels combination. In the comparison, the principle of SpO2 extraction was changed from the proposed Eq. (7) back to Eq. (3) for reproducing the traditional method. In addition, the HR values were also extracted for comparison, using the method widely used in the literature that estimated HR from spectrum peak of G-channel (by FFT) [33,34]. Figure 10 shows recalculation results on the experimental data shown in Fig. 8 based on the traditional method. Figure 11 shows the comparative curves of HR monitoring obtained by three approaches-oximeter, G-channel FFT and SOBI (BVP-identification). Like Table 1, data analysis was carried out on the recalculation results to evaluate the performance of the traditional R_B method, from respects of RMSE, r, and the accuracy rate of HR estimation by G-channel FFT (G-HR (%)). The results were listed in Table 2.  The recalculation in Fig. 10 shows that, the wave-trends mentioned in previous results (see Fig. 8(a)) were missing from curves of the R AC , R DC and R sequences in Fig. 10(a), resulting in an unsatisfactory linear regression model for the R sequence and oximeter values in Fig. 10(b), further leading to a significant difference between the SpO2 estimates and oximeter values in Fig. 10(c)-(d). Thus, it is unable to determine reliable SpO2 estimates from the ultimate SpO2 prediction curve. As revealed in Table 2, the values of RMSE and r indicate the low linear correlation between the SpO2 estimates and the corresponding oximeter values-that is, the poor reliability of the SpO2 linear regression equation. Besides, the accuracy rate of HR estimation by G-channel FFT is inferior to that of SOBI (BVP-identification) approach, according to the experimental results in Fig. 11 and Table 2.

Discussion
Through comparison on the experimental results, it is revealed that the traditional R_B channels combination method has an unsatisfactory performance in terms of accuracy and reliability of SpO2 estimation. Furthermore, the proposed method based on the SOBI algorithm and the steady data overcame the problems pertinent to the traditional method effectively. Nevertheless, some details need to be further discussed.
(1) Pre-processing for obtaining AC & DC components. According to the principle, the key optical information for SpO2 extraction is contained in the AC & DC components-commonly derived from RGB signals by filters. However, filters could remove some portion of the key optical information as well as the noise, which may affect the subsequent SpO2 estimation, when the filters are designed inappropriately. In the preprocessing module, for the RGB signals within the sliding-window, we filtered them by 0.6-3 Hz BP-filter to get the approximate AC components, and synchronously processed them by 3 Hz LP-filter combined with the smooth filter to obtain DC components. For AC component that could be seen as BVP component, redundancy was reserved in the parameter setting of the BP-filter based on experiences, in order to keep the key optical information contained in it as complete as possible. Moreover, there were high accuracies achieved in BVP recognition and HR estimation (see Fig. 9 and Table 1), indicating that there were seldom information damage or loss in the approximate AC components obtained by the BP-filter. While for DC components, the key optical information contained in it was the amplitudes of R_B channels waveforms, which can be properly preserved by the LP-filter and the smooth filter without damaging it. Based on the filters, the proposed method achieved a good performance in SpO2 estimation (see Table 1).
(2) Poor SNR of AC components in webcam-based SpO2 extraction. As an inherent thorny problem in webcam-based SpO2 research, the poor SNR of AC components had not been resolved appropriately. There were many noises/outliers discovered in the AC components of the R_B channels by using the Hampel identifier (see Fig. 3), which would cause disturbances in calculation of the SpO2 parameters. In the proposed method, SOBI algorithm combined with BVP-identification was used to extract the BVP source signal with a good SNR. Further, the energy coefficients maintained in the mixed matrix A SOBI and A SOBI_steady were explored to calculate the SpO2 parameters. Compared with the SD of the AC components (AC Red and AC Blue ) in traditional R_B method, the energy distribution coefficients (|V BVP (1)| and |V BVP_steady (3)|) from the vector V BVP and V BVP_steady could achieve better noise immunities. This is significant for SpO2 parameters extraction and could provide a promising approach for further research.
(3) Defects in R_B channels combinations. The red and blue light determined by RGB pixel sensors have inherent absorption characteristics for HbO2 and deoxygenated Hb.
With the fluctuation of blood oxygen contents, the AC and DC components of the RGB signals presented some remarkable features on the waveform envelopes-for example, the approximate spindles in the AC waveform envelopes (see Fig. 12(a)) and the obvious undulations appeared in the DC waveforms (see Fig. 12(b)). In the experiment belonging to Fig. 12, the subject held his breath for approximately 120 s, resulting in a decrease in HbO2 concentration and increase in deoxygenated Hb concentration. Then, attributed to the fact that the absorption coefficient of red light for deoxygenated Hb is greater that of HbO2, the amplitude of AC waveforms in R_channel was enhanced, with the emergence of the spindle in the enveloping lines (see Fig. 12(a)). Moreover, an undulation appeared in the DC waveforms in R_channel (see Fig. 12(b)). While for B_channel, according to the requirements for SpO2 extraction, the absorption coefficients of blue light for HbO2 and deoxygenated Hb should be similar-that is, the change of the relative concentration ratio of HbO2 and deoxygenated Hb in blood should not have an apparent influence on the amplitude of the AC and DC waveforms. However, contrary to expectations, there were unexpected spindle and undulation appeared in B_channel, which revealed the defect in R_B channels combination for SpO2 extraction-that is, the loss of wave-trends on curves of the R AC , R DC and R sequences (see Fig. 10(a)). Consequently, we used the steady data for compensation. In the actual clinical environment, especially for many chronic diseases, it is feasible to obtain the steady data. Nevertheless, this simple method can only compensate for the lost information to a certain extent. In general, the waveform features in R_channel (spindle and undulation) are the information carriers of SpO2, the retention of which is the key to SpO2 estimation and extended researches.

(4) Temporary inconsistency between SpO2 estimates and oximeter values in certain times.
In the data analysis of the experimental results, it could be observed that the curves of SpO2 estimates and oximeter values cannot achieve consistency in certain times. As shown in Fig. 13, the time point of SpO2 (oximeter) recovery was significantly later than that of the SpO2 (webcam-IPPG) recovery. There may be complicated reasons for these temporary inconsistencies. We gave the following views through the comparative analysis of the physiological signs' variations. When the SpO2 estimates began to rise (at about 150 s), the HR monitor data of the subject also rose synchronously (see Fig. 13). It might be speculated that, after a long period of breath holding, the HbO2 concentrations of the main organs have dropped sharply, then the human central nervous system (CNS) would control the heart to speed up blood supply to various organs-that is, the HR sped up. With the acceleration of HR, the blood flow volume in the blood vessels of organs increased. This emergency mechanism of CNS may bring a certain increase in HbO2 concentration and may cause transient disorders in the trend of SpO2 estimates. Furthermore, medical researchers have shown that there is a significant difference in blood vessel abundance between face and finger, which leads to the inherent difference in blood flow volume between them. Therefore, the complex mechanism of human CNS combined with the objective difference of blood flow volume in different organs may be one of the reasons for the temporary inconsistency between the SpO2 values estimated from facial video and that measured by finger-clip oximeter. In addition, there might be some undisclosed algorithms employed in the mechanisms of many ordinary commercial instruments. For example, the physiological monitoring data given by these products are always perfect without outliers, which is obviously processed by relevant adaptive smoothing filtering. These undisclosed algorithms will cover the transient disorders in the SpO2 estimates trend mentioned above and further affect the real-time measurement. Thus, it might have certain limitations in most IPPG researches for SpO2 extraction that commonly take ordinary commercial oximeter as the standard SpO2 reference. Or rather, the ordinary oximeter can only be used as an approximate reference to fit for SpO2 linear regression equation. In general, the problems mentioned above need to be further explored in multidisciplinary research fields. For example, we would consider introducing professional medical equipment to measure SpO2 references from the same location as IPPG, for replacing ordinary finger-clip oximeter. Although considerable work has been conducted in this study, there are still some shortcomings. First, the study could be extended to more types of webcam devices to verify their application feasibility. Unlike professional optical CCD devices, webcams may have their own special algorithms for video image processing, such as automatic light compensation and skin color optimization. All these factors would virtually aggravate noise interference for IPPG research, especially for SpO2 extraction that requires the quality of the RGB signals. Second, for the sake of personal safety, only healthy young people were selected to participate in these experiments. Hence, more researches and follow-up work are needed to develop a robust webcam-based SPO2 extraction method.

Conclusion
This paper focuses on the research of webcam-based SpO2 extraction. For the problems in the traditional methods, we developed a new method to resolve them, and achieved optimal SpO2 estimates. The problem of poor SNR in the AC components was covered appropriately by using the energy coefficients maintained in the SOBI mixed matrix. The inherent defects in using the R_B channels combination were overcome by applying steady data. Finally, some discoveries worth to be further explored, such as the spindles and temporary inconsistencies mentioned above, which can be treated as starting points for follow-up research.