Research on oxygen saturation based on statistical analyses

We propose indicators to measure blood oxygen saturation: 1. Mean, which reflects the central tendency of blood oxygen saturation; 2. Standard deviation, which reflects the dispersion of SpO2; 3. Hurst exponent, which indicates whether the time series is continuous or anti-continuous. Participants with nearly the same physiological characteristics except age are selected from all participants, and they are divided into female and male groups for visual analysis. We speculate that the SpO2 series model had a certain relationship with age. Subsequently, the quantitative method is used to conduct a stepwise regression analysis by using age, BMI, gender and smoking. Finally, we find that the mean of SpO2 is related to age.


Introduction
Since the outbreak of coronavirus COVID-19, many countries have invested a lot of resources in virus research. As we know, infection with the new coronavirus leads to decreasing blood oxygen saturation. SD1 and SD2 represent the short-term variability of SpO 2 and long-term variability respectively [1]. Harold Edwin Hurst, an English hydrographer, found the long-term memory of time series when researching the water level change of the Nile in 1951. To commemorate his discovery, later generations use the Hurst exponent to characterize the long-term memory of a time series [4]. There are some kinds of linear regression models. Nevertheless, some of them are complicated or the results aren't ideal.
Stepwise regression appeared in the 1960s.
Indicators such as SD1 and SD2 are introduced to describe the variability of oxygen. Then stepwise regression is established to analyze the physiological indexes of participants.
This work analyzes the variability of oxygen saturation and the relationship between age and oxygen saturation. We can provide our work to medical institutions so that patients can be treated better.

Data
We analyze the data of 36 participants, which are about the blood oxygen saturation and some biological information for about an hour. They can be seen in the appendix. Take 301116B as an example, his data is shown in Fig.1

Mean and standard deviation
Mean (M) and standard deviation (SD) reflect the central tendency and the dispersion of SpO 2 respectively. This work uses the arithmetic mean.
Because the change of blood oxygen saturation is complex, only using the linear method will ignore the instantaneous change of blood oxygen saturation [2]. By referencing a large number of data, we use the HRV nonlinear analysis method in clinical medicine to analyze the changes in blood oxygen saturation.
The standard deviation shows the variability of the data. To be specific, it can be divided into the standard deviation of the short-term variability (SD1) and the standard deviation of the long-term variability (SD2) [3].

Hurst exponent
The blood oxygen saturation data of 36 individuals are regarded as non-stationary time series. It is found that the time series of blood oxygen saturation data have a noise-like structure, and these data have a certain trend. To better detect the self-similarity of the data, and eliminate the trend components of their own evolution. It is necessary to find the mean of the whole series and subtract it from each series, so the noise-like time series are converted into the random walk time series [4]. Then the detrended fluctuation analysis is used to process the data, and the deviation series obtained is the fluctuation component [4].
When the SpO2 time series is calculated by DFA, the Hurst exponent is obtained. where H is Hurst exponent, the range is between 0 and 1 (excluding 0 and 1). It explains the relationship between the former trend and the latter trend of a time series, which is illustrated in Tab.1.
Tab.1 The relationship between the former trend and the latter trend.
H<0.5 H=0.5 H>0.5 Negative correlation No correlation Positive correlation

Successive regression
Successive regression analysis is used to analyze age, smoking, BMI, sex, and blood oxygen saturation. 1 x and 2 x can be defined as age and BMI respectively. Nevertheless, gender and smoking are not quantitative. We define gender as 3 x . 3 0, Because smoking has a great impact on the lungs, the more smoking, the greater the lung injury. To express the three degrees of smoking, smoking is defined as 4

Male x Female
x and 5 x . 4 0, ker 1, Then variables are introduced one by one and be tested. is about 0.50%, so the variability is not strong. SD1 is less than SD2, which means the long-term variability is stronger than the short-term variability. Hurst exponent is more than 0.5, which indicated that the former trend and the latter trend of SpO 2 are positively related. In other words, if SpO 2 increased before, it will have an increasing tendency. And if it decreased before, it will have a decreasing tendency. Besides, we can use the same methods to analyze other participants.

Variability analysis
In order to analyze the relationship and total variability between the mean of SpO2 and the standard deviation of SpO 2 , we can get Fig.2 by taking the average SpO2 of each participant as the x-coordinate and the standard deviation as the y-coordinate. -0.737 is the value of The Pearson correlation coefficient between the data, which is obtained by Matlab linear fitting. Therefore, there is a negative linear relationship between them, which shows that the overall variability is smaller at a higher SpO2 level [1]. Fig.2 The relationship between mean oxygen saturation level and total variability.

Visual analysis
By screening gender, BMI, smoking, and other information, we selected participants whose record names are 160217C, 090217B and 150317A. They are all females who are non-smokers. As well as their BMI is close, and the age span is large enough. They roughly match our ideal data information.  Fig.3 Comparison of female's oxygen saturation variability. According to their oxygen saturation sequence diagram Fig.3, we can find that 90217B, who is the youngest, has the highest fluctuation frequency and the largest fluctuation range, while 150317A, who is the oldest, has the lowest fluctuation frequency and the smallest fluctuation range.
For men, we selected participants with record names as 140317A, 301116A and 010317B. They are all male, non-smokers, BMI close, and have a large age range, which is roughly in line with our ideal data. Fig.4 Comparison of male's oxygen saturation variability. Drawing the fluctuation Fig.4 of their blood oxygen saturation data, we can find that, contrary to women, 010317B, who is the oldest, has the highest fluctuation frequency and the largest fluctuation range, while 140317A, who is the youngest, has the lowest fluctuation frequency and the smallest fluctuation range. Thus, we preliminarily believe that the SpO2 has a certain relationship with age, and maybe related to gender.

Successive regression
We perform regression analysis of age and average SpO2 and perform a t-test on it to obtain the corresponding p-value. The p-value is found to be less than 0.05, indicating that there is a significant 0.851 0.401 According to the standard of the steps above, the stepwise regression analysis of age and SpO2 standard deviation showed that there was no significant relationship. From this, we conclude that there is a significant relationship between age and the mean value of SpO2.

Conclusion
In conclusion, mean, standard deviation and Hurst exponent are used to describe SpO 2 . For 301116B, the mean of SpO 2 is at a healthy level. His variability of SpO 2 is low, and long-term variability is higher than short-term variability. Besides, the former trend and the latter trend of SpO 2 are positively related.
Visual analysis is used to make a preliminary judgment. Then the quantitative method is used to conduct a stepwise regression analysis by using age, BMI, gender and smoking. Finally, we find that the mean of SpO 2 is related to age.