Quantitative blood glucose detection influenced by various factors based on the fusion of photoacoustic temporal spectroscopy with deep convolutional neural networks

In order to efficiently and accurately monitor blood glucose concentration (BGC) synthetically influenced by various factors, quantitative blood glucose in vitro detection was studied using photoacoustic temporal spectroscopy (PTS) combined with a fusion deep neural network (fDNN). Meanwhile, a photoacoustic detection system influenced by five factors was set up, and 625 time-resolved photoacoustic signals of rabbit blood were collected under different influencing factors.In view of the sequence property for temporal signals, a dimension convolutional neural network (1DCNN) was established to extract features containing BGC. Through the parameters optimization and adjusting, the mean square error (MSE) of BGC was 0.51001 mmol/L for 125 testing sets. Then, due to the long-term dependence on temporal signals, a long short-term memory (LSTM) module was connected to enhance the prediction accuracy of BGC. With the optimal LSTM layers, the MSE of BGC decreased to 0.32104 mmol/L. To further improve prediction accuracy, a self-attention mechanism (SAM) module was coupled into and formed an fDNN model, i.e., 1DCNN-SAM-LSTM. The fDNN model not only combines the advantages of temporal expansion of 1DCNN and data long-term memory of LSTM, but also focuses on the learning of more important features of BGC. Comparison results show that the fDNN model outperforms the other six models. The determination coefficient of BGC for the testing set was 0.990, and the MSE reached 0.1432 mmol/L. Results demonstrate that PTS combined with 1DCNN-SAM-LSTM ensures higher accuracy of BGC under the synthetical influence of various factors, as well as greatly enhances the detection efficiency.


Introduction
Along with the rapid development of economic condition, the unhealthy dietary habits, disrupted lifestyles and high psychological stress result in the increasing probability and number of diabetes patients all over the world.Diabetes is an endocrine disorder disease and has become the third largest threats to human health.It is reported that there are over 500 million people globally suffering from diabetes and its complications.Even worse, the number of diabetes patients is expected to keep rising in the future [1].In recognition of present grave situation, the treatment of diabetes is paying increasing attention.As there are currently no specific drugs or medical methods to cure diabetes, the controlling of blood glucose concentration (BGC) via medication is the primary method, therefore, the rapid and accuracy monitoring of BGC is very crucial.Traditional methods involve invasive procedures such as finger-stick or venous blood sampling, causing trauma to tissues and imposing a significant physical and psychological burden, even a risk of secondary infection.To avoid the drawbacks of traditional invasive methods, the development of high-precision non-invasive monitoring methods has already become the goal of blood glucose monitoring.Among the non-invasive methods, photoacoustic spectroscopy (PAS) has already attracted more and more attentions due to the advantages of pure optics and pure ultrasound.It can overcome the interference of scattered light in tissues and improve monitoring accuracy because of the photo-induced ultrasound mode instead of photon detection [2].
To date, some scholars have researched photoacoustic detection of blood glucose, for example, Pai [3] constructed an embedded photoacoustic measurement instrument to measure the blood glucose at 905 nm, which provided good penetration depth, measurement portability and high-speed acquisition.Aloraynan [4] developed a PAS blood glucose monitoring system using a single-wavelength quantum cascade laser at 1080 cm−1.Then, a PAS blood glucose detection method based on dual quantum cascade lasers was proposed [5], which eliminated the interference from moisture and other substances, resulting in the improvement of sensitivity and signal-to-noise ratio.Tanaka [6] employed differential continuous wave PAS to study non-invasive blood glucose detection both in vivo and in vitro.The measurement results were compared with invasive blood glucose sensor measurements for healthy subjects, and the good consistency trends in BGC were obtained.Maeno [7] used mid-infrared light-induced PAS to analyze blood component.Meanwhile, a mini photoacoustic cell was designed, and the gelatin phantoms with varying glucose concentrations were used to verify the approximate linear relationship between photoacoustic spectra and absorption spectra.Cano [8] employed PAS to study the relationship between glycemia and blood photoacoustic peak-to-peak spectra during 10 weeks, and the phase-resolved method was utilized to obtain the evolution of the optical absorption related to the hyperglycemia at 450 nm.Zhang [9] proposed a novel "guide star" assisted photoacoustic method to noninvasively measure glucose, the sensitivity and accuracy of measuring glucose concentration was improved through optimizing optical path and combination of optical absorption and ultrasonic velocity.Long [10] proposed a blood glucose photoacoustic detection algorithm based on Teager-Kaiser main energy to overcome noise and medium interference and achieved a high detection accuracy for a simulated human skin and blood.Shaikh [11] developed a novel compact and accurate glucose concentration measurement system using the photoacoustic near infrared spectroscopy, and a good correlation between the glucose concentration and the electrical signal was obtained.Chen [12] proposed a kind of bioprobe evaluation method to obtain a high florescence quantum yield based on measurements of PAS combined with fluorescence.Ren [13] studied the photoacoustic glucose detection using optical parametric oscillator (OPO) pulsed laser induced PAS, and determined several characteristic wavelength of glucose and attained the root-mean-square error (RMSE) of glucose concentration of 10 mg/dl.However, the photoacoustic blood glucose detection faces the challenges about the reliability and consistency due to sensitivity to environmental temperature and complicated measure conditions, especially for the in vivo blood glucose monitoring.Up to now, some scholars have already investigated the photoacoustic detection of blood glucose impacted by some influencing factors.Sim [14] studied the photoacoustic detection of blood glucose by obtaining the microscopic spatial information of skin and achieved the reliable prediction of the BGC.Besides, their results showed that in vivo spectroscopic glucose monitoring is sensitive to the skin condition with significant inhomogeneities and the probing area.Yang [15] utilized 1535 nm pulsed laser to excite photoacoustic signal in glucose solution and explored the influence of different glucose concentration on photoacoustic signal.At the same time, the external factors affecting the photoacoustic signal, such as laser energy, liquid temperature and detection distance were explored, and a calibration method was also proposed.Ren [16,17] investigated the single factor and multiple factors influences of blood glucose photoacoustic detection for the glucose solution and animal whole blood.The positive relationship between the photoacoustic intensity and the laser energy, temperature and glucose concentration were obtained, as well as the negative relationship between the photoacoustic intensity and the flow velocity and detection distance.
Tao [18] studied the relationship of photoacoustic amplitude with temperature under different glucose concentrations and light intensities.At the same time, temperature characteristics of the photoacoustic signals were investigated based on the model of the temperature-dependent photoacoustic intensity for glucose solution.
In general, the influence mechanisms and relationships between the blood glucose photoacoustic intensity and influence factors are nonlinear and extremely complex.It is difficult for a simple linear model to obtain the accuracy prediction results of BGC under the complicated environment and situation, especially in vivo photoacoustic monitoring.For example, Long [10] employed a first-order linear regression model to describe the relationship between glucose concentration and time-frequency features, but the performance was degraded because the model did not fit nonlinear relationships well in actual data.With the rapid development of artificial intelligent (AI) technologies, some sophisticated problems such as natural language processing, machine vision, pattern recognition and algorithm optimization can be well solved by using AI models.Except for the traditional multivariate data analysis model, some AI models including machine learning (ML), artificial neural networks (ANN) and deep neural network have already been used to predict the BGC.Aloraynan [4] utilized an ensemble learning method, i.e., random subspace sampling combined with multiple individual classifiers such as K nearest neighbors (KNN), decision tree (DT) and support vector machine (SVM) to predict glucose concentration categories from raw photoacoustic signals.The model achieved a prediction accuracy of 90.4%.Then, an integrated machine learning classifier model was developed to predict glucose levels in the non-invasive photoacoustic blood glucose detection of skin samples based on dual quantum cascade lasers [5].The model achieved a prediction accuracy of 96.7%.Zhang [8] employed multivariate data analysis methods such as principal component regression (PCR) and partial least squares regression (PLSR) to extract the features most correlated with glucose concentration and establish the prediction models of BGC.Han [19] performed the noninvasive blood glucose detection based on near-infrared spectroscopy combined with linear PLSR and the nonlinear stacked auto-encoder deep neural network.Ren [20] proposed PAS combined with a kind of improved quantum particle swarm optimized wavelet neural network (QPSO-WNN) to quantitatively in vitro detection of diabetes.Under the optimal parameters of improved QPSO-WNN algorithm, the MSE of BGC reached 0.3088 mmol/L.In addition, a QPSO-optimized WNN algorithm was proposed to ensure the high accuracy classification of diabetes [21].For continuous blood glucose prediction, Mehrad [22] presented a DNN model, i.e., conventional neural network combined with long-short term memory (CNN-LSTM), to predict blood glucose levels for type 1 diabetes patients, and assessed the model's accuracy and clinical acceptability in different time.Besides, AI has also been applied into the quantitative and qualitative analysis of blood glucose based on other detection technologies.For example, Li [23] employed a fusion algorithm of density clustering and CNN to classify three blood glucose ranges based on ECG non-invasive monitoring technology.Meanwhile, a gradient-weighted class activation mapping method was used to visualize ECG signals.Pal [24] utilized ML and deep neural network (DNN) to classify blood plasma glucose levels based on non-contact speckle method.Lee [25] proposed a high-frequency ultrasound approach to diagnose diabetes, and a classification accuracy of 98% was achieved by using CNN model.
As we know, the human body is a complex and dynamic biological system with individual differences.Photoacoustic detection of blood glucose is unavoidably influenced by other tissues or factors.Moreover, the influence is usually intricate when the various factors simultaneously exists in practice.However, in many previous studies, few scholars considered the impacts of factors such as system parameters, skin humidity, oils and temperature.At the same time, the researches about the DNN applied into the quantitative and qualitative photoacoustic measurement of BGC are also few, especially for the synthetical influence of various factors.In this work, the photoacoustic detection of blood glucose synthetically influenced by various factors was studied, a set of photoacoustic detection system of blood glucose was built, and five different factors, i.e., laser energy, concentration, temperature, flow rate, and detection distance were specifically considered.The time-resolved photoacoustic signals of animal whole blood samples were obtained with varying influence factors.In the previous researches [26][27][28][29], the photoacoustic peak-to-peak spectra in a certain waveband were used as the input data of model, and they should be obtained at each wavelength.However, the relying solely on peak-to-peak values brings the challenge to the accurate reflection of glucose content in the measured blood samples and the detection efficiency.When the response waveband of samples is large, it takes a great amount of time on the collection of photoacoustic peak-to-peak spectra, which results in too low detection efficiency.To improve the detection efficiency of BGC, from the perspective of the time-resolved signals in photoacoustic detection, sequence data typically exhibits significant temporal dependencies along the time dimension, the photoacoustic temporal spectroscopy (PTS) method was employed.Moreover, to ensure the high accuracy of predicting BGC, we established a supervised fDNN model, i.e., 1DCNN-SAM-LSTM model to train the photoacoustic temporal signals of 500 blood samples, the BGC of 125 independent testing set samples were predicted.In this work, the temporal photoacoustic signals of blood samples were directly utilized as the input data of fDNN model.In the fDNN model, multiple conventional layers of 1DCNN module was employed to extract the change features of BGC from the temporal photoacoustic signals.Then, to improve quantitative prediction performance of BGC, the long short-term memory (LSTM) module was combined into 1DCNN due to the characteristics of sequence data and long-term dependencies for the photoacoustic temporal signals.To further enhance the prediction accuracy of BGC, a self-attention mechanism (SAM) module was coupled to more delicately extract the feature information reflecting varying of BGC.Therefore, the fDNN can fully make use of the advantages of 1DCNN suitable for handling data with spatial features and LSTM well-suited for processing data with extended time features and sequential characteristics.Additionally, the SAM was coupled to enhance the prediction accuracy of BGC.To verify the BGC prediction performance of fDNN model with the synthetical influences of various factors, the determination coefficient (R 2 ) and mean square error (MSE) were utilized as two evaluation indexes, and compared with other models.
There are three aims in this work.The first one is to propose a kind of high efficiency photoacoustic detection way based on PTS technology to monitor the BGC under the synthetical influences of various factors.The second one is to establish a 1DCNN model with high accuracy of predicting BGC via the temporal photoacoustic signals of blood glucose.The third one is to improve BGC prediction accuracy by structuring a fDNN model and parameters optimization.

Photoacoustic theory
The fundamental photoacoustic mechanism involves the interaction of photo-induced ultrasound and thermal expansion.When a tissue is irradiated by modulating short-duration (nanosecond level) pulse laser beams, a portion of irradiated laser is absorbed by tissue, and energy is deposited and released in the irradiated area, causing a rapid temperature fluctuation in tissue.Then, it results in the rapid volume expansion and contraction of tissue, ultimately forms mechanical waves, known as photoacoustic signals.A sensitive ultrasound transducer (UT) can be used to capture the photoacoustic signals on the tissue surface.The amplitude of photoacoustic signals can be expressed by a empirical formula [30]: where p is photoacoustic intensity; k is proportion coefficient; E is output energy of laser; µ α is optical absorption coefficient; β is volume expansion coefficient, C p is specific heat capacity, v is acoustic speed in tissue, R is radius of photoacoustic source, r is distance between the UT and the photoacoustic source.In Eq. ( 1), n, m, and l are the constant, generally, 0.5<n<2, 0<m<1, 0<l<2.At a specific wavelength, the amplitude of photoacoustic signals can be visualized as a feature reflecting a certain information.Although the amplitude-based measurement methods sometimes perform well under the favorable signal-to-noise ratio (SNR) conditions, this approach is susceptible to external interference such as acoustic noises from the medium, thermal noises from the transducer and electrical noises from the circuit, easily results in data deviation.However, the photoacoustic temporal signals not only eliminate the susceptibility to the external interference, but also contain information dependent on chemical composition, such as viscosity [31,32], thermal properties [33,34], sound propagation speed [35] and acoustic impedance [36].Using the photoacoustic temporal signals as the input of blood glucose detection can effectively leverage this information.More importantly, compared with photoacoustic amplitude, the temporal photoacoustic signals not only present the amplitude information, but also contain more other information, such as the characteristic peak time shift and waveform morphology change, which are all related with the components difference.Especially for the photoacoustic detection of blood glucose with the synthetical influences of various factors, the amplitude, characteristic peak time and waveform morphology will be changed when one or more influencing factors change.However, the photoacoustic amplitude alone cannot reflect these information changes.Therefore, the photoacoustic temporal signals were employed as the research data in this work.In theory, the photoacoustic temporal signals can be represented as a damped sinusoidal signal, and its shape depends on the target density, viscosity, and sound speed.The specific expression [37] of temporal photoacoustic signals is given as follows: where k is determined by the total initial condition and is related to absorption (µ = εc) and the Gruneisen parameter Γ [38].v is the acoustic speed.a, η, ξ and ρ represent the propagation phase constant, shear viscosity, bulk viscosity and density, respectively.From Eq. ( 2), it can be known that the temporal photoacoustic signals is like a sinusoidal oscillation wave with an exponential decay in amplitude.Moreover, the period or frequency of sinusoidal oscillation wave and the exponential decay of amplitude are all dependent on the inherent attributes of tissue, such as propagation phase constant (a), shear viscosity (η), bulk viscosity (ξ) and density (ρ).In general, the varying of components and concentration in a tissue will result in the attributes change, and finally causes a series of changes in the photoacoustic amplitude [26], wave shape, oscillation period, and characteristic peak time position of temporal photoacoustic signals.In addition, due to the non-Newtonian characteristic of blood, the varies of flow rate will change the shear viscosity (η), bulk viscosity (ξ) and Gruneisen parameter (Γ) of blood [31], which impacts the photoacoustic amplitude and temporal signals according to Eq. ( 2).Therefore, the photoacoustic signal can be influenced by some factors including laser energy, temperature, concentration, flow rate and detection distance, etc.Through analyzing temporal photoacoustic signals and using relevant algorithms, it is possible to more accurately extract internal component information.

1DCNN
Up to now, CNN model and its variants have already been utilized in many fields such as computer vision, natural language processing, pattern recognition, imaging and speech processing [39][40][41][42].The main characteristic of CNN is the automatic extraction of features from input data through the operations of convolution layers and pooling layers, followed by fully connected layers to achieve classification or regression.In general, CNN is primarily used for the feature recognition in two-dimensional images.As same as the basic structure of CNN, 1DCNN [43] involves a series of convolution layers and pooling layers, ultimately producing results through fully connected layers.However, unlike CNN, 1DCNN focuses on the processing of one dimension data, for example, one dimensional spectra and the time series data.Due to structural similarities, 1DCNN also has the advantages of rapidly extracting features and translation invariance of data features.In view of the characteristic of the time series data for the temporal photoacoustic signals, 1DCNN was employed to establish the quantitative model to predict the BGC with synthetical impacts of various factors.
In 1DCNN, since the convolution kernel is one-dimensional and the larger kernels do not introduce excessive parameters and computations, the model with wider convolution kernels were employed to obtain a larger receptive field and comprehensively extract sequence features.The structure of 1DCNN model is illustrated in Fig. 1.In this work, the collected temporal photoacoustic signals were directly input into the 1DCNN model.Multiple convolution layers and pooling layers in 1DCNN were utilized to extract features from each subset, with the convolution kernel dimensions matching the input vector dimensions.This kind of design maintained consistent model computational complexity even with larger convolution kernels.In here, the construction of 1DCNN model was based on the AlexNet model [44], with adjustments made to the number of convolution-pooling layers, as well as the number and size of convolution kernels.and detection distance, etc.Through analyzing temporal photoacoustic signals and using relevant algorithms, it is possible to more accurately extract internal component information.

1DCNN
Up to now, CNN model and its variants have already been utilized in many fields such as computer vision, natural language processing, pattern recognition, imaging and speech processing [39][40][41][42].The main characteristic of CNN is the automatic extraction of features from input data through the operations of convolution layers and pooling layers, followed by fully connected layers to achieve classification or regression.In general, CNN is primarily used for the feature recognition in two-dimensional images.As same as the basic structure of CNN, 1DCNN [43] involves a series of convolution layers and pooling layers, ultimately producing results through fully connected layers.However, unlike CNN, 1DCNN focuses on the processing of one dimension data, for example, one dimensional spectra and the time series data.Due to structural similarities, 1DCNN also has the advantages of rapidly extracting features and translation invariance of data features.In view of the characteristic of the time series data for the temporal photoacoustic signals, 1DCNN was employed to establish the quantitative model to predict the BGC with synthetical impacts of various factors.
In 1DCNN, since the convolution kernel is one-dimensional and the larger kernels do not introduce excessive parameters and computations, the model with wider convolution kernels were employed to obtain a larger receptive field and comprehensively extract sequence features.The structure of 1DCNN model is illustrated in Fig. 1.In this work, the collected temporal photoacoustic signals were directly input into the 1DCNN model.Multiple convolution layers and pooling layers in 1DCNN were utilized to extract features from each subset, with the convolution kernel dimensions matching the input vector dimensions.This kind of design maintained consistent model computational complexity even with larger convolution kernels.In here, the construction of 1DCNN model was based on the AlexNet model [44], with adjustments made to the number of convolution-pooling layers, as well as the number and size of convolution kernels.

LSTM
In general, 1DCNN models exhibit some limits in learning relationships between time series data.Although recurrent neural networks (RNN) [45] have a cyclic structure that allows information to pass internally, making them suitable for handling sequential data like time

LSTM
In general, 1DCNN models exhibit some limits in learning relationships between time series data.Although recurrent neural networks (RNN) [45] have a cyclic structure that allows information to pass internally, making them suitable for handling sequential data like time series data and natural language text, RNN exists some drawbacks include the lack of dependencies between long-term data, gradient disappearance and explosion, precision loss caused by excessively long back-propagation chains.To solve these problems, LSTM was constructed and widely applied in many fields [46][47][48].Therefore, to improve the quantitative prediction performance of BGC, the LSTM module was combined into 1DCNN in this work.
Different from RNN, LSTM contains a structure called memory cell, which specifically designed for storing and transmitting information.The design of this memory cell enables effective information transmission when dealing with long sequences, thereby assisting in capturing complex dependencies between data.The core structure of LSTM cell includes three gates: input gate, forget gate and output gate, as shown in Fig. 2.These gates play a crucial role in the information transmission process, selectively retaining or forgetting information, facilitating effective modeling of long sequences.series data and natural language text, RNN exists some drawbacks include the lack of dependencies between long-term data, gradient disappearance and explosion, precision loss caused by excessively long back-propagation chains.To solve these problems, LSTM was constructed and widely applied in many fields [46][47][48].Therefore, to improve the quantitative prediction performance of BGC, the LSTM module was combined into 1DCNN in this work.The details of calculations for each gate in the LSTM structure are described below.The activation value it for the input gate is calculated using the following Eq.( 3): where t x is the input at time step t in the input sequence, 1  t h is the hidden state from the previous time step.ii W and hi W are the weight matrices for the input gate, and ii b and hi b are the bias terms for the input gate.The function  represents the Sigmoid activation function, which outputs values between 0 and 1.This allows the gate to learn the degree to which it should retain or forget the input information.
The activation value t f for the forget gate is calculated using the Eq.( 4): ) ( where tanh() is the hyperbolic tangent activation function.
The activation value t O for the output gate is calculated using Eq.( 6): ) ( To update the memory cell t C and the hidden state t h , the update formulas are given as follows: The details of calculations for each gate in the LSTM structure are described below.The activation value it for the input gate is calculated using the following Eq.( 3): where x t is the input at time step t in the input sequence, h t−1 is the hidden state from the previous time step.W ii and W hi are the weight matrices for the input gate, and b ii and b hi are the bias terms for the input gate.The function σ represents the Sigmoid activation function, which outputs values between 0 and 1.This allows the gate to learn the degree to which it should retain or forget the input information.
The activation value f t for the forget gate is calculated using the Eq. ( 4): where the symbols W if , W hf , b if and b hf in this context have similar meanings to those in the input gate formula mentioned in Eq. ( 3).The calculation formula for the new value Ct of the memory cell is as follows: where tanh() is the hyperbolic tangent activation function.
The activation value O t for the output gate is calculated using Eq. ( 6): To update the memory cell C t and the hidden state h t , the update formulas are given as follows: These aforementioned equations describe the basic computation process of LSTM, where the input gate, forget gate, and output gate are responsible for controlling the acceptance of new input, forgetting of previous memory and decision-making for output, respectively.These gates operate through linear combinations of weights and inputs, processed through activation functions such as Sigmoid or hyperbolic tangent.The design of these gates enables LSTM to effectively handle long sequences and maintain the flow of gradients during training.However, when input sequences are excessively long, LSTM may still miss crucial information.Therefore, relying on the feature extraction capabilities of CNN, some unimportant information can be filtered out and the overall sequence dimensions can be reduced, thereby the accuracy of the model predictions can be improved.

Experimental system
An in vitro photoacoustic detection system of blood glucose influenced by various factors is illustrated in Fig. 3

Experimental results
To study the influence of various factors, such as temperature, concentration, laser energy, blood flow rate, and detection distance on blood glucose photoacoustic detection, five gradients for each of these factors were set, which are given as follows: (1)Temperature (T): To cover all possible scenarios of human body temperature, the temperature range was set from 34 to 42°C with 2°C interval.
(2)Concentration (c): In experiments, rabbit whole blood with different BGC levels (Yikang Biotech.Co., China) were employed as the experimental samples.To determine the severity and type of diabetes in patients, the concentration range was prepared from 2 to 14 mmol/L with 3 mmol/L interval.According to the clinical standards, 2 mmol/L was classified as hypoglycemia, 5 mmol/L as health status, 8 mmol/L as mild diabetes, 11 mmol/L as moderate diabetes, and 14 mmol/L as severe diabetes.
(3)Laser energy (E): To ensure that not only the photoacoustic signals are enable to be generated, but also avoid saturation distorted, as well as to conveniently observe the impact of laser energy, the laser output energy range was set to 0.15 mJ, 0.26 mJ, 0.37 mJ, 0.49 mJ, and 0.62 mJ, respectively.
(4)Flow velocity (v): To observe the impact of flow rate, the flow rate range was set from 0.057 m/s to 0.219 m/s with 0.04 m/s interval.
(5)Detection distance (D): To observe the impact of detection distance, the range was set To simulate the blood flow in human body, a blood circulation system was custom-built.It consists of a pump, a beaker and silicone microtubes.In the experiments, the circulation system facilitates the substitution of blood samples with varying glucose concentrations.The blood can be heated using a heater, and the temperature is regulated by a temperature controller.The flow velocity of blood can be adjusted by controlling the rotational speed of pump.The detection distance between the UT and the blood sample can be adjusted by regulating the step distance of stepper motor in the vertical direction.The captured time-resolved photoacoustic signals were amplified by an amplifier (ATA-5620, Aigtek Co., China) with a gain of 60 dB, followed by noise filtering via a low-pass filter (BLP-7-75+, Mini-Circuits Co., USA) with the cut-off frequency of 7 MHz.A digital oscilloscope (54642D, Agilent Co., USA) with bandwidth of 500 MHz and acquisition rate of 5 Gs/s was employed to collect and display digital signals.Finally, the digital data was transmitted into the computer via an GPIB-I/O card (GPIB-USB-HS, NI Co., USA).In experiments, 625 rabbit whole blood (Yikang Biotech.Ltd., China) were utilized as the experimental samples.The experimental temperature was controlled at 22 ± 0.5 °C.

Experimental results
To study the influence of various factors, such as temperature, concentration, laser energy, blood flow rate, and detection distance on blood glucose photoacoustic detection, five gradients for each of these factors were set, which are given as follows: (1) Temperature (T): To cover all possible scenarios of human body temperature, the temperature range was set from 34 to 42°C with 2°C interval.
(2) Concentration (c): In experiments, rabbit whole blood with different BGC levels (Yikang Biotech.Co., China) were employed as the experimental samples.To determine the severity and type of diabetes in patients, the concentration range was prepared from 2 to 14 mmol/L with 3 mmol/L interval.According to the clinical standards, 2 mmol/L was classified as hypoglycemia, 5 mmol/L as health status, 8 mmol/L as mild diabetes, 11 mmol/L as moderate diabetes, and 14 mmol/L as severe diabetes.
(3) Laser energy (E): To ensure that not only the photoacoustic signals are enable to be generated, but also avoid saturation distorted, as well as to conveniently observe the impact of laser energy, the laser output energy range was set to 0.15 mJ, 0.26 mJ, 0.37 mJ, 0.49 mJ, and 0.62 mJ, respectively.
(4) Flow velocity (v): To observe the impact of flow rate, the flow rate range was set from 0.057 m/s to 0.219 m/s with 0.04 m/s interval.
(5) Detection distance (D): To observe the impact of detection distance, the range was set from 12.12 mm to 16.02 mm with approximately 1 mm interval.
In experiments, to acquire the accurate time-resolved photoacoustic signals, the parameters of experimental system and conditions should be stable, for example, the excitation energy of pulsed laser, ambient temperature, and the location of focused laser spot.In addition, the time-resolved photoacoustic signals of blood samples were averaged 128 times, and the experiments of each blood sample were performed three times.The averaged time-resolved photoacoustic signals of blood with the data length of 1000 under the single and synthetical influences of different factors are shown in Fig. 4(a)-(g), respectively.
From the results about influence of single factor shown in Fig. 4(a)-(e), it can be seen that the time-resolved photoacoustic signals of blood is varied for different influence factors.The general change trend can be obtained, i.e., the photoacoustic amplitude increases with the laser energy, temperature and concentration, and decreases with the flow velocity and detection distance.However, in Fig. 4(a)-(e), except for the amplitude, the characteristic peak time and waveform morphology changes of photoacoustic signals are also enable to reflect the impacts of multiple influence factors.That is, if only the amplitude is used to predict the BGC under the synthetical impacts of various factors, many useful information will be neglected, which will not be conducive to ensuring the predicting accuracy of BGC.At the same time, from Fig. 4(f) and (g), it can be seen that under the synthetical influences of various factors, the influences of multiple factors on the photoacoustic signals are complicated, the amplitude change is not always caused by the concentration, the influences of some factors on the photoacoustic signals are even stronger than that of the concentration.For example, when the laser energy is increased, the photoacoustic amplitude is also increased although the concentration is decreased.When the detection distance is increased, the photoacoustic amplitude will be decreased though the concentration is increased.Therefore, in order to predict BGC more comprehensively and accurately under the synthetical influences of various factors, the temporal photoacoustic signals were employed as the input data of quantitative prediction models in this work.

Results of the 1DCNN model
In this study, the collected 625 rabbit whole blood samples were divided into the training set and testing set in 4:1 ratio, that is, 500 samples were randomly selected as the training set, remaining 125 samples were were used as the testing set.As we know, adjusting structure and parameters of 1DCNN is a crucial task in practical applications of deep learning because of directly impacting the predictive performance and generalization ability of the model.However, it is a very complex task, and the required time is proportional to the complexity of model.Currently, there are few detailed or efficient methods proposed for parameter tuning, often relying on experiments and experience to refine network structures and parameters.
To obtain the optimal network depth of 1DCNN, eight kinds of 1DCNN models were designed with different numbers of convolution and pooling layers.Based on the input temporal photoacoustic signals, the mean square errors (MSE) values of BGC were computed for these eight models, which are shown in Fig. 5(a).The MSE value can be computed according to the Eq. ( 9), i.e., where y i and ŷi are the original data and the predicted data, respectively.N is the number of samples.From Fig. 5(a), it can be observed that as the number of convolution-pooling layers increases from 1 (i.e., Model-1) to 4 (i.e., Model-4), the feature extraction capability of 1DCNN model gradually strengthens, and the MSE on the testing set gradually decreases.Although the MSE of testing set for the 1DCNN with 5 convolution-pooling layers (i.e., Model-5) is higher than that of 1DCNN with 4 convolution-pooling layers (i.e., Model-4), the MSE of testing set for 1DCNN with 6 convolution-pooling layers (i.e., Model-6) further decreases.Moreover, after Model-6, the MSE change becomes relatively flat.To reduce the complexity of the network structure and training time of 1DCNN model, Model-6 was selected as the structure of 1DCNN model.The structure and parameters of 1DCNN model was given in Table 1.
Based on 1DCNN model with 6 convolution-pooling layers, the impact of the different learning rates on the MSE of BGC on the testing set was compared.At the same time, the learning rate decay strategy was adopted.At the beginning, the learning rate was set to 0.001, then reduced  the learning rate by multiplying 0.5 every 100 iteration times.To determine the optimal iteration times in training process, the effect of iteration times on the MSE value of BGC based on 1DCNN was investigated.From Fig. 5(b), it can be known that with the increase of iteration times, the training time of 1DCNN linearly increases.When the iteration times is 500, the MSE value of BGC for the testing set samples is least.Therefore, the maximum number of iteration times was 500.Under these parameters, the MSE of BGC on the training set for the 1DCNN model is 0.038001 mmol/L, and the MSE of BGC on the testing set is 0.51001 mmol/L.The clarke error grid analysis graph of BGC for the testing set samples based on 1DCNN model is presented in Fig. 6.MSE value of BGC for the testing set samples is least.Therefore, the maximum number of iteration times was 500.Under these parameters, the MSE of BGC on the training set for the 1DCNN model is 0.038001 mmol/L, and the MSE of BGC on the testing set is 0.51001 mmol/L.The clarke error grid analysis graph of BGC for the testing set samples based on 1DCNN model is presented in Fig. 6.
Fig. 6.Clarke error grid analysis graph of BGC for the testing set samples based on 1DCNN model.From Fig. 6, it can be found that there are the BGC values of three samples were obviously misjudged for 125 testing set samples based on 1DCNN, that is, one sample that was originally hypoglycemic was misjudged as mild diabetes, and two samples that was originally health status were mis-predicted into the mild diabetes and the moderate diabetes, respectively.At the same time, there are one sample that was originally health status can be most likely judged into the hypoglycemic.

Results of 1DCNN-LSTM model
To decrease the MSE of BGC for testing set, based on 1DCNN model with 6 convolutionpooling layers, the LSTM was incorporated into the 1DCNN model.To determine the optimal layer of LSTM, the effects of different layers on MSE of BGC for training set and testing set were investigated and compared in the 1DCNN-LSTM model.The MSE comparison results are shown in Fig. 7, where the red, blue, black, and green lines represent 1, 2, 3, and 4 layers of LSTM network structures, and the dashed and solid lines represent the MSE values on the training set and testing set samples, respectively.From Fig. 6, it can be found that there are the BGC values of three samples were obviously misjudged for 125 testing set samples based on 1DCNN, that is, one sample that was originally hypoglycemic was misjudged as mild diabetes, and two samples that was originally health status were mis-predicted into the mild diabetes and the moderate diabetes, respectively.At the same time, there are one sample that was originally health status can be most likely judged into the hypoglycemic.

Results of the 1DCNN-LSTM model
To decrease the MSE of BGC for testing set, based on 1DCNN model with 6 convolution-pooling layers, the LSTM was incorporated into the 1DCNN model.To determine the optimal layer of LSTM, the effects of different layers on MSE of BGC for training set and testing set were investigated and compared in the 1DCNN-LSTM model.The MSE comparison results are shown in Fig. 7, where the red, blue, black, and green lines represent 1, 2, 3, and 4 layers of LSTM network structures, and the dashed and solid lines represent the MSE values on the training set and testing set samples, respectively.From Fig. 7, it can be observed that there is no apparent impact on the MSE of BGC by adjusting network structure and parameters.The MSE of BGC for the training set reaches its lowest value when using a 3 layers of LSTM network with 40 cells.At this time, the MSE values of training set and testing set are 0.020966 mmol/L and 0.32104 mmol/L, respectively.Further increasing the number of network layers to 4 or modifying network parameters do not further reduce the MSE of BGC.Based on the structure and parameters of 1DCNN-LSTM model, the clarke error grid analysis graph of BGC for the testing set samples based on 1DCNN-LSTM model was performed, which is presented in Fig. 8. From Fig. 7, it can be observed that there is no apparent impact on the MSE of BGC by adjusting network structure and parameters.The MSE of BGC for the training set reaches its lowest value when using a 3 layers of LSTM network with 40 cells.At this time, the MSE values of training set and testing set are 0.020966 mmol/L and 0.32104 mmol/L, respectively.Further increasing the number of network layers to 4 or modifying network parameters do not further reduce the MSE of BGC.Based on the structure and parameters of 1DCNN-LSTM model, the clarke error grid analysis graph of BGC for the testing set samples based on 1DCNN-LSTM model was performed, which is presented in Fig. 8.
From Fig. 8, it can be seen that there are two samples in 125 testing set samples were mis-predicted based on 1DCNN-LSTM model.One originally hypoglycemic case was misjudged into the health status, and another originally health status was misjudged into the moderate diabetes.At the same time, it can be noticed that there is other one originally hypoglycemic case seriously deviates toward the normal state.Therefore, to further achieve better predictive performance of BGC, the fusion of SAM with 1DCNN-LSTM was employed in this work.

Results of the 1DCNN-SAM-LSTM model
To enhance the capability of handling sequential data and further improve the accuracy of predicting BGC, the self-attention mechanism (SAM) module was combined into 1DCNN-LSTM model.The SAM is a resource allocation mechanism that simulates the attention process in the human brain.The core idea is to consider all positions in the input sequence simultaneously, without being constrained by the length of sequence.In other words, each input position can attend to other positions in the sequence, and the degree of attention is controlled by weights.When it is applied to the temporal signals, the SAM is more adept at capturing long-term adjusting network structure and parameters.The MSE of BGC for the training set reaches its lowest value when using a 3 layers of LSTM network with 40 cells.At this time, the MSE values of training set and testing set are 0.020966 mmol/L and 0.32104 mmol/L, respectively.Further increasing the number of network layers to 4 or modifying network parameters do not further reduce the MSE of BGC.Based on the structure and parameters of 1DCNN-LSTM model, the clarke error grid analysis graph of BGC for the testing set samples based on 1DCNN-LSTM model was performed, which is presented in Fig. 8. Fig. 8. Clarke error grid analysis graph of BGC for the testing set samples based on 1DCNN-LSTM model.From Fig. 8, it can be seen that there are two samples in 125 testing set samples were mis-predicted based on 1DCNN-LSTM model.One originally hypoglycemic case was misjudged into the health status, and another originally health status was misjudged into the dependencies by dynamically generating different connection weights.The computation is carried out in a Query-Key-Value (QKV) pattern [49].
In this work, a multi-head attention mechanism was employed, simultaneously the multiple parallel self-attention heads were utilized to process the input sequence.The input sequence is first processed through multiple independent self-attention heads.Each head produces an independent output, and these outputs are merged in the final output.Each head learns to focus on different relationships and features in the input sequence, thereby enhancing the expressive capability of the model.The SAM module was added after the last convolution-pooling layer of 1DCNN model and before the LSTM module.The constructed 1DCNN-SAM-LSTM model structure is shown in Fig. 9.
In SAM module of Fig. 9, the input sequence data is denoted as X = [x 1 , x 2 , . . ., x N ], and the output sequence data is denoted as H = [h 1 , h 2 , . . ., h N ], the main procedures of SAM are presented as follows: Firstly, linearly map X into three different spaces to obtain query vector q i , key vector k i and value vector v i (i = 1, 2, . . ., N).The linear mapping process can be expressed as follows: where W q , W k and W v are the parameter matrices for the linear mapping.Q = [q 1 , q 2 , . . ., q N ], are matrices formed by query vectors, key vectors, and value vectors respectively.multiple parallel self-attention heads were utilized to process the input sequence.The input sequence is first processed through multiple independent self-attention heads.Each head produces an independent output, and these outputs are merged in the final output.Each head learns to focus on different relationships and features in the input sequence, thereby enhancing the expressive capability of the model.The SAM module was added after the last convolution-pooling layer of 1DCNN model and before the LSTM module.The constructed 1DCNN-SAM-LSTM model structure is shown in Fig. 9.For the query vector q i , utilizing the attention mechanism with key-value pairs, the output vector h i can be obtained, i.e., The number of channels for keys (K), queries (Q), and values (V) in the self-attention mechanism directs the model's focus on different aspects of features.The number of attention heads and channels, as hyper-parameters, needs continuous optimization and adjustment in the model.Based on the optimal parameters of the 1DCNN-LSTM model, the errors on the network's training and testing sets under different numbers of attention heads (NumHeads) and channels (NumChannels) were compared, as shown in Table 2.
After 500 times training cycles for the training set samples, 125 randomly selected samples from the testing set were input into the trained 1DCNN-SAM-LSTM model.The results demonstrate that 1DCNN-SAM-LSTM achieved the superior performance with MSE C of 0.034317 mmol/L on the training set and MSE P of 0.14320 mmol/L on the testing set when NumberHeads is 32 and NumChannels of SAM module is 128, which is better than that of the previous results [17,20].The clarke error grid graph of BGC for 125 testing set samples with the synthetical influences of various factors is depicted in Fig. 10.
From Fig. 10, it can be observed that the majority of predicted BGC fall within Zone A, with only one predicted sample falls into Zone D. For this sample with incorrect prediction, it was originally the hypoglycemia but was predicted to be in a healthy state.

Comparison results of predicting BGC based on different models
To present the prediction performance of BGC with the synthetical influences of various factors based on 1DCNN-SAM-LSTM model employed in this work, several different models were  [16,19].The clarke error grid graph of BGC for 125 testing set samples with the synthetical influences of various factors is depicted in Fig. 10.Fig. 10.Clarke error grid of BGC for 125 testing set samples.From Fig. 10, it can be observed that the majority of predicted BGC fall within Zone A, with only one predicted sample falls into Zone D. For this sample with incorrect prediction, it was originally the hypoglycemia but was predicted to be in a healthy state.
At the same time, to evaluate the predicted performances of BGC with the synthetical influences of various factors based on different models, two evaluation indexes were utilized, i.e., determination coefficient (R 2 ) and mean square error (MSE).In general, the larger R 2 and smaller MSE values, better predictive performance.The computing formula of MSE is given in Eq. ( 13), the determination coefficient (R 2 ) can be computed based on the Eq. ( 14): where y i and ŷi are the original data and the predicted data, respectively.ȳ is the mean value of original data.
The comparative results of different models are presented in Table 3. From Table 3, it can be known that among these seven models, the predicted performance of BGC with the synthetical influences of various factors based on 1DCNN-SAM-LSTM model is the best with MSE C of 0.034317 mmol/L on the training set samples and 0.14320 mmol/L on the testing set samples.Compared with the traditional machine learning method of SVR, the MSE C of BGC based on 1DCNN-SAM-LSTM is reduced by 98.76% for the training set samples and 96.45% for the testing set samples.For three traditional neural networks, i.e., BPNN, RBFNN and WNN models, it can be seen that the performances of WNN model is slight worse than those of two others.Although BPNN and RBFNN have similar performance on R 2 and MSE, both of them fall short of the desired error level for the testing set samples.However, the use of 1DCNN model significantly reduces MSE and improves R 2 compared to SVR and three traditional neural networks.Then, the combination of LSTM into 1DCNN results in 87.1% reduction in MSE C for training set samples and 58.04% reduction in MSE P for testing set samples.Finally, as the SAM module is coupled into the 1DCNN-LSTM model to establish a fDNN model, i.e., 1DCNN-SAM-LSTM model, the performance of predicting BGC was also significantly enhanced, R 2 is increased to 0.99805 and MSE C is decreased to 0.034317 mmol/L for the training set samples, and R 2 is increased to 0.9900 and MSE P is decreased to 0.14320 mmol/L for the testing set samples.Overall, whether from the perspective of two evaluation indexes for the training set or the testing set, adopting the fDNN model, i.e., 1DCNN-SAM-LSTM model is the optimal choice.

Discussion
To verify the generalization ability of the fDNN model in the PTS detection of blood glucose, the validation experiments and model testing were performed.In the validation experiments, the time-resolved photoacoustic signals of 600 cow serums were collected via the established photoacoustic detection system, some part of time-resolved photoacoustic signals of serum samples with different concentrations are presented in Fig. 11 From Fig. 11(a), it can be seen that the photoacoustic amplitude increase with the BGC of cow serums.From Fig. 11(b), it can be seen that the predicted BGCs of all testing set samples lie in A zone of clarke error grid analysis.The MSE of BGC for testing set samples was 0.236 mmol/L.Then, to further validate the availability of BGC prediction accuracy based on the proposed fDNN model, the determination coefficients (R 2 ) and MSE of training set and testing set based on several different models were all computed and compared, which are presented in Table 4. From Table 4, it can be known that the prediction performance of BGC based on fDNN model, i.e., 1DCNN-SAM-LSTM model, compared with those of others.The determination coefficients (R 2 ) and MSE of training set based on 1DCNN-SAM-LSTM model are 0.98175 and 0.22216 mmol/L, the determination coefficients (R 2 ) and MSE of testing set based on 1DCNN-SAM-LSTM model are 0.98058 and 0.2365 mmol/L.Therefore, it is fully demonstrated that the fDNN model has excellent generalization ability in the PTS detection of blood glucose.

Conclusions
In this study, to explore a kind of high efficiency and accuracy photoacoustic method of measuring BGC with the synthetical influences of various factors (temperature, concentration, laser energy,
. In the system, a tunable Nd:YAG pumped 532 nm optical parametric oscillator pulsed laser (OPOletteTM, 532 II, OPOTEK Inc., USA) was employed as the light source.The laser has a pulse duration time of 7 ns, a repetition rate of 20 Hz, and a maximum energy of approximately 2 mJ, with adjustable output energy in the range of 0 to 100%.A line-focusing ultrasonic transducer (UT) with a central response frequency of 2.5 MHz (Doppler Co., Guangzhou, China) was used to capture time-resolved photoacoustic signals under different conditions of influence factors.

Fig. 4 .
Fig. 4. The time-resolved photoacoustic signals of blood under different factors.(a) different laser energies; (b) different detection distances; (c) different flow velocities; (d) different temperatures; (e) different concentration; (f) combination of different laser energy and concentration; (g) combination of different detection distance and concentration.

Fig. 5 .
Fig. 5. MSE comparison results of 1DCNN models with different convolution-pooling layers (a) and MSE values and training time of 1DCNN models with 6 convolution-pooling layers at the different iteration times (b).

a
Conv1: the first convention-pooling layer b FC1: the first full connected layer.

Figure 5 (
b) shows the MSE values of BGC with the synthetic impacts of multiple factors based on 1DCNN model with 6 convolution-pooling layers for the training set samples and the testing set samples, as well as the training times, at the different iteration times.

Fig. 6 .
Fig. 6.Clarke error grid analysis graph of BGC for the testing set samples based on 1DCNN model.

Fig. 8 .Fig. 7 .
Fig.8.Clarke error grid analysis graph of BGC for the testing set samples based on 1DCNN-LSTM model.

Fig. 8 .
Fig. 8. Clarke error grid analysis graph of BGC for the testing set samples based on 1DCNN-LSTM model.
(a).At the same time, the actual BGCs of all serum samples were also measured via the portable blood glucose meter (GA-3, Sinocare Co., China) and test strips.The integrate cow serums were randomly divided into training set samples and testing set samples according to the same ratio of 4:1, i.e., 480 cases of cow serums are training set samples, 120 cases of cow serums are testing set samples.Based on the proposed fDNN model, i.e., 1DCNN-SAM-LSTM model, the BGC of testing set samples were predicted.The clarke error grid of BGC for 120 testing set samples is shown in Fig. 11(b).

Fig. 11 .
Fig. 11.The part of time-resolved photoacoustic signals (a), and clarke error grid of BGC for testing set samples of cow serums (b).
blood flow rate and detection distance), PTS combined with a fDNN model was proposed.Based on the established photoacoustic detection system, the time-resolved photoacoustic signals of 625 blood samples at different influence factors were obtained.Since more information about the synthetical impacts of various factors can be better represented via the temporal signals, the time-resolved photoacoustic signals of blood samples rather than peak-to-peak values or amplitudes were directly utilized to quantitatively measure BGC of blood samples.1DCNN model was employed to supervised train the input temporal photoacoustic signals with data length of 1000 and the corresponding BGC of 500 training set samples, and the BGC of 125 testing set samples were predicted.Meanwhile, the effect of structural layer of 1DCNN on the MSE of BGC were investigated.Under the reasonable initialization parameters, 1DCNN model with 6 convention-pooling layers and 2 FC layers were constructed.Under the parameters optimization adjusting, the MSE of BGC for the testing set was 0.51001 mmol/L.To decrease MSE, LSTM module was combined into 1DCNN to better present the time series data property of the temporal photoacoustic signals of blood samples.Under 3 layers of LSTM module, the MSE of BGC was decreased to 0.32104 mmol/L.Then, SAM module was coupled into 1DCNN-LSTM model to deeply focus on some features extracted by 1DCNN, which enable it better reflect the changes in BGC from the temporal photoacoustic signals with the synthetical influences of various factors.Based on the fDNN model, i.e., 1DCNN-SAM-LSTM model, R 2 was enhanced to 0.990 and MSE P was reduced to 0.1432 mmol/L for the testing set samples.Comparison results of seven different models illustrate that the PTS combined with 1DCNN-SAM-LSTM model has an excellent performance in the quantitative detection of blood glucose with the synthetical influences of various factors.At the same time, the validation testing results demonstrate that the fDNN model has excellent generalization ability in the detection of blood glucose based on PTS technology.

Table 3 . Comparison results of different models
a MSE C : mean square error of the training set samples b MSE P : mean square error of the testing set samples.