Physiological Noise Filtering in Functional Near-Infrared Spectroscopy Signals Using Wavelet Transform and Long-Short Term Memory Networks

Activated channels of functional near-infrared spectroscopy are typically identified using the desired hemodynamic response function (dHRF) generated by a trial period. However, this approach is not possible for an unknown trial period. In this paper, an innovative method not using the dHRF is proposed, which extracts fluctuating signals during the resting state using maximal overlap discrete wavelet transform, identifies low-frequency wavelets corresponding to physiological noise, trains them using long-short term memory networks, and predicts/subtracts them during the task session. The motivation for prediction is to maintain the phase information of physiological noise at the start time of a task, which is possible because the signal is extended from the resting state to the task session. This technique decomposes the resting state data into nine wavelets and uses the fifth to ninth wavelets for learning and prediction. In the eighth wavelet, the prediction error difference between the with and without dHRF from the 15-s prediction window appeared to be the largest. Considering the difficulty in removing physiological noise when the activation period is near the physiological noise, the proposed method can be an alternative solution when the conventional method is not applicable. In passive brain-computer interfaces, estimating the brain signal starting time is necessary.


Introduction
In processing functional near-infrared spectroscopy (fNIRS) signals, a task-related hemodynamic signal cannot be identified if a physiological noise period is overlapped with the designed task period. This study proposes a novel method to identify physiological noises from the resting state and remove those noises during the task period using wavelet techniques and neural networks-based prediction. FNIRS is a brain-imaging technique that uses two or more wavelengths of light in near-infrared bands to measure changes in oxidized and deoxidized hemoglobin concentration in the cerebral cortex [1]. When a person moves, thinks, or receives an external stimulus, the nerve cells in cerebral cortical layers become excited. As the cells require more energy, the oxidized hemoglobin concentration around the nerve cells increases, and the deoxygenated hemoglobin concentration decreases [1]. Based on this principle, fNIRS can measure brain activity in real time. Because fNIRS is inexpensive, easy to use, and harmless to the human body, it has been used in brain disease diagnosis [2,3], brain-computer interface (BCI) [4], decoding sensory signals [5,6], child development [7], and psychology research [8].
An fNIRS channel consists of one source and one detector. When the light is emitted from a light source, photons pass through several layers, including the scalp, skull, cerebrospinal fluid, capillaries, and cerebral cortex, before returning to a detector. Through this process, the detected light contains various noises that make it challenging to know the hemodynamic responses. These noises include heartbeat, breathing, and motion artifacts [9]; more problematically, very low-frequency noise around 0.01 Hz has been reported [10][11][12].
In improving the accuracy of the measured signal, noise removal/reduction techniques are indispensable. Various techniques can remove physiological noises such as heartbeat, breathing, and Mayer waves. For instance, the superficial noise in the scalp can be removed using short separated channels [13], additional external devices, or applying denoising techniques such as adaptive filtering [14] and correlation analysis methods [15]. In addition, since the frequency bands of physiological noise are roughly known, a band-pass filter has become one of the most easily applied noise reduction techniques [16].
A general linear model (GLM) method has been widely used to find the task-related hemodynamic response in the fNIRS signal after preprocessing [17]. The desired hemodynamic response function (dHRF), which should be used for the GLM method, is designed considering the experimental paradigm. However, in the case that the essential frequency of the dHRF overlaps with a specific frequency of physiological noise, the conventional GLM method will not work and may result in mistaking noise for the hemodynamic response. Therefore, a new different denoising technique must be pursued.
A discrete wavelet transform (DWT) is a mathematical tool used to analyze signals in the time-frequency domain [18]. In fNIRS research, DWT has been used for denoising [19,20] and connectivity analysis [21,22]. The maximal overlap discrete wavelet transform (MODWT) is a type of DWT often used in signal processing and time series analysis [23]. It decomposes a signal into a series of wavelet coefficients at different widths and time locations. Unlike the usual DWTs, which use non-overlapping sub-signal windows to perform the wavelet decomposition, the MODWT uses overlapping sub-signal windows. This nested-window approach allows the MODWT to improve time-frequency localization and reduce the boundary effects that can occur in DWTs [24]. Due to this advantage, MODWT has been applied to a wide range of signals, including audio signals [25], weather information [26,27], and biomedical signals [28]. MODWT is powerful when the signals are abnormal or have complex frequency components.
Deep learning, a subfield of artificial intelligence, is based on artificial neural networks. In recent years, brain research has increasingly used it to analyze large, complex data sets, such as those generated by biomedical devices [29]. Research has been conducted to analyze health data such as magnetic resonance imaging (MRI) [30], electrocardiograms (ECG) [31] and electroencephalograms (EEG) [32,33], or to decode brain waves to control BCI [34,35]. Furthermore, analyzing brain neuroimaging data and identifying patterns associated with specific diseases can help with early diagnosis and personalized treatment.
Long short-term memory (LSTM) is a type of recurrent neural network (RNN) architecture designed to overcome the limitations of traditional RNNs in handling long-term dependencies in sequential data [36]. It has been used in a wide range of applications for time-series data classification and forecasting [37][38][39][40]. LSTMs are particularly useful in tasks that require modeling long-term dependencies in sequential data. LSTMs' ability to selectively remember and forget information over time is vital for accurate forecasting.
MODWT-LSTM-based prediction research has shown excellent results in predicting periodic data such as water level [41], ammonia nitrogen [42], weather [43], etc. In brain research, MODWT has been applied as a preprocessing method for EEG-based seizure detection [28,44], Alzheimer's diagnosis [45], and resting state network analysis of fMRI [46]. Since brain signals are measured in time series, active research on brain signal classification [47,48] uses LSTM. However, to our knowledge, this is the first study to predict the noise in fNIRS signals despite many of the noise components being periodic.
In this study, one thousand synthetic data are generated, assuming 600-s rest and 40-s task. Each data is decomposed into eight levels by the MODWT. Five wavelets containing low-frequency components from the 600-s data are used to train an LSTM network. The trained LSTM networks are used to predict the next 40 s, presumably the predicted signals of the low-frequency oscillations. The predicted signals are then subtracted from the task period data. For validation purposes, the predicted signal and original data are compared by calculating mean absolute errors (MAEs), and root mean square errors (RMSEs). Finally, the proposed method is demonstrated by analyzing the actual fNIRS data from humans. This paper is organized as follows: Section 2 describes the proposed method on the synthetic data, Section 3 demonstrates the proposed method with actual fNIRS data, Section 4 discusses the results of this study and its applications, and Section 5 presents conclusions.

Method Development
This section describes the development of the proposed method with the following four subsections. In the first subsection, the method of synthetic fNIRS data generation is described. The second and third subsections explain the operation of MODWT and LSTM, respectively. The fourth subsection describes the validation of the proposed method. The last subsection presents the results of the data analysis.

Synthetic fNIRS Data Generation
One thousand synthetic data are generated according to the method of Germignani et al. [49] with a sampling frequency of 8.138 Hz. For each data, thirty orders of autoregressive noise are added to the baseline noise [50]. The synthetic physiological noises include frequency ranges of 1 ± 0.1 Hz, 0.25 ± 0.01 Hz, and 0.1 ± 0.01 Hz for cardiac, respiratory, and Mayer waves, respectively. In addition, a sine wave with a frequency of 0.01 ± 0.001 Hz was generated for the very low-frequency component [11]. The amplitudes of five signals in a synthetic fNIRS signal were set randomly in the range of 0.01 to 0.03. In this paper, the resting period is set to 10 min, considering that the concerned low-frequency noise is near 0.01 Hz.
For five hundred data samples only, the desired hemodynamic function (dHRF) based on a 2-gamma function with 20 s of task and 20 s of resting state after the 10 min resting state were added. The amplitude of this signal was randomized between 0.1 and 0.35 and added to the previously generated noise. All data were set to zero at the starting point before processing the signals. Figure 1 depicts synthetic signals for various noises and the resultant HbO signal assumed.

Maximal Overlap Discrete Wavelet Transform
The discrete wavelet transform (DWT) is a signal processing technique that decomposes a signal into different frequency components at multiple levels of resolution. The DWT works by convolving the signal with a set of filters, called wavelet filters, which capture different frequency bands. The signal is decomposed into approximation and detail coefficients [19], which represent low-frequency components and high-frequency components, respectively. This decomposition is applied recursively to the approximation coefficients to obtain a multi-resolution representation. However, the DWT has several drawbacks, including the introduction of boundary artifacts due to the filtering process, the lack of shift invariance in the decomposition, and the potential loss of fine detail at higher decomposition levels. Zhang et al. (2018) [51] utilized the DWT in forecasting vehicle emissions and specifically compared four cases: The autoregressive integrated moving average (ARIMA) model, LSTM, DWT-ARIMA, and DWT-LSTM. They reported that adopting DWT improved the performance overall. Individually, between ARIMA and LSTM, LSTM performed better; between ARIMA and DWT-ARIMA, DWT-ARIMA generated improved results; between LSTM and DWT-LSTM, DWT-LSTM was superior; and between DWT-LSTM and DWT-ARIMA, DWT-LSTM demonstrated the best forecasting.
MODWT is a mathematical technique that transforms a signal into a multilevel wavelet and scaling factor. MODWT has several advantages over DWT. For example, the MODWT can be adequately defined for signals of arbitrary length, whereas the DWT is only for signals of integer length to the power of two.

Maximal Overlap Discrete Wavelet Transform
The discrete wavelet transform (DWT) is a signal processing technique that decomposes a signal into different frequency components at multiple levels of resolution. The DWT works by convolving the signal with a set of filters, called wavelet filters, which capture different frequency bands. The signal is decomposed into approximation and detail coefficients [19], which represent low−frequency components and high−frequency components, respectively. This decomposition is applied recursively to the approximation coefficients to obtain a multi−resolution representation. However, the DWT has several drawbacks, including the introduction of boundary artifacts due to the filtering process, the lack of shift invariance in the decomposition, and the potential loss of fine detail at higher decomposition levels. MODWT is a mathematical technique that transforms a signal into a multilevel wavelet and scaling factor. MODWT has several advantages over DWT. For example, the MODWT can be adequately defined for signals of arbitrary length, whereas the DWT is only for signals of integer length to the power of two.
where W j,t is the wavelet coefficient of the tth element of the jth level of the MODWT; V j,t is the scaling factor of the tth element of the jth level; and low ( ∼ g j,l ≡ g j,l /2 j 2 ) pass filters; h j,l and g j,l are the jth level DWT high-pass and lowpass filters, where L is the maximum decomposition level. The filters are determined by the mother wavelet as in the DWT [52]. The MODWT based multiresolution analysis is expressed as follows.
Bioengineering 2023, 10, 685 where A L is the approximation component and D j is the detail components (j = 1, 2, · · · , L). Figure 2 shows a scheme of MODWT-based multiresolution analysis. Figure 2 shows a scheme of MODWT−based multiresolution analysis. In this study, Sym4 was selected as the mother wavelet because it resembles the canonical hemodynamic response function. Let the number of data be N. Then, the maximum decomposition level becomes less than log2(N). Considering our case's shortest resting state of 60 s, the data size is 60 s × 8.13 Hz = 487.8. Therefore, the decomposition level in our work was selected by 8, which is the largest integer less than log2(487.8). The eight decompositions result in nine signals, of which only five signals belonging to low frequencies will be predicted.

Long Short−Term Memory
LSTM is a type of RNN architecture that addresses the vanishing gradient problem and allows for capturing long−term dependencies in sequential data. LSTM consists of memory cells that store and update information over time. The primary function of an LSTM is to use memory cells that can hold information for long periods. Memory cells can In this study, Sym4 was selected as the mother wavelet because it resembles the canonical hemodynamic response function. Let the number of data be N. Then, the maximum decomposition level becomes less than log 2 (N). Considering our case's shortest resting state of 60 s, the data size is 60 s × 8.13 Hz = 487.8. Therefore, the decomposition level in our work was selected by 8, which is the largest integer less than log 2 (487.8). The eight decompositions result in nine signals, of which only five signals belonging to low frequencies will be predicted.

Long Short-Term Memory
LSTM is a type of RNN architecture that addresses the vanishing gradient problem and allows for capturing long-term dependencies in sequential data. LSTM consists of memory cells that store and update information over time. The primary function of an LSTM is to use memory cells that can hold information for long periods. Memory cells can selectively forget or remember information based on input data and past states. This allows the network to learn and remember important information while ignoring irrelevant or redundant information. An LSTM network has three gates (input gate, forget gate, and output gate) that control the flow of information into and out of the memory cells. The input gate i(t) determines which information is stored in the memory cell c(t), the forget gate f (t) determines which information is discarded, and the output gate o(t) controls the output of the memory cell ( Figure 3) [53].  The LSTM model is represented by the following equations: The LSTM model is represented by the following equations: where c(t − 1) and c(t) are the cell states at t − 1 and t, and at each gate σ is a sigmoid function, tanh is a hyperbolic tangent activation function, and × denotes the cross product of two vectors. In this study, three LSTM layers were utilized, with the number of hidden units set to [128, 64,32], and a dropout layer was employed between the LSTM layers with a probability of 0.2 to prevent overfitting ( Figure 4). To train the LSTM network, the Adam optimizer was used with a maximum epoch of 100 and a minibatch size of 128. All data were normalized before training.  For the synthetic data, nine hundred data were randomly selected from the thousand data to train the network, and then one hundred data were tested. For actual fNIRS data, a leave−one−out method was used to avoid splitting data from the same person for training and testing. For example, to train an LSTM network to predict the 48 channels of a subject, a total of 432 channels (nine subjects × 48 channels) were used. Since the data was only 600 s long, 570 s of data were used for training, and the trained LSTM network predicted the next 30 s.

Validation
To determine the accuracy of the signal predicted by the LSTM, the mean absolute error (MAE) and root mean squared error (RMSE) were calculated compared to the original signal, divided by the signal with and without dHRF. The data was segmented, analyzed, and predicted to find the required resting−state length to achieve optimal predic- For the synthetic data, nine hundred data were randomly selected from the thousand data to train the network, and then one hundred data were tested. For actual fNIRS data, a leave-one-out method was used to avoid splitting data from the same person for training and testing. For example, to train an LSTM network to predict the 48 channels of a subject, a total of 432 channels (nine subjects × 48 channels) were used. Since the data was only 600 s long, 570 s of data were used for training, and the trained LSTM network predicted the next 30 s.

Validation
To determine the accuracy of the signal predicted by the LSTM, the mean absolute error (MAE) and root mean squared error (RMSE) were calculated compared to the original signal, divided by the signal with and without dHRF. The data was segmented, analyzed, and predicted to find the required resting-state length to achieve optimal prediction accuracy, as shown in Figure 5. MAE and RMSE can be calculated using the following equations.
where y i is the original signal,ŷ i is the predicted signal, i is the timestep, and n is the number of data. The calculated MAEs and RMSEs of the signal with and without the dHRF were compared using a two-sample t-test.

Synthetic Data Analysis
The synthetic data were decomposed into nine components using MODWT, and the components used for prediction were the fifth through ninth. The frequency of the fifth wavelet was between 0.13 and 0.26 Hz, the sixth between 0.067 and 0.13 Hz, the seventh between 0.035 and 0.067 Hz, the eighth between 0.017 and 0.035 Hz, and the ninth consisted of signals below 0.017 Hz. Figure 6 shows the prediction results of the signal with and without dHRF. The signal with dHRF showed a significant fluctuation during the task period in the low−frequency signals of Wavelets 6-9, and the predicted signal did not follow this fluctuation.

Synthetic Data Analysis
The synthetic data were decomposed into nine components using MODWT, and the components used for prediction were the fifth through ninth. The frequency of the fifth wavelet was between 0.13 and 0.26 Hz, the sixth between 0.067 and 0.13 Hz, the seventh between 0.035 and 0.067 Hz, the eighth between 0.017 and 0.035 Hz, and the ninth consisted of signals below 0.017 Hz. Figure 6 shows the prediction results of the signal with and without dHRF. The signal with dHRF showed a significant fluctuation during the task period in the low-frequency signals of Wavelets 6-9, and the predicted signal did not follow this fluctuation. Figure 7 and Table 1 show the calculated MAEs and RMSEs. In all conditions, the MAEs and RMSEs of the signal with dHRF corresponding to Wavelets 6-9 and the signal without dHRF were statistically significantly different. The only statistically significant difference between with and without dHRF was found in the RMSE of Wavelet 5 when the MODWT-LSTM analysis was performed with 300 s of data (Figure 7c). To compare the prediction results for each condition, MAEs and RMSEs for all conditions are shown in Figure 8. The error of the dHRF signal was the largest in Condition 2 (MODWT-LSTM at 600 s) and the smallest in Condition 6 (MODWT-LSTM with 60 s of data). In particular, the difference in prediction accuracy between with and without dHRF signals of Wavelet 8 was the largest in all conditions. The synthetic data were decomposed into nine components using MODWT, and the components used for prediction were the fifth through ninth. The frequency of the fifth wavelet was between 0.13 and 0.26 Hz, the sixth between 0.067 and 0.13 Hz, the seventh between 0.035 and 0.067 Hz, the eighth between 0.017 and 0.035 Hz, and the ninth consisted of signals below 0.017 Hz. Figure 6 shows the prediction results of the signal with and without dHRF. The signal with dHRF showed a significant fluctuation during the task period in the low−frequency signals of Wavelets 6-9, and the predicted signal did not follow this fluctuation.    Table 1 show the calculated MAEs and RMSEs. In all conditions, the MAEs and RMSEs of the signal with dHRF corresponding to Wavelets 6-9 and the signal without dHRF were statistically significantly different. The only statistically significant difference between with and without dHRF was found in the RMSE of Wavelet 5 when the MODWT−LSTM analysis was performed with 300 s of data ( Figure 7c). To compare the prediction results for each condition, MAEs and RMSEs for all conditions are shown in Figure 8. The error of the dHRF signal was the largest in Condition 2 (MODWT−LSTM at 600 s) and the smallest in Condition 6 (MODWT−LSTM with 60 s of data). In particular, the difference in prediction accuracy between with and without dHRF signals of Wavelet 8 was the largest in all conditions.    For the 600 s data prediction results, MAEs and RMSEs were calculated for 1 s, 3 s, 5 s, 10 s, 15 s, and 30 s (Figure 9). In all cases, there were statistically significant differences in Wavelets 6-9 between with and without dHRF. Especially for Wavelet 7, with the most significant difference at 1 s and a decrease after that, but for Wavelet 8, the difference started at 10 s and was most extensive at 15 s.  For the 600 s data prediction results, MAEs and RMSEs were calculated for 1 s, 3 s, 5 s, 10 s, 15 s, and 30 s (Figure 9). In all cases, there were statistically significant differences in Wavelets 6-9 between with and without dHRF. Especially for Wavelet 7, with the most significant difference at 1 s and a decrease after that, but for Wavelet 8, the difference started at 10 s and was most extensive at 15 s. For the 600 s data prediction results, MAEs and RMSEs were calculated for 1 s, s, 10 s, 15 s, and 30 s (Figure 9). In all cases, there were statistically significant differe in Wavelets 6-9 between with and without dHRF. Especially for Wavelet 7, with the significant difference at 1 s and a decrease after that, but for Wavelet 8, the differ started at 10 s and was most extensive at 15 s.

Human Data Application
In this section, actual fNIRS data from human subjects were used to validate the proposed method. The actual fNIRS data were obtained in the authors' previous study, but only resting state data were used [2]. In the first subsection, the fNIRS data acquisition is briefly described. The second subsection describes the results of the application of the proposed method.

fNIRS Data Acquisition
Resting state data with a data length of 10 min were selected from ten healthy subjects. The selected subjects are five males and five females (age: 68 ± 5.95 years). Prior to the experiment, each subject was fully informed about the purpose of the study. Written informed consent was obtained from each subject. The entire experiment was approved by the ethics committee of Pusan National University Yangsan Hospital (Institutional Review Board approval number: PNUYH-03-2018-003).
Hemodynamic responses in PFC were measured with a portable fNIRS device (NIRSIT; OBELAB, Seoul, Republic of Korea) equipped with 24 sources (laser diode) and 32 detectors (a total of 204 channels, including short channel separation) at a sampling rate of 8.138 Hz. NIRSIT uses two wavelengths of near-infrared light (780 nm and 850 nm) to measure concentration changes of HbO and HbR. Only 48 channels with 3 cm of channel distance out of 204 channels were used for this study.

Human Data Analysis
The prediction results for the actual HbO data are shown in Figure 10. Unlike the synthetic data, the amplitude of the ninth wavelet was significantly lower than the other wavelets. A spike appeared in all the wavelets at a particular time, presumably a motion artifact. The MODWT results differed at both ends of the wavelets for the 570 s data and the 600 s data.

Human Data Application
In this section, actual fNIRS data from human subjects were used to validate th posed method. The actual fNIRS data were obtained in the authors' previous stud only resting state data were used [2]. In the first subsection, the fNIRS data acquisi briefly described. The second subsection describes the results of the application proposed method.

fNIRS Data Acquisition
Resting state data with a data length of 10 min were selected from ten health jects. The selected subjects are five males and five females (age: 68 ± 5.95 years). Pr the experiment, each subject was fully informed about the purpose of the study. W informed consent was obtained from each subject. The entire experiment was app by the ethics committee of Pusan National University Yangsan Hospital (Institution view Board approval number: PNUYH−03−2018−003).
Hemodynamic responses in PFC were measured with a portable fNIRS device SIT; OBELAB, Seoul, Republic of Korea) equipped with 24 sources (laser diode) a detectors (a total of 204 channels, including short channel separation) at a samplin of 8.138 Hz. NIRSIT uses two wavelengths of near−infrared light (780 nm and 850 n measure concentration changes of HbO and HbR. Only 48 channels with 3 cm of ch distance out of 204 channels were used for this study.

Human Data Analysis
The prediction results for the actual HbO data are shown in Figure 10. Unli synthetic data, the amplitude of the ninth wavelet was significantly lower than the wavelets. A spike appeared in all the wavelets at a particular time, presumably a m artifact. The MODWT results differed at both ends of the wavelets for the 570 s dat the 600 s data.  Table 2 shows the results of calculating the mean and standard deviation of the and RMSEs of the predictions on the real HbO data. Among them, the average va plotted for easy comparison (Figure 11). The ninth wavelet had the slightest error b most significant standard deviation across all cases. The fifth and sixth wavelets sh increasingly significant errors until 3 s and 5 s, respectively, then decreased. The se and eighth wavelets had more significant errors as the time window increased.  Table 2 shows the results of calculating the mean and standard deviation of the MAEs and RMSEs of the predictions on the real HbO data. Among them, the average value is plotted for easy comparison (Figure 11). The ninth wavelet had the slightest error but the most significant standard deviation across all cases. The fifth and sixth wavelets showed increasingly significant errors until 3 s and 5 s, respectively, then decreased. The seventh and eighth wavelets had more significant errors as the time window increased.

Discussion
In fNIRS studies, cognitive tasks are used to evaluate cognitive abilities such as working memory, conflict processing, language processing, emotional processing, and memory encoding and retrieval [54]. For example, N−back, Stroop, and verbal fluency tasks evaluate working memory, conflict processing, language processing, etc. Such cognitive tasks are also often used to detect brain diseases such as schizophrenia, depression, cognitive impairment, attention−deficit hyperactivity disorder, etc. [55].
Cortical activations caused by cognitive tasks are investigated by a t−map, a connectivity map, or extracted features from HbO signals [2,3]. The t−map is reconstructed with t−values from the GLM method, indicating the dHRF's weight at each channel. The connectivity map is an image map of correlation coefficients between two channels, which reflects how those two channels are interrelated. Hemodynamic features such as the mean, slope, and peak value have also been used to diagnose brain diseases. Cognitive task analysis can identify activated/deactivated regions and differences between healthy and non−healthy people.
The proposed method was validated in two ways: (i) By comparing synthetic data with and without dHRF, and (ii) by predicting the resting state data. In the synthetic data, the proposed method showed statistically significant differences in the prediction errors between with and w/o dHRF. The prediction errors in human resting state data also

Discussion
In fNIRS studies, cognitive tasks are used to evaluate cognitive abilities such as working memory, conflict processing, language processing, emotional processing, and memory encoding and retrieval [54]. For example, N-back, Stroop, and verbal fluency tasks evaluate working memory, conflict processing, language processing, etc. Such cognitive tasks are also often used to detect brain diseases such as schizophrenia, depression, cognitive impairment, attention-deficit hyperactivity disorder, etc. [55].
Cortical activations caused by cognitive tasks are investigated by a t-map, a connectivity map, or extracted features from HbO signals [2,3]. The t-map is reconstructed with t-values from the GLM method, indicating the dHRF's weight at each channel. The connectivity map is an image map of correlation coefficients between two channels, which reflects how those two channels are interrelated. Hemodynamic features such as the mean, slope, and peak value have also been used to diagnose brain diseases. Cognitive task analysis can identify activated/deactivated regions and differences between healthy and non-healthy people.
The proposed method was validated in two ways: (i) By comparing synthetic data with and without dHRF, and (ii) by predicting the resting state data. In the synthetic data, the proposed method showed statistically significant differences in the prediction errors between with and w/o dHRF. The prediction errors in human resting state data also showed concordance with the results of synthetic data without dHRF. The agreement between the synthetic data without dHRF and the human resting state data demonstrates that the task-related response can also be differentiated from the proposed method.
Since the hemodynamic signal in this study consisted of 20 s of task and 20 s of rest and had a frequency of 0.025 Hz, it was expected that the eighth wavelet would show a significant difference with and without dHRF. As shown in Figure 6, the wavelet decomposition of the signal with dHRF was different from the signal without in the sixth through ninth wavelets. As expected, a statistically significant difference was found in the eighth wavelet, but the sixth, seventh, and ninth wavelets also showed significant differences. This is likely due to the decomposition of the dHRF into multiple levels when performing the MODWT.
The LSTM results show that the difference between with and without dHRF is more pronounced when the number of training data points increases. (Figure 7a,b). In addition, the smaller the number of training data points, the smaller the prediction error of the signal with dHRF and the larger the prediction error of the signal without dHRF. This is not surprising, since sufficient data is required for practical training of the LSTM.
To investigate whether the occurrence of hemodynamic signals can be predicted early, MAEs and RMSEs were estimated by dividing the predicted data into 1 s, 3 s, 5 s, 10 s, 15 s, and 30 s, and the difference in error between the seventh wavelet with and without dHRF was significant early. The difference between the eighth wavelet with and without dHRF was significant at 15 s because it took more than 10 s for the dHRF to rise to the maximum, since it takes time for the dHRF to rise.
When the proposed method was applied to real data, the error was similar to that of the synthetic data without dHRF. The lowest error occurred in the ninth wavelet, which seems to be due to the lowest signal strength of the ninth wavelet. Initially, wavelets with higher frequencies produced relatively higher errors, but the opposite was true as the prediction time increased. This suggests that as the data length varies, the results of the MODWT change as well, as this is more pronounced at both ends of the data.
Methods to estimate the hemodynamic response and remove noise from fNIRS signals include Kalman filtering [56], Bayesian filtering [57], block averaging [58], general linear models [59], and adaptive filtering [14,60,61]. In addition, initial-dip detection has also been studied for early detection of hemodynamic responses [62,63]. However, these methods rely heavily on the desired hemodynamic function as a reference signal ( Table 3). The hemodynamic signal is designed by gamma functions [64], the balloon model [65], the finite element method [66], the state-space method [67,68], etc. These hemodynamic signals are not suitable for use in unknown areas because they depend on the brain region or task being measured. However, the proposed method is differentiated from existing methods in that it does not require a reference signal and can be applied without external devices.

Conclusions
The following three implications are made: (i) Alleviating the dHRF's trap: In the conventional methods (i.e., general linear model [59], recursive estimation method [60], etc.), the brain signal is identified by comparing HbO signals with a dHRF. If the correlation coefficient between two signals is high, the measured HbO is attributed to the task. The dHRF computed by convolving a gamma function with the task period contains multiple frequencies, not a single frequency. For example, for a 20 s task followed by a 20 s rest, the dHRF has 0.025 Hz (=1/40 s), and all other components are considered noises. Such multiple frequencies are also seen from the synthetic data analysis, showing that the added 0.025 Hz dHRF affected neighboring frequency bands, see Figure 6. Therefore, if the brain signal is identified with only the dHRF, the neighboring signals are unwillingly included (which could be noises). Hence, the proposed method can alleviate the dHRF's trap.
(ii) Can handle an unknown task period: In neuroscience, fNIRS has been used to identify brain regions associated with specific tasks and to understand how neural networks function. In particular, regular examinations in daily life are essential for the early detection of cognitive decline due to brain disease or aging. Research on the classification of cognitive decline and brain disease diagnosis using fNIRS is being actively conducted. However, it is challenging to establish classification criteria because hemodynamic signals vary depending on various factors such as age and gender. In particular, it is necessary to compare behavioral data and fNIRS signals for classification, and the duration of cognitive function tests belonging to neuropsychological tests should be pre-designed. Thus, the proposed method can be used when the task period to be observed is unknown or very long.
(iii) Starting time estimation for passive BCI: Recently, passive BCI has become essential for fault-free automotive cars, pilots, etc. In this case, the brain signal's starting time has to be identified. To estimate the starting time, a moving-window approach can be adopted. If the prediction error becomes large while moving the window, the instance of a significant error can be considered as the starting time of a passive brain signal, and we can generate a BCI command.
The proposed method can overcome the variability in the resting state, which varies from person to person, by predicting the subsequent signal. The predicted signal ought to be removed from the measured signal, and the remaining signal should be analyzed for brain activity. Although the proposed method has some limitations, e.g., large volumes of training data and computation time to train the model for the first time, it is expected to play a significant role in improving the temporal resolution of fNIRS in the future.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data and code that support the findings of this study are openly available in https://github.com/sohyeonyoo/MODWT-LSTM.