Wavelet-based method for removing global physiological noise in functional near-infrared spectroscopy

: Functional near-infrared spectroscopy (fNIRS) is a fast-developing non-invasive functional brain imaging technology widely used in cognitive neuroscience, clinical research and neural engineering. However, it is a challenge to effectively remove the global physiological noise in the fNIRS signal. The global physiological noise in fNIRS arises from multiple physiological origins in both superficial tissues and the brain. It has complex temporal, spatial and frequency characteristics, casting significant influence on the results. In the present study, we developed a novel wavelet-based method for fNIRS global physiological noise removal. The method is data-driven and does not rely on any additional hardware or subjective noise component selection procedure. It consists of two steps. Firstly, we use wavelet transform coherence to automatically detect the time-frequency points contaminated by the global physiological noise. Secondly, we decompose the fNIRS signal by using the wavelet transform, and then suppress the wavelet energy of the contaminated time-frequency points. Finally, we transform the signal back to a time series. We validated the method by using simulation and real data at both task- and resting-state. The results showed that our method can effectively remove the global physiological noise from the fNIRS signal and improve the spatial specificity of the task activation and the resting-state functional connectivity pattern.


Introduction
Because of low-cost, portability and high ecological validity, functional near-infrared spectroscopy (fNIRS) becomes a fast-developing non-invasive functional brain imaging technology. fNIRS has been widely used in various fields, such as cognitive neuroscience [1], clinical research [2] and neural engineering [3]. fNIRS measures the hemodynamic activity in the cerebral cortex. A typical fNIRS measurement channel consists of a near-infrared light source and a light detector placed on the scalp, forming a "banana-shaped" light path in the head due to the biological tissue's diffusional scattering property. The distance between the source and the detector (S-D distance) is generally around 3 cm, which ensures the "banana-shape" reach adequate depth from the scalp and pass through the cerebral cortex [4]. Because the oxygenated hemoglobin (HbO) and the deoxygenated hemoglobin (HbR) have distinct absorption spectrum to the near-infrared light, so the concentration change parameter of the HbO and the HbR in the cerebral blood can be calculated from the intensity change of dual wavelength near-infrared lights using the modified Beer-Lambert law (MBLL) [5].
However, in practice it is a major challenge that the fNIRS measurement is often contaminated by the global physiological noise. This globally distributed physiological noise usually casts significant influence on various kinds of experiment. For example, in the task-based study, it may cover up the evoked neural activity from being detected [6][7][8], whereas it may lead to spurious global connectivity pattern in the resting-state functional connectivity study [9,10]. The global physiological noise in fNIRS mainly arises from the physiological fluctuation of the blood supply system in the superficial tissue layers including the scalp and the skull, and it also arises from the physiological vascular oscillation inside the brain [11][12][13]. It is usually the dominant component of the fNIRS signal because the photon density of the "banana-shaped" light path decays severely with the depth increases, so the fNIRS signal is much more sensitive to the hemoglobin concentration change in the superficial layers than that in the cortical layers (about 10-20 times [12]). Therefore, essentially it is a difficult weak signal detection problem to recover the neural activity from the fNIRS signal contaminated by the global physiological noise.
Several methods have been developed to remove the fNIRS global physiological noise (see [14] for a review). One frequently-used method is the band-pass filtering which removes the noise by suppressing the power in the noise frequency band [15]. The problem of this approach is that it is necessary to know the exact frequency characteristics of all physiological noise which may vary over time, space and across individuals. Another issue is if the frequency band of the noise overlaps with that of the neural activity, the filter may destroy the experimental effects of interest.
The short source-detector distance channel (sSD channel) based correction is a kind of effective method for removing the fNIRS global physiological noise [8,13,[16][17][18][19][20]. By using relatively short S-D distance (about 0.5 to 1.5 cm [12,21]), the "banana-shaped" light path can be constrained within the superficial layers rather than reaching the cerebral cortex. Therefore, the signal from the sSD channel is usually assumed as a good measure of the superficial physiological noise and is used to remove the superficial physiological noise in the normal S-D distance channel. Although this method is very effective, it has several practical shortcomings. Firstly, not all fNIRS apparatus support sSD channel. Secondly, configuring sSD channel increase the setting up time. Thirdly, the superficial physiological noise is not homogeneously distributed on the head [11], but the number of fNIRS channels is usually limited. It is impossible to allocate each normal channel with an additional sSD channel to record the noise. It is still a difficult problem to optimize the arrangement of the limited sSD channels [18,21].
The blind source separation (BSS) algorithms such as principal component analysis (PCA) [22][23][24][25] and independent component analysis (ICA) [26,27] are another option for global physiological noise removal. These methods firstly decompose the fNIRS signal into a series of source components with either the orthogonal (PCA) or the statistical independent (ICA) assumption. Then the noise components are identified by means such like energy sorting or spatial uniformity checking, based on different assumptions about the noise. Finally, the signal is separated from the noise and reconstructed. The BSS methods are good alternatives for global physiological noise removal, especially when sSD method is unavailable. However, the performance of these methods highly depends on the noise component identification procedure, which is usually under human's supervision or even needs manual selection which requires rich experience from the user. Another potential problem of the BSS methods is that they cannot handle the time lags among the physiological noise of different channels.
The wavelet transform (WT) is a powerful mathematical tool for non-stationary signals' time-frequency analysis. The wavelet transform can expand a time series into time-frequency space by using both time and frequency localized basis functions (wavelet basis). It uses low time resolution at low frequencies and high time resolution at high frequencies and thus is time-frequency adaptive. Based on WT, the wavelet transform coherence (WTC) can further give the coherence between two signals in the time-frequency space. In previous studies, WT and WTC were widely used in fNIRS data preprocessing and analysis. For example, WT was used in fNIRS signal detrending [28] and motion-induced artifact removal [29][30][31][32]. While WTC was used to calculate the time-frequency coherence between the fNIRS signal and the peripheral physiology recording signal to identify and quantify the physiological components in fNIRS signal [33,34], or between the fNIRS data and the predicted brain response to detect task activation [35]. Moreover, in recent years, WTC was also frequently used in fNIRS hyperscanning studies to investigate the inter-brain synchronization during social interaction between two or more brains e.g [36][37][38][39].
However, to our knowledge, the wavelet-based fNIRS global physiological noise removal remains vacant. In the present study, we proposed a novel wavelet-based method to remove the fNIRS global physiological noise. Our method consists of two steps. In the first step, we use WTC to automatically detect the time-frequency points contaminated by the global physiological noise. In the second step, the fNIRS signal is decomposed with WT, the wavelet energy of the contaminated time-frequency points was then suppressed, and the signal was finally reconstructed. We conducted simulation and real experiments of both task-and resting-state to validate the performance of our method.

Wavelet transform and wavelet transform coherence
Wavelet transform is a powerful time-frequency analysis technique. A wavelet is a function with zero mean and is both frequency and time localized. For example, the Morlet wavelet, employed in the present study and other fNIRS studies [33][34][35] for its good performance in time-frequency analysis, is defined as where t δ is uniform time steps and ( ) 2 , n x W n s is defined as the wavelet power [41]. The above wavelet transform also has its corresponding inverse wavelet transform (IWT) which can reconstruct the signal using time-and frequency-localized components (for mathematical details, see [42] and [43]). Based on the wavelet transform, the wavelet transform coherence can give the coherence between two time series at each point of the wavelet-divided time-scale plane, and thus can be used to analyze the co-varying characteristics between two time series with both temporal and frequency resolution. The wavelet transform coherence between two time series n x and n y is defined as where * , n n n n x y x y W W W = and * denotes complex conjugation.
2 , n n x y W is defined as the cross wavelet power. S is a smoothing operator consisted of a temporal smoother and a scale smoother [44]. The computational implements of WTC and WT/IWT have been provided in the MATLAB software (R2017a, MathWorks Inc., Massachusetts, USA).

The WT-based method for global physiological noise removal
Generally, the basic concept of the wavelet denoising is to decompose the signal into time-frequency space, and then identify those noise-related time-frequency points by using appropriate criterion and set their wavelet coefficients to zero, and finally transform it back to a time series. Our method uses the same concept. The key point is that we use WTC to identify those globally co-varying time-frequency points as contaminated by the global physiological noise. In our method, the global physiological noise is removed per channel. For each channel, the noise is removed by using a two-stage procedure. In stage one, the time-frequency points contaminated by the global physiological noise are detected by using WTC. Specifically, firstly we calculate the time-frequency distributed WTC map (also called scalogram [35]) between the current denoising channel's signal and every other channel's unfiltered signal. These WTC maps are further binarized according to the significance of the WTC value at each time-frequency pixel. Then all these WTC maps are averaged, forming a globally co-varying time-frequency map. The value of a given pixel in this averaged map reflects how globally the current channel co-varies with other channels at this time-frequency point. Finally, a denoising mask for the current channel is generated by applying a threshold k to the globally co-varying time-frequency map. In stage two, the signal of the current denoising channel is decomposed into the time-frequency space by using WT. Then the wavelet energy at those noise-contaminated time-frequency points is suppressed by applying the mask obtained in stage one to the wavelet coefficients. Finally, the signal is reconstructed by using the inverse wavelet transform. The above two stages are repeated per channel to complete the global physiological noise removal. (Fig. 1). This method utilizes the global distribution property of the physiological noise. Using this method, the time-frequency components with high globally co-varying feature are labeled as contaminated and removed, while the interested neural components are reserved because they are relatively local. Also note that because the WTC can track the co-varying feature of two signals even if there is a time delay between them, so our method can deal with the possible time lags between the physiological noise of different channels.
The selection of the threshold k is important. Indeed, k is a quantification definition of the globality of the global physiological noise. Ideally, k should be equal to 100% if assuming the global physiological noise is significantly coherent among all channels. However, the global physiological noise is not completely spatial homogeneous. Therefore, too strict definition of the globality (high k) can decrease the noise removal performance of the algorithm. On the other hand, too loose definition of the globality (low k) may lead loss of the interested neural activity because the neural components within a same functional area may be also coherent. Therefore, theoretically, if assuming the neural components are coherent within the entire interested brain regions, the threshold k should satisfy where ROI N indicates the number of the interested channels (regions of interests, ROI) and T N means the number of all the channels. In the present study, we used a simulated experiment with set/known number of ROI N and T N to validate this lower bound of k. We also conducted a real experiment to test the influence of different k value selection on the denoising performance.

Simulation experiment
In the simulation experiment, we simulated a 66-channel fNIRS measurement (6 rows by 11 columns) on the head surface. The measurement contained a 14-channel ROI (region of interest). All the channels were filled with simulated global physiological noise consisted of multiple sinusoidal signals of different frequency, including the heart beat (1 Hz), the respiration (0.3 Hz), the Mayer waves (around 0.1 Hz) and the very low-frequency oscillations (below 0.01 Hz) [35,45,46]. Moreover, to test whether our method can deal with the possible time delay of the global physiological noise spreading among different measurement channels on the head surface, a maximum of 6 s time delay was simulated, spreading from an origin point (the bottom-middle channel) outward to the other channels. We set the time lag of a channel according to its Chebyshev distance (d 1,2 = max(|x 1x 2 |, |y 1y 2 |)) from the origin point ( Fig.  2(A)).
Both task-and resting-state data set were simulated to validate the proposed method. The task-evoked neural activity signal was simulated as a boxcar function convolved with the canonical hemodynamic response function (HRF). The simulated resting-state spontaneous neural activity signal was a combine of a series of sinusoidal functions with random phase. The frequency range of these sinusoidal functions was from 0.01 Hz to 0.1 Hz, stepped by 0.001 Hz, with amplitude distributed in 1/f manner [45] (Fig. 2(B)). The simulated task-and resting-state signals were set only in the ROI channels and were mixed with the simulated global physiological noise. The amplitude ratio of the neural signal and the physiological noise was set to from 1: 8 to 1: 10, randomly across channels. A 10 dB random Gauss white noise was also added to each channel's simulated signal. The length of the simulated time series was 300 s, with a sampling rate of 10 Hz (Fig. 2(C)). Both the task-and resting-state data sets were denoised by using the proposed method with different globality threshold k to remove the global physiological noise. The recovered signals were then compared with the ground truth signals. To examine the lower bound of the globality threshold parameter k, the recovered signal in ROI derived from each k was then regressed by the ground truth signal to see whether the recovered signal can well reproduce the ground truth signal.

Real experiment
We then conducted a finger-tapping motor task experiment and a resting-state experiment to further validate the proposed method. Forty healthy right-handed young adult participants (21.7 ± 2.5 years of age, 22 males and 18 female) were recruited from Shenzhen University. All participants underwent a 10-minute resting-state session followed by a 5.6-minute task session of fNIRS recording. In the resting-state session, the participants were instructed to keep still with their eyes closed and relax their mind. In the task session, participants performed a sequential bilateral finger-tapping task containing seven task blocks with a pseudo-randomized block length of 20-30 s [47]. Informed consent was obtained from all subjects. The study protocol was approved by the Institutional Review Board at Shenzhen Key Laboratory of Affective and Social Neuroscience, Shenzhen University.
The fNIRS measurement was conducted by using a NIRScout continuous wave fNIRS system (NIRx Medical Technologies, USA). The absorptions of the near-infrared lights at two wavelengths (785 nm and 830 nm) were measured with a sampling rate of 7.8125 Hz. Two 4 × 4 probe sets were used in this study with each probe set consisting of eight laser sources and eight detectors, forming 24 measurement channels (48 channels in total). The source-detector distance was 30 mm. The two probe sets were respectively placed on the head with their center at C3 and C4 of the international 10-20 system [48] to cover the bilateral sensorimotor areas. The cortex localization of the channels was obtained by using a 3-dimentional digitizer and the NIRS-SPM software [49,50]. The oxygenated (HbO) and the deoxygenated (HbR) signals were calculated with the modified Beer-Lambert law [5], with differential pathlength factor (DPF) of 7.25 and 6.38 for 785 nm and 830 nm, respectively [51].
All the fNIRS signals were firstly detrended by fitting the first and second order polynomial functions to remove the linear and bilinear trends [27]. Then the global physiological noise was removed by using the proposed method. For the task activation calculation, the general linear model (GLM) approach was used. The regressor was made by convolving the block design with the canonical HRF. The group-level activation t-map was derived by conducting a one sample t-test on all individual β values [47,52]. For the resting-state functional connectivity (RSFC) calculation, the spontaneous neural activity was extracted by using a 0.01-0.08 Hz band-pass filter [53]. The seed channels (the sensorimotor area, left channel 5 and 9) were determined according to the task activation results. The average time course of the two seed channels was used as the seed time course to calculate the Pearson's correlation coefficients with the time course of each channel. The Fisher's z-transform was applied to the Pearson's correlation coefficients to increase the Gaussianility [54]. Then the group-level RSFC t-map was derived by using the one sample t-test [27]. For comparison, the task activation map and the RSFC map were also calculated without denoising. The receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) [55] were used to evaluate the performance of our method in detecting the task activation and the RSFC of the sensorimotor area. The sensorimotor template derived from the cortex localization was used as the ground truth to draw the ROC curve [56]. By using varied k values, we also explored the influence of the k value selection on the denoising performance in the real fNIRS data set. Figure 3 illustrates the recovered neural signal (the red line) and the corresponding ground truth signal (the black line) of a randomly selected ROI channel for the task-and resting-state data set, respectively, using k of 50%. The Pearson's correlation coefficient between the reconstructed signal and the ground truth signal was 0.60 for the task-state and 0.43 for the resting-state, respectively. The results showed that the recovered signal well reproduced the ground truth signal (Fig. 3). The high-frequency oscillation in the recovered signal was the residual Gauss white noise which can be easily removed by low-pass filtering. The denoising results showed that when k was greater than or equal to the ratio of the number of ROI channels to the number of total channels (20% in our simulation experiment), the recovered signal can well reproduce the ground truth signal (Fig. 4). However, when k is less than this ratio, the recovered signal cannot reproduce the ground truth signal. These results suggested that the globality threshold k should at least be the ratio of the area of the interest to the area of the total measurements to avoid damaging the interested neural activity. Fig. 4. The denoising results of different globality threshold k in the simulation experiment. The vertical coordinate was the p-value of the regression. The horizontal coordinate was the globality parameter k, represented by a ratio (the ratio means for the currently denoising channel, with how many channels of the other 65 channels a time-frequency component co-varies should it be labeled as contaminated in the present study). Note that when k was less than 20% (13/65, the number of the ROI channels divided by the number of all channels, except the currently denoising ROI channel), the recovered signal cannot significantly reproduce the ground truth signal (the dashed line indicates the significance level of 0.05).

Real experiment
The activation maps (Fig. 5, left panel) and the RSFC maps (Fig. 5, right panel) derived from the HbO signal with using our method to remove the global physiological noise (Fig. 5, second row) showed more consistency with the sensorimotor template (the first row) compared to those maps without using our method (Fig. 5, third row). ROC curve analysis showed that higher AUC indices in both the task activation map and the RSFC map derived from the corrected data using our method than those from the uncorrected data (0.81 vs. 0.78 for task activation, and 0.82 vs. 0.71 for RSFC; Fig. 6). Fig. 5. The task activation maps and the RSFC maps. The left column and the right column respectively show the group-level task activation results and the RSFC results, derived from the corrected HbO data (the second row) and the uncorrected HbO data (the third row), with the globality threshold k = 50%. The first row shows the sensorimotor template. The denoising analyses with varied threshold k showed that the best performance of our WT-based method was at k = 50% in the present data set (Fig. 7). When k < 50%, performance of the WT-based method decreased as a function of the k value. This result was consistent with that of the simulation experiment. As we predicted in the theoretical analysis, when k > 50%, the performance of the algorithm decreased as well. This may be because the physiological noise is spatially heterogeneous, therefore an over-estimated globality threshold may be too strict for noise detection. Similar result could also be found on the HbR data (see Appendix).

Discussion and conclusion
The current study developed a wavelet transform based method for fNIRS global physiological noise removal. The method consisted of two parts. The first part was the temporal-frequency global noise detection based on the wavelet transform coherence, and the second part was the signal decomposition and noise removal based on the wavelet transform and reconstruction. The results from the simulation experiment and the real experiment showed that the method can effectively remove the global physiological noise in fNIRS signal and improve the spatial specificity of the task activation and the resting-state functional connectivity.
Our method has several advantages. Firstly, compared with the traditional frequency-based filtering, the exact frequency distribution of the noise is unnecessary to know for our method because it adequately utilizes the global distribution property of the noise. The globally co-varying time-frequency components in the signal can be detected by using the wavelet transform coherence. Moreover, our method can also deal with the potential time variance of the noise based on the time-frequency analysis ability of the wavelet transform. Secondly, our method can deal with the time-lag of the physiological noise among different brain regions. The global physiological noise shows a significant spatially spreading pattern. Tong and Frederick found that the whole circulatory process of the low-frequency physiological signal averagely takes about 6 s spreading over the head [57]. And a recent study made by Zhang et al. [21] also showed that the physiological interference in superficial layers is highly consistent in a local brain region, but the consistency is significantly lower between distant brain regions. These findings suggested that the physiological noise is not time-aligned among different fNIRS channels, which may bring difficulty to the blind source separation methods such as ICA and PCA when extracting a common physiological noise component from all channels. In our method, the wavelet transform coherence can capture the co-varying characteristics between two signals with time delay between them and thus can solve this problem. Thirdly, our method is completely data-driven. It is independent of any extra hardware to record the physiological noise or manually picking up the noise component. It is simple to use and can be applied to most of the multi-channel fNIRS apparatus. These advantages make our method a promising preprocessing tool for fNIRS studies.
Our method also has some limitations. For example, the current method is based on utilizing the global property of the physiological noise. The globality threshold k plays a key role in the method. But the selection of k based on a-priori knowledge of the expected size of the ROI also relies on rich user experience. On one hand, selecting a too small k will reduce the specificity of the noise detection, might even lose the signal of the neural activity. According to the theoretical analysis and the experimental results, k should not be less than the ratio of the area of the interested functional region to the area of the total measurement. Therefore, the area of the ROI should be estimated as accurately as possible. This is relatively easy when the ROI is clearly determined. However, when the ROI is not explicit, for example, in some exploratory studies, the k value should be selected with caution. In this situation, a larger k may reduce the risk of removing the interested neural activity. On the other hand, a too large k will reduce the sensitivity of the noise detection. In the present study, the results from the real data suggested that a k of 50% can give the best performances for both task activation and RSFC detections. The results also indicated that higher k values (from 60% to 90%) could decrease the performance. However, one can still benefit from the method even if higher k values are used when compared to nonuse. In addition, the task activation AUC is relatively independent of k above 50% when compared to the RSFC AUC. This may be because the RSFC derived from HbO signal is more sensitive to the global physiological noise. Further studies are still needed to explore the optimal k value selection for different experiment types. Another limitation is that the current method can only be applied to off-line data. In the future, we plan to develop the on-line version of this method for real-time applications such as the brain-computer interface and the neurofeedback.
In conclusion, the present study proposed a promising method for fNIRS global physiological noise removal which has unique advantages. Our method can be applied to both task-based and resting-state fNIRS data, and its effectiveness was validated by using both simulation experiment and real experiment. Our method can be complementary to the sSD and the BSS methods to remove the global physiological noise, which represents a necessary evolution in fNIRS signal processing. Fig. 8. The task activation maps and the RSFC maps of HbR data. The left column and the right column respectively show the group-level task activation results and the RSFC results, derived from the corrected HbR data (the second row) and the uncorrected HbR data (the third row), with the globality threshold k = 50%. The first row shows the sensorimotor template. Fig. 9. The receiver operating characteristic (ROC) curves of HbR data. The left and the right panel show the ROC curve of the task activation map and the RSFC map, respectively (globality threshold k = 50%). The red line indicates the ROC curve of the map derived from the corrected HbR data, and the blue line indicates the ROC curve of the map derived from the uncorrected HbR data.