Spectral clustering-based resting-state network detection approach for functional near-infrared spectroscopy

: In recent years, studying the resting-state network (RSN) by using functional near-infrared spectroscopy (fNIRS) has received increased attention. The previous resting-state fNIRS studies mainly adopted the seed-based correlation and the independent component analysis to detect RSN. However, these methods have several inherent problems. For example, the seed-based correlation method relies on seed region selection and neglects the interactions among multiple regions. The ICA method usually relies on manual component selection, which requires rich experience from the experimenter. In the present study, we developed a new approach for fNIRS-RSN detection based on spectral clustering. It consists of two steps. First, it calculates the individual-level partition of the fNIRS measurement region by using spectral clustering with an automatically determined cluster number. Second, the individual-level partitioning results are further clustered. Those clusters with high group consistency are determined as RSN clusters. We validated the method by using simulated data and in vivo fNIRS data. The results showed that the proposed method was eﬀective and robust for fNIRS-RSN detection.


Introduction
Human brain is intrinsically organized into networks that consist of multiple spatially remote brain regions whose neural activities are temporally coordinated even during the resting state [1,2]. The resting-state network (RSN) is believed to serve as the foundation of many basic and critical cognitive functions [3][4][5]. Extensive evidences also suggest the relation between RSN disorders and various neurological and psychiatric diseases [6]. Typically, the researchers use functional magnetic resonance imaging (fMRI) to study RSN [7]. However, in recent years, studying RSN by using functional near-infrared spectroscopy (fNIRS) received increasing attentions [8][9][10][11][12][13][14][15][16][17][18]. Compared with fMRI, fNIRS has many unique advantages such as low-cost, portable, comfortable and insensitive to head motion, making it very suitable for resting-state studies, especially on specific participant groups such as infants [19].
In fNIRS studies, the seed-based correlation [20][21][22] and the independent component analysis (ICA) [23,24] are widely used for RSN detection. The seed-based correlation is the most classical approach for RSN detection. To find the RSN of a functional system, first, a seed region in this functional system is predefined by using structural and/or functional localizing information. Then the correlation or the regression is used to calculate the resting-state functional connectivity (RSFC) between every channel and the seed. Those brain regions showing significant RSFC with the seed region are determined as an RSN. Although the seed correlation approach is widely-used, it has several inherent shortcomings. The first shortcoming is that it only considers the relationship between other brain regions and the seed region, but ignores the interactions among those regions. The second shortcoming is that the results usually depend on the seed selection [23].
ICA is one of the blind source separation (BSS) algorithms. It can decompose fNIRS signals into a series of statistically independent source components. Then the RSN component can be picked out according to its frequency characteristic and spatial pattern. As a data-driven approach, the ICA-based fNIRS-RSN detection does not rely on seed selection, and can separate noise from neural activity signal. However, the component selection procedure of ICA is usually under human's supervision or relies on manual selection, which requires rich experience from the experimenter and may bring subjective influence to the results.
Therefore, it is of practical value to develop new fNIRS-RSN detection methods that rely less on prior information and human intervention. A direct idea is to partition the brain into clusters according to the resting-state neural activity similarity among different regions, and the brain regions partitioned in the same cluster may constitute an RSN [21,25]. To this end, the present study developed a novel data-driven approach for fNIRS-RSN detection based on spectral clustering. Spectral clustering is a cutting-edge algorithm for partitioning data into different clusters based on spectral graph theory. Compared with traditional clustering algorithms such as k-means, spectral clustering does not need to presume any distribution or prototype of the data, and thus becomes one of the most popular modern clustering algorithms [26]. Our approach is two-step. First, in the individual-level, it partitions all the fNIRS measurement channels into different clusters by using a normalized cut-based spectral clustering method with automatically determined cluster number. Second, it further partitions the individual-level clustering results into different clusters. Those clusters with high group consistency are determined as RSN clusters. This two-step procedure can detect RSN in the absence of prior information and human intervention. We validated the method by using both simulated data and in vivo fNIRS data.

Spectral clustering and normalized cut
Spectral clustering is based on graph partitioning. For resting-state fNIRS data, a graph G can be defined by a set of vertices V = {v 1 , v 2 , · · · , v n } corresponds to the n measurement channels and a set of edges E = {e ij |i, j = 1, 2, · · · , n} corresponds to the RSFC between each channel dyad (i, j). This graph can be represented by an n × n weight matrix W whose entity w ij is the RSFC value between channel i and j, and w ij = 0 when i = j. The graph's degree matrix D is a The normalized cut [22] is a popular graph partitioning method for spectral clustering. Let (α, β) be a two-way partition of G (i.e., α ∪ β = V and α ∩ β = ∅), the normalized cut method determines the partition by minimizing the loss function where Vol(α) = cut criterion simultaneously minimizes the between-cluster disassociation and maximizes the within-cluster association, therefore can avoid the tendency of separating out small cluster containing only a few points.
By introducing an indicator vector x for partition (α, β), whose element satisfies the normalized cut can be written as According to [27], minimizing Eq. (4) is equivalent to minimizing the Rayleigh quotient with the condition that y is piecewise constant and y T d = 0. It can be achieved by solving a standard eigenvalue problem ofL In the two-way case, the partition is completed by splitting the smallest nontrivial eigenvector ofL. For k-way partition, first let U ∈ R n×k be the matrix consisted of k eigenvectors that correspond to the k smallest eigenvalues ofL as columns, and then treat each row of U as a point and cluster the n points using k-means clustering. Finally, assign the original points in V to the same clusters as their corresponding rows in U.

Automatic cluster number estimation
The original normalized cut method needs the number of clusters as input parameter. In the present study, we adopt the cluster number estimation method proposed by Kong et al. [28] to automatically determine this parameter. Suppose the data intrinsically has m clusters, and U ∈ R n×k is the matrix consisted of k eigenvectors generated by the eigen-solver, let z i be the vector of the i-th row of U. According to [28], when if k = m, there will be , v i and v j belong to the same cluster, 0, v i and v j belong to different clusters.
Else, if k<m or k>m, cos(z i , z j ) will either be deviated from 1 when v i and v j belong to the same cluster or deviated from 0 when v i and v j belong to different clusters. Utilizing this, the intrinsic cluster number m can be estimated by k which makes cos(z i , z j ) maximally concentrating on 1 or 0 under the corresponding condition. In actual calculation, we can let k increases from 2 to n, and generate U ∈ R n×k for each k. let z i be the vector of the i-th row of U and compute matrix C with its entity c ij = cos(z i , z j ). Then, we binarize C to B = b ij with a small positive number ε (e.g., ε = 0.05): Let s be the sum of all the elements of matrix B The cluster number estimation can be obtained by sequentially checking the first minimal k that satisfies s k ≤ s k−1 and s k ≤ s k+1 .

Resting-state network detection
We use a two-step clustering procedure to detect RSN from the fNIRS data. Generally, first, we conduct an individual-level normalized cut clustering for every participant to partition the fNIRS channels into clusters according to the RSFC among channels. The cluster number parameter is determined automatically by using the above introduced automatic estimation algorithm. Second, we further conduct a group-level clustering to partition all the participants' individual-level cluster maps into different group-level clusters. Those group-level clusters with high group consistency are determined as RSN clusters. The flow chart of our approach is illustrated in Fig. 1. Specifically, in the individual-level, for the i-th participant of total M participants, we first calculate the Pearson's correlation coefficient between every pair of channels to construct the RSFC matrix. Then we use the normalized cut to partition all the channels into K i clusters where the cluster number K i is automatically estimated and participant-specific. Each cluster α i k (i = 1, 2, · · · , M; k = 1, 2, · · · , K i ) can be represented as a cluster map In the group-level, we first determine the group-level cluster number K group . K group is calculated as the mode of {K i } (i = 1, 2, · · · , M) (i.e., the individual-level cluster number that appears most often for all the participants). Secondly, we pool the individual-level cluster maps of every participant to get {map i k }, and then partition {map i k } into K group clusters {ξ i } (i = 1, 2, · · · , K group ) by using the normalized cut according to the between-map similarity (i.e., the Pearson's correlation coefficient between each pair of two maps in {map i k }). In this way, a resultant group-level cluster ξ i contains similar individual maps from different participants (if two or more individual maps in ξ i come from a same participant, we view them as sub-partitions of a same component and merge them to form one map). The group-level cluster in which the individual map patterns are highly consistent across participants is identified as an RSN component. To this end, we calculate the group consistency map of every ξ i by averaging all the individual maps in it Then we test the value of the group consistency map of ξ i channel-by-channel against a Binomial distribution B(M, 0.5) which corresponds to the null-situation of randomly assigning a channel of every participant to be in or out of a cluster. The obtained p-value is corrected for multiple comparison of n channels. Those significant channels in a same group-level cluster ξ i constitute an RSN.

Simulation experiment
In the simulation experiment, we simulated a 196-channel fNIRS measurement (14 rows by 14 columns) in which different number (from one to four) of RSN clusters were added. Each RSN cluster size was 5×5. The time course g(t) in each RSN channel was the linear superposition of a synthetic resting-state neural activity signal h(t) and a white Gaussian noise n(t) [29]. The time course in non-RSN channel was a pure white Gaussian noise. The synthetic resting-state neural activity signal was a combination of a series of random phase sinusoidal functions with frequency range from 0.01 Hz to 0.1 Hz, stepped by 0.001 Hz, and their amplitudes were subject to 1/f distribution [30]. The resting-state signals were identical within a same RSN cluster but different across clusters. The signal-to-noise ratio (SNR, defined as |h(t)| 2 |n(t)| 2 ) was set to 0.05 [31]. The length of the time course was 300 s, with a sampling rate of 10 Hz. For each condition, the data was generated for 100 times to simulate individual difference. We adopted the average goodness-of-fit index defined as GOF = to measure the performance, where n is the number of RSNs, C i is the set of channels in the i-th RSN andĈ i is the set of significant channels in the corresponding RSN found by our method. The operator | · | counts the element number of a set. The GOF index is in the range of [0, 1]. If the detected RSNs exactly match the real ones, GOF is 1. While if the method does not detect any real RSNs, the GOF is 0.
Moreover, we also tested the influence of different signal-to-noise ratio (SNR), different RSN volume, and the unbalanced RSN volume ratio on the performance of the method (for details, see Appendix).

In vivo experiment
We also used real resting-state fNIRS data to validate our method. The data contained forty healthy right-handed college student participants (21.7 ± 2.5 years of age, 22 males and 18 female). The participants underwent a 10-minute resting-state scan in which they were instructed to keep still with their eyes closed and relax their mind. This resting-state data was used to extract the RSN. In addition, the participants also performed a 5.6-minute sequential bilateral finger-tapping task scan containing seven task blocks with a pseudo-randomized block length of 20-30 s. This task scan data was only used for functionally localizing the seed channels for the seed-correlation analysis. Informed consent was obtained from all the participants. The study protocol was approved by the Institutional Review Board at Shenzhen Key Laboratory of Affective and Social Neuroscience, Shenzhen University. The data was from a previously published study for other purpose [32].
The fNIRS measurement was conducted by using a NIRScout continuous wave fNIRS system (NIRx Medical Technologies, USA). The absorptions of the near-infrared lights at two wavelengths (785 nm and 830 nm) were measured with a sampling rate of 7.8125 Hz. Two 4 ×4 probe sets were used in this study with each probe set consisting of eight laser sources and eight detectors, forming 24 measurement channels (48 channels in total). The source-detector distance was 30 mm. The two probe sets were respectively placed on the head with their center at C3 and C4 of the international 10-20 system to cover the bilateral sensorimotor areas. The cortex localization of the channels was obtained by using a 3-dimentional digitizer and the NIRS-SPM software [33,34] to form a sensorimotor template. The sensorimotor area in the template was defined as the channels localized within the bilateral precentral and postcentral gyri [21]. The oxygenated (HbO) and the deoxygenated (HbR) signals were calculated with the modified Beer-Lambert law [35], with differential pathlength factor (DPF) of 7.25 and 6.38 for 785 nm and 830 nm, respectively [36].
In the present study, we used the proposed spectral clustering-based RSN detection method to detect the sensorimotor network which was well reported in the previous resting-state fNIRS studies [9,21,23]. First, a 0.01-0.08 Hz band-pass filter was applied to the resting-state HbO and HbR data, respectively, to extract the spontaneous neural activity [30,37]. Then the individual-level partition maps were calculated based on the functional connectivity graph, and were further clustered to form the group consistency maps. We also used the traditional seed-based correlation method and compared the results derived from these two methods. The seed channels were determined according to the task activation results of the fingger-tapping task data (the sensorimotor area, left channel 5 and 9). The average time course of the two seed channels was used as the seed time course to calculate the Pearson's correlation coefficients with the time course of each channel. The Fisher's z-transform was applied to the Pearson's correlation coefficients to increase the Gaussianility [38]. Then the group-level RSFC t-map was derived by using the one sample t-test [21]. We compared the performance of the two methods in detecting the sensorimotor RSN by using the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) [9]. The sensorimotor template derived from the channel localization was used as the ground truth to draw the ROC curve [32]. Figure 2 illustrated the results of the simulated data containing different RSN number. The results suggested that the method could perfectly find all the RSN clusters (GOF = 1 in all conditions. p < 0.05, Bonferroni-corrected). When the ground truth RSN number was less than or equal to three, the method partitioned the data into four clusters, including both the RSN components and the noise components. When the ground truth RSN number was four, the method found four RSN components and one noise component. Figure 3 illustrated the in vivo experiment results derived from the HbO data. The individual-level cluster numbers were in the range from two to five (Fig. 3(C)). The mode was four, and was used in the group-level analysis. Figure 3(A) illustrated the group-level consistency maps of the four group-level components. One out of the four component (ξ 1 ) had channels that passed the significance test (Channel 5,8,9,12,13,16,17,20 in the left hemisphere and Channel 30,33,36,37,40,43 in the right hemisphere. p < 0.05, Bonferroni-corrected). Its group-level consistency map was highly specific to the predefined sensorimotor template (Fig. 3(B)), with a GOF index of 0.63. While the other three components (ξ 2 , ξ 3 and ξ 4 ) contained no significant channels. We further analyzed the consistency between the individual-level cluster maps in ξ 1 (see Fig. 3(D) for an example) and the sensorimotor template by calculating the GOF index. Figure 3(E) illustrated the distribution of the GOF indices for all the participants. The mean GOF was 0.34 across participant, with the standard deviation of 0.08. Similar but weaker results were found on the HbR data (see Appendix). Figure 4(A) illustrated the comparison between the RSN maps derived from the spectral clustering method and the seed-based correlation method, respectively. Compared with the result derived from the seed-based correlation method which is obviously lateralized to the seed hemisphere (the second row), the result derived from the spectral clustering-based method (the first row) showed more symmetric pattern of the bilateral sensorimotor network. Figure 4(B) showed the ROC curve analysis result. The spectral clustering-based method showed higher AUC index (AUC = 0.87, the red line) than the seed-correlation method (AUC = 0.73, the blue line).

Discussion and conclusion
The present study proposed a spectral clustering-based fNIRS-RSN detection approach which relies little prior information and human intervention, and conducted both simulation and in vivo fNIRS experiments for validation. On the one hand, our method showed good performance on the simulated data with different RSN number (Fig. 2). On the other hand, the RSN detected from the real fNIRS data showed good consistency with the sensorimotor template (Fig. 3). The ROC curve showed that our method has relatively high sensitivity and specificity (AUC = 0.87, Fig. 4). Taken together, these results suggested that the proposed method is effective for fNIRS-RSN detection.
Our approach has several advantages. First, this method is fully data-driven. It can be used when the prior information for seed selection is unavailable or the localizer task is hard to be performed, for example, when studying the RSN development of infants and young children [39]. Second, this method is simple to use. It can automatically calculate the cluster number and determine the significant RSN map without any human intervention, therefore does not require rich experience from the experimenter. Third, the result derived from our method would not be biased by the seed selection or the manual component selection procedure, therefore can provide more objective results.
Technically, the present study is based on the spectral clustering algorithm. It is theoretically more robust to the data distribution than the traditional clustering method such as k-means clustering [28] which was used in fNIRS studies to detect task activation [40] and RSFC [21]. Moreover, several previous fMRI studies also adopted the spectral clustering method for brain's subunit parcellation [29,41] and RSN detection [25]. Compared with these studies, our approach has several features. First, we adopt an automatic estimation algorithm to determine the cluster number parameter of the spectral clustering. Second, we use two-stage clustering to calculate group-level clusters as RSN candidates. Third, we use a significance test-based method to automatically determine RSN component from the group-level clusters. These features may be also helpful to improve the methods used in fMRI.
When using this approach, two issues should be noted. First, compared with ICA which can separate all kinds of noise components, our approach is more sensitive to the global physiological noise such as the superficial tissue physiological noise and the systemic physiological fluctuation. These global noise components may cause fake high correlations in the RSFC matrix and influence the spectral clustering results. Therefore, we suggest that the data should be preprocessed with global physiological noise removal methods (e.g., [32]). Second, the RSN component determination in our method is essentially based on looking for public connectivity clusters for all the participants. However, it requires careful identification to discriminate whether these public connectivity clusters represent real, meaningful RSNs. The results should be interpreted with caution, especially when prior information (e.g., anatomy template or functional activation result) is absent.
The present study also has some limitations and can be further improved in the future. For example, in our validation experiments, we did not account for the autocorrelation in the fNIRS time-series and the systemic physiological noise, both of which may affect the correlation matrix, and therefore affect the results. The measurement area of the in vivo fNIRS data used in the present study was small, covering only one RSN (the sensorimotor network). The performance of our method on larger measurement area such as whole-head needs further validation. Moreover, the current method can only calculate functional connectivity networks. In the future, we can extend it to calculate effective connectivity networks. In addition, other spectral clustering algorithms (e.g., average cut and modularity detection [29]) can be tested to optimize the performance of the method. It is also necessary to develop stricter statistical test method for group-level cluster significance determination.
In summary, the present study proposed a full-automatic, data-driven and model-free approach for fNIRS-RSN detection. both simulation experiment and in vivo experiment results validated its effectiveness and robustness. The new approach could be complementary to the seed-based correlation and the ICA methods, providing new tools for resting-state fNIRS studies. The goodness-of-fit indices derived from different SNR data. When SNR was below 0.02, the method cannot detect any RSN. When SNR was between 0.02 and 0.04, the goodness-of-fit index increased rapidly. When SNR was larger than or equal to 0.04, the method can detect all the RSNs perfectly.     9. comparison between the spectral clustering method and the seed-based correlation method for HbR data. (A) The group-level RSN maps derived from the spectral clusteringbased method (the first row) and the seed-based correlation method (the second row), respectively. The third row shows the sensorimotor template. (B) The receiver operating characteristic (ROC) curve. The red line corresponds to the ROC curve of the spectral clustering-based method, and the blue line corresponds to the ROC curve of the seed-based correlation method.