Modularization Analysis of Brain Functional Network Using Fuzzy C-means Algorithm and Correlation in Resting State

The human brain is one of the most complex systems of nature, and neuronal cells are connected together by Synapses to form a complex brain structure network. The spontaneous activity and the excitement or inhibition of external stimuli in neuronal cells are also transmitted through synapses to other neurons, so that the brain can be coordinated with each other when the body undergoes physiological activities, such as languages, perception and emotion. In order to study the functional structure of the brain more clearly in resting state, researchers constructed brain functional network using Electroencephalography (EEG), Magneto Encephalography (MEG), Functional Magnetic Resonance Imaging (fMRI) and other technology. And the working mechanism in brain is researched combined with complex network analysis methods. Until now, many studies shown that the brain network has the characteristics of modular structure [1-3] found that the human cortical thickness network had an organizational pattern corresponding to brain functional modules (such as vision, language, etc.). It also found that the network can be divided into sub-function modules such as visual network, auditory network and default network [4]. With the deepening of the study, the modular structure has become a focus on researching the local functional structure of brain network.


Introduction
The human brain is one of the most complex systems of nature, and neuronal cells are connected together by Synapses to form a complex brain structure network. The spontaneous activity and the excitement or inhibition of external stimuli in neuronal cells are also transmitted through synapses to other neurons, so that the brain can be coordinated with each other when the body undergoes physiological activities, such as languages, perception and emotion. In order to study the functional structure of the brain more clearly in resting state, researchers constructed brain functional network using Electroencephalography (EEG), Magneto Encephalography (MEG), Functional Magnetic Resonance Imaging (fMRI) and other technology. And the working mechanism in brain is researched combined with complex network analysis methods. Until now, many studies shown that the brain network has the characteristics of modular structure [1][2][3] found that the human cortical thickness network had an organizational pattern corresponding to brain functional modules (such as vision, language, etc.). It also found that the network can be divided into sub-function modules such as visual network, auditory network and default network [4]. With the deepening of the study, the modular structure has become a focus on researching the local functional structure of brain network.
Nowadays, there are many algorithms for dividing brain networks into modules, such as Kernighan-Lin, GN, clustering and others [5,6]. For example, the components of brain were partitioned using a greedy algorithm and then it found that the visual module existed during the resting state [7]. On the basis of the Newman algorithm, researchers performed a module analysis of the brain functional network with young and old people in resting state [8]. Among them, clustering is an unsupervised machine learning method to partition a collection of multivariate data points into meaningful clusters, where all members within a cluster represent similar characteristics and two data points between different clusters are dissimilar to each other [9]. The similarity criterion for distinguishing the difference between data points is generally measured by distance [10][11][12]. Data points belong to the same group if they are much closer to each other, while they are evidently from different groups if the distance between them is distinctly large [13,14]. The fuzzy clustering determines the degree of each data point belonging to the same group for the degree of membership function. Which is soft partition, and it's different from other hard clustering methods to classify the data points to certain cluster.
The dynamic activity between different neurons or brain regions can be described intuitively through the brain function network, and the connection between nodes indicates the dynamic coordination between neural signals. In this paper, the fuzzy c-means clustering algorithm is used to partition the brain function network of patients with Parkinson and normal human. After the partition is completed, the module closely related to the patient is selected to study. Afterwards, the undirected network is constructed by correlation in module, and the functional structure of nodes and networks are analysed. Furthermore, the differences in activation of the specific brain region are studied through the statistical results of the Amplitude of Low Frequency Fluctuation (ALFF).

Materials and Methods
The brain fMRI data of patients and normal human were collected with Siemens 3.0 MAGNETOM Trio Tim. The scanning parameters of The main idea of the k-means is the minimization of an objective function, which is normally chosen to be the total distance between all patterns from their respective cluster centres. Its solution relies on an iterative scheme, which starts with arbitrarily chosen initial cluster memberships or centres. The distribution of objects among clusters and the updating of cluster centres are the two main steps of the k-means algorithm [15]. The algorithm alternates between these two steps until the value of the objective function cannot be reduced anymore.
FCM Clustering is a soft version of k-means, where each data point has a fuzzy degree of belonging to each cluster [16]. The FCM algorithm for vector set is a clustering technique that aims to partitioning a set of measured vector xi (i=1,2,...,n) into Gi (i=1,2,...,c) clusters, the main result is the minimization of an objective function J(U,G) with respect to a fuzzy partition matrix U and a set of prototypes G through cluster centre of each cluster.
Where d (xk-Gi) represents a universal distance function. Corresponding to the fuzzy partition, elements value of U=[uij]c×n is allowed 0 to 1. However, with the normalization rule, the membership of cluster is equal to one  Where m (1<m<∞) is the controller of fuzziness (e.g., m=1) means hard partition and m=∞ means completely fuzzy, the value of m is 2 without special requirements. Constructing a new objective function, it can find the necessary condition of the Eq. (5) reach to minimization.  The brain fMRI images are pre-processed by toolbox of SPM8 (Statistical parametric mapping) and REST (Resting-state fMRI data analysis toolkit) in Matlab. It includes slice time, realignment, normalization, smooth and filtering, the filtering rang is 0.01-0.08 Hz. Then, according to the AAL (Anatomical Automatic Labelling) partition template, the brain is partition into 90 brain regions, the left and the right brains are divided into 45 regions respectively, and match with the pre-processed fMRI images. Finally, the time series of brain regions are extracted from fMRI data by DPARSF (Data processing assistant for resting-state fMRI). In order to build the N × N (N=90) connectivity matrix C (Figure 1a), the correlation coefficient of the time series between any two brain regions is calculated. The correlation coefficient r is defined as: Where Xi and Yi represents the time series of node X and Y, X and Y represents the mean time series of node X and Y, respectively.
The threshold is selected to binarize the correlation matrix C, so we can get a binary adjacency matrix A (Figure 1b). Elements below the threshold are set to zero; surviving elements can either be set to one, as follows: Where τ represents the threshold. According to the matrix C, the brain functional network is partitioned into some reasonable modules using fuzzy c-means, and the undirected networks are constructed in module (Figure 1a and 1b).  Where λj (j=1,2,...,n) is Lagrange multiplier of the n constrained expressions of Eq. (4), and it takes the derivative of all input quantities in Eq. (6), the requirement for minimization of Eq. (5) is ||x j -G k || in this indicates the Euclidean distance between the jth vector and kth cluster center, the initial value of r is zero and G k is the centre of cluster k.
According to the above definition, FCM algorithm can be briefly described as follows: (1) Randomly select a set of c initial centres G.
Centrality is used to determine the role of each node, and the node with the largest centrality is hub in the module. Degree centrality measure the centre degree of node using node degree, and betweenness centrality measure the centre degree of node in module by information flow. If the module network has n nodes, the node degree of i v is w i , the out-degree and in-degree of i v is out The betweenness centrality of the node i v is defined as: Where σ jk represents the quantity of shortest path length from node v j to v k , jk σ (i) represent the quantity of shortest path length of passing through node v i . The greater of centrality, which indicates the function connection strength, is relatively strong, and the node is hub in the module.
The shortest path length can be used to analyse the transmission efficiency of information in network. It describes the internal structure of the network and plays an important role for information transmission. The shortest path length l ij from node v i to node v j is defined as the minimum number of edges to experience from v i to v j , the reciprocal 1/l ij is the efficiency from node v i to v j , denoted as C ij , then the efficiency of the module Gc is defined as: The shorter of shortest path length, so the faster of information transfer rate, and the efficiency of the network is relatively higher. Through the clustering coefficient of network, it is helpful to study the local characteristics of module. In general, a network with high clustering coefficients and shortest path length shows small world effect. In module G, if the node v i is connected to k i nodes, the maximum number of edges between k i nodes is k i (k i -1)/2, denoted as n i . Then the clustering coefficient C i of the node i i i C k n = is defined as: Where k i represents the node degree of vi, the average clustering coefficient C of the module is

Results
The matrix C is regarded as N=90 rows vector x i (i=1,2,..,90) or column vector, these vectors are classified into c groups with nonsimilarity index with Euclidean distance. In general, the value of c is much smaller than the number of samples, but it is greater than one. Because of the randomness of FCM, after several partitions, the brain function network is divided into relatively stable eight modules, as shown in Table 1.
The brain function network of the normal group was divided into the same eight modules as the patient, and the brain region in each  module is the same as the patient. It was found that the brain region of strong correlation with Parkinson's disease, such as Precental gyrus (PreCG), Caudate nucleus (CAU), Lenticular nucleus, putamen (PUT) and other regions are mostly distributed in modules 3, 5, 6 and module 7 [17,18]. Then, the Pearson correlation coefficient between any two brain regions is calculated. Due to the global threshold removes some of the connections, so the local threshold is chosen to binarize the correlation coefficient matrix. The threshold is selected from 0, and the step size is 0.05, and the network must be a connected network with no isolated nodes. Taking the module as a whole, the shortest path length of each module is analysed; we can understand the information transfer speed in the module, as shown in Figure 2.
As seen from Figure 2, when the local thresholds in modules 3, 5, 6, and 7 are greater than 0.3, 0.15, 0.16 and 0.25 respectively, the shortest path length does not exist in the four normal modules, and the network is a non-connected network. To ensure that the network is connected, the critical value is selected to binarize the local correlation coefficient matrix, and the thresholds are 0.3, 0.15, 0.16, and 0.25 respectively. The network structure is shown in Figures 3 and 4.
It is found that the shortest path length of patient's network is smaller than that of the normal human, which indicate that the transmission rate of the information in these modules is higher than that of the normal. When the shortest path length in each module of patient and normal person is analysed separately, the differences of information transmission rate can be understood from a smaller range. The shortest path length of normal module 3 is 2.5333, and in module 6, 5 and module 7, they are 2.1795, 1.8485 and 1.8095 respectively. Which indicates that the transmission rate of the information in module 3 is the slowest, and the transfer speed in other modules have different degree of increases. In module 7 of patient, the shortest path length is 1 and it is the smallest, so that there is the largest information transmission speed in the module. The shortest path length in module 3, 5 and 7 are 1.3778, 1.5758 and 1.5897 respectively, it indicates that the transmission rate is relatively smaller for these modules.
When the threshold is greater than the critical value, the shortest path length of patient still exist, it shows that the correlation in the same module between patient's nodes is generally lower than normal. The study also found that the critical values in modules 5 and 6 were smaller than those of modules 3 and 7, which indicates the overall correlation of modules 5 and 6 are less than that of the other two modules.
The clustering coefficient of the brain function network refers to the average of the clustering coefficients of all the nodes in the network. It can be seen as the grouping degree of the module, as shown in Figure 5.
As shown in Figure 5, the analysis of the network from patient and normal, it found that the clustering coefficients of normal are 0.5333, 0.5417, 0.5769 and 0.6489 respectively, and they are 1, 0.6195, 0.7366 and 1 of patient respectively, we can see that the degree of grouping of patient is higher than normal, it indicates that there is a higher connection density of patient network. Through the clustering coefficient in module 3 and 5, it can be found that the grouping level of patient is always greater than that of normal. When the network is not connected, the clustering coefficient in patient module 6 is less than normal in a certain threshold range, which indicates that the clustering coefficient of some nodes in this module of normal human is increased. It is also found that the down trend of clustering coefficient is in the same of normal and patient's module 7, this shows that there is no significant difference between patient and normal group. Therefore, the grouping degree of module 3 and module 7 is the largest and the same, and the other modules have a relatively small degree of grouping. When these modules are connected, the clustering coefficient of the patients is higher than that of the normal, and the shortest path length is smaller than normal human, indicating that the small world effect is more obvious.
According to the degree centrality, it is helpful to determine the functional connection strength and status of nodes in network. The greater of degree centrality, it means that the degree of effect of the node in the network is stronger, which is the hub for the network, as shown in Figure 6.
Through Figure 6, the node degree is the same in module 3 of patient. It shows that the function connection strength of these nodes is consistent. The node SFGdor.L and SFGdor.R have the largest node degree in module 3 of normal group, and they are the same, which indicates their functional connection strength is relativity stronger, and they are the hub in the network. In patients' module 5, the node IPL.L has the largest node degree, so the degree of effect is the largest, which is the hub. In the normal group module 5, node IPL.R, ANG.L, and ANG.R have the same node degree, it represents that the degree of effect of these nodes is in the same. In module 6, the node degrees of node SMG.L and CAU.L are the largest of normal and patient, respectively, which shows that they are the hub and have the larger functional connection strength. The node degree of INS.R, HIP.R, AMYG.R, PUT.L and PAL.R in normal module 7 are the same and have the maximum, so the degree of function connection of these nodes is the same, and the node degree of each node in patient module 7 is equal.
It is also found that there is a large change of node degree of some regions, which include PoCG.R in module 3, IPL.L and PCUN.R in module 5, CAU.L in module 6, ORBinf.R and MTG.L in module 7. The node degree of these nodes is increased, which indicates that the functional connection strength is enhanced, and there is an important role during the transition from normal to patient. In addition, the node degree in the patient module 3 and module 7 is greater than normal group, it represents that the functional connection strength in the two modules is greater than normal.

Discussion
Usually, the activation level of brain regions is studied using ALFF method, the size of the ALFF value corresponds to the strength of the Blood Oxygen Level Dependent (BOLD) signal. When the ALFF of one brain regions is increased, it indicates that the activity of neurons increases, the energy distribution is larger. On the contrary, if the low-frequency amplitude decreases, the neurons' activity is small, the energy distribution also decreases [19,20]. One sample t test is performed on the ALFF values of the patient group and the control group, so the deviation statistics (t value) of activation level between the brain of a region and the average value of the whole brain can be obtained. It shows the largest statistical difference between a region of the brain and the whole brain, as shown in Figure 7.
In Figure 7, the red areas represents that the activation level of the brain region is higher than the average value of the whole brain in resting state. By the distribution of the red regions, it can be seen that the activation region for normal person is significantly more than patient, which indicates that in the course of the transition from normal human to patient, the activity of some regions changed a lot. From the statistical results, we found that the activation level of some regions is decreased significantly [21][22][23][24][25], such as PCG.R, which indicates the functional connection in the region is abnormal, and the range of energy distribution is also quite different. Similarly, the two sample paired t test (P<0.05) is performed on the ALFF value of the patient group and the normal control group. The nodes with large change of node degree and the brain regions with strong correlation of patients were selected as a module, the statistical result as shown in Table 2, the network structure of different threshold as shown in Figure 8.      If the statistical value is greater than 0, it indicates the activation level of patient is higher than that of the normal. Otherwise, the activation level is lower than that of the normal. As seen from Table 2, the t value of node ORBinf.L, CAU.L, MTG.L, PUT, PAL and PCG.R are less than 0 in patients, which indicates that activation level of these regions are less than those in normal group, and the functional connection strength of patient is relatively small. The value of node PoCG.R, IPL.L, PCUN.R, PreCG and PCG.L are greater than 0, this shows that the activation of this part of the patient's [26][27][28][29][30] region is higher than the normal, and the functional connection strength is relatively large. As can be seen from Table 2, the Precuneus's statistical value is large, which shows that the degree of activation in the region has a greater degree of weakening of patient.
It can be seen from Figure 8, the node degree of ORBinf.L, PCG.L, PCG.R, PoCG.R, PCUN.R, PUT.L, PAL.L and node MTG.L in patients is greater than those of the normal person, which indicates the functional connection strength of these nodes is greater than controls. Furthermore, from the statistical results of the ALFF value, it can be seen that the statistical value of node ORBinf.L, PUT.L, PCG.R, PAL.L and MTG.L are less than 0, so we think that the degree of effect for these nodes is increased in the local area. Node degree of IPL.L and CAU.L node is less than normal, so the degree of functional effect is less than normal. However, the statistics value of node IPL.L is [31][32][33] greater than 0, which indicate that the function connection strength of this region is increased in whole brain, but the experimental results show that in the local area of decline. Similarly, the same is true for the four modules selected, such as in patient module 7, node degree of ORBinf.L is greater than normal, we can believe that the correlation is relatively weak between the region and other regions, while normal human is enhanced.
In this paper, the FCM algorithm was used to divide the brain function network into modules by the correlation coefficient. The undirected networks were analysed by correlation and ALFF. The information transfer rate and the degree of grouping of the network are analysed using the shortest path length and the clustering coefficient respectively. The function connection strength and status of each node are studied by the node degree. To understand the distribution activation level of brain regions, the method of ALFF is selected to analyse the difference between patient and normal. Finally, the results of the two groups were compared, due to the severity degree of the disease and randomness with other factors, the need for continuous follow-up analysis, resulting in ubiquitous results.