A Cooperative Spectrum Sensing Method Based on Empirical Mode Decomposition and Information Geometry in Complex Electromagnetic Environment

,


Introduction
Spectrum sensing is a key step in the cognitive radio (CR) technology [1,2].The main task is to quickly and accurately detect whether the primary user (PU) is using the spectrum.Traditional spectrum sensing methods include energy detection [3], matched filter detection [4,5], and cyclostationary feature detection [6].These methods all have shortcomings.For example, in the case of uncertain noise, the detection performance of energy detection degrades sharply.Matching filter detection needs to know the PU signal and noise information in advance.
In complex electromagnetic environments, noise uncertainty greatly impairs the performance of spectrum sensing [7,8].Therefore, it is necessary to reduce the noise and redundant information in the signals collected by the secondary users (SUs).The main challenge is that most of the signals collected by the SUs are nonstationary and nonlinear.To cope with that, traditional noise reduction methods are proposed, including short time Fourier transform, Wigner Ville distribution, and wavelet transform [9,10].However, these methods are all based on Fourier transforms, limited by the principle of uncertainty.Boudraa et al. proposed an empirical mode decomposition (EMD) algorithm, which can decompose the noise-infected signal into multiple intrinsic mode function (IMF) components [11].An EMD denoising method based on continuous mean square error criterion is proposed [12].Through analyzing the energy abrupt point Complexity between IMF components, the low-order IMF component before the mutation point is filtered out to achieve noise reduction.
With the rapid development of information geometry, the concept of statistical manifolds can be applied to the signal detection problems.These problems can be transformed into geometric problems on manifolds and then analyzed using geometric tools, so as to indirectly solve the signal detection problems.Liu et al. used information geometry for radar signal detection and proposed the matrix constant false alarm probability and a geodesic distance detector [13].Lu et al. applied the information geometry theory to spectrum sensing and obtained the close-form expression of the decision threshold using the matching methods, but the algorithm has higher complexity [14].Chen et al. added the manifold geometry measurement and obtained the decision threshold through simulation [15].However, all these methods need to derive and calculate the decision threshold, which is not only complicated but also inaccurate.
The application of machine learning in various fields has also attracted great interest of many researchers [16,17].Thilina et al. used K-means, the Gaussian mixture model (GMM) in unsupervised learning and neural network (NN), support vector machine (SVM) in supervised learning to study spectrum sensing [18].A spectrum sensing method based on signal energy is proposed, which uses K-means to classify these energy features.At the same time, the spectrum sensing performance of the algorithm under different channel models is further analyzed [19].A spectrum sensing method based on dominant features and maximum and minimum eigenvalues is proposed, and K-means and GMM are selected to form the unsupervised learning framework [20].Clustering algorithm and eigenvalue-based algorithms are also proposed.In the signal feature extraction phase, a feature extraction method is proposed to increase the number of cooperative SUs logically.Then, K-means and K-medoids are selected as the learning framework [21].
Based on the previous studies, this paper proposes a cooperative spectrum sensing method based on empirical mode decomposition and information geometry (EMDIGK), to further improve the spectrum sensing performance in complex electromagnetic environments.In the feature extraction phase, the EMD algorithm is first used to denoise the signals collected by the SUs, as to obtain the signal characteristics more accurately.We introduce the order decomposition and recombination (ODAR), interval decomposition and recombination (IDAR) to logically increase the number of SUs.Then, the covariance matrix of the split and recombined signal matrix are calculated separately, and then the covariance matrix is mapped onto the manifolds using information geometry theory, and the geodesic distance on the manifold is calculated and used as the signal feature.Then, the Kmedoids clustering algorithm is used in the training and spectrum sensing phase.The EMDIGK method proposed in this paper does not need to obtain the signal information of the communication PU in advance, so it is easy to be obtained for the labeled training data.In the experimental part, we verified the efficacy of EMDIGK through simulation.The experimental results show that EMDIGK successfully improves the spectrum sensing performance in complex electromagnetic environment.
For ease of reference, the symbols and notations used in this paper are summarized in Table 1.

The Basic System Model of Cooperative Spectrum Sensing
In the cognitive radio network (CRN), from the perspective of a single SU, the existence of the PU signal can be defined as a binary hypothesis as follows, where  0 indicates that the PU signal does not exist and  1 indicates that the PU signal exists.
() represents the signal transmitted by the PU and () represents the Gaussian noise in the environment, and  denotes the number of sampling points.Based on (1), the detection probability (  ) and the false alarm probability (  ) of the system can be defined as ( 2) and (3), respectively.
For a proposed algorithm, if   remains unchanged, a larger   indicates better performance.Similarly, a smaller   is desired when   is fixed.
In CR system, spectrum sensing is usually performed in a complex environment, and then single SU sensing is often affected by multipath loss, shadows, and hidden terminals, resulting in performance degradation of the entire system [22].In CRN, the fusion center (FC) collects the information collected by all SUs involved in cooperative sensing and then makes a final judgment based on this information [23][24][25][26].The cooperative spectrum sensing system model is shown in Figure 1.

Feature Extraction Based on EMD and Information Geometry
3.1.Feature Extraction Model.In a complex environment, it is first necessary to estimate the noise after EMD denoising to establish a representative reference point.The process of feature extraction based on empirical modal decomposition and information geometry is shown in Figure 2. To make the reference point more representative, we first collect enough noise signals, and then perform the EMD noise reduction, DAR, and covariance conversion (as shown in the dashed box in Figure 2).Then the Riemann mean solution method is used to solve the Riemann mean of these covariance matrices and is used as the reference point.Similarly, the signal matrix to be collected is also transformed by EMD, split recombination, and covariance matrix.Finally, the geodesic distances of these covariance matrices from the reference point are calculated and used as statistical features of the signal.

Empirical Modal Decomposition.
Assuming the signal acquired by the -th SU is   = [  (1),   (2), . . .,   ()], and the aim is to reduce the noise and redundant information in the collected signal, thereby improving the overall sensing system performance.Firstly, the EMD is used to reduce the noise of the signal collected by the SU.EMD can adaptively decompose any complex signal into a series of IMFs, shown as follows.
where   represents the residual.The specific steps of the EMD decomposition algorithm are listed as follows.
Step 1. Find all local maxima and local minima of signal   .
Step 2. The cubic spline interpolation method is used to fit all local maxima and local minima, respectively, to construct the maximum envelope    () and the minimum envelope    () and then calculate the average of    () and    ().Therefore, the average value  1 () can be obtained as The u-th eigenvalues of the matrix

𝑃
The number of environmental noise signal matrices R   , R

𝑝
The p-th noise signal covariance matrix after ODAR and IDAR Distance between the sensing signal and reference point R The feature extracted under the channel of the unknown PU state Step 3. By subtracting  1 () from the original signal   , the first component ℎ 1 () can be obtained as Check whether ℎ 1 () satisfies the conditions of the IMF.If so, continue to Step 4; otherwise, redo Steps 1 and 2 for ℎ 1 () to obtain the envelope value  11 () and then obtain ℎ 11 () as Execute sequentially until the -th step ℎ 1 () satisfies the conditions of the IMF, then Step 4. Subtracte   from the  1 component, resulting in the first residual  1 () as Treat  1 () as the original signal, and repeat Steps 1∼ 4 above, to obtain  2 (), and so on, until   () becomes a monotonic function or constant.
According to the characteristics of EMD, after the original signal is decomposed by EMD, several IMF components and one residual are obtained.The main component of the low-order IMF component is the high-frequency part of the signal, which contains sharp signals and noise.The highorder IMF component has less noise components, mainly the low-frequency part of the signal.According to this feature, several IMF components in the low frequency band can be selected to reconstruct the original signal to achieve the purpose of noise reduction.The reconstructed signal is shown in (11).
In the process of selecting the IMF component of the low frequency band, there must be a certain critical   component, so that the IMF component after the component is useful for the main part.Therefore, the ultimate goal of EMD decomposition is to find the   component of the criticality.A continuous mean square error criterion is proposed in [10], shown as follows.
Therefore, the signal after the EMD processing can be obtained as After all the SUs collected signal   after the EMD noise reduction, we can get a new matrix X = [x 1 , x2 , . . .,x  ], where x = [x  (1), x (2), . . ., x ()] represents the signal acquired by the -th SU after EMD.X is an  ×  matrix.
where  ( > 0) is the split parameter and  is the length of the split signal vector after splitting.The covariance matrices R  and R  are then calculated using Y ODAR and Y IDAR , shown in 3.4.Information Geometry Theory.In information geometry theory, consider a set of probability density functions ( | ), where  is an  dimensional sample associated with a random variable Ζ,  ∈ Ζ ⊆   . is an  dimensional parameter vector,  ∈ Θ ⊆   .Therefore, the probability distribution space can be described by parameter set Θ.The probability distribution function family  is as shown in R is the covariance matrix composed of the signals collected by the SUs, shown as where X denotes the signal matrix described above,  denotes the number of samples, and X  denotes the transposition of X.We can parameterize the probability distribution family by R as where  × is the open set of the  ×  dimensional vector space.According to the information geometry theory,  forms a microscopic manifold structure, which is called a statistical manifold.R is the coordinate of the manifold.Since the parameters of the manifold are covariance matrices, they can also be called matrix manifolds.Under the two hypotheses  0 and  1 , we can obtain two types of covariance matrices, R  and R  + R  .At the same time, R  and R  + R  correspond to two types of coordinates on the manifold.As the SNR increases, the difference between the two hypothetical sensing signals becomes larger, corresponding to an increase in the distance between the two points on the matrix manifold [14].In the following, based on this, we will extract the signal characteristics.

Geodesic Distance.
In information geometry, the distance between two probability distributions on a statistical manifold can be measured in a variety of ways.On the manifold, the geodesic distance is a widely accepted method for measuring the distance between two probability distributions.Due to the nature of the manifold curvature, the distance between the two points on the manifold depends on the choice of the curve between the two points.A curve that minimizes the distance between two points is defined as a geodesic, and the corresponding distance is called a geodesic distance.
Assuming any two points  1 and  2 on the manifold, the arbitrary curve between the two points is () ( 1 ≤  ≤  2 ), ( 1 ) =  1 , ( 2 ) =  2 .Then the curve distance between points  1 and  2 is expressed as [28]: where G() is the Fisher information matrix in the information geometry for statistical metrics on the manifold.
When the minimum value is obtained using (22), i.e., the minimum distance between  1 and  2 , the geodesic distance is obtained, and the corresponding curve is the geodesic.
In the cognitive radio spectrum sensing technology, the sensing signal matrix is a multivariate Gaussian distribution family with the same mean but different covariance matrices [29,30].Assume having the covariance matrices R 1 and R 2 , the geodesic distance [31] between them is as shown in where   is the -th eigenvalues of the matrix

Riemann Mean.
We perform ODAR and IDAR on the  environmental noise matrices to obtain R   ( = 1, 2, . . ., ) and R   ( = 1, 2, . . ., ) and then find their Riemann mean.The objective function Φ is shown in where D(⋅, ⋅) is the geodesic distance between two points on the manifold.The matrices R  and R  (which minimize the value of the objective function Φ(⋅)) are Riemann mean [32], which can be expressed as Assume that when  = 2, there are two points R 1 and R 2 on the manifold, and its Riemann mean R is located at the midpoint of the geodesic of the two points R 1 and R 2 on the manifold.R is calculated using When there are  points ( > 2), the Riemann mean is difficult to calculate.In this case, [31,33] have proposed the iterative calculation of the Riemann mean R using the gradient descent algorithm, shown in where  is the iteration step size and V is the iteration step.According to the above description, we can get the Riemann mean The Riemann means R  and R  are selected as reference points, and the signal matrix to be collected is also transformed by EMD, DAR, and covariance matrix to obtain R  and R  .Finally, the geodesic distances from R  to R  and R  to R  are calculated separately, and the distance is taken as the statistical characteristic of the signal.
According to (24), we can calculate the corresponding geodesic distance as where  1 and  2 represent the distance between the sensing signal and reference point R  and R  on the manifold, respectively.If the PU signal does not exist, then the distance from the reference points is small.On the contrary, if the PU signal exists, then the distance is large.According to such feature, we can intuitively analyze whether the PU is using the licensed spectrum.In order to facilitate the training of the clustering algorithm framework and test the spectrum sensing performance, a feature vector capable of reflecting the signal characteristics is needed.Therefore, the two geodesic  1 and  2 constitute a geodesic distance feature vector (GDFV

Cooperative Spectrum Sensing Based on K-Medoids Clustering
After extracting the features, we use the K-medoids clustering algorithm to perform spectrum sensing.The traditional spectrum sensing method needs to derive the threshold value to determine whether the PU exists.Nevertheless, these methods often have problems, such as inaccurate thresholds and difficulties in calculation.EMDIGK uses the K-medoids clustering algorithm as the learning framework.This algorithm is similar to the K-means clustering algorithm.Unlike K-means, the K-medoids clustering algorithm does not calculate the mean of the samples as the centroid point in the class.Instead, the actual sample point in the class is taken as the center point, and the center point has the smallest distance to all sample points in the class.Compared with the centroid point in K-means, the center point in Kmedoids has the advantage of being less affected by extreme values [34].The K-medoids algorithm is less sensitive to noise points, so that the outliers will not cause excessive deviation of the results of the division.The whole process is divided into two phases, the training and the spectrum sensing phase.
The cooperative spectrum sensing system model based on Kmedoids is shown in Figure 3.
Before training, we need to prepare a training set D: where D  is the feature vector extracted by the method described in the third section, and  represents the number of feature vectors in the training set D. Let C  denote the set of the training feature vectors belonging to class , where  = 1, 2, . . ., .
Class C  has a centroid   .The training process is as follows.
Step 1. Input the training sample set D and the number of clusters .
Step 3. Calculate the distance of each sample to each centroid and put it into the nearest class.
Step 4. Update   with   = arg min Step In (35), D represents the feature extracted under the channel of the unknown PU state.If ( D) > , it means the PU signal exists, and thus the channel is not available.Otherwise, ( D) <  means the PU signal does not exist, and the channel can be used.Parameter  is used to control the false alarm probability and the missed detection probability in the spectrum sensing process.
The overall flow of the EMDIGK algorithm proposed in this paper is as shown in Algorithm 1.

Simulation Results and Performance Analysis
In this section, we used the multicomponent signal () = cos() + cos(4 + 0.2 2 ) as the experimental simulation signal.
To ensure the accuracy of the experiment, we obtained 2,000 signal features, of which 1,000 were used as the training set and the other 1,000 were used as the test set.The test set is used to verify the spectrum sensing performance of the proposed method.We compare the detection performance under different sensing methods, and the experimental results suggest that the proposed EMDIGK-based method can obtain a better sensing performance.[14], and ED is a method using energy characteristics as a statistic.Tables 2 and 3 show the detection probabilities of the methods under different SNR when the false alarm probability is a constant.We observe that the EMDIGK method has higher detection probability than other methods.It can be calculated that the detection probability of EMDIGK is 36.6%,38.5%, 56.4%, 61.6%, and 136.0%higher than other methods under the condition of  = −14 dB and   = 0.1.That is because the EMDIGK method reduces the noise in the communication signal, thereby reducing the impact of noise on the spectrum sensing system.At the same time, the feature of the information geometry method can be used to analyze the feature values in different states more intuitively, so that a better detection performance can be obtained.

The ROC Curve with Different Number of Cooperative SUs.
Figure 8 shows the ROC curve of the EMDIGK method with different SU numbers.The number of sampling points  = 1000.When  = −18 dB, it shows that the performance is better with the increase of the number of cooperative SUs.This is because cooperative spectrum sensing can reduce the interference of various factors such as multipath fading and shadow in the propagation environment.Therefore, with the increase of the cooperative SU, the anti-interference ability of the system is stronger and the performance is better.

5.4.
The ROC Curve with Different Sampling Points.Figure 9 shows the ROC curve with different sampling points for the EMDIGK method under  = −18 dB.The number of cooperative SUs  = 2.As the number of sampling points increases, the information of the sensing signal is more complete, so the extracted features are more representative.As a result, from the figure we can see that when the number of sampling points increases, the detection performance of the EMDIGK method is also improved.

Conclusions
This paper proposes an EMDIGK method to effectively improve the spectrum sensing performance in a complex electromagnetic environment.In terms of feature extraction, firstly, the EMD algorithm is used to denoise the signals collected by the SUs, thereby reducing the impact of noise on subsequent sensing process.Further, in order to analyze this kind of problem more intuitively and simply, we adopt the information geometry theory.The signal after EMD is mapped into the statistical manifold, thereby transforming the signal detection problem into a geometric problem.Then, the corresponding geometric tools are used to extract the signal features as statistical features.Finally, the K-medoids clustering algorithm is used as a learning framework to construct a spectrum-aware classifier to achieve spectrum sensing.In the experiments, the performance of the EMDIGK algorithm is verified and analyzed, which shows its superiority over other popular methods.In our future work, we will study the combination cyclostationary feature detection with clustering algorithms and information geometry.In future work, research could be extended in the area of how to reduce the complexity of the algorithm to achieve spectrum sensing more efficiently.

Figure 1 :d 1 Geodesic: d 2 Figure 2 :
Figure 1: The basic system model of cooperative spectrum sensing.

Figure 3 :
Figure 3: System model based on the K-medoids clustering algorithm.

Figure 8 :Figure 9 :
Figure 8: The ROC curve with different numbers of SUs under  = −18 dB.
after EMD processing X A new  ×  matrix consisting of x ()  The split parameter  The length of the split signal vector after splitting Y ODAR , Y IDAR A matrix after ODAR or IDAR R  , R  Covariance matrix of Y ODAR or Y IDAR   is an  dimensional sample   is an  dimensional parameter vector Ζ A random variable Ζ Θ Θ is the probability distribution space R Covariance matrix corresponding to X R  , R  + R  Covariance matrix corresponding to X under  0 and  1  A point on the manifold () Arbitrary curve between the two points [27].Decomposition and Recombination.After the EMD processing, the signal matrix after EMD denoising is processed using ODAR and IDAR, respectively[27], as to logically increase the number of cooperative SUs and further acquire the characteristic information of the signal.Let Y ODAR and Y IDAR denote the signal matrix after ODAR and IDAR, respectively.Both matrices are  × .Y ODAR and Y IDAR are defined as follows.

Table 2 :
The   of different methods in  = −14 dB.
Effect of Clustering Algorithm.Figures4 and 5show the effect of GFV before and after clustering by K-medoids at  = −14 dB.It can be seen that under such condition, after the K-medoids clustering algorithm, the signal characteristics in two different states can be well distinguished.The red ' * 's indicate that the PU exists, while the blue 'x's represent the PU does not.The black circle and triangle in the figure represent the centroid of the noise class and the centroid of the containing PU signal class, respectively.

Table 3 :
The   of different methods in  = −16 dB.