Compressed Data Collection Method for Wireless Sensor Networks Based on Optimized Dictionary Updating Learning

Wireless sensor networks (WSNs) is composed of a large number of tiny sensors. These energy-constrained sensors are deployed in a variety of environments to collect data such as temperature, humidity, and light intensity. Therefore, how to suppress the impact of environmental noise on the collection accuracy and extend the lifetime of WSNs is one of the prominent issues. This article proposes an optimized dictionary updating learning-based compressed data collection algorithm (ODUL-CDC) to suppress the impact of environmental noise on the accuracy of WSNs data collection and extend the life cycle of WSNs. The proposed algorithm uses the dictionary learning method to obtain a sparse dictionary by learning from the training data. The collection error caused by environmental noise is positively correlated with the degree of self-coherence of the sparse dictionary. Therefore, the self-coherence penalty term is introduced during the dictionary updating process, which can reduce the over-fitting of the training data in the dictionary learning process. Moreover, the self-coherence penalty term endows the learned sparse dictionary with a low-self-coherence structure. Experimental and simulation results show that, as compared with discrete cosine transform(DCT), K-SVD and IDL learning-based data collection methods, the proposed algorithm exhibits the highest increase in recovery accuracy of 3.2% in the signal-to-noise ratio (SNR) range of 30–50 dB, the sampling ratio range of 25%-40% and the sparsity range from 3 to 30. Furthermore, the energy consumption is significantly less than that of the compared methods, which helps improve the network lifetime.


I. INTRODUCTION
Wireless sensor networks (WSNs) are composed of large-scale and self-organized sensor nodes that are capable of sensing, data storage, and communication [1]. WSNs have numerous applications, such as in industrial automation [2], [3], smart cities [4], traffic network [5], military reconnaissance [6], and the measurement of humidity or temperature environmental data [7]. Although sensor nodes have the ability to collect data, transmit data, store data and simply process data, these capabilities are relatively weak. Therefore, the communication distance of each sensor node is very limited in the actual deployment. In the longdistance data transmission, it needs to rely on the forwarding The associate editor coordinating the review of this manuscript and approving it for publication was Noor Zaman . of adjacent sensor nodes, that is, multi hop transmission. Therefore, in practical applications, sensor nodes not only need to transmit their own data, but also need to undertake the task of data forwarding of neighboring nodes. The ability of a single sensor node is very limited, so it needs to cooperate with other nodes to achieve the corresponding functions and complete the given tasks. Because it is difficult to recharge or replace the limited power supply of ordinary nodes, the development of energy-efficient data collection methods is becoming crucial [8].
In the typical WSNs data collection process, the sensor nodes periodically perceive the physical environment information and send the collected sensor data to the base station node via multi-hop forwarding. Sensor nodes not only transmit their own data, but also forward the data of other nodes. The nodes closer to the base station participate in more forwarding, and are more likely to fail due to energy exhaustion. This has become a bottleneck problem that affects the life cycle of WSNs.
In recent years, the new information processing theory called compressed sensing (CS) [9]- [11] has provided a novel idea to solve the energy bottleneck problem in WSNs data collection. In the CS data collection method, data compression and collection are integrated into a single process, and the high computational and communication burden is transferred to the base station. Finally, in the sink node, incomplete data are reconstructed through various complex reconstruction algorithms.
Bajwa et al. [12] first proposed a WSNs data collection scheme based on CS theory, which transmits analog projection signal of sensing data through synchronous amplitude modulation to reduce data collection delay and energy loss. However, this scheme transmits analog signals and requires strict time synchronization between nodes, so it is not widely used. Haupt et al. [13] also proposed a WSNs data collection method based on CS theory. This method emphasizes the management of CS coding process, and proposes a coding mechanism based on random gossip, which enables WSNs to use CS to store and recover data on multiple nodes, rather than limited to one sink node. Similar to the method proposed by [12], this method also uses synchronous amplitude modulation to transmit analog projection signal of sensing data. Luo et al. [14] proposed compressed data gathering (CDG). CDG performs CS compression coding in the multi hop path of WSNs, which reduces the amount of sensor data transmission and allocates the communication cost to each node equally. However, due to the introduction of compression coding mechanism, the amount of data transmission of the terminal node increases. On the contrary, the CDG method is not as efficient as the original data collection method, when the sensor data is not very sparse. In order to solve this problem. Luo et al. [15] proposed a hybrid CS solution from another perspective: the end node still uses the original data transmission, and the intermediate node judges whether to use CS coding transmission or traditional transmission according to the amount of data transmission. Wu et al. [16] proposed a data collection method for WSNs based on CS theory. This method uses a sparse measurement matrix, which contains only one non-zero element in each row. It can effectively reduce the number of nodes participating in each CS measurement. It can effectively reduce the number of nodes participating in each CS measurement. In other words, it can reduce the amount of data transmission and prolong the network lifetime. Leinonen et al. [17] proposed a data collection method based on sequence compressed sensing, which adopted a processing framework based on sliding window. The convergence node can recover sensor data from CS measurement sequence efficiently, which greatly reduces the decoding delay. A recursive CS recovery algorithm is specially designed to make full use of the estimated value of forward coding. The regularized weighted method is used to solve the data to be recovered, which not only reduces the amount of data transmission, but also improves the recovery accuracy of data.
These methods use fixed sparse dictionaries. However, the application scenarios of WSNs are diversified, and the sparse characteristics of sensor data are different under different application scenarios [18]. Fixed sparse dictionaries lack the ability to adapt to diverse scenarios, which limits the application scope of compressed data collection. In addition, WSNs are usually deployed in complex environments in which a large amount of environmental noise will inevitably affect the accuracy of data collection. Determining how to optimize the design of the compressed data collection system to reduce the impact of environmental noise is therefore a problem worth considering.
This paper presents an optimized dictionary updating learning-based compressed data collection (ODUL-CDC) algorithm. The main contribution of the proposed method are as following: 1. We develop a dictionary learning method to obtain a dictionary with better sparse representation ability. 2. We introduce the self-coherence penalty term to reduce the over-fitting of the sparse dictionary. 3. Experimental and simulation results demonstrate the superiority of our method in both sparse representation ability and energy saving.
The remainder of this paper is organized as follows. Section II illustrates the research status of data collection methods in WSNs. The proposed method is described in detail in section III. Section IV uses 50 nodes to test the accuracy of data collection. The proposed method is compared with DCT, K-SVD and IDL algorithms at sampling ratio ranges from 10% to 40%, SNR ranges from 20dB to 50dB and sparsity ranges from 3 to 30. Section V uses 1000 nodes in MATLAB software to simulate the network energy consumption. The proposed method is compared with DCT, K-SVD and IDL algorithms when the number of successful reconstructions is from 40 to 160. Section VI is about conclusion and future work.

II. LITERATURE REVIEW
In order to improve the efficiency of data collection in WSNs, many methods have been proposed. Jiang et al. [19] proposed a trust based energy efficient data collection algorithm. The algorithm realizes the data acquisition of large-scale Internet of things system. In [20], a new energy-based heuristic maximum coverage small lifetime (MCSL) giving high priority to the sensors with maximum residual battery life and covering a minimum of uncovered target and avoiding redundant covering of critical target has been proposed. And experimental results clearly state that the proposed algorithm performs better in terms of network lifetime in all the scenarios (i.e., varying sensors, and targets). Menaria et al. [21] proposed a novel fault tolerance approach named node-link failure fault tolerance model (NLFFT Model) in WSN, to handle the faults that occur either by link or node failure during VOLUME 8, 2020 data transmission from the sensor to the sink or base station. The proposed algorithm improves the performance of WSN in terms of end-to-end delay and power consumption. Li et al. [22] proposed a Data Collection scheme based on Denoising Autoencoder (DCDA). In the data training phase, a Denoising Auto Encoder (DAE) is trained to compute the data measurement matrix and the data reconstruction matrix using the historical sensed data, which makes the data collection more energy-efficient. Velmani and Kaarthick [23] proposed a Velocity Energy-efficient and Link-aware Cluster-Tree (VELCT) scheme for data collection in WSNs, which would effectively mitigate the problems of coverage distance, mobility, delay, traffic, tree intensity, and end-to-end connection. Abdulaziz and Simon [24] proposed Multi-channel network Coding Clustering (MuCC) for mobile data gathering within challenging wireless environments. MuCC is novel in its usage of both multi-channel and network coding techniques to improve both throughput and reliability. Wu and Tseng [25]developed an efficient distributed wake-up scheduling scheme for data collection in a sensor network that achieves both energy conservation and low reporting latency. Ang et al. [26] proposed analytical approaches to determine the node energy consumption for large-scale wireless sensor networks mobile data collector schemes and gave models for determining the optimal number of clusters for minimizing the energy consumption. Li et al. [27] investigated a novel optimal scheduling strategy, called EHMDP, aiming to minimize data packet loss from a network of sensor nodes in terms of the nodes' energy consumption and data queue state information. Cohen et al. [28] designed and analyzed a data collection protocol based on information theoretic principles. It provided a simple codebook construction with very simple encoding and decoding procedures. The data collection protocol considers information security without energy consumption. In [29] a time-division-multiple-accessbased energy consumption balancing algorithm is proposed for the general k-hop WSNs, where one data packet is collected in one cycle. The proposed algorithm has good performance in terms of energy efficiency and timeslots (TS) scheduling. Caione et al. [30] proposed a fully distributed method: each node autonomously takes a decision about the compression and forwarding scheme to minimize the number of packets to transmit. An enhanced version of the algorithm is also introduced to take into account the energy spent in compression. Wei et al. [31] proposed a distributed clustering algorithm, Energy-efficient Clustering (EC), that determines suitable cluster sizes depending on the hop distance to the data sink, while achieving approximate equalization of node lifetimes and reduced energy consumption levels. Li et al. [32] proposed a novel scheduling optimization problem for energy harvesting mobile sensor network, that maximizes the amount of collected data under the constraints of radio link quality and energy harvesting efficiency, while ensuring a fair data reception. Kang et al. [33] proposed a delay-efficient traffic adaptive (DETA) scheme for collecting data from sensor nodes with minimum energy consumption.
The DETA scheme minimizes data collection delay by constructing delay-efficient, collision-free schedule, and by using a special mechanism to enable every node to self-adapt with the changes of data traffic.
Most of the methods developed in previous WSNs CS data collection research use a fixed orthogonal basis to sparsely represent the sensor signal. Chen et al. [34] used discrete cosine transform (DCT) as the sparse representation basis of sensor signals. Xiang et al. [35] used diffusion wavelets as sparse representation bases for sensor signals. Wu et al. [36] proposed to use the difference matrix as the sparse representation basis of sensor signals via the use of the sparsity of the sensor sample values in adjacent time intervals.
However, due to the diversity of WSNs application scenarios, different scenarios have different sparse characteristics, and detailed expression is difficult with a fixed sparse basis. Moreover, the use of a fixed sparse basis requires additional human intervention. For different application scenarios, an appropriate sparse basis must be selected in advance.
Luo et al. [37] proposed the use of a variety of fixed sparse representation bases to improve the sparse representation ability of sensor signals. In other words, in this method, multiple orthogonal sparse bases are combined into an over-complete redundant basis. However, this method still depends on the advanced determination of several fixed sparse bases, and it lacks the ability to adapt to various scenarios. Quern et al. [38] proposed an adaptive construction method for sparse representation bases based on principal component analysis (PCA). First, the historical data are taken as the sample matrix, and the covariance matrix of the sample matrix is then diagonalized. Finally, the eigenvector matrix obtained by diagonalization is used as the sparse representation basis of the sensor signal. However, this construction method can easily over-fit historical data, which leads to the poor sparse representation ability for real data excluding historical data.
To adapt to diverse and dynamic signals, a dictionary is learned from a group of training signals. The goal is to obtain a dictionary that can use a few atoms to decompose the signals. The method of optimal directions (MOD) [39] and K-SVD [40] are two well-known traditional algorithms for learning a dictionary that leads to a much more compact representation. Alsheikh et al. [41] used a dictionary learning algorithm to adaptively construct the sparse representation basis for sensor signals. Before WSNs starts the collection of normal compressed data, a portion of the uncompressed sensor data is transmitted as the training data for a dictionary learning algorithm.
Duarte-Carvajalino and Sapiro [42] proposed a method by which to simultaneously learn the dictionary and optimize the sampling matrix. This method is based on the minimal mutual coherence between the dictionary and the projection matrix. Kumar and Rajawat [43] presented a dictionary learning framework for fingerprinting indoor locations.
Nevertheless, these existing methods do not consider environmental noise. Therefore, this paper presents an 205126 VOLUME 8, 2020 optimized dictionary updating learning-based compressed data collection (ODUL-CDC) algorithm. The sparse dictionary is learned from the training data to improve the sparse representation ability of sensor data in different application scenarios. The reconstruction error caused by environmental noise is positively correlated with the self-coherence of the learned dictionary. Thus, the self-coherence of the learned dictionary is added as a penalty term during the dictionary updating step. The self-coherence penalty term can reduce the over-fitting of the training data in the dictionary learning process.

III. THE ODUL-CDC ALGORITHM
In this section, we propose ODUL-CDC algorithm. In this algorithm, dictionary learning is used to adaptively construct the sparse dictionary in compressed data collection. The sparse dictionary is learned from the training data to improve the ability of sparse representation for sensor data in different application scenarios. Furthermore, the selfcoherence penalty term introduced by ODUL-CDC algorithm can reduce the over fitting of training data in the process of dictionary learning, which can improve the ability of sparse representation. By introducing self-coherence penalty term, ODL-CDC algorithm makes the learned sparse dictionary have low self-coherence structure, which can effectively suppress the impact of environmental noise on the accuracy of data collection. The main contents of the algorithm are presented as follows.

A. COMPRESSED SENSING DATA COLLECTION MODEL
A typical compressed sensing data collection model is illustrated in Figure 1. It is assumed that there are N nodes (excluding sink nodes) on a multi-hop transmission path. Each node senses the environment to obtain sensor data, and x j represents the sensor data collected by node S i . In the compressed sensing data collection model, each node no longer directly transmits its own sensor data, nor directly forwards the sensor data of other nodes; instead, it transmits the weighted sum of multiple sensor data [15]. For example, node S 1 sends M weighted data ϕ 1 x 1 to its parent node S 2 , where ϕ 1 = {φ 11 , φ 21 , · · · , φ M 1 } T is the random measurement vector of node S 1 . After receiving these data, node S 2 adds the sensor data collected by itself to obtain the new weighted data ϕ 1 x 1 + ϕ 2 x 2 , and sends the data to the parent node S 3 . Then, the sink node receives M weighted data by repeating the operation. This is expressed by an M -dimensional column vectory as follows.
It is expressed in matrix form as where ∈ R M ×N is the measurement matrix and x ∈ R N is the sensor data vector. If x is sparse in some sparse dictionary D ∈ R N ×K , where K is the atomic number of the dictionary and K ≥ N , then x is expressed as where vector θ = [θ 1 , θ 2 , · · · , θ N ] T is the corresponding sparse coefficients of x such that θ 0 = K N . The orthonormal basis can be constructed from various bases, e.g., DCT, wavelets, curvelets, etc.
Equation (2) can then be written in the following form: where A = D is called the sensing matrix. Because the number of rows in matrix A is less than the number of columns, Eq. (4) is an underdetermined system of linear equations. If there are no other additional conditions, Eq. (4) will yield infinite solutions. CS theory points out that when θ satisfies the sparsity assumption and the sensing matrix A satisfies certain conditions, the unique exact solution of θ can be obtained from Eq. (4).
To solve the problem of which conditions are satisfied for Eq. (4) to obtain the unique exact solution, Candes and Tao first proposed the restricted isometry property (RIP) [27].
The matrix A is said to satisfy the RIP with order S. Equation (4) will thus have a unique exact solution.
If the condition of CS is satisfied, the estimated value of the sparsity coefficient can be obtained by solving the following optimization problem.
The original signal can then be recovered by using the formulax =Dθ.
The objective function in Eq. (6) is an l 0 -norm function. Because the l 0 -norm function is non-convex, Eq. (6) is a nonconvex optimization. The global optimal solution of Eq. (6) is an NP-hard problem, and it cannot be solved in polynomial time. Various greedy algorithms have been proposed to obtain VOLUME 8, 2020 an approximate solution, and include orthogonal matching pursuit (OMP) [45], regularized OMP (ROMP) [46], and compressive sampling matching pursuit (CoSaMP) [47].
Fortunately, this problem is equivalent to the following l 1 minimization problem under certain conditions. Thus, the recovery can be obtained using linear programming (LP) techniques to search for resolution of the following.
B. DICTIONARY LEARNING METHOD The sparse dictionary used in the existing WSNs data collection methods based on the compressed data collection model, such as DCT, is a fixed orthogonal matrix. However, the deployment environment of WSNs is complex and changeable, and sensor data in different application scenarios will have different sparse characteristics. Therefore, it is difficult to describe the sensor data with a fixed dictionary. For this reason, it is necessary to use a dictionary learning method to learn corresponding sparse dictionaries from different types of sensor data, which can improve the ability of CS data collection to adapt to diverse application scenarios.
Here, x i L i=1 is used as a set of training data for dictionary learning, where x i ∈ R N represents a data vector and L represent the amount of training data. Thus, the data matrix X ∈ R N ×L , X = x 1 , x 2 , . . . , x L can be yielded. The general form of conventional dictionary learning methods can be then reformed as where · F represents the matrix Frobenius norm, D ∈ R N ×K denotes the sparse redundant dictionary, and C ∈ R K ×L denotes the sparse matrix.

C. INFLUENCE OF ENVIRONMENTAL NOISE ON COLLECTION ACCURACY
In actual situations, environmental background noise will interfere with the collection of compressed data, which will make the measurement data contain noise components. From equations (2), (3) and (4), we can see that the noise is mixed with the original data x. In the process of data collection, firstly, we sparse represent the mixed noise data in the dictionary. Then the compressed sensing theory is used to collect data. Finally, the original data is restored by the reconstruction algorithm is. In the process, the greater the degree of self-coherence of the dictionary, the smaller the accuracy of data collection. In order to solve this problem, a self-coherence penalty item is introduced in this section to make the learned sparse dictionary have a low self-coherence structure, which can effectively suppress the impact of environmental noise on the accuracy of data collection.
Here, e is used as the random Gaussian noise vector. The mean square error (MSE) is used to estimate the performance of reconstructing the sparse random vector θ: whereθ denotes an estimated value of θ and E θ,e (·) denotes the mathematical expectation concerning the joint distribution of the random vectors θ and e. The well-known oracle estimator assumes that the positions of nonzero entries in the sparse vector θ are known as a set ⊂ {1, 2, · · · , N }. Thus, Eq. (4) has the following form: where I denotes the matrix obtained by only preserving the corresponding columns of the identity matrix of the set , and θ ∈ R S denotes the vector obtained by deleting the set of entries from the set . The formulation of the oracle estimator is then given as follows [48]: where Tr (·) denotes the trace of the matrix.
To reduce the MSE caused by environmental noise, Tr I T D T T DI −1 in Eq. (11) is considered to be as small as possible. The term Tr I T D T T DI −1 is positively correlated to the self-coherence of the sparse dictionary. Therefore, D T D − I 2 F is introduced to dictionary learning as a self-coherence penalty item.

D. OPTIMIZED DICTIONARY UPDATING LEARNING
Considering the preceding discussion, the cost objective function of the final form of the ODUL-CDC problem is where λ is the trade-off factor. ODUL-CDC is solved in a two-step iterative approach, which alternates between sparse coding and dictionary updating procedures.

1) SPARSE CODING
In the sparse approximation stage, the dictionary D is fixed, and the sparse representation matrix C of the training data X in the dictionary D is solved.

2) DICTIONARY UPDATING
In the dictionary updating stage, matrix C is fixed and the new dictionary D is solved according to the following optimization problem.
205128 VOLUME 8, 2020 Equation (14) is evidently a convex problem, which must have a total optimal solution. Let Because ∇F (D) is a convex function, when F (D) reaches the minimum value, there must be ∇F (D) = 0, that is, The following equation can be obtained from Eq. (16).
Equation (17) is used as the approximate estimation of the optimal solution of optimization problem (14), that is, the dictionary is updated with the following formula during the iteration process: where D k+1 is the dictionary updated in the (K + 1)-th iteration. After obtaining the new dictionary D k+1 , the columns of D k+1 should be normalized in turn. Finally, the pseudo-code of the ODL-CDG algorithm is shown in Algorithm 1.

E. STEPS OF THE ODUL-CDC ALGORITHM
The ODUL-CDC algorithm consists of five steps. 1) Employing collection tree protocol (CTP) to construct a tree network topology with sink node as root node. In the process of tree topology construction, node i records Ci as the total number of nodes in the subtree whose root node is node i. For example, in the example in Figure 2, the total number of subtree nodes of node 6 is 5.
2) The sink node constructs a seed which can generate random number, and sends the seed to each node in the network in the form of broadcast. After receiving the seed, combined with its own node ID number, the node generates a series of pseudo-random numbers as the random measurement vector of the node. 3) According to the random seed, each node calculates the random measurement vector of all nodes in its subtree. 4) This step is the dictionary learning stage. The node adopts the non compression transmission mode, that is, the compression ratio of data transmission is 1. The measurement times M is equal to the number of nodes N . When the sink node collects enough sensor data, the sensor data can be used as the training set. The sparse dictionary D is obtained by using the dictionary learning algorithm proposed in previous section. 5) This step is compressed data collection. The nodes transmit data by compressed sensing theory shown in Figure 2. The sink node will finally receive the weighted sum vector as shown in formula (2). The original data of all sensors can be recovered by OMP algorithm.
The flowchart of the proposed ODUL-CDC algorithm is illustrated in Figure 3.

IV. EXPERIMENT AND ANALYSIS
To verify the effectiveness of the proposed ODUL-CDC algorithm, simulation experiments were carried out on the WSNs experimental platform developed by the Beihang Sensor Network and Instrument Laboratory [49]. The sensor nodes and node distribution used in the experiment are respectively presented in Figures 4 and 5.
The experimental data used in the simulation experiment were temperature and humidity data recorded by 50 sensor nodes from November 5, 2019, to December 3, 2019. The recording interval was 10 min, and each node recorded 4000 points of temperature data and 4000 points of humidity data. In chronological order, the first 400 data points were divided into training sets, and the last 3600 data points were divided into test sets. The experimental process was divided into three steps: 1) The ODUL algorithm was used to learn the sparse dictionary D on the training data set; 2) The sparse representation ability of dictionary D was evaluated on the test data set; 3) The compressed data collection algorithm was run on the test data set, and the data collection accuracy was evaluated under different environmental noise intensities.

A. DATA COLLECTION ACCURACY
A fixed DCT matrix, the dictionary learned by K-SVD, and the dictionary learned by IDL [50] were selected as comparison objects. The specific parameters used in the experiment are reported in Table 1. The measurement matrix used for the compressed data collection was a Gaussian random matrix, and the environmental noise was artificially simulated Gaussian random noise. To reduce the impact of the randomness of the experiment, each group of experiments was repeated 50 times, and the average value was taken as the final experimental result.
The accuracy of data collection (ADC) is defined as follows.
As presented in Table 2 and Figure 6, the signal-to-noise ratio (SNR) was 40 dB, and the sampling ratio ranged from 10% to 40%. As can be seen from the figure, ODUL-CDC exhibited poor performance when the sampling ratio was low. However, when the sampling ratio was high, the ODUL-CDC dictionary outperformed the DCT, K-SVD, and IDL dictionaries in terms of the relative reconstruction error. The fixed dictionary, DCT, was the worst case, as the DCT dictionary could not sparsely represent synthetic data of varying diversity using a fixed structure. In comparison, the ODUL-CDC, K-SVD, and IDL dictionaries were better than   the DCT dictionary, as they could adapt to sparsely represent synthetic data via training.
Similar results can be observed in Table 3 and Figure 7. As was determined from the experiments, the results of ODUL-CDC were better than those of IDL, K-SVD, and DCT because the self-coherence penalty suppressed the influence of noise on the accuracy of data collection.  As presented in Table 4 and Figure 8, when the sampling ratio was 40% and the SNR was 40dB, the ODUL-CDC dictionary outperformed the DCT, K-SVD, and IDL dictionaries in terms of the accuracy of data collection. This because that the self-coherence penalty term introduced by ODUL-CDC can reduce the over fitting of training data in the process of dictionary learning, which can improve the ability of sparse representation. The experimental results demonstrate that the VOLUME 8, 2020  ODUL-CDC algorithm performed at least 3.2% better than the other three algorithms in terms of data collection accuracy when the sparsity ranged from 3 to 30, the SNR ranged from 30 to 50 and the sampling ratio ranged from 25% to 40%.

B. IMPACT OF THE TRADE-OFF FACTOR ON THE ODUL-CDC ALGORITHM
Note that the ODUL-CDC algorithm contains a variable parameter, namely the trade-off factor λ, the value of which is bound to affect the algorithm performance. For this reason, the influence of the value of λ on the performance of the ODUL-CDC algorithm was analyzed via simulation experiments. The experimental results reported in the previous section indicated that the performance of the ODUL-CDC algorithm on the temperature data set was very similar to that on the humidity data set. Therefore, the effect of λ on the performance of the ODUL-CDC algorithm was only tested on the temperature data set.
First, the simulation experiment was conducted to analyze the influence of λ on the sparse representation ability of the ODUL-CDC algorithm. In the simulation, the sparsity was fixed to 10, the value of λ was gradually increased from 0 to 0.4, and the sparse representation error of the dictionary trained by the ODUL-CDC algorithm on the temperature data set was tested. The simulation results are presented in Figure 9. In the initial stage, with the gradual increase of λ from 0, the sparse representation error of the ODUL-CDC algorithm gradually decreased. This is because the self-coherence penalty term introduced in the ODUL-CDC algorithm started to take effect with the increase of λ from 0, which reduced the overfitting of the training data in the dictionary learning process. However, when λ was greater than a certain value (such as λ > 0.1), the sparse representation error increased as λ increased. This is because, when the value of λ was too large, the penalty effect of the self-coherence penalty term was too strong, which inhibited the sparse representation ability of the dictionary for sensor data.
Then the influence of λ on the data collection error of the ODUL-CDC algorithm was then analyzed via simulation experiments. In the simulation, the SNR was fixed at 10, the value of λ was gradually increased from 0 to 0.4, and the error of the compressed data collection based on the ODUL-CDC algorithm on the temperature data set was tested. The simulation results are presented in Figure 10, and are very similar to the results displayed in Figure 9.
With the increase of λ from 0, the data collection error gradually decreased. However, compared with that in Figure 9, the curve in Figure 10 has a larger drop. This is because, as λ increased from 0, the sparse representation ability of the dictionary trained by the ODUL-CDC algorithm gradually increased. More importantly, the selfcoherence penalty term in the ODUL-CDC algorithm began to take effect, and the dictionary's ability to suppress environmental noise gradually increased. Therefore, under the superimposition of these two factors, the decrease in the data collection error was greater than the decrease in the sparse representation error. Similar to the results presented in Figure 9, when λ was greater than a certain value, the data collection error also increased as λ increased. This is because, when the value of λ was too large, the sparse representation ability of the ODUL-CDC algorithm deteriorated. In summary, the value of λ must be weighted for specific applications.

V. SIMULATION ON ENERGY CONSUMPTION AND ANALYSIS
To verify the effect of the ODUL-CDC-based WSNs data collection method on the improvement of the network survival time, MATLAB software [51] was used to simulate and analyze the proposed algorithm. In the simulation, nodes were randomly deployed in an area of 500 × 500 m, and the sink node was located in the center. Each node had an initial energy of 1 J. We assume that the distance between the receiving node and the sending node is d. other specific simulation parameters are shown in table 5. If d is less than the threshold d max , according to the free space attenuation model, the transmitting power of the sending node will exhibit a quadratic attenuation as the distance d increases. If d is greater than d max , according to the multi-path attenuation model, the transmitting power of the sending node will exhibit a fourth-power attenuation as the distance d increases. E T (k, d) represents the energy consumed by the node to send k bits of data, and E R (k)represents the energy consumed by the node to receive k bits of data. The following equations therefore hold: where E TX is the energy consumed by the sending circuit to send 1 bit of data, E RX is the energy consumed by the receiving circuit to receive 1 bit of data, and E Amp is the energy consumption of the transmission amplifier circuit. It was regarded as a successful reconstruction when the relative reconstruction error was less than 0.1. Table 6 and Figure 11 present the comparison of the energy  consumption of the ODUL-CDC algorithm and other dictionary learning-based data collection methods. Data collection methods based on dictionary learning need to initially access the entire data as training data, and then learn a new dictionary from the training data. Therefore, methods based on dictionary learning consume more energy in the initial stage of data collection. Thus, the stage after the dictionary learning was completed was chosen to compare the energy consumption. Table 6 and Figure 11 demonstrates that the ODUL-CDC algorithm achieved the best energy-saving effect, as the introduced self-coherence penalty dictionary effectively suppressed the reconstruction error of the algorithm. Therefore, under the same reconstruction accuracy, the data collection method based on ODUL-CDC should collect more CS measurement data than the other three algorithms.

VI. CONCLUSION AND FUTURE WORKS
This paper presented an energy-saving data collection algorithm for WSNs based on ODUL-CDC. The sparse dictionary is learned from the training data to improve the sparse representation ability of sensor data in different application scenarios. To reduce the recovery error caused by environmental noise, a self-coherence term of the learned dictionary is introduced as a penalty term during the dictionary updating stage. The introduced self-coherence term endows the ODUL-CDC algorithm with an improved sparse representation ability and noise suppression ability. The experimental results demonstrated that the ODUL-CDC algorithm exhibited the highest increase in recovery accuracy of 3.2% as compared to the DCT, K-SVD, and IDL methods when the SNR ranged from 30 to 50, the sampling ratio ranged from 25% to 40% and the sparsity range from 3 to 30. The simulation results showed that, compared with the data collection methods based on IDL, K-SVD, and DCT, the ODUL-CDC algorithm exhibited a significantly reduced energy consumption with the same recovery accuracy, thereby contributing to an improved network life.
Future, as the ODUL-CDC algorithm is only applied in a small environment with simple noise, it can be extended and applied in large-scale WSNs with more complex noise. In addition, other measurement matrices, such as the sparsest measurement matrices, can be used in this work to further reduce the energy consumption. He is currently a Professor with the School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, China. His main research interests include wireless sensor networks and instruments. VOLUME 8, 2020