An Adaptive Genetic Algorithm of Adjusting Sensor Acquisition Frequency

Portable meteorological stations are widely applied in environment monitoring systems, but they are always limited in power-supplying due to no cable power, especially in long-term monitoring scenarios. Reducing power consumption by adjusting a suitable frequency of sensor acquisition is very important for wireless sensor nodes. The regularity of historical environment data from a monitoring system is analyzed, and then an optimization model of an adaptive genetic algorithm for environment monitoring data acquisition strategies is proposed to lessen sampling frequency. According to the historical characteristics, the algorithm dynamically changes the recent data acquisition frequency so as to collect data with a smaller acquisition frequency, which will reduce the energy consumption of the sensor. Experiment results in a practical environment show that the algorithm can greatly reduce the acquisition frequency, and can obtain the environment monitoring data changing curve with less error compared with the high-frequency acquisition of fixed frequency.


Introduction
A portable meteorological station (PMS), in brief, is a kind of equipment that can automatically observe ground environment states, such as temperature, humidity, illumination and so on [1]. It is mainly used in many fields such as weather forecast [2], environmental monitoring [3] and biometeorology [4]. Generally, most PMSes use solar energy to store and supply power to the system when they are open, remote and the power grid is unable to supply power directly [5]. Continuous rainy or cloudy weather will lead to inefficient power storage of solar panels, thus affecting the efficiency of a PMS.
Reducing the overall power consumption is very important for prolonging the life of a sensor and ensuring the stable operation of a PMS system [6]. Large capacity batteries are useful for prolonging the service time [7], but result in an increase in the cost and size of the PMS. In fact, reducing power consumption can also reduce electromagnetic radiation and thermal noise, and improve the stability of the system [8]. The existing low-power strategies usually depend on the design of low-power circuits [9], low-power sensors [10] and low-power chips [11], which can improve the performance of PMSes in hardware configuration [12]. However, it is not good enough in the practice of PMSes, and more efficient methods of energy management should be improved. Some studies have proposed a variety of acquisition frequency regulation methods to reduce power consumption. A harvesting and energy-aware adaptive sampling algorithm was proposed in [13]; it would set a low sampling rate and guarantee the self-sustainability when the residual energy of sensors is too low. Adaptive sampling technology for wireless sensors has been concerned recently. An adaptive sampling interval adjustment (ASIA) method using a two-input single-output fuzzy logic controller was proposed in [14], adaptive power management was proposed in [15] and an adaptive sampling algorithm based on temporal and spatial correlation of sensory data was proposed in [16]. An optimal power consumption control method of policy discrete-time queues was proposed in [17]. These algorithms consider the real-time state of the sensor and the external environment, so as to change the acquisition frequency of the sensor dynamically.
In the view of statistics, many features of weather data have strong relationships with past days generally, especially with recent days. The influence of historical data on acquisition frequency can be considered. Therefore, this paper aims at modeling an acquisition interval mainly according to the recent changing rate of monitoring data. The main contribution of this paper is adjusting acquisition time intervals dynamically for wireless sensors by learning the changing tendency of monitoring data. By learning historical weather data with the genetic algorithm model, the acquisition frequency is dynamically adjusted to reduce power consumption.

The Acquisition Data Serials
The original data cited in this paper was collected from Breeding Base of Agasicles hygrophila in Yichang City, China. The base is located at Baishiping Village, Longzhouping Town, Changyang County. The geography coordinate of this location is E 111 • 13 22.7215", N 30 • 31 33.7378", and the altitude is 123.3 m.
The monitoring data is collected in the winter of 2018. It includes the air's temperature, the air's humidity and the sunlight's strength. These environment monitoring data were collected by weather sensors.
The data is usually collected at some frequency [18], which is defined in monitoring protocol. Let us suppose a PMS collects N times of data at equal intervals every day. The weather acquisition serials of a certain day in N times are as follows T = {T 1 , T 2 , . . . , T N }. So, the interval of acquisition of wireless sensor is (60 × 24/N) minutes. The collected data is expressed as y t . The serials of data can be expressed as set Y = {y 1 , y 2 , . . . , y N }, especially the data serials of the jth day can be expressed as set Y j = {y 1j , y 2j , . . . , y Nj }. Data collected in this way are discrete, which can be fitted to a continuous environment state curve with many nonlinear fitting methods.
Since power consumption mainly depends on the quantity of data sent by sensors, reducing the sensor acquisition frequency is the main method of reducing energy consumption. The acquisition frequency can be adjusted by monitoring the center according to the requirements of information granularity, which takes effect on the control accuracy of the application system. Information granularity is decided by the number of sensor acquisition and intervals between adjacent acquisition periods. As for an application system, many data sent by sensors are redundant, since environment states monitored by sensors may change relatively little, causing little influence on information management. If as much redundant data as possible would not be sent by sensors, the energy consumption will decrease directly.
As shown in Figure 1, a serial of feature points can be selected to form a fitted curve, which is quite similar to the initial curve formed by full data (the set Y) in high acquisition frequency.
Obviously, the selected Y is a sub-set of the set Y, that is Y ⊆ Y. Coordinately, the selected acquisition time T is a sub-set of set T, that is T ⊆ T. Wireless sensors collect environment data according to the time-set T , which is subject to a suitable distance D between the fitted curve formed by the collected data and the real curve. Figure 1 also shows that different fitted curves may be of quite different distance D even their numbers of selected data are the same. It means that a suitable time-set is most important. granularity is decided by the number of sensor acquisition and intervals between adjacent acquisition periods. As for an application system, many data sent by sensors are redundant, since environment states monitored by sensors may change relatively little, causing little influence on information management. If as much redundant data as possible would not be sent by sensors, the energy consumption will decrease directly.
As shown in Figure 1, a serial of feature points can be selected to form a fitted curve, which is quite similar to the initial curve formed by full data (the set Y) in high acquisition frequency.

Residual Sum of Square
The distance D mentioned above should meet the commands that an application system set for the control accuracy. For example, a greenhouse monitoring system may allow temperature fluctuation in several degrees. The distance D can be defined by the residual sum of square (RSS, in brief) of the jth day as: Here, y ij presents the actual value from curve fitted from data in very high acquisition frequency of the jth day, whereas theŷ ij is the predicted value of the jth day, it forms the optimized sequence. The lower the D RSS is, the more similar the two curves are.
Take Figure 1 as an example. The primitive curve is fitted by data collected every 10 min, 144 times one day in all. Its weather acquisition set is T = {T 1 , T 2 , . . . , T 144 }. Curve1 and curve2 in Figure 1 are fitted curves from data of different time serials T and T", respectively.
The weather acquisition serials T of cruve1 are as follows: Obviously, the quality of the fitted curve is related to not only the quantity of selected acquisition data is but also the quality of selected data. If the number of selected acquisition data is too big, it will cause more energy consumption of wireless sensors. Conversely, it will not reflect the main features of the original environment state. The suitable sub-set of acquisition data should be found out to improve the quality of the fitted curve, which can be decided by an optimization model of suitable set-selection.

Analysis of Historical Data
Take the temperature sequence as an example. Air temperature is affected by the strength of illumination, so the change of temperature is regular generally [19]. So, daily temperature changes Sensors 2020, 20, 990 4 of 14 of similar days have a similar trend. On the other hand, the time of sunrise and sunset also changes with the change of seasons, which leads to movement of some feature parts and change of maximum temperature on the temperature curves. Therefore, the fitted curves of similar days are always similar, and the fitted curves with a large time interval will be different.
The value of the Pearson Correlation Coefficient is usually applied to measure vector similarity, which is defined as: where y ij presents the data of the jth day in fitted curve, and the y i(j−1) presents the data of the (j−1)th day in fitted curve. The output range is [−1, 1]; 0 means no correlation, negative value means negative correlation, positive value means positive correlation. Figure 2 shows the Pearson correlation coefficient for every two adjacent days of temperature data in a continuous week. It can be seen that the temperature data of adjacent days have an extremely strong correlation.
Sensors 2020, 20, x FOR PEER REVIEW 4 of 13 where yij presents the data of the jth day in fitted curve, and the yi(j−1) presents the data of the (j−1)th day in fitted curve. The output range is [-1, 1]; 0 means no correlation, negative value means negative correlation, positive value means positive correlation. Figure 2 shows the Pearson correlation coefficient for every two adjacent days of temperature data in a continuous week. It can be seen that the temperature data of adjacent days have an extremely strong correlation.

Optimization Model Based on Adjacent Periods
The adjacent day's original temperature data has strong relevance in Figure 2. The environment monitoring data changing curve is chosen from the previous days to the last day. The acquisition sequence of current acquisition data can be obtained with an optimization model according to the selected curve. The temperature variation tendencies of a similar duration in adjacent days are mostly similar, so it indicates the rules of acquisition time sequence of the next day.
So, the historical data of the past several days can be used to guide the acquisition sequence of the current day, assuming that the acquisition time-sequence of the current day is T. The purpose is to obtain a better sequence; in other words, the fitted curve from data in sequence T is more similar to the fitted curve from data in very high acquisition frequency. Generally, if the Pearson correlation coefficients of the past adjacent days are bigger than 0.8, multiple fitted curves of the past M days formed from the data in sequence T are similar.
The quality of fitted curves is defined by fitness function as follows: where ij y presents data set selected from the fitted curve of the jth previous day in a very high acquisition frequency and ˆi j y presents data set selected from the fitted curve of the jth previous day in sequence T', where the total number of selected time points is m.
If the fitness value is large, it infers that the error between the fitted curve formed from data in

Optimization Model Based on Adjacent Periods
The adjacent day's original temperature data has strong relevance in Figure 2. The environment monitoring data changing curve is chosen from the previous days to the last day. The acquisition sequence of current acquisition data can be obtained with an optimization model according to the selected curve. The temperature variation tendencies of a similar duration in adjacent days are mostly similar, so it indicates the rules of acquisition time sequence of the next day.
So, the historical data of the past several days can be used to guide the acquisition sequence of the current day, assuming that the acquisition time-sequence of the current day is T. The purpose is to obtain a better sequence; in other words, the fitted curve from data in sequence T is more similar to the fitted curve from data in very high acquisition frequency. Generally, if the Pearson correlation coefficients of the past adjacent days are bigger than 0.8, multiple fitted curves of the past M days formed from the data in sequence T are similar.
The quality of fitted curves is defined by fitness function as follows: Sensors 2020, 20, 990 where y ij presents data set selected from the fitted curve of the jth previous day in a very high acquisition frequency andŷ ij presents data set selected from the fitted curve of the jth previous day in sequence T , where the total number of selected time points is m.
If the fitness value is large, it infers that the error between the fitted curve formed from data in sequence T and curve from data in very high acquisition frequency may be large in the current day. Otherwise, T is a better acquisition sequence. Whereas, when the fitness value FIT is getting smaller, the results come better. T is limited by: There will be some errors between the fitted curves formed by T and the real curve formed by T. The fitness value can reflect the error. In a different application field, the tolerance of error is different. Assume that: The total number of optimized acquisition point in a selected day should not be too small or too big, since it is hard to reflect the main features of weather if the number is too small, or it wastes too much energy and the optimization steps are meaningless if the number is too big. So, the number n is limited by: n ≤ N, n ∈ N + In summary, the optimization model of adaptive algorithm for environment monitoring data acquisition strategy can be described as Equation (7): minFIT Subject to T ⊆ T FIT ≤ FIT max and n ≤ N, n ∈ N + (7)

Adaptive Genetic Algorithm of Frequency Adjustment
An adaptive genetic algorithm of optimizing acquisition frequency is proposed according to the optimization model mentioned above.
As for the genetic algorithm, both crossing operator P c and mutation operator P m have a great influence on the genetic algorithm's performance, especially to the algorithm's convergence. The population can produce new individuals when crossing operator P c is getting bigger, but if it becomes too large, the retention rate of the excellent individual in this population decreases. This algorithm is similar to the ordinary stochastic algorithm if mutation operator P m is too large, which means that the genetic algorithm is not needed. So, the adaptive genetic algorithm should be optimized.

Genetic Coding of Time Sequence Selection
The genetic algorithm cannot solve the parameter of the problem space directly. Therefore, it must be transformed into the genetic space problem, which lets genes make up the chromosome or the individual according to a certain structure. This transfer process is called coding or representation.
The genetic algorithm takes each acquisition serial as an individual. The genetic coding of every individual is a sequence, which is made up of N binary digits, where binary digit 1 represents data that is collected at this time point, while binary digit 0 represents data that is not collected at this time point. Therefore, the coding of weather acquisition serials T of a certain day contains N uniform acquisition frequency, and the coding of optimized acquisition serials T' of a certain day contains n selected acquisition items. Take Figure 1 as an example. The primitive curve is fitted by data collected every 10 min, i.e., 144 times one day. Its coding is: Curve1 and curve2 in Figure 1 are fitted according to their own acquisition strategy. The coding of acquisition strategy: The initialization of population is completed by adopting random initialization. The initialization steps of every individual are: 1.
Generates a zero vector of length N.

2.
Assigns the first n elements to binary 1.

4.
Obtains the coding of individual, which has n binary 1 and N−n binary 0.
Each initialization step produces a genetic individual randomly, and all generated individuals form the initial population.

Fitness Function
Fitness of genetic algorithm describes the environmental adaptability of every individual, the larger the adaptability, the higher the chance of survival. Therefore, there are some acquisition sequences for the individuals that do not meet the requirements of acquisition frequency, the total acquisition number sum(T i ) n. If the individual's fitness value is 0, it means a wrong acquisition. For a feasible individual, the reciprocal of the order function can be used as the fitness function. So, the fitness function can express as:

Selection Operation
Selection operations of the genetic algorithm include choosing the better individuals from the population and giving up the bad ones. The purpose of selection is to transfer the better individuals to the next generation. In this way, the algorithm optimizes the average fitness of each generation. The selection operation uses the wheel selection algorithm normally.

Crossover Operation
The crossover operator of the genetic algorithm is an important role, which helps to improve the searching ability of the genetic algorithm. A larger crossover probability is chosen to enrich the species diversity and improve the searchability in the beginning, while reducing the crossing probability to protect the best species from damage at the end. The genetic operators P c can be defined as: where, P c1 and P c2 are constant, f max is the maximum fitness value of the population, f avg is the average fitness value of each generation and f is the larger fitness value of the two individuals to the crossed. In order to ensure the number of the individual's gene 0 and gene 1 be constant after the crossover operations, the crossover operations can be executed at the same time if the numbers of diffident genes on the left and right sides of two individual intersections are the same. If it is not the same, it must be changed to the same with moving the position of the intersection and then execute the crossover operation.

Mutation Operation
The mutation operation of the genetic algorithm is to replace a genotype gene of the individual chromosome-coding string with other alleles. In this way, a new individual can be obtained. Thus, the genetic algorithm has the ability of local search, maintains species diversity and prevents premature convergence. Similar to the crossover operation, the mutation probability is defined as: where, P m1 and P m2 is constant, f max is the maximum fitness value of the population, f avg is the average fitness value of each generation and f is the fitness of the mutated individual. In order to ensure the proportion stability of gene 0 or gene 1, two mutation points are generated. If genes of the two mutation points are different, the gene of the mutation points will occur simultaneously, and vice versa.

Error Correction
There exists an error between the fitted curve and the real curve. It will result in an accumulative error if the new day's acquisition sequence is calculated only based on the fitted curve formed the previous day. So, the error should be corrected by applying high acquisition frequency every several days.
The setting of correction days should be based on the type of data collected and the maximum allowable error. If it is too small, the average acquisition frequency will increase and the power consumption will increase too. If it is too large, the error of data acquisition will be too large accordingly.

Main Algorithm Process
The flow of the main algorithm is shown in Figure 3. The main algorithm processes mainly include:

•
Step 1: Parameter initialization, such as the allowed error FIT max , acquisition frequency n init , iterations number of adaptive genetic algorithm i, interval-days C of correcting error.

•
Step 2: Judging whether the current day is error correcting day or not; if it is the day of error correcting, go to Step 6, else go to Step 3.

•
Step 3: Generating the initial population which contains p individuals randomly.

•
Step 4: Obtaining a new population with a serial of operations such as: selection operation, crossover operation and mutation operation.

•
Step 5: If the number of iterations does not reach the maximum number of iterations, repeat Step 4. Otherwise, if the individual's maximum fitness of the new population is not satisfied with the maximum number of iterations, the acquisition frequency n should be adjusted higher, then go to Step 3. The current data acquisition strategy returns. Then go to the Step 7.

•
Step 6: Taking the acquisition sequence T in high frequency as the current acquisition strategy, and the initial acquisition number n init as acquisition number n.

•
Step 7: Obtaining data according to the above acquisition sequence and updating the fitted data of current day. allowable error. If it is too small, the average acquisition frequency will increase and the power consumption will increase too. If it is too large, the error of data acquisition will be too large accordingly.

Main Algorithm Process
The flow of the main algorithm is shown in Figure 3. The main algorithm processes mainly include:

Effect Analysis
In order to testify the performance of the proposed algorithm, experimental sensor data in high acquisition frequency is collected for one month, which is taken as the test dataset. The optimized acquisition sequence is produced by the proposed algorithm in Section 3, and the quality of the result is compared to the test dataset.
Taking the temperature data in our project as an example, the data is collected N = 144 times every day if the data is not optimized. According to the previous three days' environment monitoring data changing curve, the current day's acquisition sequence can be obtained. In this experiment, M = 3, initial acquisition frequency n init = 20, tolerable error FIT max = 0.5, Iteration times i = 100, population size p = 1000, correction days C = 7. P c1 = 0.9, P c2 = 0.6, P m1 = 0.1, P m2 = 0.01. Different fitting algorithms have a certain influence on the error. Because there are few data points, the fitting algorithm uses the least square method to eight degree fit function (which depends on the scale of acquisition frequency; it should not be too big if the scale is small. Here, eight degrees is selected for the experiment data on the balance of the fitting accuracy and computation cost). The comparison of some part of fitted curves is shown in Figure 4. The algorithm applies to get the acquisition strategy. The low-frequency data from the optimized acquisition strategy is compared with the actual data.
have a certain influence on the error. Because there are few data points, the fitting algorithm uses the least square method to eight degree fit function (which depends on the scale of acquisition frequency; it should not be too big if the scale is small. Here, eight degrees is selected for the experiment data on the balance of the fitting accuracy and computation cost). The comparison of some part of fitted curves is shown in Figure 4. The algorithm applies to get the acquisition strategy. The low-frequency data from the optimized acquisition strategy is compared with the actual data.  The result quality is usually evaluated with four evaluating indicators, RMSE, MAE, MAPE and Pearson correlation coefficient (r), which are defined as follows: y i is the value in real line,ŷ i is the value in modeling line and m is observed sample number. The first three items indicate quality of optimized results. Take their normalized mean as an integrated evaluation parameter: After testing data of one month, Figure 5 shows that the errors of optimized results are lower than 1 • C. On the other hand, the Pearson correlation coefficient presents linear relationships with the real observe value. The results in Figure 6 show that most of correlation coefficients are bigger than 0.8, which means very strong correlation.
Sensors 2020, 20, x FOR PEER REVIEW 9 of 13 i y is the value in real line, ˆi y is the value in modeling line and m is observed sample number.
The first three items indicate quality of optimized results. Take their normalized mean as an integrated evaluation parameter: ( ) e 3

RMSE MAE MAPE
After testing data of one month, Figure 5 shows that the errors of optimized results are lower than 1℃. On the other hand, the Pearson correlation coefficient presents linear relationships with the real observe value. The results in Figure 6 show that most of correlation coefficients are bigger than 0.8, which means very strong correlation.

Influenced Factors Analysis
Let the initial acquisition frequency n init = 14 times, other conditions are same as before, and let           The comparison results are listed in Table 1. If the data accuracy is strictly required, a higher Fmin value needs to be set. Meanwhile, when the Fmin rises up, the algorithm itself needs a higher acquisition frequency n in order to reach higher acquisition accuracy. If the initial acquisition time is large, it will cause too large acquisition frequency, and it will cost too much power. On the other hand, if the initial acquisition is low, a large number of iterations will be needed; this will cost too much time. When the is Fmin fitted, choose the correct ninit that is good for the algorithm progress.
The local original data will help us to find the correct data ninit. Many experiments have been carried out to evaluate the acquisition frequency n, and the mean error, maximum error and minimum error are compared. The comparison results are listed in Table 1. If the data accuracy is strictly required, a higher F min value needs to be set. Meanwhile, when the F min rises up, the algorithm itself needs a higher acquisition frequency n in order to reach higher acquisition accuracy. If the initial acquisition time is large, it will cause too large acquisition frequency, and it will cost too much power. On the other hand, if the initial acquisition is low, a large number of iterations will be needed; this will cost too much time. When the is F min fitted, choose the correct n init that is good for the algorithm progress. The local original data will help us to find the correct data n init . Many experiments have been carried out to evaluate the acquisition frequency n, and the mean error, maximum error and minimum error are compared.
The result in Figure 10 shows that there is a negative correlation between n and F avg . On the direction of this curve, the correct n init can be chosen to improve the algorithm performance.   Figure 10 shows that there is a negative correlation between n and Favg. On the direction of this curve, the correct ninit can be chosen to improve the algorithm performance.

Convergence Performance Comparison of Different Algorithms
Convergence performance is very important for a genetic algorithm. Figure 11 gives the comparison between normal genetic algorithm (GA, in brief) and adaptive genetic algorithm (AGA, in brief) proposed in the paper. It shows that AGA is quite beneficial than the normal GA when the iterations are less than 80, and the fitness converges to the same level.

Convergence Performance Comparison of Different Algorithms
Convergence performance is very important for a genetic algorithm. Figure 11 gives the comparison between normal genetic algorithm (GA, in brief) and adaptive genetic algorithm (AGA, in brief) proposed in the paper. It shows that AGA is quite beneficial than the normal GA when the iterations are less than 80, and the fitness converges to the same level.

Convergence Performance Comparison of Different Algorithms
Convergence performance is very important for a genetic algorithm. Figure 11 gives the comparison between normal genetic algorithm (GA, in brief) and adaptive genetic algorithm (AGA, in brief) proposed in the paper. It shows that AGA is quite beneficial than the normal GA when the iterations are less than 80, and the fitness converges to the same level.

Adaptive Analysis of Other Weather Data
The algorithm can also be applied to evaluate the humidity and illumination values. Similar to the processes of temperature data, the low-frequency acquisition data and the real data can be obtained, and comparison results are shown in Figures 12 and 13.
It shows that the algorithm has good adaptive capacity for handling monitored environment data with a high Pearson correlation relationship.

Adaptive Analysis of Other Weather Data
The algorithm can also be applied to evaluate the humidity and illumination values. Similar to the processes of temperature data, the low-frequency acquisition data and the real data can be obtained, and comparison results are shown in Figures 12 and 13.

Conclusions
This paper proposes an adaptive adjustment method of environment monitoring data acquisition strategy for reducing the power consumption of PMSes. This method is based on the adaptive genetic algorithm. By optimizing the daily acquisition strategy, a suitable weather curve can be fitted by data in very low acquisition frequency. The experimental results under the actual environment show that the algorithm can effectively optimize the acquisition strategy.
Author Contributions: F.C. conceived and designed the algorithm and drafted the paper; S.X. presided the

Conclusions
This paper proposes an adaptive adjustment method of environment monitoring data acquisition strategy for reducing the power consumption of PMSes. This method is based on the adaptive genetic algorithm. By optimizing the daily acquisition strategy, a suitable weather curve can be fitted by data in very low acquisition frequency. The experimental results under the actual environment show that the algorithm can effectively optimize the acquisition strategy.
Author Contributions: F.C. conceived and designed the algorithm and drafted the paper; S.X. presided the It shows that the algorithm has good adaptive capacity for handling monitored environment data with a high Pearson correlation relationship.

Conclusions
This paper proposes an adaptive adjustment method of environment monitoring data acquisition strategy for reducing the power consumption of PMSes. This method is based on the adaptive genetic algorithm. By optimizing the daily acquisition strategy, a suitable weather curve can be fitted by data in very low acquisition frequency. The experimental results under the actual environment show that the algorithm can effectively optimize the acquisition strategy.