Intelligent Control of Agricultural Irrigation Based on Reinforcement Learning

In the traditional agricultural irrigation control methods, flood irrigation and manual control are generally used to irrigate the land, and the effective utilization rate of water is only 20% -35%. With the advancement and development of science and technology, especially with the rapid development and application of sensor technology, wireless communication technology, reinforcement learning and deep learning technology, and intelligent terminals, intelligent control of agricultural irrigation integrating these high and new technologies has been adopted to improve water resources in agricultural irrigation. The utilization efficiency has become an inevitable trend and fundamental requirement for the development of precision agriculture and facility agriculture. This paper proposes an intelligent control method for agricultural irrigation based on reinforcement learning. By constructing a deep learning network to extract features from the raw sensor data and construct Q-learning features, using deep reinforcement learning powerful data learning capabilities, the precision of agricultural irrigation control can be effectively improved. The effectiveness of this method is verified by algorithm training and testing in a greenhouse plantation of a company in Hunan.


Introduction
At present, water shortage has become a worldwide problem, and the safe, efficient and rational use of water resources has become the focus of global attention. Therefore, strengthening the management and rational allocation of water resources, realizing intelligent irrigation management and intelligent irrigation decision-making have important strategic significance for improving the efficiency of water resources utilization, alleviating the shortage of water resources, and achieving sustainable agricultural development.
Recently, smart irrigation has been proposed to promote agricultural modernization to reduce water waste and greatly increase food production. In particular, smart irrigation systems have introduced many advanced computers and information technologies (such as the Internet of Things, artificial intelligence, and cloud computing) into agricultural production. With the gradual popularization of 5G technology in the future, the Internet of Things (IoT) and artificial intelligence are the two core technologies for building intelligent irrigation systems. The Internet of Things is mainly used to automatically collect agricultural data and transmit the collected data to the data center, and artificial intelligence technologies (such as reinforcement learning) are used to analyze agricultural data for intelligent decision-making. With the emergence of machine learning, experts in many fields have used computers to build a large number of control models, but due to some defects in traditional machine learning algorithms, they often fail to achieve good decision-making results. Since 2009, reinforcement learning technology has continued to heat up. Although the theoretical research is still in its infancy, its application field has been involved in many directions. It has shown great energy in typical decisionmaking scenarios such as chess and card sports and autonomous driving; it also brings a new direction of innovation to the intelligent control of agricultural irrigation

Deep Convolutional Neural Network
In 2012, Alex Krizhevsdian proposed a deep convolutional neural network model AIexNet, which won the ILSVRC2012 competition with a significant advantage, far ahead of the second-place error rate. AIexNet has established the dominance of deep convolutional neural networks in computer vision, and also promoted the expansion of deep learning in areas such as speech recognition, natural language processing, and reinforcement learning.
The structure of the AIexNet network is shown in Figure 1. Because it uses two GPUs during training, the components in the structure diagram are split into two parts, and the structure and parameters of the two parts are consistent.

The convolutional neural network approximates the action value function
The approximation (fitting) method of the value function consists of parametric approximation and non-parametric approximation. Parametric approximation is divided into linear approximation and nonlinear approximation. Here, the parametric approximation method is introduced with the state value function, v (s). Linear approximation refers to the form in which the value function can be specified explicitly, as shown in formula (1), where θ is the parameter of the value function to be approximated, and φ (s) is called the basis function.
( 1 ) Commonly used basis functions include polynomial function, Fourier basis function and radial basis function. With the form of a value function, you can use the stochastic gradient descent method to update the parameter θ of the value function: ( 2 ) U in the above formula represents the update target. In different reinforcement learning methods, U has different manifestations. For example, the parameter update formula of the MC method is: The parameter update formula of the TD method is: The advantage of using linear approximation is that there is only one optimal value, which can converge to the global optimal. But when applying reinforcement learning to solve practical problems, it is difficult for us to find a suitable basis function. With the rapid development of deep neural networks, its powerful fitting ability has achieved great results in various fields, so DQN introduced deep convolutional neural networks to approximate the action value function. The action value function of neural network approximation belongs to nonlinear approximation. Here, the action value function q (s, a; θ)} θ represents the weight of each layer of the neural network. Will change, the function of updating the action value is actually updating the parameter θ. The network structure used by DQN is three convolutional layers plus two fully connected layers.

Training agent using experience playback
The method of using the neural network to approximate the value function has been studied by scholars in the 1990s, but at that time scholars found that the value function approximated by the neural network often showed instability and non-convergence. The fundamental reason is that training neural networks requires training data to be independent and identically distributed. However, there is a great correlation between the data collected through reinforcement learning, and training using these data will certainly not converge. DQN draws on the principles of the human hippocampus and designs an experience playback method to train the neural network.
During the agent learning process, the data obtained by interacting with the environment is first stored in a database, and then the data is extracted from the database by random sampling and sent to the neural network. This technique effectively breaks the correlation between the data.

Test analysis
The above algorithm was verified by using the data and site of an agricultural development company in Hunan. The company focuses on growing grapes and has more than 200 greenhouses. A complete online monitoring sensor is installed in the greenhouse, and the soil moisture x1(t) in the greenhouse is collected by the sensor every 1s, the leaf water potential x2(t) is measured by a plant water potential meter, and the leaf stomatal conductance x3(t) is measured by a plant dynamic stomometer, the solar radiation x4(t), soil humidity x5(t), soil humidity x6(t), relative humidity x7(t), air temperature x8(t), wind speed x9(t) are measured by the automatic monitoring system. Set the DQNAgent network parameters as shown in Table 1. Since the DQN algorithm is intelligently applied to discrete action spaces, however, the agricultural irrigation control amount is a continuous value, so it is discretized here. It is known that the maximum water flow of automatic irrigation devices is 10L / s. In one of the greenhouses, using 1min data as a training sample, a total of 1210 hours of training and a total of 72600 steps were trained. The obtained training results are shown in Figures 3and 4 The curve in Fig.  7 represents the variation of the Q value of 8 discrete actions as the number of training steps increases.   Figure 4 DQNAgent has obtained the maximum cumulative return of this training when it interacts with the environment for about 9000 steps, but in a complex environment such as agricultural irrigation, sometimes the maximum cumulative return is not It does not represent the optimal strategy. In the following training process, the cumulative return of DQNAgent fluctuated greatly, while the Q value of 8 discrete actions continued to rise consistently, and stabilized at about 55,000 steps, which shows that DQNAgent has learned a more complete action distribution, The exploration strategy fully explored the environment.

Conclusion
As the brain of driverless vehicles, the decision-making module of driverless technology plays a vital role in the safety and efficiency of vehicles. This article uses deep reinforcement learning to propose an end-to-end framework to complete agricultural irrigation decision-making functions. Experiments show that the algorithm framework based on deep reinforcement learning designed in this paper can better complete irrigation decision-making in the greenhouse cultivation environment, without the need for traditional manual viewing of experience-based decision-making dependence and complicated debugging process, which can adapt to the needs of agricultural irrigation control in various environments, improve the accuracy of agricultural irrigation control and save water resources.