A robot vision navigation method using deep learning in edge computing environment

In the development of modern agriculture, the intelligent use of mechanical equipment is one of the main signs for agricultural modernization. Navigation technology is the key technology for agricultural machinery to control autonomously in the operating environment, and it is a hotspot in the field of intelligent research on agricultural machinery. Facing the accuracy requirements of autonomous navigation for intelligent agricultural robots, this paper proposes a visual navigation algorithm for agricultural robots based on deep learning image understanding. The method first uses a cascaded deep convolutional network and hybrid dilated convolution fusion method to process images collected by a vision system. Then, it extracts the route of processed images based on the improved Hough transform algorithm. At the same time, the posture of agricultural robots is adjusted to realize autonomous navigation. Finally, our proposed method is verified by using non-interference experimental scenes and noisy experimental scenes. Experimental results show that the method can perform autonomous navigation in complex and noisy environments and has good practicability and applicability.


Introduction
The development direction of world agricultural production in the twenty-first century is shifting from traditional agriculture to modern agriculture [1,2]. Agriculture is an important basic industry to protect the national economy. The maximum utilization of agricultural resources, maximum production, and maximum development are the keys to measuring the level of modern agriculture [3,4]. For China, one of the checks and balances in the level of modern agricultural production is the independence and intelligence for production machinery and equipment. The development of high-level intelligent agricultural machinery is an important direction for current agricultural development [5,6].
With the rapid development of electronic technology and intelligent algorithms, intelligent robots have been widely used in many fields. Its autonomy and intelligence are becoming more and more perfect. Facing the demand for efficient production in modern agriculture, intelligent robots have also attracted many attentions of agricultural researchers. As a new concept of agricultural machinery [7,8], agricultural robots have huge economic benefits in the field of agricultural production and have broad market prospects. The timely development and development of a new generation for agricultural machinery represented by agricultural robots are of great significance for my country's transition to modern agriculture [9,10].
At present, the existing image semantic segmentation algorithm, the network model is very complex, the parameter calculation is large, and the requirement of hardware equipment is also high. How to optimize the algorithm structure and reduce the dependence on hardware equipment is the current research focus, so as to better apply the technology in real life.

Related work
A visual navigation system is the core device of agricultural robots. An excellent visual navigation system can help agricultural robots to process and analyze collected images with the help of advanced intelligent algorithms or artificial intelligence algorithms. This helps robots to observe and understand the outside world and realize the intelligence and autonomy of mechanical equipment. The robot vision system first captures a two-dimensional image of a threedimensional external environment by an image acquisition device such as a camera. The obtained two-dimensional images are processed by intelligent algorithms to realize image segmentation, feature extraction, and other image understanding processes [11]. Finally, the symbolic description of the image itself is obtained to support agricultural robots to make decisions on the next action. The workflow is shown in Fig. 1.
Many scholars have conducted research on the visual navigation of agricultural robots. They confirmed the importance of vision systems for agricultural robots and the feasibility of practical applications [12][13][14]. Researchers at Kyoto University in Japan confirmed the feasibility of machine vision in agricultural mobile robot applications and extracted HIS space of images. In HIS, images were scanned with horizontal lines, and the least square method was used to identify crop spacing [15]. Han et al. used the K-means clustering algorithm to obtain crop row spacing information. And they through image comparison and evaluation judged the accuracy of image processing, in order to achieve agricultural tractor navigation [16]. Akane team discussed an image processing method that classifies collected images based on grayscale histograms. In addition, different methods were used to distinguish between traversable and nontraversable areas in the farmland to realize the navigation of agricultural vehicles [17]. Researchers such as David relied on the global positioning system and inertial navigation system. They combined a robot vision system to solve the problem of autonomous navigation for agricultural robots and realized the sustainable intensification of largescale agriculture [18].
Due to its high reliability and detection accuracy, Hough has been used in intelligent agricultural equipment [19][20][21]. Chen et al. solved the problem of machine vision on the effect of multiple environmental variables on crop row recognition during the entire growth period of lettuce and green cabbage and at the same time improves the effectiveness of the machine vision crop row recognition algorithm. This paper proposed a multi-crop row extraction algorithm based on automatic Hough transform accumulation threshold [22]; Li and other studies analyzed the principle of Hough transform proposed in image processing. They proposed to use this transformation for the processing of gravity and magnetic data. Based on the linear features contained in this identification data, it corresponded to information such as the geological body boundary and plane distribution characteristics of fault structure. The calculation and analysis of the theoretical model and actual data show that this method can extract the boundary information of gravity and magnetic data more accurately, and it had good noise robustness [23]. Olsen and Sogaard proposed a method that uses machine vision to obtain RGB three-channel images and used 2G-R-B operators to convert color images into single-channel grayscale images. Calculate the position of the center for crop gravity in the horizontal direction by analyzing images, and use the least square method to fit the spacing of crop position [24]. Qun et al. designed a greenhouse robot based on machine vision, using a watershed algorithm to segment images and convert them into a binary image. The establishment of a navigation path by Hough transform can significantly reduce the effect of natural light and greenhouse plastic film on image segmentation in a greenhouse environment. The correct rate of road information extraction was 95.7% [25]. Agricultural robot research is an interdisciplinary subject, which is a comprehensive product of many fields and disciplines. The vision navigation system is just like human eyes, which is the premise of the normal and stable operation of an intelligent robot. The actual agricultural production environment is complex and diverse, so compared with other industrial robots, the accuracy of agricultural robot navigation is much higher. Therefore, it is of great significance for precision vision navigation of agricultural robots. Drawing on the existing research on autonomous navigation of crops, this paper proposes a visual navigation algorithm for agricultural robots based on deep learning image understanding. The main contributions are as follows: 1) Improve the Hough transform method based on subdivision algorithm, improve the calculation efficiency of the traditional Hough algorithm, and realize the effective extraction of robot path. 2) And the correspondence relationship between the image coordinate system and the actual scene coordinate system and the state equation are established to achieve robots' autonomous navigation posture adjustment.
The rest of this paper is organized as follows. The third section introduces vision system image processing technology, including image segmentation and edge detection technology. Section 4 introduces the technology of path extraction and pose adjustment for agricultural robots. Section 5 uses actual scenarios to verify our proposed method. Section 6 is the conclusion of this paper.

Image processing of farmland scenery
Efficient and good image processing is the prerequisite for agricultural robots to autonomously navigate. The main flow of image processing technology is shown in Fig. 2, which mainly includes steps such as image preprocessing, image segmentation, and feature extraction.

Image acquisition
Image acquisition is the first step in image processing. Generally, the vision system cannot directly process simulated images because collected images are simulated images. This paper uses a CCD image sensor in the robot vision system to convert analog images collected by an image acquisition device into a digital image and transmit it to the vision system computing center to ensure the goodness of acquired image attributes. That is, the position and gray scale are helpful for further research on subsequent image processing.

Image preprocessing
In order to provide better quality images to the vision system computing center, images are preprocessed to solve the problems of distortion and deformation caused by hardware equipment and digital-analog conversion during image acquisition and transmission.

Grayscale image
Image graying is an important method for image enhancement. Make targeted corrections to the pixels in images to enhance the obvious features of images. At the same time, expand the image dynamic adjustment range and contrast to make the image effect more clear and uniform. The piecewise linear grayscale transformation is used to realize the grayscale processing of images, enhance the target grayscale interval, and suppress the non-target grayscale interval. And set the image grayscale range to [0, X]; the linear relationship is shown in Fig. 3.
By changing the coordinates of each inflection point and the slope of the line segment by a piecewise linear transformation, the grayscale interval can be expanded or compressed. The mathematical expression is

Grayscale histogram
Grayscale histogram is the simplest and most effective tool for describing grayscale values of images. It reflects the frequency of occurrence of uniform gray values, and it is the basis of image processing.
If the gray value of the gray image h(x, y) is within the range of [0, X − 1], the gray histogram equalization expression of image h(x, y) is: where η(g i ) is the probability of gray level i, g i is the gray level of level i, n is the total number of pixels, and n i is the number of pixels of gray level g i .

Image segmentation
In actual farming scenes, the environment is complex and crops are diverse. It is difficult to obtain the ideal image segmentation results only by underlying feature information. It has been confirmed that deep learning technology can collect global feature information in images to obtain better segmentation results. Based on the hybrid dilated convolution, the cascaded deep residual network is improved to complete image segmentation processing in the agricultural robot vision system. A one-to-one mapping relationship between image pixels and semantic categories is established.
The more the number of network layers in the deep convolutional neural network, the richer the level of information extraction for global feature items of images [26][27][28]. However, it should also be noted that with the deepening of the network layer, the gradient disappearance and network degradation caused by chain derivative in the back propagation of the network will cause the image segmentation speed and accuracy to decrease. In order to solve this problem, we add a residual structure to a deep convolutional network to increase the shortcut constant connection, which avoids the harm of segmentation processing caused by the disappearance of gradients and network degradation in deep networks. Figure 4 shows the residual structure added to the deep network.
Set the input parameter of the shallow network of the deep convolutional network to x, and the expected output value is E(x). If the deep network is not improved, the input parameter x is passed to output as the initial result. The mapping function required for network learning is F(x) = E(x) − x, and the feature mapping is also E(x) = F(x) + x. After adding the residual unit and maintaining the dimension of the input and output parameter elements unchanged, the residual unit adds the parameter input elements and output elements of multiple parameter layers cascaded. Ensure that input parameters and output parameters are within a reliable range. And we, through the ReLu activation function to get the final output, reduce the impact of network gradient disappearance and mesh degradation.
We use ResNet101 as the reference network for deep networks, because of its deeper network layer core and more elaborate network structure design. The deep residual network ResNet is divided into 5 network layers. Each network layer is configured with 5 convolution modules, an average pooling layer, and a classification layer, as shown in Fig. 5. The convolution modules are convl, conv2_x, conv3_x, conv4_x, and conv5_x. For the parameters in each convolution module, 7×7 is the size of the convolution  kernel, and 64 is the number of channels in the convolution kernel. The brackets are a residual unit and X3 indicates that there are 3 residual units in the convolution module. Gradient disappearance and network degradation are very serious for image segmentation results. To this end, we cascade a new convolution module conv6_x behind ResNet101 network to form a cascaded deep residual network. The network structure and network parameter settings of its convolution module are the same as conv5_x. To further extract the image features globally, consider adding the conv7_x module. However, it was found by experiments that the semantic segmentation accuracy has not been improved compared to the cascaded conv6_x module. Therefore, as shown in Fig. 6, the cascaded deep residual network is finally composed of 6 convolution modules, convl, conv2_x, conv3_x, conv4_x, conv5_x, and conv6_x.
At the same time, using hollow convolution can increase the receptive field of the agricultural robot vision system, so as to better control image resolution [29] and fusion convolution of conv5_x and conv6_x convolution modules in ResNet network. To avoid the influence of the "grid" phenomenon in the convolutional network on segmentation results, set different void rates in the convolution module so that the receptive field can completely cover the input feature map. Taking conv5_x as an example, the module contains 3 consecutive residual units. The conv5_1 residual unit void rate is set to 1, the conv5_2 void rate is set to 2, and the conv5_3 void rate is set to 3. The conv5_x and conv5_x network structure and parameter settings are consistent. Thus, the void parameter of the residual unit in the conv6_x convolution module is set the same as conv5_x. Figure 7 is a schematic diagram of a convolution structure of a mixed cavity. The proposed model improves the cascaded deep convolution network based on the hybrid hole convolution method to solve the problem of network degradation caused by too many layers of deep network, and uses B-spline wavelet transform to detect the image edge to realize the image processing steps in the vision system, so as to provide the optimal image data support for the follow-up aircraft autonomous navigation.

Multi-resolution edge detection
The B-spline wavelet transform is used to detect the outline of the large-scale area after the above processing, and the image signal can be multi-resolution analyzed.
After processing the two-dimensional image signal, a low-pass smoothing function ω(x, y) is used to perform wavelet transformation along the x and y directions, that is, the two-dimensional image wavelet transform can be expressed as where R 1 g and R 2 g are the two variables after the image changes, which are the gradients of the two-dimensional image along x and y directions. The time-domain two-scale equation of scale function and wavelet function is The two-scale equation in the frequency domain is where the wavelet function is the scale function Fourier transform P 0 and P 1 are filters corresponding to the scale function and wavelet function, respectively, according to the conservation of energy of space division. where In this paper, the impulse response coefficients of the third-order B-spline wavelets (n = 4), P 0 (z), and P 1 (z) are shown in Table 1.
Due to the spatial separability of a two-dimensional image signal, the rows and columns can be separately subjected to wavelet transform according to the above algorithm to achieve multi-resolution edge detection.

Path extraction for visual navigation
The farming environment is a multi-variable time-varying and nonlinear complex system, which brings great difficulty to the intelligent robot autonomous navigation. Based on the image processing results in Section 3, improved Hough transform is used to extract the navigation path of the crop row, so that robots' posture can be adjusted in time.

Improved Hough transform
Hough transform is based on the global characteristics of images, forming a local peak at a point in the parameter space where straight line points in images are concentrated. Find and link line segments in the images.
Hough transform has the advantages of strong robustness and strong anti-noise ability. But at the same time, there is also a problem of a large amount of calculation, which will affect the real-time nature of autonomous navigation. Therefore, this paper uses the following steps to improve Hough transform: Determine the parameter value range after changing polar coordinates. The image after image processing is U × V, and the polar coordinate parameter space is (ρ,θ), It is worth noting that we use every 2°to calculate, and the amount of calculation is 1/2 of traditional transformation. This is because when digitally quantizing polar coordinate parameters, if the quantization precision is too small, the effect of parameter space cohesion is not obvious. The accuracy is too large, the calculation process is cumbersome, and the calculation amount is large.  Store the sine and cosine values as an array. Store the sine and cosine values from 0 to 180°as values. When the query is needed during the calculation process, directly call the calculation, which is simple and quick.
Use refinement algorithm to improve the Hough algorithm. The refinement of the algorithm can effectively reduce the amount of data after image segmentation, thereby reducing the calculation process and shortening the calculation time.
Effectively determine the corresponding peak of parameter space and the straight line in images. First, the median filter is used to remove noises in parameter space. And a few larger peak points are detected according to the phase angle and deviation characteristics of navigation. Finally, the peak point of the navigation path is determined by statistical analysis.

Obtaining navigation parameters
where L is the length of the top and bottom edges for view field in the actual scene, U is the distance from agricultural intelligent device camera to the top and bottom edges of view field, and V is the width of the processed image. After vertices A ′ and B ′ of the coordinate system are obtained, the two equations of a straight line A ′ B ′ can be obtained, and then, the distance and yaw angle from camera point to a straight line A ′ B ′ can be obtained.

Path extraction
The steps of extracting the navigation path of the agricultural robot by the improved Hough transform algorithm are as follows: 1) A thinning algorithm is used to refine segmented images in the third section; 2) Discretize parameter space ρ and allocate memory for each; 3) Calculate θ step by step every 2°, and calculate ρ corresponding to (x, y) in the image to achieve one-to-one correspondence; 4) Use median filtering method to remove the noise points of detected images in the parameter space; 5) According to the phase angle and deviation characteristics of navigation, a few larger peak points are detected. Finally, the peak point of the navigation path is determined by statistical analysis.

Pose determination of robots
When agricultural robots perform normal command operation, its own posture determination is the prerequisite for navigation and agricultural operations. The values of offset angle α and offset distance γ can determine the posture of agricultural robots relative to the center line of the crop row.
Existing studies have shown that the pose adjustment of intelligent robots can be determined according to the correlation between actual coordinates and image coordinates [30]. Figure 8a is a schematic diagram of the coordinates of the actual scene for robots. X r axis refers to the left side of the car body in the actual scene, and Z r is the upper side of the car body center line (car body navigation line). L r is the center line between rows and crops; γ is the robot offset distance, it is the vertical distance from camera coordinate point to L r ; α is the angle between robot center line and navigation line. Figure 8b is a schematic diagram of image coordinates, the u-v coordinate system is the image coordinate system in pixels, and the x-y coordinate system is the image coordinate system in millimeters. Based on homogeneous coordinates and matrix form, the mathematical expressions corresponding to pixels and sizes are: Based further on the Hough transform, the straight line in Fig. 8b can be expressed as According to the camera perspective principle, the actual scene coordinates in image coordinates correspond to: where k is any real number, and the angle formed by the horizontal line of camera β and the smallest observation point to the ground. Then, offset angle α and offset distance γ of the agricultural robot are respectively: where h is the distance between the camera and ground. σ x and σ y are the scale factors

Results
The experimental equipment of this paper is a Tesla K80 GPU host, and the experimental environment is Ubuntu16.04. The code is written based on Tensorflow, a deep learning framework. The camera equipment is Bumbelee2, a stereo vision product produced by Point Grey Research (PGR). The software environment is operating system Chinese Windows 10, English version software Microsoft Visual Studio 2012. The main programming language is C#. This section tests the visual navigation system and analyzes the data in the test. This test is divided into posture measurement error test, non-interference navigation test, and weed background navigation test.

Measurement and analysis for pose errors
The actual working environment is more complicated, and it is difficult to measure the pose of robots. Therefore, the accuracy of robot pose calculation directly affects subsequent control actions. Choose to test the simulated rice seedlings in the laboratory. The experimental design is as follows: (1) The deviation of fixed phase angle is 0, that is, keep the robot's median line parallel to the actual direction of advance, and choose to move the robot perpendicular to the direction of the seedling row. Keep the displacement deviation range as [−40, 40], the recorded data displacement interval is 10mm, and the recorded data is shown in Table 2. The calculated standard deviation is 0.312mm. is 5°, and the recorded data is shown in Table 3. The calculated standard deviation is 0.121°.
From the above test data, it can be seen that the results obtained by pose calculation are consistent with the measured results. And the standard deviation is small, which satisfies the measurement requirements.

Non-interference navigation test
The experiment uses plastic to simulate seedling rows and simulates the farming environment under ideal conditions indoors for testing. Its purpose is to verify whether the navigation first extraction is correct in image processing, so as to confirm whether the visual navigation system is effective.
Take the initial angle deviation and position deviation as (−5°, 0mm), (5°, −5mm) two initial states for analysis, draw the curve of its movement process, and analyze the test results.
Scenario 1: The robot motion curve of initial position angle deviation −5°and displacement deviation 0mm is shown in Fig. 9. The phase angle returns to 0°in about 1.53s, and the displacement deviation is 2.11 to reach 0mm deviation. And in the later movement, due to the vibration of robots, the jitter alternates between positive and negative, so it does not affect the overall effect. Scenario 2: The motion curve of initial phase angle deviation 5°and displacement deviation −5mm is shown in Fig. 10. The angle deviation is large at the beginning of the movement, and the angle deviation quickly attenuates after the movement starts. The displacement deviation reaches the peak value at about 2.61s, and the angle deviation  reaches the minimum value at about 2.34s. After several fluctuations, the angle deviation and position deviation both decay to 0. The non-interference navigation test result proves that the method proposed in this paper can effectively set the navigation line of the seedling row and keep the robot posture and timely and effective adjustment, which can meet the accuracy of autonomous navigation of agricultural robots.

Weed background navigation test
In order to verify the feasibility of this method in this paper in actual farming scenarios, the actual environment is simulated in the laboratory, and the layout scenario is shown in Fig. 11. Artificial turf is used to simulate the most complex paddy field environment in the farming environment including duckweed and waterweed. Due to the inability to accurately measure the displacement deviation and phase angle deviation in the manual layout scenario, this paper selected two random combinations for experimental analysis. Scenario 3: The motion curve of initial phase angle deviation −6°and displacement deviation −2.3mm is shown in Fig. 12. The phase angle deviation converges to 0 and continues to increase in the opposite direction 5 s after the movement starts. The displacement deviation converges to 0°in 2.6s, the standard deviation of phase angle deviation is 4.21°, and the displacement deviation is 5.31mm. Because the background color is similar to the seedling color, there is still noise after image processing. This causes feature point extraction and clustering errors, resulting in unstable navigation line parameters. But in general, it can still travel along the seedling column and does not step on the seedling. Scenario 4: The motion curve of initial phase angle deviation 2.3°and displacement deviation 8.12mm is shown in Fig. 13. As shown in Fig. 13, the displacement deviation converges to 0 at 3.1s due to the large displacement deviation relative to phase angle deviation and reaches the extreme value when phase angle deviation is 3.2s. At 11s, the displacement deviation increases in the positive direction. In order to correct the displacement deviation, the phase angle deviation is corrected. The phase angle deviation also increases to correct displacement deviation, and eventually, the displacement deviation converges to 0. The navigation test is carried out in the presence of background noise, and the proposed method can still accurately extract the navigation line when the background noise is large. By setting the coefficients in time and walking along the set route, the feasibility and practicability of the proposed method for autonomous navigation in complex farming environments are confirmed. The proposed model improves the cascaded deep convolution network based on the hybrid hole convolution method to solve the problem of network degradation caused by too many layers of deep network.

Results and discussion
Facing the accuracy requirements of autonomous navigation of intelligent agricultural robots, this paper proposes an agricultural robot visual navigation algorithm based on deep learning image understanding. The algorithm mainly includes two aspects of image processing and visual navigation path extraction. In the processing of collected images, collected images are processed based on a cascaded deep convolutional network and hybrid dilated convolution method, which provides optimal image data support for the subsequent autonomous navigation of robots. Moreover, the Hough transform method is improved based on the subdivision algorithm in visual navigation path extraction. And the correspondence relationship between the image coordinate system and the actual scene coordinate system and state equation are established to achieve robots' autonomous navigation posture adjustment. The experimental results show that our proposed method embodies rapid response characteristics at the same time in the non-interference scene and complex noise scene to ensure normal and stable operation of agricultural robots. The focus of future research will be to explore the adaptability of the proposed algorithm and agricultural robots in the market to improve algorithm scalability. However, limited by the author's level, the proposed algorithm still cannot get very accurate segmentation results for object boundary and small object segmentation. To solve this problem, we can consider using deeper network structure in the future, such as Resnet152, Densenet169, Densenet201, etc.; we can also consider fusing other deep learning technologies to complete image semantic segmentation tasks, such as a new variant of recurrent neural network RNN, counter network GAN, etc.