Autonomous Dam Surveillance Robot System Based on Multi-Sensor Fusion

Dams are important engineering facilities in the water conservancy industry. They have many functions, such as flood control, electric power generation, irrigation, water supply, shipping, etc. Therefore, their long-term safety is crucial to operational stability. Because of the complexity of the dam environment, robots with various kinds of sensors are a good choice to replace humans to perform a surveillance job. In this paper, an autonomous system design is proposed for dam ground surveillance robots, which includes general solution, electromechanical layout, sensors scheme, and navigation method. A strong and agile skid-steered mobile robot body platform is designed and created, which can be controlled accurately based on an MCU and an onboard IMU. A novel low-cost LiDAR is adopted for odometry estimation. To realize more robust localization results, two Kalman filter loops are used with the robot kinematic model to fuse wheel encoder, IMU, LiDAR odometry, and a low-cost GNSS receiver data. Besides, a recognition network based on YOLO v3 is deployed to realize real-time recognition of cracks and people during surveillance. As a system, by connecting the robot, the cloud server and the users with IOT technology, the proposed solution could be more robust and practical.


Introduction
In the water conservancy industry, there are many fundamental engineering facilities, such as dams, water and soil conservation, water transfer project, shipping project, water supply project, hydraulic power plants, irrigation facilities, etc. Big water dams are the most comprehensive facilities. They always have many functions, such as flood control, electric power generation, irrigation, water supply, shipping, etc. [1]. Therefore, the safety of big dams is crucial to cities and people around.
In the past, staff must check the environment, structure, and electromechanical facilities of dams regularly every day. This is a necessary way of keeping the dam running safely. However, it cannot realize all-weather all-day monitor and sometimes staffs are at risk since most dams are constructed in remote rural areas. Nowadays, with the great improvement of robot technology, robots have been used in public safety, security check, disaster rescue, and high-voltage lines inspection [2][3][4][5]. Many researchers and engineers are also paying attention to utilize underwater robots, unmanned aerial vehicles, unmanned surface vehicles for dams' surveillance and inspection. A state machine is designed here for robot control, which includes self-check, autonomous navigation, remote control, idle, and emergency states. The robot would get into self-check state once the power is turned on. All sensors, motors, localization data, the high-level computer will then be checked. The robot would get into an idle state and wait for task command from the cloud server if the self-check is passed. Autonomous navigation would be triggered once the task and waypoints are received. The emergency state would be triggered when the robot is stuck or some sensors are running wrong. The logic of the state machine is shown in Figure 2.

Robot Platform
To realize high robustness and agile motion performance, the robot body is built with an allaluminum chassis which about a total of 10 Kg weight. Two 200W DC brushless high torque motors are deployed to realize four-wheel drive. A 24 V lithium battery with 40 AH capacity is used as the power supply.
All motors are controlled by a customized MCU board, which connects the motors and drivers by CAN bus. The maximal speed is 2 m/s; the maximum payload is 40 Kg; the maximum runtime is more than 4 hours. It is suitable for rugged all-terrain operation with four off-road tires. Figure 3 shows the body design.
As shown in Figure 4, for the robot body, we will use the velocity and the angular velocity in the robot axis as state variables, i.e., [ ] . After the kinematic analysis of the robot, we can find the relationship between robot velocities and wheel speed. A state machine is designed here for robot control, which includes self-check, autonomous navigation, remote control, idle, and emergency states. The robot would get into self-check state once the power is turned on. All sensors, motors, localization data, the high-level computer will then be checked. The robot would get into an idle state and wait for task command from the cloud server if the self-check is passed. Autonomous navigation would be triggered once the task and waypoints are received. The emergency state would be triggered when the robot is stuck or some sensors are running wrong. The logic of the state machine is shown in Figure 2. A state machine is designed here for robot control, which includes self-check, autonomous navigation, remote control, idle, and emergency states. The robot would get into self-check state once the power is turned on. All sensors, motors, localization data, the high-level computer will then be checked. The robot would get into an idle state and wait for task command from the cloud server if the self-check is passed. Autonomous navigation would be triggered once the task and waypoints are received. The emergency state would be triggered when the robot is stuck or some sensors are running wrong. The logic of the state machine is shown in Figure 2.

Robot Platform
To realize high robustness and agile motion performance, the robot body is built with an allaluminum chassis which about a total of 10 Kg weight. Two 200W DC brushless high torque motors are deployed to realize four-wheel drive. A 24 V lithium battery with 40 AH capacity is used as the power supply.
All motors are controlled by a customized MCU board, which connects the motors and drivers by CAN bus. The maximal speed is 2 m/s; the maximum payload is 40 Kg; the maximum runtime is more than 4 hours. It is suitable for rugged all-terrain operation with four off-road tires. Figure 3 shows the body design.
As shown in Figure 4, for the robot body, we will use the velocity and the angular velocity in the robot axis as state variables, i.e., [ ] . After the kinematic analysis of the robot, we can find the relationship between robot velocities and wheel speed.

Robot Platform
To realize high robustness and agile motion performance, the robot body is built with an all-aluminum chassis which about a total of 10 Kg weight. Two 200W DC brushless high torque motors are deployed to realize four-wheel drive. A 24 V lithium battery with 40 AH capacity is used as the power supply.
All motors are controlled by a customized MCU board, which connects the motors and drivers by CAN bus. The maximal speed is 2 m/s; the maximum payload is 40 Kg; the maximum runtime is more than 4 hours. It is suitable for rugged all-terrain operation with four off-road tires. Figure 3 shows the body design.
where r is so called effective radius of wheels, n is the reduction ratio, 2c is a spacing wheel track.  From (2), we could calculate the desired wheel velocities easily, but the direct calculation does not work well since the model is a realistic model and the angular velocity always has a large lag. Therefore, we add an onboard IMU and use the following equation to set the wheel velocities. As shown in Figure 4, for the robot body, we will use the velocity and the angular velocity in the robot axis as state variables, i.e., [v x w] T . After the kinematic analysis of the robot, we can find the relationship between robot velocities and wheel speed.
where r is so called effective radius of wheels, n is the reduction ratio, 2c is a spacing wheel track.
[ ] = 60 [ where r is so called effective radius of wheels, n is the reduction ratio, 2c is a spacing wheel track.  From (2), we could calculate the desired wheel velocities easily, but the direct calculation does not work well since the model is a realistic model and the angular velocity always has a large lag. Therefore, we add an onboard IMU and use the following equation to set the wheel velocities. For the robot body, it will get velocity and angular velocity command v d x w d T from upper PC. We should control the velocities to track the command. From (1), we have From (2), we could calculate the desired wheel velocities easily, but the direct calculation does not work well since the model is a realistic model and the angular velocity always has a large lag. Therefore, we add an onboard IMU and use the following equation to set the wheel velocities.
where k p , k i , k d are constants for the simple PID controller, e w = w d − w i , w i is the feedback from the onboard IMU. The practical response could be seen in Figure 5.
Sensors 2020, 20, x 6 of 20 where , , are constants for the simple PID controller, = − , is the feedback from the onboard IMU. The practical response could be seen in Figure 5. It is simple to calculate the wheel odometry in the inertial frame. We drive the robot moving a square and back to the origin, the dead reckoning result is shown in Figure 6

Sensors
The experimental robot with sensors is shown in Figure 7. From the front to the back, there are 1 GNSS receiver, a stereo camera, a Livox LiDAR, a monocular camera, a 4G LTE wireless router, an industrial computer, a GNSS antenna. An onboard MCU controller with IMU is located inside the body box. Motion control is handled by the onboard MCU which also sends odometry data to the high-level industrial computer. The navigation is handled by the computer which is connected to the cloud server.
A navigation map is built mainly by the LiDAR and odometry data using the LOAM SLAM method. The map would be regarded as prior knowledge. The monocular camera in the middle is used to transfer video streaming and do the people and crack detection jobs. The GNSS receiver with It is simple to calculate the wheel odometry in the inertial frame. We drive the robot moving a square and back to the origin, the dead reckoning result is shown in Figure 6 and final position value is [0.1229-0.2654 0.03] (should be [0 0 0] T ). The whole distance is about 70 m and we did experiments 10 times, the origin RMSE (root mean square errors) is 0.3135.
Sensors 2020, 20, x 6 of 20 where , , are constants for the simple PID controller, = − , is the feedback from the onboard IMU. The practical response could be seen in Figure 5. It is simple to calculate the wheel odometry in the inertial frame. We drive the robot moving a square and back to the origin, the dead reckoning result is shown in Figure 6

Sensors
The experimental robot with sensors is shown in Figure 7. From the front to the back, there are 1 GNSS receiver, a stereo camera, a Livox LiDAR, a monocular camera, a 4G LTE wireless router, an industrial computer, a GNSS antenna. An onboard MCU controller with IMU is located inside the body box. Motion control is handled by the onboard MCU which also sends odometry data to the high-level industrial computer. The navigation is handled by the computer which is connected to the cloud server.
A navigation map is built mainly by the LiDAR and odometry data using the LOAM SLAM method. The map would be regarded as prior knowledge. The monocular camera in the middle is

Sensors
The experimental robot with sensors is shown in Figure 7. From the front to the back, there are 1 GNSS receiver, a stereo camera, a Livox LiDAR, a monocular camera, a 4G LTE wireless router, an industrial computer, a GNSS antenna. An onboard MCU controller with IMU is located inside the Sensors 2020, 20, 1097 7 of 19 body box. Motion control is handled by the onboard MCU which also sends odometry data to the high-level industrial computer. The navigation is handled by the computer which is connected to the cloud server.

Localization and Navigation
The navigation includes perception, localization, path planning, and a remote controller. The whole diagram is depicted in Figure 8. The mature cost grid map is used for general environment representation. The filtered point cloud and depth data would be obstacles shown on the map.
The path planning method is divided into two parts here. The global waypoints are sent to the robot by the cloud server, which is pre-defined by different jobs. The local motion planning is achieved by the DWA algorithm on the grid map.
Localization is one of the most difficult problems for dam robots since the environment is changing and the robot has to move indoor and outdoor. Using one single sensor to acquire a positon is an impossible task. Many sensors would introduce fusion problem. Extended Kalman filter (EKF) is a good way to solve this problem and has been deployed in many robot localization applications. A navigation map is built mainly by the LiDAR and odometry data using the LOAM SLAM method. The map would be regarded as prior knowledge. The monocular camera in the middle is used to transfer video streaming and do the people and crack detection jobs. The GNSS receiver with antenna can acquire accurate position data in the wide-open area. On the other hand, the LiDAR's point cloud and cameras' depth data are designed to be used in obstacles perception as well in the future.

Localization and Navigation
The navigation includes perception, localization, path planning, and a remote controller. The whole diagram is depicted in Figure 8. The mature cost grid map is used for general environment representation. The filtered point cloud and depth data would be obstacles shown on the map.

Localization and Navigation
The navigation includes perception, localization, path planning, and a remote controller. The whole diagram is depicted in Figure 8. The mature cost grid map is used for general environment representation. The filtered point cloud and depth data would be obstacles shown on the map.
The path planning method is divided into two parts here. The global waypoints are sent to the robot by the cloud server, which is pre-defined by different jobs. The local motion planning is achieved by the DWA algorithm on the grid map.
Localization is one of the most difficult problems for dam robots since the environment is changing and the robot has to move indoor and outdoor. Using one single sensor to acquire a positon is an impossible task. Many sensors would introduce fusion problem. Extended Kalman filter (EKF) is a good way to solve this problem and has been deployed in many robot localization applications. The path planning method is divided into two parts here. The global waypoints are sent to the robot by the cloud server, which is pre-defined by different jobs. The local motion planning is achieved by the DWA algorithm on the grid map.
Localization is one of the most difficult problems for dam robots since the environment is changing and the robot has to move indoor and outdoor. Using one single sensor to acquire a positon is an impossible task. Many sensors would introduce fusion problem. Extended Kalman filter (EKF) is a good way to solve this problem and has been deployed in many robot localization applications.
A two EKF node structure is proposed here to build a robust localization system. The GNSS data is transformed based on the local origin and then input to EKF global node update process. The yaw angle from the LiDAR odometry is input to the EKF update process. The result of the EKF local node is input to the EKF global predict process. The LiDAR odometry data are input to the EKF local update process. The wheel odometry data and the IMU measurement is input to the EKF local predict process. It is noted here that the local EKF node is running on the embedded MCU at 200Hz and it handles in 3D space with 16 states.
It is important to get the 3D pose information for the rubber ground surface. However, the global EKF node is running at 30Hz on the computer and it only produces in 2D space. The robot position and orientation in the map are acquired with the EKF global updating. The fusion localization structure diagram is shown in Figure 9. Where W d l , W d l are the wheel velocity command, W l , W l are the feedback wheel velocity from encoders, V d x , ω d are the velocity command generated from motion planning, V x , ωz are the wheel odometry outputs, x, y, yaw are the LiDAR odometry, X, Y, Yaw are the final global pose estimation result. A two EKF node structure is proposed here to build a robust localization system. The GNSS data is transformed based on the local origin and then input to EKF global node update process. The yaw angle from the LiDAR odometry is input to the EKF update process. The result of the EKF local node is input to the EKF global predict process. The LiDAR odometry data are input to the EKF local update process. The wheel odometry data and the IMU measurement is input to the EKF local predict process. It is noted here that the local EKF node is running on the embedded MCU at 200Hz and it handles in 3D space with 16 states.
It is important to get the 3D pose information for the rubber ground surface. However, the global EKF node is running at 30Hz on the computer and it only produces in 2D space. The robot position and orientation in the map are acquired with the EKF global updating. The fusion localization structure diagram is shown in Figure 9. Where W , W are the wheel velocity command, W , W are the feedback wheel velocity from encoders, V , ω are the velocity command generated from motion planning, V , ωz are the wheel odometry outputs, , , are the LiDAR odometry, , , are the final global pose estimation result. With the kinematic models and zero-velocity constraints, we are now ready to design a kinematic-model-based robot positioning scheme for the local EKF node. The general state update equation is  With the kinematic models and zero-velocity constraints, we are now ready to design a kinematic-model-based robot positioning scheme for the local EKF node. The general state update equation is where x t = [q 0 q 1 q 2 q 3 x y z V x V y V z δϕ bias δθ bias δψ bias δV xbias δV ybias δV zbias ] T is the state variables, q 0 q 1 q 2 q 3 are the quaternion, x y z are the position in the inertial frame, V x V y V z are the corresponding velocities, δϕ bias δθ bias δψ bias are the rotation zero-offset, δV xbias δV ybias δV zbias are the acceleration zero-offset. We have the following detail state update equations.
where g x (k)g y (k)g z (k) are the components of gravity.

of 19
The global EKF node is designed in the same method, the details are omitted here. The localization result in the practical environment can be found in Figure 10. The body coordinate represents the filtered pose of the robot now. The robot was running a circle in the practical environment. We can see that the localization performs well even sometimes the GNSS receiver is out of the lock and the result is ready for navigation. The localization and path planning could fail sometimes, even if so many sensors have been deployed. In this application, we designed a remote controller module. Users can control the robot remotely with real-time video streaming.   The Figure 11 shows a picture of an experiment environment for a surveillance job, this picture is captured from the camera on the robot. In this environment, the road is narrow and the GNSS signal is weak because of many trees. However, the proposed robot system works well. The practical results could be seen in Figure 12. The goal and the blue line are the planned global path. The green line is the local path planning result. We can see the robot locate itself well and find where to go correctly.

Environment Inspection
The dam crack identification and pedestrian detection are the key technologies related to the safety of the dam. We use the trained model of YOLO V3 [31] to recognize them. We use the dataset online to train the YOLO V3 algorithm and then recognize the dam crack. The YOLO V3 algorithm includes a darknet-53 module, eight DBL components, three convolutional layers, two upsampling layers, and two tensor concat layers. The darknet-53 includes a DBL component and five residual learning units res1, res2, res8, res8, and res4.
The DBL contains a convolutional layer, a BN layer and a leaky ReLU. The convolutional parameters of the convolutional layer are the kernel size 3 × 3, stride 1, same padding and output channels 32. The res1, res2, res8, and res4 contain 1, 2, 8, and 4 basic units of the residual learning, respectively, and each basic unit of the residual learning contains two DBL and an identity map. The convolutional parameters of the first DBL are the kernel size 1 × 1, stride 1, and the output channels are equal to the number of basic units of the residual learning. The convolutional parameters of the second DBL are the kernel size 3 × 3, stride 2, same padding, and output channels are equal to 2 times basic units of the residual learning.

Environment Inspection
The dam crack identification and pedestrian detection are the key technologies related to the safety of the dam. We use the trained model of YOLO V3 [31] to recognize them. We use the dataset online to train the YOLO V3 algorithm and then recognize the dam crack. The YOLO V3 algorithm includes a darknet-53 module, eight DBL components, three convolutional layers, two upsampling layers, and two tensor concat layers. The darknet-53 includes a DBL component and five residual learning units res1, res2, res8, res8, and res4.
The DBL contains a convolutional layer, a BN layer and a leaky ReLU. The convolutional parameters of the convolutional layer are the kernel size 3 × 3, stride 1, same padding and output channels 32. The res1, res2, res8, and res4 contain 1, 2, 8, and 4 basic units of the residual learning, respectively, and each basic unit of the residual learning contains two DBL and an identity map. The convolutional parameters of the first DBL are the kernel size 1 × 1, stride 1, and the output channels are equal to the number of basic units of the residual learning. The convolutional parameters of the second DBL are the kernel size 3 × 3, stride 2, same padding, and output channels are equal to 2 times basic units of the residual learning.
The three convolutional layers are CConv1, CConv2, and CConv3. The filter sizes of the CConv1, CConv2, and CConv3 are 512, 256, and 128 respectively. The convolutional layers CConv1, CConv2, and CConv3 respectively contain six convolutional operations. The convolutional parameters of the first convolutional operation are the convolutional kernel size 1 × 1, and the number of output channels is equal to the filter size. The convolutional parameters of the second convolutional operation are the convolutional kernel size 3 × 3, and the number of output channels is equal to 2 times the filter size. The convolutional parameters of the third convolutional operation are the convolutional kernel size 1 × 1, and the number of output channels is equal to the filter size. The convolutional parameters of the fourth convolutional operation are the convolutional kernel size 3 × 3, and the number of output channels is equal to 2 times the filter size. The convolutional parameters of the fifth convolutional operation are the convolutional kernel size 1 × 1, and the number of output channels is equal to the filter size. The convolutional parameters of the sixth convolutional operation are the convolutional kernel size 3 × 3, and the number of output channels is equal to 2 times the filter size. The upsampling layers Upsample1 and Upsample2 sample the input feature maps and the input feature maps of the concat layer into the same size.
The network structure is shown in Figure 13. This method can do the detection job at 10 Hz with less than 20% CPU occupying. Some results are shown in Figures 14 and 15. The three convolutional layers are CConv1, CConv2, and CConv3. The filter sizes of the CConv1, CConv2, and CConv3 are 512, 256, and 128 respectively. The convolutional layers CConv1, CConv2, and CConv3 respectively contain six convolutional operations. The convolutional parameters of the first convolutional operation are the convolutional kernel size 1 × 1, and the number of output channels is equal to the filter size. The convolutional parameters of the second convolutional operation are the convolutional kernel size 3 × 3, and the number of output channels is equal to 2 times the filter size. The convolutional parameters of the third convolutional operation are the convolutional kernel size 1 × 1, and the number of output channels is equal to the filter size. The convolutional parameters of the fourth convolutional operation are the convolutional kernel size 3 × 3, and the number of output channels is equal to 2 times the filter size. The convolutional parameters of the fifth convolutional operation are the convolutional kernel size 1 × 1, and the number of output channels is equal to the filter size. The convolutional parameters of the sixth convolutional operation are the convolutional kernel size 3 × 3, and the number of output channels is equal to 2 times the filter size. The upsampling layers Upsample1 and Upsample2 sample the input feature maps and the input feature maps of the concat layer into the same size.
The network structure is shown in Figure 13. This method can do the detection job at 10 Hz with less than 20% CPU occupying. Some results are shown in Figure 14 and 15.   The three convolutional layers are CConv1, CConv2, and CConv3. The filter sizes of the CConv1, CConv2, and CConv3 are 512, 256, and 128 respectively. The convolutional layers CConv1, CConv2, and CConv3 respectively contain six convolutional operations. The convolutional parameters of the first convolutional operation are the convolutional kernel size 1 × 1, and the number of output channels is equal to the filter size. The convolutional parameters of the second convolutional operation are the convolutional kernel size 3 × 3, and the number of output channels is equal to 2 times the filter size. The convolutional parameters of the third convolutional operation are the convolutional kernel size 1 × 1, and the number of output channels is equal to the filter size. The convolutional parameters of the fourth convolutional operation are the convolutional kernel size 3 × 3, and the number of output channels is equal to 2 times the filter size. The convolutional parameters of the fifth convolutional operation are the convolutional kernel size 1 × 1, and the number of output channels is equal to the filter size. The convolutional parameters of the sixth convolutional operation are the convolutional kernel size 3 × 3, and the number of output channels is equal to 2 times the filter size. The upsampling layers Upsample1 and Upsample2 sample the input feature maps and the input feature maps of the concat layer into the same size.
The network structure is shown in Figure 13. This method can do the detection job at 10 Hz with less than 20% CPU occupying. Some results are shown in Figure 14 and 15.

Conclusions
In this paper, a general robot system is proposed for dam surveillance. The robot is connected to cloud servers and terminal users by mobile internet and IoT network. Like the robot itself, this paper introduces mechanics layout, sensor selection, and the navigation method. A simple controller and a wheel odometry calculation are proposed and achieve good performance. A two-node EKF structure localization framework is proposed to solve localization problem, which fuses LiDAR SLAM, wheel odometry, IMU, and GNSS signals. For unexpected circumstances, a remote controller based on real-time video streaming is deployed as an emergency supplement. To make the whole system able to work all-time, a control state machine is also introduced. A YOLO v3 network is trained and deployed to detect dam crack and people around. From the practical experiments, this system can work well and is capable of the surveillance job for dams. Afterward, we will pay more attention to specific dams' surveillance jobs, such as intrusion detection and dam deformation detection. We believe robots will greatly improve the work efficiency for water conservancy in the future.

Conclusions
In this paper, a general robot system is proposed for dam surveillance. The robot is connected to cloud servers and terminal users by mobile internet and IoT network. Like the robot itself, this paper introduces mechanics layout, sensor selection, and the navigation method. A simple controller and a wheel odometry calculation are proposed and achieve good performance. A two-node EKF structure localization framework is proposed to solve localization problem, which fuses LiDAR SLAM, wheel odometry, IMU, and GNSS signals. For unexpected circumstances, a remote controller based on real-time video streaming is deployed as an emergency supplement. To make the whole system able to work all-time, a control state machine is also introduced. A YOLO v3 network is trained and deployed to detect dam crack and people around. From the practical experiments, this system can work well and is capable of the surveillance job for dams. Afterward, we will pay more attention to specific dams' surveillance jobs, such as intrusion detection and dam deformation detection. We believe robots will greatly improve the work efficiency for water conservancy in the future.