A full freedom pose measurement method of industrial robot based on reinforcement learning algorithm

: In order to improve the efficiency of robot operation in the field of industrial automation, a full freedom pose measurement method of industrial robot based on reinforcement learning algorithm is proposed. According to the characteristics of two-wheel independent driving industrial robot, the attitude of the robot in three kinds of moving modes in unconstrained space is calculated. The algorithm for measuring the full degree of freedom of industrial robot is given by the multi-agent method of population particle optimization combined with the reinforcement learning algorithm (PSO-QL). The experimental results show that the proposed method has the advantages of low accuracy, high measurement efficiency, high success rate of grabbing and avoiding obstacles and good application effect.


Introduction
The International Federation of Robotics (IFR) defines robot as a kind of semi autonomous or fully autonomous machine, which can complete the work beneficial to human beings.It is called industrial robot when it is applied to the production process, special robot when it is applied to the special environment, and service robot when it is applied to the family or direct service (Xue et al. 2017).As an important part of high-end manufacturing equipment, industrial robot has high technology added value and wide application range.It is an important supporting technology of advanced manufacturing industry and an important production equipment of information society in China.It will be of great significance to the future production and social development and the enhancement of military and national defense strength.It is expected to become another war after automobile, aircraft and computer Slightly emerging industries (Ying et al. 2017).In China, the reform of state-owned enterprises is vigorously promoted and market economy is implemented.Most industrial enterprises are undergoing restructuring.Advanced manufacturing technology is changing the traditional mode of production.The key to technological transformation is to improve the level of flexible production at the grass-roots level.Many enterprises carry out product transformation and production transformation in order to provide a robot market (Jason et al. 2018).In order to obtain high-quality, high-efficiency and low-cost products, enterprises can make a choice between human and robot without hesitation.
Industrial robots are used in the field of industrial automation.They are usually six DOF joint or parallel robots with fixed base (Luo et al. 2016).There are many kinds of robots applied in other fields.Although there are many kinds of robots in different fields, the measurement technology of full freedom pose of robots in different fields has great universality and can be used for reference.In recent years, with the continuous development of industrial robot technology and the continuous expansion of its application scope, some intelligent methods are widely used in the field of full freedom pose measurement of industrial robots (Jamshid et al. 2018).For example, Khatib uses the potential field force to measure the pose to solve the pose measurement problem, and proposes the artificial potential field method, which defines the repulsion force between the robot and the obstacle, and the gravity between the robot and the target point, to plan a smooth path through joint efforts; MengZhong Jie et al. study the use of visual servo technology to measure the robot's pose under the condition of large target size and limited field of vision; in the constrained space, the location of the mobile robot is studied by Sungon Lee et al. (David and Josh 2017).Although this method can measure the pose of the mobile robot, the measurement is slow and not accurate enough.Although these methods can measure the robot pose, most of them do not have the ability to interact with the environment and learn independently to adapt to the complex and changeable pose information, and the period of pose measurement is too long.
Reinforcement learning (RL), as an online, unsupervised machine learning method, takes the feedback of the environment as the input and selects the optimal action to achieve its goal through learning (Li et al 2019).Since the end of 1980s, great progress has been made due to the breakthrough in mathematical theory, and now it has become a hot direction in the field of machine learning.Reinforcement learning can be applied to any task involving sequential behavior, mainly in the fields of limited resource scheduling, robot measurement control and chess games (Peter 2016;Michael and Nikolaus 2018).Therefore, in the process of pose measurement, this paper uses the multi-agents reinforcement learning (RL) (PSO-QL) algorithm based on particle swarm optimization (PSO) to calculate the pose of the industrial robot based on the effective attitude sensor signal, so as to improve the measurement speed and accuracy.

The full freedom pose measurement method of industrial robot based on reinforcement learning algorithm 2.1 Individual description of industrial robot
The industrial robot is driven by two independent stepping motors, and its plan is shown in Fig. 1.Where, M0 is the center point of the robot, M1 and M2 are two smooth supporting points, R is the radius of the driving wheel, aq is the vertical distance between the two-wheeled contact point and the center point, a1 is the horizontal distance between the two wheel contact point and the center point, and a' is the distance between the two-wheeled contact point and the center point.Since the industrial robot uses two stepping motors as the driver, and each driver is limited to only two driving modes: forward rotation and stop, but no reverse ability, then the robot has only three moving modes: ① only the upper wheel rotates forward; ② only the lower wheel rotates forward; ③ the upper and lower wheels rotate forward at the same time.
Since any pose problem with constraint space can be decomposed into the pose problem with local unconstrained space , it is assumed that in unconstrained space, the industrial robot driven by two wheels independently meets the conditions of non integrity system, the wheel and experimental platform are pure rolling, the deformation of contact between wheel and experimental platform is ignored, and the friction between supporting point and experimental platform is ignored.When the CPU gives a pulse, the robot moves immediately without delay; when no pulse is given, the robot stops immediately without inertia (Gao et al 2016).

Mobile mode of industrial robot
The attitude description of industrial robot in unconstrained space is shown in Fig. 2 x n (1 ) M (3 ) Note: (1): μ n ; (2): wheels on top; (3): wheels on bottom

Fig.2 Posture description of industrial robots
When mode ① is adopted, the robot's attitude at time 1 n is as follows: Where: m  is the step angle of the stepping motor; is the attitude of the robot at time 0 n .When mode ② is adopted, the robot's attitude at time 1 n is as follows: When mode ③ is adopted, the robot's attitude at time 1 n is as follows: (3)

Pose measurement based on PSO
The inertia weight was first introduced into the velocity evolution equation of PSO algorithm at the IEEE International Conference on evolutionary computation in 1998 Equation ( 4) is the standard version of particle swarm optimization algorithm, in which represent the optimal value of a single particle and the global optimal value respectively; T is the particle state vector; W is the particle's update velocity vector.
The traditional RL algorithm is that a single Agent learns the appropriate behavior through repeated trial and error in the process of interaction with the dynamic environment, so as to evaluate the action taken, and get the optimal strategy through continuous trial and error and selection (Francesco et al. 2017;Wu et al. 2016).The Q-learning method in RL algorithm is selected here.The Q-value updating equation of single Agent is as follows: Where:  is the learning rate; 1 n R  is the return of the environment at time 1 n  ;  is the discount rate coefficient;   Au is the behavior set.Q-learning selects the behavior with the highest H value in each state.But for too complex problems, Q-learning needs too much computing time to achieve the desired goal, and PSO algorithm can speed up the optimization speed by properly adjusting the initial interval and fitness function (Li et al. 2017).PSO algorithm is considered to coordinate action selection among multiple Agents of QL algorithm.
The update speed of H value of Q-learning can be accelerated by PSO method.Substituting equation (5) into equation ( 4) to obtain the update strategy of H value based on PSO: Where, is the current Agent-Q value.Fig. 3 shows the improvement of the measurement algorithm of industrial robot's full freedom pose.If the updated equation of PSO and Q-learning is applied at the same time, the second equation in equation ( 6) can be changed into:

 
The attitude required for pose measurement of industrial robots is obtained from Section 2.2, and the speed is calculated from the multi Agents Q-Learning algorithm based on PSO.The sensor of industrial robot can measure its position.In the effective signal gap, the difference between the measured position signal and the calculated position signal is taken as the fitness function of the algorithm, while in the effective signal time point, the difference between all the measured position components and the calculated value is taken as the fitness function to further optimize the coefficients of the algorithm; and the maximum number of cycles is set.When the number of cycles is reached, the optimization cycle is ended unconditionally.Combined with the conventional PSO algorithm steps, the specific steps of the industrial robot's full freedom pose measurement algorithm are shown in Fig. 3.

Into the loop
The fitness function is called to calculate the fitness Call equations ( 2) and (3) to update the position vector Formula (5) and formula ( 6 According to the algorithm process of industrial robot's full degree of freedom pose measurement, the industrial robot's full degree of freedom pose is measured.

Experimental results
The industrial robot used in the experiment is FUNAC R-2000iB/210F, with RS-232 port and discrete signal IO interface.ATMEI company's AT90S8515 is used as the processor.The processor has 8 KB internal F1ash EPROM, 512 bvte SRAM, 5L2 bvteEPR()M, which meets the experimental requirements of this paper.The experimental site is an industrial small parts processing factory.The accuracy, efficiency and practical application effect of this method are verified by experiments.The experimental process is as follows.Figure 4 shows the solid model of industrial robot.

Fig. 4 Experimental industrial robot diagram 3.1 Comparison of accuracy of pose measurement
Three methods, i.e. the method in this paper, the least square method and the micro displacement cycle correction method, are used to measure the pose of the experimental industrial robot when it is running.100 results are randomly selected from the pose results measured by each method, and the results are divided into 10 groups on average to compare the pose measurement errors.See Table 1 for the specific comparison results.From the data in Table 1, it can be concluded that the mean root mean square error and the mean maximum error of the 10 pose measurement results of the proposed method are 0.32mm and 0.84mm respectively; the mean root mean square error and the mean maximum error of the 10 pose measurement results of the least squares method are 1.95mm and 4.88mm respectively; and the mean root mean square error and the mean maximum error of the 10 pose measurement results of the micro displacement cycle correction method are respectively 3.66mm, 7.12mm.It can be seen that the proposed method has the lowest error and higher accuracy when measuring the full freedom pose of industrial robot.

Efficiency comparison
In order to verify the measurement efficiency of the proposed method, three methods are used to measure the pose of the experimental industrial robot for 10 times, recording the time consumption of each pose measurement of each method and comparing the time consumption.Table 2 shows the details.Through the analysis of Table 2, it can be seen that the time-consuming of 10 pose measurements in the proposed method is lower than the other two methods, and through the calculation of the data in the table, the average time-consuming of the proposed method is 1.516s, the average time-consuming of least square method is 3.567s, and the average time-consuming of micro displacement cycle correction method is 9.243s, which shows that the time-consuming of pose measurement in the proposed method is shorter and the overall efficiency is higher.

Comparison of the measurement reality of robot's pose and moving path
Firstly, the robot's pose moving path during actual operation is recorded and saved, as shown in Figure 5; then the robot's pose moving path measured by the proposed method, the least square method and the micro displacement cycle correction method with the actual pose moving path is compared, and the measurement results of different methods are tested according to the comparison results.Fig. 6 shows the robot's position and attitude movement path measured by the three methods.Fig. 6 The movement paths of different measured positions and poses are compared By comparing the robot's pose moving path measured by the three methods in Figure 6 with the actual robot pose moving path in Figure 5, it can be seen that the robot's pose moving path measured by the method in this paper is very close to the actual pose moving path, almost consistent, while the other two methods have different degrees of deviation, which shows that the method in this paper has good measurement effect and measurement Higher accuracy.

Comparison of grasping effect of industrial robot
In order to verify the effect of this method in practical application, three methods are used to measure and record the position and posture of the industrial robot during the practical operation of small parts grabbing.According to the position and posture records measured by each method, the robot resets the grabbing task and grabs 20 times respectively.The number of successful grabbing and failure grabbing is comprehensively compared.The comparison results are as Table 3.Through the analysis of Table 3, it can be concluded that the number of successful grabs of industrial robots set in the proposed method is the highest, and the success rate reaches 90%.It shows that the measurement error of the proposed method is low, and the measured pose is closer to the actual pose when the robot grabs, which can better realize the grabbing task of the robot, and the effect is better in the actual application.

Comparison of obstacle avoidance effect of industrial robot under the proposed method
Based on the fact that there are many obstacles in the actual working environment of industrial robots, this paper analyzes the measurement of position and posture when robots avoid obstacles.Three methods are used to measure and record the position and posture when robots work.The recorded measurement position and posture is used to set the path for robots to avoid obstacles, and the actual effect of obstacle avoidance is observed and compared.

Experimental parameter setting
A corridor open space in the dormitory of the processing plant is selected as the experimental site.Obstacles are set in the open space.The experimental parameters are set considering the limitations of the real environment.The specific parameters of the experiment are shown in Table 4.

Barrier spacing 1m
The experimental total time 30 times

Experimental results
According to the obstacle avoidance routes set by each method, 10 experiments are carried out respectively.The error between the obstacle avoidance positions of each method in the experiment and the actual operation of the robot is calculated, and the obstacle avoidance effect of each method is analyzed according to the obstacle avoidance error results of each method.The scene of obstacle setting is shown in Fig. 7, and the comparison result of obstacle avoidance error is shown in Fig. 8.It can be seen from Figure 8 that the obstacle avoidance error of the proposed method is the lowest, and the error variation is very small, which shows that the obstacle avoidance position and posture measured by the proposed method are more accurate, the obstacle avoidance effect in real environment application is better, and the method performance is relatively stable.

4.Discussion
With the continuous development of artificial intelligence technology, the robot's ability to understand the environment has been strengthened, and intelligent algorithms for robot pose measurement have been widely used (Hao et al. 2018;Yuan et al. 2016).The application of reinforcement learning theory in the field of robot pose measurement makes the robot interact with the environment through "trial and error", which increases the robot's ability to understand the environment (Sun et al. 2016).In view of the shortcomings of the existing methods, this paper mainly carries out the following research work: The advantages and disadvantages of existing reinforcement learning algorithms in solving the problem of robot pose measurement are compared and analyzed.Combined with PSO-QL, it is used as the full freedom pose measurement method of industrial robots in this paper (Anupa et al. 2016;Xu et al. 2016).Aiming at the problems of low efficiency and low accuracy of traditional algorithm, the method proposed in this paper improves the efficiency and accuracy of measurement.
This method is used to realize the pose measurement of industrial robots in the dynamic and static environment.Starting from the practical problem of pose measurement, the proposed method is applied to the real robot in the real environment, and the effect and efficiency of pose measurement of robot are obtained, which proves the effectiveness of the proposed method, and the measurement effect in the static and dynamic environment is tested through experiments, to further verify the effectiveness of the proposed method in practical application.
The research of industrial robot pose measurement is closely related to its application environment and application background (Sui et al. 2017).The research content of this paper still has shortcomings.Further research is needed in the future.We can start from the practical application environment and robot type (Cheng et al. 2016), apply this method to more complex real environment, and test the accuracy and effect of this method, and then replace other types of robots, measure their position and posture changes, check the measurement errors, and further improve the measurement accuracy of the proposed method.

Conclusions
In this paper, multi-Agent reinforcement learning algorithm based on particle swarm optimization is used to construct the algorithm process of industrial robot's full freedom pose measurement.The proposed method is applied to the industrial robot of small parts processing factory.By measuring the static and dynamic pose of the robot, and applying the measurement results to the actual operation process, the application effect is compared with the least square method and the micro displacement cycle correction method, which verifies the high accuracy and ideal practical application effect of the proposed method in the actual measurement.In the future, we will continue to study this method, further reduce the actual measurement error, improve the accuracy of pose measurement, and improve the working efficiency of industrial robots.

Fig. 1
Fig. 1 Floor plan of two-wheeled industrial robotSince the industrial robot uses two stepping motors as the driver, and each driver is limited to only two driving modes: forward rotation and stop, but no reverse ability, then the robot has only three moving modes: ① only the upper wheel rotates forward; ② only the lower wheel rotates forward; ③ the upper and lower wheels rotate forward at the same time.Since any pose problem with constraint space can be decomposed into the pose problem with local unconstrained space , it is assumed that in unconstrained space, the industrial robot driven by two wheels independently meets the conditions of non integrity system, the wheel and experimental platform are pure rolling, the deformation of contact between wheel and experimental platform is ignored, and the friction between supporting point and experimental platform is ignored.When the CPU gives a pulse, the robot moves immediately without delay; when no pulse is given, the robot stops immediately without inertia (Gao et al 2016).2.2 Mobile mode of industrial robotThe attitude description of industrial robot in unconstrained space is shown in Fig.2.The , the size of  can balance the ability of global search and local search, and when  is large, the algorithm can continuously search for new areas, otherwise, it makes the algorithm focus on searching the area where the possible optimal solution lies;

Fig. 3
Fig.3 Algorithm process of full-degree-of-freedom pose measurement for industrial robots Fig. 5 Movement path of actual posture of robot

Table 1
Comparison of measurement errors of pose

Table 2
Time comparison of pose measurement(s)

Table 3
Comparison of grasping effect

Table 4
Experimental parameter setting