Priority-based state machine synthesis that relaxes behavior design of multi-arm manipulators in dynamic environments

This paper presents a framework governing multiple state machines (SMs) to generate behaviors for kinematically-coupled multi-arm robots. A SM is one of the most used tools to switch controllers in robotics, however, it represents only one motion at a certain time. Such a limit of representation not only restricts the potential of the kinematical redundancy but also significantly increases the number of situations we have to consider in behavior design. To relax the problem, this paper provides a distributed design scheme and an online synthesis method for SMs. The base of the framework is a task-priority-based control method for redundant robots. We expand it by introducing an abstract data structure specialized in robotic behaviors. The abstract reformulation naturally connects with SMs so that the framework realizes parallelism of sequential logic. A priority queue constructed in each control loop integrates SMs designed for each effector, and the framework automatically generates whole-body behaviors online. In dynamic simulations, the framework governs four SMs and achieves reaching tasks with a dual-arm manipulator at 1 kHz. It is shown that the loose coupling of the SMs yields reactive and flexible decisions in a dynamic environment. GRAPHICAL ABSTRACT


Introduction
A robot needs to utilize multiple local controllers dexterously, to execute various complex tasks. We then have to consider a mechanism for switching controllers, so to speak, a meta-control mechanism. Introducing such a mechanism is dual to controller modularization, which appears in the concept of the subsumption architecture [1] and the sequential composition [2]. Once we modularize controllers, we are released from great costs to design a perfect global controller. We only need to create multiple local controllers, which perform excellently just in some temporal and/or local situations. In this paper, we suppose that each local controller is designed for one specific task. Utilizing local controllers properly, the meta-control mechanism yields intelligent global behaviors as a result.
State machines (SMs) [3] are one of the most wellknown meta-control mechanisms. Besides, behavior trees [4], another one, can be rewritten into SMs. SMs are formulated as deterministic finite automata (DFA) [5] in the context of computer science. The DFA enable developers to write robot program concisely, and they connect local behaviors (controllers) by logic. There exists a trial that analyzes robotic behaviors managed by the DFA with the hybrid automata theory [6]. CONTACT  Reviewing the traditional SM-based meta-control, we can know that the SMs work well with a lot of robots with small degrees of freedom (DoF), such as vehicles and 6-DoF manipulators. In general, a robot with small DoF executes only one task at a certain time (e.g. locomotion or manipulation). This fact is finely compatible with the SMs: we can regard the robot motion as a single process here.
On the other hand, pure SMs would not be suitable for robots with large DoF. A robot with large DoF can execute multiple tasks simultaneously. Let us consider a dual-arm manipulator as an example. It can execute more than two tasks in parallel with its two arms. But unfortunately, the SMs destroy this fantastic parallelism because the SMs demand complete descriptions of robotic behaviors as a single state. Consider the dual-arm example and a situation when we want its left end-effector and right one to achieve tasks L and R, respectively. If both tasks are independent from each other, we do not need to specify the execution order. Nevertheless, the SMs strictly require us to decide the order for the behavior, as 'do L before R,' 'do L and R in parallel,' or 'do L after R,' etc. We are enforced to add unnecessary temporal logic to the tasks, in order to describe multiple tasks as a single process. Such a requirement of completeness directly causes the frame problem [7]. As the number of tasks increases, the completeness not only restricts autonomous robotic decisions but also explodes the number of possible situations we have to consider in the order of combination. That is, the pure SMs impose the curse of dimensionality [8] on us.
For these backgrounds, this research is motivated to relax the completeness of robotic behavior design. In this paper, we are going to build a framework in which the robot resolves the ambiguity of the behavior descriptions and decides task achievement orders autonomously. This challenge contributes not only to reduce workloads of robot programmers but also to realize autonomous decisions of robots with large DoF.
It is important to recognize that robotic motions are quite different from processes on a standard computer: robots always work in the real world and are constrained by the laws of physics. Some readers may think that we can apply parallel SMs [9] to this research. However, even if a task for the left arm and one of the right are independent, the two arms are physically coupled via the robot trunk. That is, the two motions (processes) interact at all times. As we investigate above, the structure of robot motions is significantly different from that of processes on a computer. We need to resolve the discrepancy between the tightness of physical coupling and the looseness of the task definition. Therefore, this paper will introduce a concept of a new data structure specialized in robot motions to combine multiple SMs with low-level feedback control laws in a light way. The concept of this research is closely related to hierarchical SMs [10,11]. Introducing hierarchy is an effective approach to reduce the complexity of SMs. We will construct a similar SM structure to arbitrate multiple SMs in Section 4. The advantage of the proposed hierarchy-like structure is that it can activate some SMs partially, while the standard hierarchy just switches lower SMs exclusively. In other words, the proposed framework allows non-Boolean activations of SMs by considering the DoFbased feasibility.
This paper is organized as follows. In Section 2, we revisit the priority-based motion control for redundant robots [12]. A data structure specialized in the robots hides behind the control algorithm, so we find and reformulate it in Section 3. Furthermore, we explicitly discuss the difference between logic and priorities there. In Section 4, we extend the control algorithm by connecting multiple SMs defined with the special data structure. Section 5 shows implementation examples for a dual-arm manipulator, with dynamic simulations. We observe that the proposed framework autonomously generates wholebody behaviors online. Finally, Section 6 concludes this paper.

A revisit to task-priority-based control
A robot with large DoF, such as a dual-arm manipulator and a humanoid, can try more than two tasks simultaneously. This convenient property is known as redundancy. The redundancy enables the robot to execute complex tasks which have multiple objectives, e.g. avoiding obstacles during manipulation, minimizing energy consumption, and satisfying joint limits. Furthermore, the redundancy also gives robustness around kinematic singularities to the robot control. On the other hand, the redundancy makes the robot control complex. Control problems for a redundant robot sometimes become under-determined, i.e. we get an infinite number of control law candidates. In this section, we revisit the taskpriority-based control [12], one of the most effective methods to resolve the redundancy.

Redundant robot kinematics
For the index i = 0, 1, . . . , N − 1, consider forward kinematics (FK) f i with respect to the ith effector where q ∈ R n is the robot configuration, and x ∈ R m i is the ith effector position. We can then consider the configuration space Q = {q} and the ith task space Although the FK (1) have strong nonlinearity due to revolute joints, the derivative forms a linear relationship called differential forward kinematics: Here, Let us assume that n ≥ m i ∀ i, in order to consider a redundant robot, which has large DoF. If J i is full row rank, namely rankJ i = m i , we can solve the linear The generalized inverse matrix J − i is defined as a matrix satisfying the following condition [13]: Note that J − i is not unique because the matrix J i has the null space. Thus, Equation (3) can represent all solutions of the inverse problem of (2). Although the following discussion does not depend on the choice of J − i , we select the pseudo-inverse J † i , one of the generalized inverse matrices, to use for simplicity. In implementation, we can use regularization methods to stabilize numerical calculation of the pseudo-inverse matrices (see discussion in [14] for more detail).

Algorithm of the priority-based control
We start to think about how we realize multiple task execution. In this problem, we assume that all tasks are represented by the desired effector velocityẋ i ∀ i, and the index i represents the priority order: the bigger the index, the lower the task priority. For example, the 0th taskẋ 0 is the most important to execute.
At first, the robot has to realize the 0th task. The following joint velocityq 0 is a general form of the solution to realize the most prioritized taskẋ 0 : In this equation, k 1 ∈ R n is an arbitrary vector, and the second term on the right-hand side represents the extra capacity for additional tasks. The matrix (I − J † 0 J 0 ) projects k 1 into the null space of J 0 orthogonally. Thanks to this projection, the vector k 1 does not affect the 0th task execution.
We next ask the robot to execute the 1st task as much as possible. If the objectives of the 0th and 1st tasks conflict, the 1st task will not be executed completely because of the prioritization law. On the other hand, the objectives of the 0th task and the 1st task may partially match. Therefore, the robot just needs to try the execution of (ẋ 1 − J 1 J † 0ẋ 0 ). The 0th control law, whose priority is higher, acts as a constraint here. Since the robot can only use the second term on the right-hand side of (5) for the 1st task, the suitable vector k 1 is derived as The pseudo-inverse matrix [J 1 (I − J † 0 J 0 )] † minimizes the Euclidean norm of the error between the taskẋ 1 and the solution J 1q , in the tangent space T x 1 X 1 . Setting , we can get a general form of the joint velocityq 1 which executes the 0th task completely and performs the 1st task as much as possible: k 2 ∈ R n is an arbitrary vector working similarly to k 1 . Iḟ q 1 still has the extra capacity for additional tasks, we can recursively apply the same task insertion until the capacity is filled. Algorithm 1 summarizes these procedures.
Break the For loop end if end for returnq

Reformulation with an abstract data structure
In the previous section, we review the priority-based control, following the original paper. This section reformulates the control law by introducing a new data structure abstracting a robot configuration space.
At first, we define the formal structure of a modularized local controller. Let S and E be a set of the robot state variables (e.g. joint angles and velocities) and one of the environmental information, respectively. Definition 3.1: A prioritized local controller is a 3tuple C = (ẋ, J, p) with the following attributes: • The control velocity,ẋ : S × E → R m . In this paper, this is equivalent to a local feedback law.
shows the task space and its local coordinates used for the local controller design. • The priority of the local controller p : This definition does not specify the dimension of the task space and subsumes various controllers. It enables us to treat all local controllers as abstract data objects in a unified fashion.
We next introduce a priority queue, which is used widely in robotics, especially in motion planning [15].

Definition 3.2:
A priority queue is an abstract data type providing the following three operations: (1) Push operation adds an element to the priority queue. Having the two definitions, we can reformulate the priority-based control as shown in Algorithm 1. If the set of controller {C} is static, we can regard the priority queue Q as a kinetic priority queue [16].
Note that we do not do anything special here but change our perspective. A slight difference is that the reformulated algorithm includes the priority-based sort of local controllers. By the reformulation, the prioritybased control morphs into a well-defined procedure for the expansion. Once given a set of prioritized local controllers, Algorithm 2 governs them and produces a robot behavior. The concept of the reformulation is illustrated in Figure 1. We can find that the robot DoF has an array-like data structure. Until the array becomes full, the priority-based control law repeatedly tries to insert the local control velocities into the array via the null space projection.

What the priority means
At this point, we need to correct our understanding of the priority. We sometimes confuse the used priority with the temporal priority. The priority order shows not 'what the robot should finish first' but just 'what the robot should do NOW,' namely, the priority represents spatial orders rather than temporal ones. We can easily recognize this proposition by referring to Algorithm 2: the priorities are evaluated just in each control loop. The priority often yields a temporal order of task achievements (e.g. in situations the priority order does not change for a long time), but it is just a result. This paper surely separates them: the priority in this paper is a spatial one, and we will introduce SMs to treat the temporal order in the next section.

State machine synthesis
In this section, we construct a framework for synthesizing multiple SMs. The priority-based control manner is ready to be combined with them. The framework applies some techniques used in the reactive motion planning framework named RMPflow [17], especially controller definition in task spaces and online controller synthesis.
Before the construction, we define a SM in a suitable form for this research by modifying the definition in [5]. function is called at the beginning of each control loop. It takes a prioritized local controller selected in the previous loop, the robot state, and environmental information as arguments and returns a prioritized local controller used in the current control loop. • The current controller C current ∈ C. This must be initialized to the initial controller C 0 ∈ C so that this SM functions after booting up.
While the transition is often implemented as an eventdriven action, this paper supposes that the transition function δ is called at the beginning of each control loop. This supposition makes the transitions active; passive transitions are not suitable for this controller design. At a certain time, a SM points to only one prioritized local controller corresponding to one state in the SM. We can get the pointed controller by referring to C current of each SM. 1 Up until this point, we finish the preparation to synthesize multiple SMs. Each SM provides one prioritized local controller C current at a certain time, and the prioritybased control takes a set of controllers as an argument (see Algorithm 2). Therefore, we can design a control loop described in Algorithm 3.   Figures 1 and 2 side by side, we can command the whole view of the control framework. Here, Algorithm 3 constructs both the whole-body controller and the whole-body SM simultaneously.

Algorithm 3 The control loop
It is important to note that the proposed framework includes the hierarchical control technique [18]: it is the case when all controllers have a constant priority. The proposed framework namely provides dynamic hierarchy and autonomously activates/inactivates SMs with dynamic prioritization.
Furthermore, the framework also provides parallelism of sequential logic. This parallelism is a key to making the most of the advantages of multi-arm robots. We need to focus on preceding approaches treating parallelism. Parallel hierarchical SMs was proposed in [9] and applied to the cooperation of two robots, but cases the robots are kinematically coupled are out of the scope. Besides, temporal logic can be another candidate for parallelism representations. A linear temporal logic-based planning was presented in [19], however, the reported computational time due to mixed-integer linear programming is too large to apply to real-time planning. The proposed framework satisfies both requirements, i.e. (1) realizing parallelism of sequential tasks with resolving problems due to kinematic coupling and (2) generating robotic behaviors with few computational costs in order to deal with dynamic environments. In the next section, we can observe a benefit of the parallelism with a dual-arm manipulator. The robot avoids the whole body getting stuck even if one arm gets so.
Remark that this formulation does not consider constraints such as joint limits and avoiding self-collisions. It is because we need to focus on the main interests, the abstract data structure in kinematics and internal SM behaviors in the proposed framework. On the other hand, in real problems, we need to consider the constraints for the safety of robots and environments. One candidate is filtering generated control input using quadratic programming based on control barrier functions [20], which guarantees to satisfy the inequality constraints for systems with velocity inputs. Another approach is applying hierarchical quadratic programming [21] to SM synthesis instead of the simple projection-based technique introduced in Section 2.

Implementation examples
In this section, exemplification helps us to understand how the proposed framework works effectively. As a platform, we use the NEXTAGE upper-body humanoid [22] (see Figure 3). It has 15 DoF (6 DoF for each arm, 2 for the neck, and 1 for the waist), and we fix the two neck joints unused in manipulation tasks in the following simulations. Thus, our controller uses the remaining 13 DoF. The two arms are kinematically coupled by the waist joint. Note that the robot model written in the URDF (unified robot description format) is published online [23]. We build two SMs for each hand, so the robot simultaneously holds four SMs in total. The first SM is introduced for reaching tasks and consists of three controllers. All reaching controllers use the same control law based on the classical artificial potential method [24] and only differ in their target poses. They yields the reference velocities in its task space byẋ = x target − x, where x target ∈ R 6 is the target pose of the end-effector, which includes the position and the orientation. We have to pay attention to the singularities of the Euler angle representation in a general case. The transition requests the robot to achieve reaching tasks in the placement order of the targets. We set constant priorities for the controllers as priority baselines: 1.0 for all controllers working for the left hand and 0.99 for the right hand. Thus, the robot prioritizes the left hand reaching if the right hand reaching competes. Figure 4 visualizes this SM.
The other SM aims to avoid an obstacle. This SM has one controller based on the virtual impedance method [25], and it commands only the effector position; the orientation is not controlled. Since we will set a sphere-shaped obstacle whose radius is 0.15 m, the reference velocity is calculated bẏ where x obs ∈ R 3 is the center of the obstacle. The threshold d margin = 0.25 m defines the repellent region.
Although this SM has one controller, we can naturally rewrite it into a SM with two states by parsing the conditional branch of (8). Figure 5 visualizes the rewritten SM. We design the priority as exp(1/(0.3 − x − x obs ), so that we expect the avoidance controller to function if each hand gets closer to the obstacle (less than about 0.3 m).  The proposed framework is implemented completely in C++. Choreonoid [26] provides the dynamic simulation environment with a time step 1 ms, and the framework synchronously runs. The real-time factor is 1.0 for all simulations with Intel Core i7-8750H CPU @ 2.20GHz and 16GB RAM, and one control step spends less than 36.3 μs on average. These results show that the framework can control the robot at more than 1 kHz even for real robots. All joints are controlled with high gains to follow the reference joint velocities while satisfying the joint angle limits. Note that we use the same program for control in the following two simulations. Figure 6 shows the snapshots of the robotic behavior in the free space. Only the reaching SMs are valid over this simulation because no obstacle is placed there. At first, both the robot hands reach to the targets closest to the body. They next do so to the second targets in the same way. However, the robot cannot achieve the third (farthest) reaching tasks without using the waist joint. Since we gave a rule in which the robot prioritizes the left hand rather than the right, the robot first clears the third task of the left hand. Figure 7 shows the end-effector trajectories for this simulation. Transitions of the SMs for the reaching tasks yield discrete joint velocity changes. Although this example uses the simple controllers to focus on how the proposed framework works, we can improve the convergence performance by adjusting the controller gains or applying a different controller.
On the other hand, Figure 8 shows the snapshots of the robotic behavior with a moving sphere-shaped obstacle. In contrast to the free space simulation, the proposed framework autonomously decides to tackle the feasible tasks, considering the dynamic environment. The kinetic priorities of the obstacle avoidance controllers yield interruptions for safety. This flexibility comes from the looseness of the SM coupling. It is important to note that the robot gets the correct obstacle position in this simulation because object recognition is outside the scope of this paper. Figure 9 shows the end-effector trajectories for the simulation with the dynamic obstacle. Not only transitions of the SMs for the reaching tasks but also activations and inactivations of the SMs for obstacle avoidance yield discrete joint velocity changes. Therefore, the discontinuous changes seen in the joint trajectories are more than in the previous simulation. Figure 6. The snapshots of the generated robotic behavior in the free space. Since the robot cannot achieve both of the third (farthest) reaching tasks without using the waist joint, it tries them one by one. The robot tackles the third left task first, because of our prioritization.    The state machine for one arm required in traditional robot programming. Even though this state machine can become simpler by introducing hierarchy, the state machine required by the whole-body forms the cross product of two state machines in this figure at least. If we consider the resolution of arm controller conflicts in the state machine layer, the whole-body state machine becomes much more complex.
Changing the obstacle motion, we can observe that the framework generates various orders of task achievements without any program changes. If we realize such control with conventional methods, we need to construct a horribly complete SM by fusing the four SMs together. Figure 10 is a SM generated by a combination of one reaching SM and one avoiding SM, and the SM required by the dual arm robot is much more complex. Generally, the whole-body SM requires us to consider a huge number of situations, and the number increases in the order of combination. In other words, we will face a combinatorial explosion with the conventional methods. The proposed framework elegantly avoids such a problem by entrusting the consideration to the robot.

Conclusion
This paper proposed an algorithmic framework for synthesizing multiple SMs by extending the prioritybased control technique. It significantly reduces the costs of redundant robotic behavior design. Moreover, the allowed ambiguous behavior description gives the robots a right to select controllers, and the robot flexibly decides the task achievement order. The example simulations show that the framework generates the wholebody motion of the dual-arm manipulator, both in static and dynamic environments. The result also implies the effectiveness of combinatorial symbols in the contexts of symbolic planning [27].
On the other hand, there exist a lot of things we have to tackle. The most important one is priority design. It is still heuristic in this paper, so a clear guideline for the design is required. Currently, there exists a trade-off between the difficulties of state machine design and priority design. For example, in cases the number of environmental variables (e.g. obstacle positions and shapes) is few, the priority design is easier than the state machine design. We can design priorities by considering relative behaviors of priorities in a space of environmental variables because priorities are scalar functions. The simulations shown in Section 5 satisfy the condition. Besides, the framework has discrete behaviors. System analysis, especially focusing on stability, is the next work.