A lobster-inspired multi-robot control strategy for monitoring non-stationary concentration fields

We propose a new lobster-inspired chemotaxis decentralized control strategy for monitoring a non-stationary concentration field using a team of nonholonomic mobile robots. The task of the team is to locate and trace the movement of the point (or points) with the highest field value (i.e. source), provided that the robots are not aware of the dynamics of the field and can only periodically sample the field at their locations. As an example of the concentration field we consider a population of biological species modeled by a self-organizing multi-agent system with agents acting as individuals of the population in accordance with some flocking rules. The proposed strategy combines the lobsters’ plume localization behavior and flocking mechanisms to efficiently solve the problem even with a small group of robots. Simulations and experimental works on physical unicycle robots are performed to validate the efectiveness of the approach for the cases of non-stationary fields.


Introduction
Monitoring the dynamics of a scalar physical field is one of the most common tasks performed by mobile robots in various physical environments. Regardless of their nature, most physical fields can be viewed as a field of concentrations of some substance non-uniformly distributed over the monitored area. Scalar concentration fields can have different origins: unwanted gas emission [1,2], pollution [3,4], populations of biological species [5] and others. Most of the fields studied in the literature have a clearly distinguishable source (or several sources) generating odor, which under the influence of external factors, such as wind, currents, etc., spread unevenly through the environment in the form of a plume. In this case, the process of odor source localization can be divided into three sub-tasks [6]: 1) plume finding, 2) plume tracking, and 3) estimating the actual location of the source. This scheme works quite well for fields with static sources, but it should be substantially reconsidered when sources can change their locations or when there are no sources at all [7]. A time-varying scalar field without sources arises, for example, as a result of swarm behavior of bio-populations, the distribution density of which is an object for detailed study. Studying such fields using mobile robots is a challenging problem, requiring the development of effective control strategies.
To date, a large variety of robot control algorithms and strategies have been developed to solve various problems related to inspecting and monitoring physical fields. An insight into the state of the art in the field in question can be gained from review papers [8,9]. In [8] the authors roughly divide odor source localization algorithms into four categories: gradientbased algorithms, bio-inspired algorithms, multi-robot algorithms, and probabilistic (map-based) algorithms. However, given that some bio-inspired and multi-robot algorithms implicitly use information about the field gradient, it is also justified to divide all these algorithms into two categories: gradient-based and non-gradient-based algorithms. A well-known representative of the latter is the 'Infotaxis' algorithm, which implements a behavior strategy of robots aimed to maximize the expected information gain. The 'Infotaxis' algorithm was originally proposed in [10] to overcome the main disadvantage of gradient-based strategies, which are not suitable for work under sparse sensing conditions. Then the 'Infotaxis' algorithm was repeatedly modified, for example, to deal with 2D and 3D turbulent environments [11] or be applicable to multi-robot systems [12]. Note that, despite its effectiveness in problems with sparse field measurements, it generally exhibits a slow convergence to field peaks and, as a consequence, can hardly be applied to the exploration of fields with time-varying sources. In the latter case, the gradientbased method is the most appropriate choice.
An elegant yet effective way to achieve the goals of monitoring in dynamic environments is to use bio-inspired strategies that imitate the behavior of biological organisms [8,13,14]. A popular example of this kind of algorithms is the lobster-inspired algorithm [15,16,17]. It implements a simple gradient-based strategy that can be expressed as the following rules [8]: the robot as a lobster should turn towards the direction of higher concentration sensed by two receptors, which are in the antennules, or go straight forward if the two antennules detect nearly the same concentration and the robot moves backward if neither of the two antennules detects concentration. Known implementations of the lobster-based approach involve small groups of robots and cannot be applied to any number of them. Nevertheless, the idea of mimicking lobster's foraging behavior is far from exhausted and can be extended to a group of robots that are assigned to monitor an extremely dynamic physical field. In the paper, we try to combine lobsters' behavior and flocking mechanisms, which are widespread in multi-agent systems, to effectively solve the problem of monitoring the field of concentrations formed by bio-populations.
The main contributions of this work are as follows. First, a new concentration field model is built on the basis of the flocking behavior of a multi-agent system. The model does not imply the presence of one or more sources, no matter whether they are static or dynamic, but instead, it is originated from the emergent behavior of agents, which makes changes in the distribution of concentrations (number of agents per unit area) over the search area poorly predictable. To the best of the authors' knowledge, such models have not almost ever been considered in the literature. Second, a lobster-inspired decentralized control strategy is designed for multiple nonholonomic mobile robots to solve the problem of monitoring this type of fields. The proposed control scheme considers each robot as an abdominal receptor and its neighboring robots as antennule receptors of a lobster, which makes it more reliable and scalable compared to traditional lobster-based strategies. Finally, we validate the control strategy in a simulation software and on the testbed platform developed by our team for testing purposes.
The remainder of the paper is organized as follows. In Section 2, a dynamic concentration field model based on the flocking behavior of a multi-agent system is presented and the problem of monitoring the concentration field by a group of mobile robots is stated. Section 3 introduces the main result: a cooperative lobster-inspired control strategy for multi-robot systems is presented. In Section 4, simulation and experimental results are provided to illustrate the performance of the proposed control scheme. Concluding remarks are given in Section 5.

Problem formulation
We consider a group of N mobile robots governed by the nonholonomic systeṁ where x i and y i indicate the position of the i-th robot in a global (inertial) reference frame, θ i is the robot heading angle; v i and ω i are the linear and angular velocities which are considered as the control inputs of robots. It is assumed that |v i | < v and |ω i | < ω. The task of the group is to monitor a non-stationary scalar field of concentrations given by the function For certainty, the concentration field in the paper is formed by a population of biological species modeled by a self-organizing multi-agent system in which agents correspond to representatives of the population. The dynamics of agents is described by the first-order integratorq where K is the number of agents, q a i = (x a i , y a i ) T ∈ R 2 is the vector representing the position of the agent, u a i ∈ R 2 is the vector representing the agent control inputs. The behaviour of the multi-agent system is determined by the following three Reynolds' rules [18]: (i) Collision avoidance -each agent tries to avoid collisions with its neighbors in the local neighborhood; (ii) Velocity matching -each agent tends to move in the same direction and at the same speed as its neighbors in the local neighborhood; (iii) Flock centering -each agent tends to the center of mass of the group formed by its neighbors in the local neighborhood. We use a simplified version of distributed flocking algorithms from [19] to implement these rules. It is natural to require that the agents' speed is significantly less than that of robots.
For the convenience of further development, the search area Q is divided into a given number of equal cells Q i,j , i = 1, n j = 1, m. Let a ij (t) be the number of agents in cell Q i,j at time t. Then, the state of the environment at time t can be represented as the n × m matrix The next step in building the model is to simulate the work of sensors that measure the number of agents (individuals of the population) in the vicinity of their location. For this purpose, we apply a convolution with the kernel H shown in figure 2 to the original matrix A(t). This gives the concentration matrix The size of the kernel H is a configurable parameter which depends on the characteristics of sensors. Now, the field function f (t, x, y) can be defined as   figure 3), we normalize the field values in accordance with the following formula: f * (t, x, y) = 255 r f (t, x, y), r is the brightness coefficient chosen according to the size of the convolution kernel. Figure 3 shows a concentration field that was generated by the procedure described above. The paper aims to develop a decentralized control scheme for the multi-robot system that allows it to monitor a dynamic concentration field given by the function (3), that is, the robots are required to locate and trace the point in the field that has the highest concentration.

A lobster-inspired multi-robot control strategy
The idea of the proposed approach to cooperative multi-robot control is based on mimicking the food-seeking behavior of crustaceans, particularly lobsters. Lobsters have three perturbationsensitive receptors: two on antennules and one in the abdominal region. The movement of a crustacean towards the food is conditioned by relation between signal levels of receptors in antennules and can be described by the following three rules [16].
(i) If the signal levels of both receptors differ from each other, the lobster turns toward the receptor that is more irritated; (ii) If the signal levels are about equal, the lobster continues to move straight ahead; (iii) If the signal levels are below a specified limit, it means that the trail of the food source has been lost and it is necessary to move in the opposite direction.   The abdominal receptor in this model is only responsible for determining whether a food source has been reached or not. Despite the simplicity of the rules, in many cases, they are sufficient for monitoring purposes. However, to achieve better group performance, the original lobsterbased control scheme can be modified. In what follows, we propose a cooperative control scheme that combines the lobster-inspired behavior and flocking mechanisms implemented via artificial potential fields.
In the developed approach, we consider each robot in the group as an abdominal receptor and its neighbours as antennules receptors. The robot compute the control signals based on discrete measurements of its current position q i = (x i , y i ) T ∈ Q and the field value s i ∈ R at q i as well as the measurements of its neighbors: q j and s j , j ∈ N i . Here, N i = {j : ||q i − q j || < R}, R is the interaction range.
Two potential forces act on each robot: field tracking force F f i and collision avoidance force F oi . The tracking force guides the robot along the calculated approximate gradient toward the expected extreme value of the field and is defined as According to (4), the search direction is chosen by comparing the measured field value of the robot and its neighbours. Thus, if s i is less than s j , the corresponding component of the force F f i is directed toward robot j. If the opposite is true, then this component is directed in the opposite direction, that is, from robot j to robot i. The force F oi ensure that robots keep the regular n-gon formation and that there are no collisions between them. Following [20], we define this force as where d is the desired distance between robots, L (L < d) is a design parameter that characterizes the size of the robot, c ij = c ji is a positive constant.
The resulting force F i := x di y di T is formed as the following weighted sum of the two forces defined above: where w 1 and w 2 are weighting coefficients that balance the forces and agree the units in which the distance and field values are expressed. Let q ei := e 1i e 2i e 3i T be the error vector in the reference frame linked to the i-th robot which is defined as T ei is the transformation matrix that rotates the inertial frame into the frame associated with the robot, θ di = atan2(y di , x di ). Then robot's discrete control laws can be derived (cf. [21]) as v ci (t) = sign(ṽ ci (t k )) min(|ṽ ci (t k )|, v),ṽ ci (t k ) = k 1 e 1i (t k ), ω ci (t) = sign(ω ci (t k )) min(|ω ci (t k )|, ω),ω ci (t k ) = k 2 sin e 3i (t k ) t ∈ T k , where T k = [t k , t k+1 ), t k+1 = t k + h, h is the control interval, k 1 and k 2 are positive parameters.

Simulation and experimental results
We validated the proposed approach through simulations and experimental work on the testbed platform TEMAR (Testbed Environment for Multiple Autonomous Robots) [22]. The purpose of the tests in both cases was to locate the point that has the maximum value of a time-varying field and track its movement. In the tests, we used a non-stationary field model generated by the method described in Section 2.
The first step in testing the proposed strategy is to carry out a series of simulations. For this purpose, we developed in Java a software environment that provides generation of physical fields and implementation of the control strategy. The distance from the geometric center of the group to the area with the maximum field value is taken as an indicator for evaluating the performance of the algorithm. Figure 4 shows examples of how this indicator changes with time in simulations for a non-stationary field.
The inappropriate behavior of the indicator in the first 6 seconds of the simulation for the non-stationary field is explained by the dynamic nature of the model. During this time interval, agents in the field model are forming a stable configuration, causing a rapid change in the position of the point with maximum concentration. In both cases, the robots gradually converge to the peak point and then follow it with insignificant fluctuations in its vicinity.
It should be noted that due to the gradient nature of the proposed method, there may be cases when robots lose the tracked area or cannot reach it. The first option is possible when agents in the population fall into subgroups, producing several local maxima of the field function. In this case, the robots will continue tracking the nearest separated group. The second option has not been observed in the conducted simulations.
In the second stage, we tested the approach by conducting experiments on differential unicycle robots within the testbed platform TEMAR. The testbed is based on the LEGO EV3 robots and includes both software and hardware parts, which provide component interaction and environment simulation. TEMAR has a modular architecture with all its components connected by a server application. However, this fact does not limit the platform to modeling the behavior of only centralized groups of robots. Each robot is an independent computing entity that uses the server application only to obtain additional information about its neighborhood that cannot be provided by onboard sensors. The main modules of the complex include: (i) Sensor simulation module. Although robots are equipped with a color recognition sensor with which stationary physical fields can be simulated, it is more flexible and efficient in terms of saving robot computing resources to generate such information on the server application and periodically transmit it to the robots. (ii) The precise location module is a special case of the sensor simulation module that provides robots with information about their exact location as well as the relative arrangement of robots between themselves and obstacles. Thus, this module simulates the operation of   The entire mission execution process can be roughly divided into three phases: gathering, searching, and tracking. According to the conditions of the tests conducted, the robots start the task from random positions, so the first phase consists in gathering robots into a stable formation, which in this case has the shape of a triangle. The emergence of such a formation is due to the structure of the artificial potential forces acting on robots: the collision avoidance force (5) and the field tracking force (4). After gathering into the triangle-shaped formation, the robots start moving along the field gradient towards the area of maximum concentration. Finally, once robots reach this area, they track it by changing their positions so as to remain in the vicinity of the extreme value. Figure 5 shows the above-described mission phases in one of the conducted experiments. The left part of each subfigure is obtained from the camera mounted above the scene, the right part is the corresponding graphical representation of the field created by the server application.

Conclusions
The series of tests showed the high suitability of the proposed approach for the monitoring task. The developed lobster-inspired algorithm has a high rate of convergence, allowing the search team to quickly reach the vicinity of the field point with maximum concentration. In addition, at the last phase of the mission (the tracking phase) the approach shows relatively small root mean square deviations of the group from the desired area, which allows monitoring the field for any required period. However, the proposed approach is in dire need of exploration mechanisms, which would provide a better performance with physical fields that have several local extremes of concentration. In terms of the proposed model, such fields emerge as a result of splitting the population of agents into subgroups that operate in a self-contained manner.