Swarm Behavioral Inversion for Undirected Underwater Search

Coordinating the dynamics of large groups, or swarms, of autonomous underwater vehicles in order to search a given target area can be difficult due to the plurality of the system, environmental complications, and the prolonged and indefinite duration of the patrol. This paper examines the use of swarm inversion to optimize the behavioral dynamics of a swarm of autonomous agents in a patrol search with underwater morphological and environmental constraints. In particular, the range of the forward-looking sensor range of agents varies spatially, requiring more search time in dark areas to maintain a high level of surveillance. This results in a tradeoff between the uniform coverage and surveillance frequency. The patrol fitness is determined via simulation feedback, and particle swarm optimization is used to invert and refine the behavior of the swarm. The tradeoffs between high search frequency and search uniformity are examined, as well as the evolved swarm’s adaptability to varying environmental conditions and robustness of agent numbers. Results demonstrate that swarm inversion can yield effective agent behaviors for maintaining a presence in a given target zone despite stochastic navigation and an anisotropic environment.


Introduction
Unmanned vehicle autonomy is an increasingly active area of artificial intelligence research that seeks to implement effective decision-making algorithms in undirected vehicles. Designing the appropriate control scheme to achieve the desired vehicular response in all possible circumstances, and thus fully characterizing the explanation facility, is often a difficult if not unobtainable objective. Thus, the state-of-theart autonomous control is largely comprised of ad hoc expert systems that may exhibit undesirable emergent behaviors when the operational theater is perturbed. Nevertheless, autonomous systems are necessary as direct and timely human control of all factors may be outside the capability of the operator, and deterministic control is limited by scope and adaptability, and thus has potential for vulnerability.
The details of autonomy become increasingly complex when dealing with unmanned, multi-agent systems. Large groups of autonomous, interacting agents, or swarms, can have emergent behaviors that are difficult to predict without simulation or physical implementation. However, the endeavor is worthwhile as large, interacting groups can often accomplish tasks individuals cannot, but the nature of mission execution and other subsequent emergent characteristics can be difficult to predict via analytic inspection [3,10]. Swarm technology and evolutionary techniques address this issue by offering a robust and adaptive approach.
Swarm intelligence refers to large groups of agents interacting under simple rules that exhibit some emergent behavior and has applications in communications [5,12], robotics [2], and optimization [6,17,18]. Prospective advantages of swarm intelligence include swarm robustness, plasticity, and decentralization [3], ideal characteristics for governing the interactions of a large group of autonomous vehicles. Emergent behaviors are often observed as a consequence of a given set of antecedents (e.g., sensor readings). The inversion of this process is to define criteria for the consequent or agent action and then refine the antecedents. Swarm inversion is the specific application of an optimization technique to multi-agent systems that seeks to develop optimal rules of operation for all agents by refining their behavioral responses. These responses are the focus of the optimization, since swarms that grow in capability can make the problem trivial. Variant techniques based on evolutionary algorithms have been applied to large groups, primarily in simulation and robotics [9,11,15,17,19].
Our proposed application of swarm inversion addresses the problem of dynamic undirected searches, specifically applied to an underwater patrol scenario. Here, a swarm of autonomous underwater vehicles with multi-channel sensing capabilities is given a limited amount of time to establish and maintain a presence in a given target zone. The primary difference between this scenario and similar work [8,9,11] is the inversion algorithm, the nature of the agent's control parameters, and specific underwater morphological constraints. There is no external performance behavior being sought. Any emergent behavior that performs well within the imposed physics-based constraints is acceptable. We use the Combs swarm inversion method which involves the coevolution of disjunctive fuzzy logic [7]. The reader is referred to this work for a discussion of details on the inversion process as well as a discussion of prior work which leads to use of the Combs method.
The agents we use do not leave pheromone trails [8] for other agents to find nor will they follow waypoints or landmarks. They will not be able to directly communicate among each other or to any central controller. Although this agent ability could be added, neither of these capabilities were deemed characteristic of the agents we have in mind for the operation of surveillance [4].
An illustrative analogy to our problem is the patrol of a field at night by agents with limited vision. The field in some areas is illuminated so that the surveillance can be performed quickly over a large area. In darker areas, more time is required to assess safety status with the same certainty. The field can be occupied at a point at any time by an enemy agent, so the same regions must be repeatedly inspected. The average overall surveillance frequency of the field can be made high by agents spending all of their time patrolling well-lit areas. This, however, leaves darker areas uninspected for long periods of time. Requiring inspection of darker areas when there are fixed resources leaves welllit areas less frequently visited. There is therefore a tradeoff between overall average surveillance frequency and the uniformity of coverage.
There are sensible, deterministic tactics to search a given terrain. Agents can line abreast and move in formation, comb the area, or follow a pre-planned path. However, path planning in an anisotropic environment is not a trivial task. Planned paths also display behaviors that are relatively easy to observe, ascertain, and subsequently circumvent. The stochastic nature of swarms makes the patrolling agents much more difficult to predict and counter, while the robust and adaptive nature of swarm intelligence would be advantageous in execution.

Patrol scenario
The underwater environment affects an agent's ability to search its surroundings by restricting communication and obscuring visibility. Unlike surface or aerial vehicles, an underwater vehicle utilizing acoustic sensing has limited channels available and often has few forms of direct communication. Thus, their interactions are modeled here as indirect and passive; agents become aware of each other by observing proximity noise or crosstalk. The underwater environment can also contain acoustic shadow zones, areas where deviations in the sound speed profile cause refraction in acoustic transmissions, limiting an agent's effective viewable distance. For our simulation, a high-level surface attenuation map is assumed to be known or approximately calculable by the agent, whether a priori or in real time via environmental readings.

Agent morphology and coverage maps
The swarming model considered assumes a high-level environment attenuation map. Agents are modeled to have a maximum speed and yaw rate, and their acoustic sensing capabilities are approximated as a visibility arc representing the ensonified area with the highest probability of detection by that agent. As an agent travels, the previously ensonified areas are retained as a tail representing a memory component that is only known to that particular agent ( Figure 1). Each tail decays exponentially and eventually requires the agent to revisit and refresh these areas.
As each agent travels, an aggregate pixel coverage map is assembled, representing the combined coverage that a given pixel has been searched by any agent. This aggregate includes the decaying memory component of each agent. After a fixed iteration interval, the scenario is terminated, and the theater's final combined mean coverage and pixel standard deviation is recorded.

Visibility attenuation and interference
A high-level attenuation map is applied to the field. Each pixel in the map is assigned a value from 0 to 1 that represents a scale modifier to the agent's visibility. Lower values reduce the ensonified area of any agent on that pixel. Agents may also interfere with each other due to channel constraints. Whereas two agents in different channels will see each other if encountered, two agents operating in the same band generate crosstalk and confusion. Similarly, two agents within a close distance will generate proximity noise and overload all other acoustic signals, confusing both agents. This results in the agent's ensonified arc becoming void for that particular time step, and no contribution is made to the aggregate confidence map.

Genomic parameterization
The evolved agent genome is an array representation of each behavioral response parameter to a given sensor. A total of 10 evolvable parameters (initialized as uniform random variables over their entire dynamic range), characterize three primary sensors. Agents have sensors for their current position in the attenuation map via a Global Positioning System (GPS) or Inertial Navigation System (INS). Agents also have sensors for interference and are aware of the general direction but not range of the offending source. Finally, agents develop a response to the closest visible agent. Disjunctive fuzzy Combs control is achieved through For sensors 1 and 2, the agent responds with a unit vector in the direction of the nearest visible ally or noise source scaled by the evolved parameter value. For sensor 3, each agent retains its previous two headings and visibility levels in order to estimate the local visibility gradient of the attenuation map. The agent then responds with a unit vector in the direction of the maximum decreasing gradient, multiplied by a factor determined by the piecewise linear function and its current visibility level. The maximum decreasing gradient is calculated from the cross-product of the two previous headings, adjusted for the agent's turn direction. This allows the agent to estimate the direction of decreasing visibility.

Fitness function
Developing a well-tuned fitness function is imperative to optimizing the swarm's behavior for this simulation. Gaudiano et al. [8] examined evolving state transition parameters for a multi-agent system of missiles, concluding that the inversion process's performance was heavily influenced by agent initialization and fitness function, and that the formulation of the fitness function could introduce unwanted biases. Small adjustments made to the fitness function can drastically shift the inversion's solution, and each solution may have a range of fitness values due to initializations and noise. Several known strategies in developing the fitness function include the use of prior knowledge to limit the search space and fixed or de-randomized initializations [12]. To reduce the impact of initialization on the performance of the swarm, agents are initialized randomly around a fixed ring at the center of the field and given an outward initial trajectory.
The goal of this inversion is to direct agents into searching the field frequently and uniformly. However, there is an inherent imbalance between these two factors; perfect mean coverage is unobtainable due to the limited number of agents, but perfect uniformity can easily be achieved if all agents interfere with each other, contributing nothing to the aggregate map and resulting in zero standard deviation. As this solution is relatively simple to discover, a third term regarding agent interference was required to prevent the evolution from circumventing the true goal of the mission. To this end, the three major objectives were to maximize mean coverage μ of all pixels in the zone, maximize uniformity of coverage via minimizing the standard deviation σ among all pixels on the map, and minimize average ratio of time spent blinded by interference b. A uniformity weighting factor λ was incorporated to tweak the fitness function and direct the optimization between mean and uniformity, expressed below. The λ variable is a tactical variable chosen by the user in accordance with the degree of importance of σ. In Pareto optimization, such parameters are commonly used to tune between competing attributes in the design process.
The goal is to maximize this fitness value through simulation feedback. An exponential term was used to reshape the fitness landscape to reward higher scores.

Parameter inversion
Under default conditions with zero behavioral responses to the environment and ally interactions, agents produce a mean, per-pixel confidence map that reflects the high-level attenuation map, as depicted in Figure 2. The shadow zone is  With no specific behavioral responses, the swarms produce non-adapted mean pixel confidence maps (lower row) that reflect its respective attenuation map. The ideal swarm should search these areas uniformly at high mean confidence.
covered poorly on average relative to higher-visibility areas. An ideal swarm model should search all areas frequently and uniformly. A variant of Shi and Eberhart's modified particle swarm optimizer (PSO) [20] with re-initialization is utilized in optimizing the agents' response functions. Each PSO agent is a solution genome that is run through the simulation in order to evolve the fitness of the population. In general, a population size of 100 genomes searching over 200 generations is used. The PSO is used to optimize the 10 evolvable parameters of the agent behavioral response genome.

Simulation setup and scenarios
For this simulation, a fixed number of homogeneous agents are initialized randomly about a ring formation at the center of a 128 × 128 pixel square theater. Agents may freely leave the field but are attracted to the center once outside theater bounds. The simulation's frame rate is fixed and corresponds to a maximum step size of 1 pixel, giving an effective maximum velocity of 1 pixel per unit time. All agents are synchronized, updating their actions simultaneously, once per frame or time step. Agents have a maximum viewable distance of 5 pixels and a memory decay rate of 0.99 per time step. The swarm is allotted 10,000 time steps to complete their patrol. In the base scenario, there are 60 agents limited to two channels. A C++ program was written to perform the simulation and execute the PSO.
Several simulation variants are tested. First, agents evolve on a fixed map (map 1 in Figure 3) with varying values of λ in order to observe the impact of the fitness function on average confidence and coverage uniformity. The second variant introduces the different attenuation maps in Figure 3 into the training process in order to improve universal performance and demonstrate swarm adaptability. Finally, swarm robustness is examined by observing the performance of the evolved swarms with varying numbers of available starting agents.
For the second variant, there is an issue with calculating a fitness function for different visibility maps. Fitness scores are not even across maps: some maps inherently have higher mean visibilities or standard deviations, which directly influences the swarm's performance and fitness calculation. To address this issue, quick optimizations are run separately for λ = −1 and λ = 10 on each map. The fitness values for these two results are mapped to zero and one linearly. A final evolution cycling through all five maps is used to generate the solution.

Single map evolution using multi-objective fitness
Multi-objective optimization [14] demonstrates the inversion's ability to search the shadow zone given a specific fitness function. The fitness function portrayed in (1) yields a higher fitness value for swarms that achieve high mean coverage μ, low pixel image standard deviation σ, and low average blind-time ratio b. The weighting factor λ directly influences the fitness calculation: high values of λ correspond to a higher weighting on uniformity, or low standard deviation, while low values of λ signify higher emphasis on the overall mean pixel search. Intuitively, this means that when λ is low, the agents should avoid the shadow zone as it will decrease their visibility and thus the total mean pixel coverage. When λ is high, total coverage is deemphasized, and the evolution trades higher pixel coverage mean for the improved uniformity gained by searching the shadow zone.
A qualitative examination of the behaviors of the evolved swarms demonstrates these characteristics. Figure  2(a) presents the tested attenuation map. There is a readily apparent repulsion from the shadow zone in the λ = −1 solution, demonstrated in Figure 4(a). Most agents actively turn away from the shadow zone when they encounter the 0.6 visibility threshold. These actions reflect the largely negative repulsion in the evolved genome depicted in the simulation snapshot. For λ = 10, there is a visible swarming of the shadow zone due to the various levels of attraction provided by the corresponding piecewise response genome in Figure 4 Figure 6. Despite the stochastic nature of the swarm leading to variations in final confidence maps for repeated runs, the final confidence maps are uniform. The evolution stresses lower standard deviation with increasing λ. To drive the swarm toward more uniform searches in subsequent calculations, a weighting factor of λ = 10 is used in the optimization process.
As expected, single-objective variants of the fitness function did not yield promising results. Simply maximizing the mean is insufficient as this encourages the agents to confine their search to areas of high returns, leading to agents avoiding the shadow zone. Alternatively, maximizing the minimum pixel confidence had trivial improvement over the same evolution time due to the strictness of the condition. Uniformity constraints were found to require the blindness term b as otherwise fitness was driven to zero via interference at the expense of high coverage.

Map training and adaptability
The performance of the swarm was dependent on the visibility map. Agents that were optimized for one attenuation map did not necessarily maintain their performance for alternative fields, as demonstrated in Figure 7. This was expected as the crafted fitness function and resulting evolutionary process was not map invariant. However, these evolutions were still useful, as they provided information on what range of μ, σ, and b an optimized swarm on a given map will yield. Various representative maps were needed in the training process in order to address adaptability, and these extremes provide a method for comparison.  Simply summing the scores on all five training maps in order to calculate a genome's fitness is insufficient. Map scores vary with structure, leading to some maps rewarding disproportionately or having relatively lenient solutions. Map 5, in particular, improves the most, leading to the agents preferentially optimizing this map, often at the expense of the others. Normalizing the individual evolved performances was observed to reduce the bias in this process, displayed in Figure 7 (bottom row). Table 1 lists the average fitness values of 30 trials for each of the conditions. Agents trained on all the available maps with their fitness scores normalized performed consistently well on all maps. While map-specific evolutions often achieved the highest scores of any swarm for that map, they regularly underperformed on other attenuation maps.

Agent robustness
The robustness of the agents about the λ = 10 solution is depicted in Figure 8, calculated from 30 trials each. In the vicinity of 60 starting agents, there was little variation in the standard deviation of the agents. The mean confidence and average blind time do drift upward as agent numbers increase, but this is expected as more agents mean more coverage and also more opportunities for interference. However, these values do not change much. For the given circumstances, the swarm is robust and can maintain its performance despite slight variations in agent numbers.

Conclusions
Swarm inversion can be an effective tool in refining the behavior of a homogenous group of autonomous agents in order to complete a given task, often producing clever or unexpected solutions for the problem scenario. The classical advantages of swarms are demonstrable as the resulting agents were robust to changes in initial swarm size and adaptive to changes in the attenuation map. The inversion process was capable of developing an effective and robust behavioral guide for searching the given target zone. 8 International Journal of Swarm Intelligence and Evolutionary Computation Figure 8: The robustness on map 1 with λ = 10 about 60 starting agents, taken from 30 trials each. Standard deviation is consistent, while mean coverage and blind time increase with the number of agents. This is expected, as more agents mean more areas searched as well as more opportunities to encounter other agents and cause interference.
The effect of the weighting factor in the fitness function has an appreciable effect on the evolved performance of the swarm, where increasing the value of λ increasing the relative importance of achieving higher uniformity. Training on a wide range of maps can improve the general operation of the swarm at the cost of individual map performance.