Autonomous Multi-Robot Search for a Hazardous Source in a Turbulent Environment

Finding the source of an accidental or deliberate release of a toxic substance into the atmosphere is of great importance for national security. The paper presents a search algorithm for turbulent environments which falls into the class of cognitive (infotaxi) algorithms. Bayesian estimation of the source parameter vector is carried out using the Rao–Blackwell dimension-reduction method, while the robots are controlled autonomously to move in a scalable formation. Estimation and control are carried out in a centralised replicated fusion architecture assuming all-to-all communication. The paper presents a comprehensive numerical analysis of the proposed algorithm, including the search-time and displacement statistics.


Introduction
Search strategies for finding an emitting source of hazardous substance based on sporadic sensory cues are a topic of great importance. In the context of national security, for example, the search could be for the source of an accidental or deliberate biochemical agent release into the atmosphere [1], which would require urgent attention. Similar techniques are also used in search and rescue missions and to explain the foraging behaviour of animals. In many studies of these search techniques, the searching agent is assumed to be mobile and capable of sensing the emitted substance from the source. The sensing cues are often sporadic, fluctuating and discontinuous, due to turbulent transport through the medium over a large search domain [2]. The objective of the search is to find the emitting source in the shortest possible time.
Recent advances in biochemical sensing technologies [3,4] have made the deployment of robots to search for such emitting sources very attractive [5]. Developing autonomous robots equipped with appropriate sensing capability to explore and search in harsh and dangerous environments, such as toxic or flammable ones, is the ultimate goal of this research. Due to its importance, the topic has attracted a great deal of interest from the scientific community. A survey including a taxonomy of pre-2008 search algorithms can be found in [6]. The dominant class of methods are bio-inspired strategies [7][8][9], which mimic the use of olfactory senses by bacteria, insects and other animals to localise food sources, detect predators, or find mates. Typically, the search actions depend on sensory cues. For example, upon sensing an odor signal, male moths surge upwind in the direction of the flow. When odor information vanishes, they exhibit random cross-wind casting or zigzagging to perform a local search until the plume is reacquired. Chemotaxis algorithms are a bio-inspired class of algorithms, in which the search is carried out by climbing the concentration gradient [10]. These strategies are effective close to the source where the odor plume can be considered as a continuous cloud. In the absence of positive detections, the robot may stay in one position, move linearly, or carry out a random walk. Once the chemical is detected, motion is directed towards higher gradients of concentration. These bio-inspired techniques are simple, as they require only limited spatial perception [11], however they are mostly ad hoc. Another popular class of algorithms, not mentioned in [6], are stochastic source-seeking algorithms [12][13][14]. While these algorithms are theoretically sound, they rely on smooth gradients of concentration. In the presence of turbulence, however, where the plume of the tracer randomly breaks up into time-varying disconnected patches resulting in intermittent measurements, these stochastic source-seeking algorithms are not practical.
Recently, a class of algorithms referred to as infotaxis [15] was developed specifically for searching in turbulent flows. In the absence of a smooth distribution of concentration, caused by turbulence, this strategy directs the robot towards the highest information gain. As a theoretically principled approach, with the source-term estimation being carried out in the Bayesian framework and the robot motion control being based on information-theoretic principles, the infotaxic strategy has been adopted by a number of research groups [16][17][18][19][20][21][22][23][24].
The goal of Bayesian estimation is to construct the posterior probability density function (PDF) of the source parameter vector, which typically includes its location, release rate and size. In [15] the estimation was carried out using an approximate grid-based nonlinear filtering technique [25] on a two-dimensional parameter space consisting of the x and y Cartesian coordinates of the source location. The source-release rate was assumed known, as well as the environmental parameters. The search was carried out using a single robotic platform. Grid-based nonlinear filtering was subsequently replaced with the sequential Monte Carlo method (also known as the particle filter) in [21][22][23]26]. The particle filter is more efficient than grid-based methods and is therefore capable of estimating the source-release rate jointly with the source location. Multi-robot infotaxis were investigated in [16,22,24].
Sequential estimation of the source posterior PDF using the particle filter is problematic when the state (the source parameter vector) is random, but stationary [27]. The absence of dynamics in the state implies that the exploration of the parameter space is limited to the first time index only. Even by artificially introducing dynamics to the state (e.g. a random walk-like dynamic with a small variance), the depletion of particles is inevitable over time and causes the collapse of the PF for long-duration searches [23]. In this paper we apply a Rao-Blackwell dimension reduction method to estimate the posterior PDF of the source parameter vector. This method computes the source release-rate, conditioned on the source location, analytically and uses the Monte Carlo method only in the space of x and y coordinates of the source. As a batch method, it does not suffer from depletion of particles over time. This paper studies the multi-robot search assuming a replicated centralised fusion with all-to-all communication. The robots move in a scalable formation, where the maximum size of the formation is determined by the specified communication range. Control actions are made autonomously by all robots using entropy reduction as the information theoretic measure. In computing the reward, however, we also introduce a cost of motion. The paper analyses the statistical patterns of the search, in particular the formation displacements and its scale.
The organisation of the paper is as follows. Mathematical models are presented in Section 2. The method of source parameter estimation is described in Section 3. Robotic platform motion control is explained in Section 4. Section 5 presents the numerical results, through simulations and using an experimental dataset characterised by a turbulent flow. Finally, the conclusions are drawn in Section 6.

Robot Motion Model
In this section we describe the motion model of each individual robotic platform. Suppose there are N ≥ 1 robotic platforms, with the pose of the ith platform at time t k denoted by a vector Here r i k = (x i k , y i k ) is the robot location and φ i k is its heading. The motion of the group of robots is coordinated, i.e they move in a formation. In particular, we use a circular formation with the N robots spread approximately equally on the perimeter of a circle (the formation is controlled to be equally spread on a circle, but due to process noise in the motion model, this can be only approximately achieved), whose center at time t k is (x c k , y c k ) with the radius ρ k ∈ [r min , r max ]. The minimum radius r min is introduced to prevent the robots from colliding; the maximum radius r max is to ensure all-to-all communication amongst the platforms. The bearing of the ith platform within the formation is fixed, while its radius is allowed to vary. The position of the ith platform is defined by the offset (∆x i , ∆y i ) from the centre (x c k , y c k ). All platforms in the formation have approximately equal orientation, while the offset is given by: for i = 1, . . . , N.
The biochemical sensors are activated at time instants t k , k = 1, 2, · · · to report concentration measurements. During the interval of time [t k−1 , t k ] between two consecutive sensing instants, the formation is moving. The duration of this interval, T k = t k − t k−1 ≥ 0, is referred to as the travel time.
The assumption is that sensing is suppressed during the travel time. The motion of the formation during the interval [t k−1 , t k ] is controlled by the following input parameters: the linear velocity V k , the angular velocity Ω k , and the formation scale increment ∆ξ k = ξ k − ξ k−1 (which can be positive or negative). Given the control vector u k = [V k , Ω k , ξ k , T k ] , the dynamics of ith platform during a short integration time [t − δ, t], where t ∈ [t k−1 + δ, t k ] and δ T k , can be modelled by a Markov process with the transitional density π( The mean of this normal density, β(θ i t−δ , u k ), is a nonlinear function specified as follows: where: describes the motion due to the formation-scale change. Vector C in (2) introduces the corrections to the pose in order to remove the difference between the desired (deterministic) pose: and the actual pose (perturbed by process noise) θ i k−1 at the previous sensing time t k−1 , i.e., Without the correction term C, the platforms could drift and break the formation during the search time. Finally, Q k is the covariance matrix which accounts for stochastic disturbances in motion. Figure 1 illustrates a simulated path of a robot formation based on the described motion model. In this illustration, the formation of N = 5 robots is expanding (the scales are ξ 1 = 1, ξ 2 = 4 and ξ 3 = 8). Note that, despite random individual paths, the formation at measurement time instants t 1 , t 2 , t 3 is almost a perfect pentagon. Apart from some randomness due to process noise, the motion model keeps the formation with the fixed bearings relative to its centre and a fixed north (i.e., the formation does not rotate).

Measurement Model
Dispersion of the emitted hazardous substance in a turbulent environment is modelled using the Lagrange particle encounters model developed in [15]. Suppose that the emitting source is located at coordinates r 0 = (X 0 , Y 0 ) and its release rate, or strength, is Q 0 . The source parameter vector is denoted η = [r 0 Q 0 ] . The particles released from the source propagate with combined molecular and turbulent isotropic diffusivity D, but can also be advected by wind. The released particles have an average lifetime of τ before being absorbed. Let the average wind characteristics be the speed U and direction, which by convention, coincides with the direction of the x axis.
Suppose a spherical sensor of small radius a is mounted on the ith robot platform, whose pose at (robot location is assumed non-coincidental with the source location r 0 ). This sensor will experience a series of encounters with the particles released from the emitting source. The average rate of encounters can be modelled as follows [15]: where D, τ and U are known environmental parameters, is the distance between the source and ith sensor platform, K 0 is the modified Bessel function of the second kind of order zero, and λ = Dτ depends on environmental parameters only. The stochastic process of sensor encounters with the dispersed particles is modelled by a Poisson distribution. The probability that a sensor at location r i k encounters z ∈ Z + ∪ {0} particles (z is a non-negative integer) during a time interval t 0 is then: is the mean number of particles expected to reach the sensor at location r i k during t 0 .

Problem Specification
Let z i k denote the sensor measurement recorded by the ith robot platform at time t k . The sequence of such measurements from platform i, starting from the beginning of the search until t k is a vector Similarly, the measurements from all platforms up to time t k is a vector z 1:k = (z 1 1:k ) , · · · , (z N 1:k ) . Accordingly we can define the vector of sensor measurement locations corresponding to z 1:k as r 1:k = r 1 1 , · · · , r 1 k , · · · , r N 1 , · · · , r N k . This vector is assumed to be known. Whenever in the text we refer to the measurement vector z 1:k , we implicitly assume that it is in pair with r 1:k .
Assuming all-to-all communication between the platforms for exchanging mutually their latest measurements and positions (ignoring the time-delays and bandwidth limitations), z 1:k and r 1:k are available at every platform at time t k for processing. Thus, every platform can independently carry out centralised fusion in order to estimate the source parameter vector η. This type of fusion architecture is known as replicated centralised.
Assuming the sensor measurements, conditioned on η, are independent, the likelihood function of the measurement vector z 1:k can be written as a double product (z 1: . We adopt the Bayesian estimation framework with the goal of computing the posterior PDF p(η|z 1:k ). In this framework, in addition to (z 1:k |η), one needs to specify the prior distribution of the parameter vector π(η). Then, using the Bayes' rule, the posterior PDF is: For the described problem, (7) cannot be solved analytically and we need to apply a numeric approximation. However, it will be shown that, assuming the prior of the source strength π(Q 0 ) is a gamma distribution, the posterior p(Q 0 |z 1:k , r 0 ) can be solved analytically. Because the posterior p(η|z 1:k ), using the chain rule, can be expressed as: we will only need to apply a numeric approximation to estimate the posterior p(r 0 |z 1:k ).

Solution
It is reasonable to assume that the source strength is independent of its location, and hence π(η) = π(Q 0 )π(r 0 ). Let us adopt for π(Q 0 ) a gamma distribution, with the shape parameter κ 0 and the scale parameter ϑ 0 : Note that for suitably chosen hyperparameters κ 0 and ϑ 0 , the support of this prior can cover a large span of possible values of Q 0 . The conjugate prior of the Poisson distribution is the gamma distribution [28]. Therefore, the posterior p(Q 0 |r 0 , z 1:k ) is also a gamma distribution, p(Q 0 |r 0 , z 1:k ) = G(Q 0 ; κ k , ϑ k ), with parameters κ k and ϑ k expressed analytically as a function of r 0 and z 1:k as follows (see Appendix A): where ρ r 0 (r i k ) = t 0 R η (r i k )/Q 0 , that is: It remains to compute the posterior PDF p(r 0 |z 1:k ): Compared to (7), which is a three-dimensional PDF, the posterior in (13) is two-dimensional. This dimension reduction improves the accuracy of the numerical solution. After a few lines of mathematical derivations one can show that the analytic expression for the likelihood function g(z 1:k |r 0 ), which features in (13), is given by (see Appendix A): A plethora of techniques is available for the numerical computation of (13), from grid-based numerical integration methods to Monte Carlo methods (e.g. the Markov Chain Monte Carlo, importance sampling and population Monte Carlo). We choose a Monte Carlo method which, for a given π(r 0 ) and z 1:k , approximates the posterior PDF p(r 0 |z 1:k ) by a random sample {r (m) 0,k } 1≤m≤M as follows: Here δ(r 0 − r (m) 0,k ) denotes the delta-Dirac mass located at r (m) 0,k . As the size of the sample M → ∞, the moments of p M (r 0 |z 1:k ) converge almost surely to the moments of p(r 0 |z 1:k ). An advantage of the Monte Carlo method over the grid-based (deterministic) integration is that the former positions its integration points (i.e., samples) in the regions of high probability [29].
The basic steps in estimation of p(η|z 1:k ) expressed by (8) are summarised in Algorithm 1. The expected a posteriori point estimates of the source location and its strength can be computed from the output r as follows: Finally, the Monte Carlo method which estimates p(r 0 |z 1:k ) in line 2 of Algorithm 1, was implemented using iterated importance sampling with progressive correction (IIS-PC) [30]. Full details of the implementation are given in [31]. A desirable property of importance sampling is to draw samples from an importance density that result in sample weights with a minimal variance. The question is how to design this importance density. The key idea behind IIS-PC is to achieve the goal of drawing samples with a minimal variance in a sequential manner, by constructing a sequence of target distributions from which to draw samples. The first target distribution is the prior, while every subsequent target distribution should increasingly resemble the posterior. A target distribution which can be used in this context at iteration s = 1, . . . , H of the IIS-PC is: where g(z 1:k |r 0 ) is given by (14), Γ s = ∑ s v=1 γ v with γ v ∈ (0, 1] and Γ H = 1. Note that Γ s is an increasing function of s, upper bounded by one. As a consequence, the intermediate likelihood is broader than the true likelihood, particularly in the early stages (for small s). Thus, the sequence of target distributions in (17) gradually introduces the correction imposed by the measurement z k on the prior π(r 0 ). To derive any benefits from IIS-PC, it is required after each stage to remove the lowly weighted members of the sample and diversify the remaining ones. Lowly weighted members are removed by resampling [27], while diversification of the remaining samples is performed by Markov transitions whose stationary distribution is the target distribution p s (r 0 |z 1:k ). The outcome is a diverse sample located in the region of the parameter space where the intermediate likelihood has non-zero values. The choice of correction factors γ 1 , · · · , γ H , as well as the number of iterations H are design parameters. Further details can be found in [31].

Robot Formation Control
Suppose at time t k−1 the robotic platforms have processed all available measurements included in the vector z 1:k−1 and estimated the posterior PDF p(η|z 1:k−1 ) using the method described in Section 3. Let A ⊂ R 2 denote a designated search area, which includes the source location r 0 . The key aspect of search is, for each robotic platform of the formation, to decide autonomously where to move next within A in order to acquire a new concentration measurement at t k , denoted z k = [z 1 k , z 2 k , . . . , z N k ] . An autonomous multi-robot search can be formulated as a partially-observed Markov decision process (POMDP) [32]. The elements of a POMDP include an information state, the space of admissible actions (controls) and a reward function. The information state, adopting the Bayesian framework for estimation of source parameters, is the posterior PDF p(η|z 1:k−1 ). Current knowledge about the source position and strength is fully specified by this density. A decision in the context of search is the selection of a control vector u k ∈ U k , where U k is the space of admissible actions. Finally, the reward function maps each admissible action into a non-negative real number, which represents a measure of the action's expected information gain. An optimal strategy selects, at each time, the action with the highest reward. Admissible actions can be formed with one or multiple steps ahead. According to the motion model introduced in Section 2.1, the space of admissible actions U k is continuous with four dimensions: V k , Ω k , ξ k and T k . In order to reduce the computational complexity of the numerical optimisation, we discretise U k and consider only myopic (one step ahead) control.
If V, O, S and T denote the sets of possible discrete-values of V k , Ω k , ξ k and T k , respectively, then U k is the Cartesian product V × O × S × T. The myopic selection of the control vector at time t k is expressed as: where D p(η|z 1:k−1 ), z k (v) is the reward function. Note that the reward function depends on the future measurement z k (v), assuming the control vector v ∈ U k has been applied. In reality, this future measurement is not available (the decision has to be made at time t k−1 ), and therefore the expectation operator E with respect to the prior measurement PDF features in (18). Previous studies of search strategies [16,23] found that the reward function measuring the information gain as the entropy reduction, results in the most efficient search. However, the earlier work neglected that, while traveling, robots incur certain cost. Assuming α is the cost of travel per unit distance, we adopt the following specification for the expected reward function: where s k (v) = V k T k ≥ 0 is the travelled distance under action v, H k−1 is the current entropy (based on z 1:k−1 ), i.e., H k (z k (v)) is the future entropy (after applying the control v): and E{H k z k (v) } is its expected value, with respect to the probability mass function P{z k |z 1:k−1 } = (z k |η) p(η|z 1:k−1 )dη: The exponential term, e −α s k (v) , in (19) incorporates the cost of travel, penalising longer travel distances. The computation of entropy H k−1 in (20) is carried out as follows. Recall that p(η|z 1:k−1 ) is approximated by a random sample {(r , which leads to the approximation H k−1 ≈ − 1 M ln 1 M . The computational expense of exact computation of E{H k z k (v) } grows exponentially with N because the sum in (22) is N-dimensional: it requires consideration of all possible combinations of measurement outcomes from N platforms. Hence we resort to a Monte Carlo approximation.
For a command vector v ∈ U k , first we draw a random sample {z (j) k } 1≤j≤J from P{z k |z 1:k−1 } using the following procedure. We randomly select a sample η * k−1 from {η (m) k−1 } 1≤m≤M , and then draw N times from the likelihood (z k |η * k−1 ). By repeating this procedure S times, J = SN samples of the measurement outcomes from N platforms {z (j) k } 1≤j≤J are created. Then (22) is simply approximated as: k (v) . The search algorithm needs to decide when to stop the search and report its final estimates of source parameters, denoted r 0 and Q 0 . The termination criterion is based on the spatial spread of the random sample {r (m) 0,k } 1≤k≤M , computed as the trace of its sample covariance matrix. When this spread is below a certain threshold, denoted , the search is terminated.
The key assumption of the replicated centralised fusion architecture is that the same input data (measurements z 1:k and the corresponding locations r 1:k ) are available to all robotic platforms for data fusion (parameter estimation and robot motion control). However, this assumption is not sufficient. As the Monte Carlo method plays a role in the data fusion, we must also ensure that the same pseudo-random generators, using the same seed, are running on each individual platform. In this way all platforms reach the same decision on the motion control vector u k and subsequently apply the motion model described in Section 2.1, knowing their own identification number i in the formation.

Illustrative Run
First we illustrate a typical run of the multi-robot search algorithm using the following parameters (all physical quantities are in arbitrary units (a. u.)): The initial position of the centroid of the formation was at (200, −250). The results of a typical run are shown in Figure 2. Figure 2a shows the search area, the paths of the multi-robot formation at k = 2 and the source location at (X 0 , Y 0 ) with the contour plot of the corresponding mean rate R η of (3). The random samples {r (m) 0,k } 1≤k≤M , approximating the posterior p(r 0 |z 1:k ), are shown as brown dots. Note that the search area is more than ten times bigger than the area where the source can be sensed. Consequently, the measurements z 1:k are initially zero vectors for a long time; on this occasion for k = 1, · · · , 32. At k = 33 the first non-zero measurement is recorded by one of the sensors, resulting in the posterior PDF p(r 0 |z 1:k=33 ), approximated by the sample {r  0,k } 1≤m≤M at k = 2 and k = 33, respectively. The true source location is indicated by a pink asterisk. The contours of the mean plume are plotted with blue lines. Figure (c) shows the prior probability density function (PDF) π(Q 0 ) (red dashed line), the posterior PDF p(Q 0 |r 0 , z 1:k ) at k = 48 (green solid line), and the true value of Q 0 = 4 (blue asterisk).

Monte Carlo runs
In order to understand the performance characteristics of the search algorithm, 200 Monte Carlo runs were performed of various scenarios. Unless otherwise specified, the parameters used were the same as specified in Section 5.1, but with the bigger search area, A = 750 × 750.  Figure 3 shows the mean search time when varying the number of platforms, N, from 1 to 10. Both the source location and initial centroid position were drawn at random from the uniform distribution over the search area. There is initially a significant decrease in the mean search time as platforms are added, but this eventually levels off. Even with a large number of platforms the formation still needs to get quite close to the source before there is a significant probability of a non-zero detection on any of the sensors, so the search time becomes dominated by the time spent exploring the area before that detection.   Figure 5 shows the effect of travel cost on the formation displacements chosen by the search algorithm. The travel cost α, was increased from 0.01 to 0.02 and the results are shown for 1 and 5 search platforms. In accordance with intuition, increasing the travel cost results in a shift of the histogram towards smaller displacements. Another interesting observation can be made from Figure 5: the histograms clearly has two peaks. One corresponds to the short movements, while the other peak corresponds to the long "jumps" of 32 or more units in length. This appears to be very similar to the displacement patterns of foraging animals [2,33,34]. They too typically combine the phases of long non-sensing displacement, with short sensing (and reactive) search phases. This strategy is often referred to as intermittent search.

Search Area
Quantiles of Inv. Gaussian Dist.  Figure 6 shows a Q-Q plot with 95% confidence intervals [35], comparing the empirical PDF of the search times (for N = 1 and N = 5 robotic platforms) with a fitted inverse Gaussian distribution. The match between the empirical and the proposed theoretical PDF can be accepted with 95% confidence, if the confidence limits, shown as green lines in Figure 6, do not cross the red dashed line. The empirical search time samples were obtained with the source location fixed at (X 0 , Y 0 ) = (187.5, 187.5) and the initial robot formation centroid at [187.5, −187.5]. As found in our previous work [23], the search time for a single search platform is well modelled by an inverse Gaussian. The model, however, does not hold as strongly for N = 5 searching platforms, especially for shorter search times.
In all Monte Carlo runs of the proposed autonomous multi-robot search, the hazardous source was found and localised with accuracy determined by the termination criterion. In particular, the RMS localisation error was found to roughly correspond to √ = 2.5 a.u.

Experimental Results
The search algorithm was also evaluated on an experimental data set, collected by COANDA Research & Development Corporation using their large recirculating water channel. The source was releasing fluorescein dye at a constant rate from a narrow tube. The data is a sequence of 340 frames of instantaneous concentration field measurements in the vertical plane, sampled every 10/23 s. The size of each frame is 49 × 98 pixels, where each pixel corresponds to a square area of 2.935 × 2.935 mm 2 . The sequence of frames, in the form of a video, is included in the supplementary material; the actual dataset can be obtained from the authors on request.
The frames from this experimental dataset were scaled by a factor of 3 using bicubic scaling and placed in the top left corner of a 500 × 500 search area. The simulated measurements from the previous section were replaced with the rounded integer taken from the closest spatial and temporal sample from the experimental dataset.      Figure 6, and in our previous work [23], the search times for the experimental plume can be accurately modelled by an inverse Gaussian distribution.

Summary
The paper proposed an algorithm for autonomous search for an emitting source of hazardous material transported through the environment by a turbulent flow. The search was designed for a group of robots connected in a network with all-to-all communication.
The source parameter estimation was carried out in the Bayesian framework: the posterior density of source strength, conditioned on the source location, was carried out analytically. The posterior density of source location, on the other hand, was computed numerically, using a Monte Carlo method. The source parameter estimation, carried out in this manner, overcomes the problems encountered in previous implementations, such as the assumption that the source strength is known in using the grid-based estimation, or the depletion of particles, when the particle filter is applied.
The robots are moving in a controllable formation, with control parameters including the linear and angular velocity, the travel time and the scale of formation. The reward function for choosing the robot formation control vector was selected as the entropy reduction, with the built-in travel cost.
Numerical results, using both simulated and real data, demonstrate reliable performance: the success rate in finding the source is 100%, with localisation accuracy determined by the termination criterion. Furthermore, the analysis of the algorithm reveals: (1) a diminishing return on increasing the number of platforms in formation; (2) a linear growth of the mean search time with the search area; (3) an increase in the cost of travel resulting in shorter formation displacements; (4) the search displacements are in accordance with the intermittent search strategy; and (5) the PDF of search time for a single searcher is well-modelled by the inverse Gaussian distribution.
There are many avenues for future work. For example, one could explore distributed, rather than centralised, source parameter estimation and robot control. This could take into account a more realistic communication network with limited bandwidth and time delays. The implicit assumption in the presented work was that the search area is an open field. The search in an area with obstacles and known map would be another complementary future research direction: it would require a different dispersion model, and modifications of the parameter estimation and robot control algorithms. Finally, the search in an area with obstacles and an unknown map would require robots to be equipped with appropriate ranging sensors for localisation and mapping. Carrying out autonomously and simultaneously three functions: search, localisation and mapping, is the ultimate goal of this research. In proving Equations (10) and (11), we will use two properties of gamma distribution: 1. If random variable X ∼ G(υ, ϕ), then for any constant c > 0, the random variable cX ∼ G(υ, cϕ).
2. Suppose the prior distribution of random variable Y is G(υ, ϕ). Let n be a sample from the Poisson distributed likelihood function with parameter Y. Then, the posterior distribution of Y is G(υ + n, ϕ 1+ϕ ).
By repeating the sequence of update steps using multiple platforms N > 1 over time j = 1, 2, . . . , k, it can be shown that (10) and (11) holds for any N and any k.