A multi-target detection and position tracking algorithm based on mmWave-FMCW radar data

Then, a Kalman Filter (KF) is used to track the position of the centroid of each cluster. Finally, a Structured Branching Multiple Hypothesis Testing (SBMHT) algorithm is applied and updated over reasonably short time intervals to decide which detected tracks are supposed to be confirmed and which ones instead should be discarded. The proposed MTT technique was validated experimentally using the data sets collected from a 60-GHz TI IWR6843 radar platform. The reported results show that the developed algorithm, if properly tuned, is faster and returns more accurate results than other MTT techniques. In particular, the percentage of detection errors is negligible and the planar positioning accuracy is within about 30 cm with 90% probability when up to five targets move freely within the same room.


Introduction
The growing interest in indoor tracking and localisation of people and vehicles in indoor environments has recently gained momentum primarily due to the widespread adoption of mobile smart devices, which provide wireless and effortless services to users [1].Several wireless technologies are utilised for indoor agent tracking, with WiFi and Ultra-Wideband (UWB) technologies taking the lion's share [2,3].WiFi technologies have an indisputable advantage: they are deployed everywhere to provide connectivity services.However, when they are used for localisation purposes, accuracy is far from satisfactory [4,5].Conversely, indoor localisation based on UWB signals ensures superior accuracy and enhanced robustness [6], but it requires a specific UWB infrastructure [7,8].Moreover, the need to wear electronic devices with a unique identifier raises understandable privacy concerns.
To address this issue, device-free, memoryless tracking methods have become increasingly popular [9,10].Such techniques are noninvasive, and allow users to move freely without having to be tethered to their identifying devices.
The Millimetre-Wave Frequency Modulated Continuous Wave (mmWave-FMCW) radars have gained popularity over the last few years mainly due to the evolution of antenna arrays [11].Indeed, they can be used not only for indoor positioning, but also for vital signs detection [12].A single mmWave-FMCW radar can potentially track multiple, unknown users at the same time.As a consequence, mmWave-FMCW radars do not suffer from privacy concerns [13].Thus, they could revolutionise target localisation and tracking in indoor environments, even because their detection range is much longer than the range of Ultra-High Radio Frequency Identification (UHF-RFID) https://doi.org/10.1016/j.measurement.2024.114797Received 10 February 2024; Received in revised form 16 April 2024; Accepted 27 April 2024 readers [14][15][16].The data generally returned by a mmWave-FMCW radar are the coordinates of the clouds of points reflected by some target moving in the space in front of the radar's antenna array [17].Unfortunately this points cloud can be quite noisy.In [18], a PointNet auto-encoder neural network is adopted to enhance the points cloud of data collected from a mmWave-FMCW radar.In [19] instead a localisation framework based on the fusion of UWB and mmWave-FMCW radar data is proposed both in outdoor and indoor scenarios.When multiple targets move in the same environment, three additional critical problems must be addressed.First of all, the amount of outliers in the data set is much greater than in the case of a single target.Therefore, removing such outliers is critically important for performance.Secondly, the radar points must be clustered and properly associated to different moving targets.Finally, the time-varying position of such clusters should consistently represent the position of the corresponding targets.Thus, to address the aforementioned issues, a Multi-Target Tracking (MTT) algorithm is needed.These kinds of algorithms are designed to simultaneously track the positions of multiple objects over time in dynamic environments [20].Among the existing solutions, the Structured Branching Multiple Hypothesis Tracking (SBMHT) offers significant computational savings compared to other approaches [21], as it will be explained in Section 2 and shown in Section 6.However, the standard SBMHT approach is not very reliable in small indoor environments.This is why in this paper we propose a solution to improve both multi-target detection reliability and tracking performance accuracy in indoor scenarios.
Therefore, the key elements of novelty of this work are: a better outlier removal scheme before data clustering and a robust multihypotheses tracking approach, which requires just a preliminary tuning of the algorithm parameters in the chosen environment.
The rest of the paper is structured as follows.Section 2 presents some related work in the field of radar-based multi-target indoor positioning.Section 3 introduces the multi target detection problem using radars, while Section 4 describes the theoretical framework and the proposed solution.Section 5 discusses the importance of parameters tuning and explains the main settings of the experimental setup.The detection and tracking performance based on several experimental results are reported in Section 6.Finally, in Section 7 the main conclusions are drawn and the future work is outlined.

Related work on radar-based positioning
To discriminate the valid data from the noise floor, a heuristic threshold is usually applied to the raw mmWave-FMCW signals.The Constant False Alarm Rate (CFAR) detector is the simplest and most common (and yet powerful) filtering technique to decrease the number of clouds points to a manageable level (some further details on CFAR filtering are reported in Section 5.1) [22].Nevertheless, the position points returned by the radar include sparse observations due to reflections that do not belong to any target.The amount of such ''outliers'' or ''false alarms'' in small rooms may be very large [23].The combination of: data sparsity, sporadic missing detection in the case of distant targets, noise and multipath propagation effects, lack of knowledge of the true shape of the clusters of points associated to a given target, make the performance of the traditional outlier detection methods dramatically poor when applied to radar data [24,25].As suggested by Keller et al. [26], efficient methods for analysing these types of clusters often require self-adaptive thresholding mechanisms for labelling non-clustered and clustered data.The main reason is that the optimal thresholds depend on the overall density of the cluster points.So a method that is appropriate for a low number of moving targets could not be suitable when the number of targets grows.
As a consequence, classic parametric clustering methods, like the K-means algorithm [27], the Gaussian Mixture Model (GMM) criterion [28], and the agglomerative hierarchical technique [29] turn to be hardly effective for outliers detection when applied to radar data.On the other hand, supervised cluster analysis and outlier detection techniques for radar tracking have recently attracted increasing attention [30,31].
Many MTT systems based on radars typically rely on Multiple Hypothesis Tracking (MHT) techniques to overcome the aforementioned challenges and to tackle the data association problem [32].Over the past decades, various MHT approaches have been employed for data association, detection, and tracking (see [33,34] for a comprehensive review).In [21] the problem of making multiple data association hypotheses can be considerably mitigated by dividing the entire set of tracks and observations into separate clusters.Furthermore, many other approaches rely on the idea that a big tracking problem can be split into a set of smaller problems to be addressed independently [35].Fontana et al. [36] propose a clustering and merging approach based on a Poisson multi-Bernoulli mixture (PMBM) filter, which looks suitable for multiple target tracking with a large number of agents.The research work by He et al. [37] introduces an innovative multi-sensor, multitarget tracking technique, followed by a fusion stage using clustering and statistical tests to group local tracks into clusters, with a global estimation based on the generalised covariance intersection (GCI) algorithm.Extensive simulations confirm its effectiveness for multi-target tracking.In [36] a data-driven clustering algorithm is adopted to divide the data association problem into sub-problems and to derive the clustered PMBM posterior density via Kullback-Leibler divergence minimisation.Although this method mitigates the computational burden when the number of targets grows, the tracking performance in the presence of occlusions is still lower than using standard Track-Oriented Multiple Hypothesis Tracking (TOMHT) approaches.
One big challenge of the MHT-based approaches is the difficulty to ensure consistency, as it is commonly assumed that multiple tracks in a given hypothesis cannot share the same measurements [38].This means that if one track relied on a detected point to update the position of a target, any other track using the same point would conflict with the first association.This assumption may not hold true when targets are close to each other and when the distance between them is comparable with sensor spatial resolution.This problem is recognised to be particularly tough [39].Most of the existing MHT approaches are not able to provide accurate results in such situations, as the amount of false tracks can be particularly high and dominate most hypotheses.In addition, the family of MHT approaches suffers from another important drawback: the need to keep track of and to test a vast number of hypotheses.This issue persists even when most hypotheses are discarded immediately.In [40] a multi-hypotheses fractional belief propagation (MHFBP) algorithm for radar-based MTT is proposed.This technique effectively addresses computational challenges and outperform the classic MHT, the feature-aided MHT (FA-MHT), and the MHT-belief propagation (MHT-BP) with improved tracking performance and reduced computational burden in diverse scenarios.However, even this method does not offer any effective way to reduce the number of false detected tracks.As correctly explained by Smiti et al. [41], the supervised cluster analysis and outlier detection approaches are more effective in detecting abnormal events when they do not derive from sporadic points that are faraway from the centre of a cluster.Moreover, such solutions are also preferable in the case of small clusters with temporal and spatial local anomalies.
In our research, we noticed that the problem of clustering indoor radar data when multiple targets are present is similar to the complex molecular data analysis based on biological specimens [42].Recently, a robust, fast and accurate outlier rejection method for microscopy data localisation was presented in [43].That method can detect outliers in clusters of measurement data having nonuniform and non-stationary densities.Taking inspiration from this method, in this paper a novel and effective technique to remove outliers from raw radar data is presented.This approach is combined with an efficient method based on the SBMHT algorithm (i.e., a computationally efficient version of MHT) to detect and to track multiple targets [21].As described in the following, this algorithm provides a novel solution in the panorama of indoor positioning techniques, and it is essential to ensure the robust tracking of multiple targets in the same environment.

The multi-target detection problem
Let   be the number of stationary or moving objects in an environment.The state (i.e., the planar position) of each object  at the th time step (with  ∈  = {1, … ,   }) is described by vector  , = [ , ,  , ]  .
Let   = [  1, , … ,     , ]  be the vector including the state of all the objects.Each data frame returned by the mmWave-FMCW radar due to the signal reflections caused by an object with coordinates (  ,   ) includes: Doppler frequency shift, object velocity, distance and bearing angle between the radar antennas array and the object.Thus, distance, bearing angle and Doppler frequency shift measured at the th time step can be modelled as follows: where It is worth emphasising that despite the threshold-based CFAR filtering, the raw radar data may be affected by several outliers, i.e., points due to noise or other random phenomena that are spread quite randomly throughout the field of detection.In the case of multiple targets, not only the valid radar data, but also the amount of outliers tend to grow, thus making data clustering difficult.However, the random events causing such outliers are quite uncorrelated in time, whereas the radar points related to a true target are strongly correlated in both time and space.Thus, if the radar data are accumulated over  consecutive frames, the points associated with existing targets tend to exhibit a good persistence.For this reason, the set of most recent radar data (i.e., those collected in time step intervals  = {−+1, … , }) can return more accurate clustering results, as the radar points associated to existing targets can be distinguished from the outliers, as it will shown in Section 5.2.

Problem formulation
Given the set of measurement data   and assuming that each moving object lies on the same two-dimensional plane, our goal is to track the position of one or more targets over time ∀ = 1, 2, … , i.e., to estimate paths  , = {ŝ , }  =1 ,  = 1, … ,   , while minimising the estimation error ‖ , −ŝ , ‖, ∀ [44].To this end, the measurement data  , must be first detected and correctly associated to a given object  (i.e., for  ∈  , ) [34].Since the MTT problem based on radar data does not require to place any device on the target, associating the radar data to the correct target is very hard [45,46].In fact, this is one of the most important contributions of this paper.The detection problem aims at finding the elements Ŝ of the estimated state that best approximate the actual state   in every measurement instant.To formalise the problem, we can consider a square real-valued matrix   ∶ Ŝ ×  → R × representing the pairwise Euclidean distance between the centroids of Ŝ and   .To make   square, the smaller vector between Ŝ and   is zero-padded to the length  of the longer set.Also, to find the best association, a linear assignment  ∶ Ŝ →   and two index functions  ∶   × {1, … ,   } and δ ∶ Ŝ × {1, … , } must be sought to minimise the cost function Notice that if  =   ≥   , the inverse function  −1 ∶   → Ŝ can be simply used in (2).

Algorithm description
A flowchart describing the proposed algorithm is shown in Fig. 1.A detailed description of the different functions is provided in the following.

Cluster analysis
The first three steps of the proposed MTT algorithm are: background (i.e., obstacle points) removal, outliers removal and targets' data clustering.Generally, two main approaches exist to remove the spurious radar points due to clutter and fixed obstacles in a given environment.In the first group of solutions, the region of interest is extracted from the map.As a result, all the radar points that lie outside the region of interest are regarded as obstacle points and are discarded immediately [47].In the second group of solutions instead, the information about the map of the room (including both obstacles and free space) wherein targets can actually move is used to update the track score.Thus, the localisation process is based on a Markov chain model [48].In this paper, the former approach is adopted.Of course, some radar observations due to static objects may not be removed and could survive the outlier removal process.However, since these points are usually static and do not belong to any moving target, they are F. Shamsfakhr et al. inherently discarded by the multi-hypotheses approach described in Section 4.3.In fact, sooner or later these points are associated to some false hypothesis that is finally rejected.
To address the outliers removal problem, a supervised classification approach is used in this paper [43].Let   be the vector of the  latest radar measurement data collected up to the th time step.Starting from   , arrays of ordered sequences are built by using the Nearest-Neighbour Distance (NND) algorithm to determine whether a measured point obtained from (1) belongs to a cluster or it is an outlier.As mentioned earlier, due to the specific characteristics of radar signals, the target distance from the radar affects the density of points assigned to the target.Thus, the distance of each point from the sensor is added to the feature space of the input data (i.e., r, in ( 1)).Using this input pattern, a standard two-layer Feed-Forward Neural Network (FFNN) classifier with 30 neurons per layer was used to distinguish valid data from outliers.This solution proved to be indeed computationally lighter and more effective than a more complicated Long-Short-Term Memory (LSTM) network, which was instead used in [43] to address a similar outlier detection problem, but in a totally different context and with more observations.The neural network hidden layer uses the Rectified Linear Unit (ReLU) activation function and the output relies on a sigmoid activation function.The choice of the ReLU activation function is motivated by both its effectiveness in promoting sparse representations and its good generality.
The adopted training algorithm is based on the stochastic gradient descent method which is crafted to optimise performance and it is more adaptable than other techniques in the case of binary classification tasks.Of course, the algorithm works if a sufficient number of measured points is available, as discussed later in this paper.Once the outliers are detected and removed from the observations set, the data are then grouped by the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm [49].This algorithm is preferable to alternative clustering methods because of: (i) its simplicity, efficiency and adaptability; (ii) its robustness to noise, and, last but least, (iii) its ability to identify clusters of varying shapes and sizes with no prior knowledge of the number of clusters [50].Thus, for a given   , a set of   clusters   = [   1, , … ,     , ]  is determined at time step , with  , = [  , ,   , ]  being the centre of each cluster.Each cluster contains a subset of refined radar data.

Filtering and prediction
In general, the clusters resulting from the previous step do not just coincide with the actual targets.First of all, the number of estimated clusters   may differ from the actual amount of objects   due to a variety of factors, including radar measurement data quality, radar resolution, the value of   itself, the targets' locations  , (for  = 1, … ,   ) and the distance between them.Moreover, the clustering performance is affected by the multipath propagation of the backscattered signals.
As a result of these issues, the estimated centre of the th cluster tends to deviate from the actual target's position, i.e., the value of ‖ , −  , ‖ differs from zero and often increases.To mitigate the aforementioned problems, it is required to keep track of the set of consistent hypotheses around  , for  = 1, … ,   and ∀, using the estimated trajectories of each target  , .As discussed in Section 1, MHT refers to a group of algorithms that can generate numerous hypotheses on the locations of a target.These hypotheses are evaluated by using the data received over time, with some of them being deemed more plausible than others.In the MHT algorithms, the hypotheses are updated using every new set of data.Therefore, the new hypotheses inherit the prior probabilities from their parent hypotheses.The low-probability hypotheses are gradually removed.However, the same measurement data can be associated to different tracks since the algorithm considers multiple hypotheses and, until a track is terminated, a measurement may be part of multiple hypotheses.If some radar-based measurements are lost or no longer associated with the same target and no dynamic models were used to describe the motion of the moving targets, the evolution of a track would be suddenly lost as well, which is not acceptable.To avoid this problem, a basic dynamic model describing the motion of each target can be used.In this way, even if some measurements are missing, then the evolution of a track can be estimated in open loop.Since trajectory  , of the th target is just a sequence of estimated positions ŝ, , a standard linear Kalman Filtering (KF) can be used to this purpose, i.e., where: • superscript ⋅ − denotes the predicted quantities; •  is the interval between two subsequent radar scans.Without loss of generality,  coincides with the sampling period adopted to discretise the system dynamics.• Since we are assuming that the available measurements at time step  + 1 are the predicted cluster positions ĉ+1 derived from the radar data, the matrix  +1 defines a 2 × 2 linear mapping: for instance, assuming that the cluster matches an actual object in the scene,  +1 is the identity matrix  2 .
Note that index  (referring to the th target) is omitted in (3) to improve readability.As thoroughly investigated in [51] and empirically verified in [52], the Constant Velocity (CV) model (which relies on the average velocity of a target over the last  steps), although simple, can well describe the motion of a human target [53].Thus, it will be employed in this work, as well.The covariance matrix of the prediction estimation errors is then given by where  =  2   2 is the estimation error covariance matrix of the velocities resulting from the average of the last  steps.Denoting the cluster position measured as explained in Section 4.1 with where  +1 is the actual cluster position at time step  + 1 and  +1 represents the white, zero-mean and normally distributed measurement errors with covariance matrix  =  2   2 , the innovation term based on the estimated cluster position ĉ+1 returned by ( 3) is Therefore, the covariance matrix associated with ( 6) is and the Kalman gain is given by Finally, the Kalman filter equations for updating the target location and the estimation error covariance matrix are, as usual, Unlike a standard Kalman filter, the innovation term (6) and the covariance matrix  +1 are employed to compute the Mahalanobis distance which takes into account the difference between the predicted target position and the new measurements.This parameter is then used for efficient gating and target-cluster association, as explained in [52].

Track evaluation and hypotheses formation
After tracks initialisation and target detection gating and branching, a hypothesis testing for each track must be performed.To this end, the SBMHT algorithm calculates a score for each track in the form of a loglikelihood ratio that relies on the prior probability of target existence, the false alarm rate, and the sequence of detected events [21].Let  , ∈  be the th track in the set  for  = 1, … ,   (with   ≤   ) and ℎ , ∈ [0, 1] be the probability that the th track  , is valid at time step .Therefore, tracks confirmation and elimination are performed continuously at specific time intervals by checking the track scores recursively.The score calculation relies on the Bayes' rule [54], i.e.
where Pr is the probability that the sequence of observations assigned to the track  , represents a true target hypothesis.As explained in [39], such a posterior probability can also be expressed as the Bayesian average where In (13), P is the expected probability of detection,    is the false alarm rate,  2 is the normalised Mahalanobis distance defined in (10) and | , | is the determinant of the matrix obtained from (7).Note that the first case in (13) can be approximated by [21] Assuming that the product  , Pr [ ℎ , | ,−1 ⊆  ] is negligible, using the log-likelihood form and following the same steps described in [21,39], (12) can be rewritten recursively as follows, i.e. where and the initial score is given by Hence, the track probabilities are initialised using spatial densities that are the estimated using the probabilities of a new target detection   and a false alarm rate    .The initialisation of such probabilities in (17) will be discussed in the next section.The calculated score is finally used for track confirmation or elimination.

Hypotheses evaluation and pruning
The SBMHT algorithm starts with an empty hypotheses set, which is gradually updated with the detected tracks.The tracks score is sequentially updated and evaluated using (15) and for each time window of length  (referred to as the pruning length in the following), it is compared with a decision threshold   .Therefore, at each time step  and for all the tracks with length ≥ , a track can either be pruned (i.e.,  , ≤   ) or confirmed (i.e.,  , >   ).It should be noted that the confirmation of a track does not indicate that the corresponding track is a valid hypothesis, but only that it is consistent with the previously collected data.Moreover, the scores are reinitialised to check if the associated track is consistent with the collected radar data.On the contrary, every time that a track is pruned, it may be considered as a still valid hypothesis depending on its length.Unfortunately, the values of parameters  and   can just be chosen heuristically.Further details on how these values are set in the specific case study considered in this paper are reported in Section 5.2.

Experimental setup and algorithm settings
In this section, first the experimental setup based on the chosen mmWave-FMCW radar (i.e., the Texas Instruments (TI) 60 GHz IWR6843 device) is described.Then, the criteria to set the various parameters of the proposed MTT algorithm are explained.

Platform, testing environment and data sets
The IWR6843 device is installed on the TI IWR6843AOPEVM USBpowered evaluation module (see Fig. 2).This module is equipped with an advanced Antenna-on-Package (AoP) device.Specifically designed for the 60-64 GHz frequency band, the sensor features a configurable FMCW chirp design.The AoP architecture plays a crucial role in minimising signal losses and simplifying integration, showcasing a technological leap in radar sensing: it includes 4 receiving (RX) and 3 transmitting (TX) antennas, providing a 120 • azimuth and a 120 • elevation field of view (FoV).Based on the available documentation [56], the maximum radar detection range in line-of-sight conditions is about 49 m and no specific environmental requirements are needed.However, at distances greater than about 10 m, the amount of missing and bad data grows drastically, which makes both multiple-target detection and outlier removal much more challenging.The front-end of the IWR6843 device scans the area in front of the antennas with FMCW chirp signals, that are reflected by possible objects and are finally received as echoes.The frequency shift between the transmitted and back-scattered signals (Doppler shift) is proportional to the distance and the relative velocity of the target.A built-in embedded C674x Digital Signal Processor (DSP) is used for preliminary signal processing.By applying a 2D-FFT, the distance from one or more objects and the related Doppler frequency shift can be estimated in the frequency domain.Since these estimates usually are very noisy, at the output of the 2D-FFT a CFAR threshold filter is applied.To this end, the range-Doppler spectral data are partitioned into cells.Each cell is regarded as a potential target location.The cells surrounding the cell under test are split into guard cells and training cells.The former ones are adjacent to the cell under test and are used to estimate the local noise level, while the latter ones are farther away and are used to estimate the statistical features of the noise.The threshold for detection is computed adaptively by using the average signal level in the guard cells.During the experimental campaign, 15 training cells and 4 guard cells, respectively, were used while collecting the radar data.The CFAR threshold factor was instead set between 1.5 and 2. Such values were tuned heuristically to provide a good trade-off between clutter rejection and target detection sensitivity.Finally, a 3D-FFT (i.e., incorporating the azimuth angle dimension in addition to range and Doppler frequency shift) is applied to the filtered data to return the cylindrical coordinates of the cloud of points representing the detected targets.The raw position data are transferred to a PC for further processing at a rate of about 10 Hz.
In the current setup, the TI IWR6843AOPEVM evaluation module was placed on a 2.20 m high post located in the middle of one of the side walls of an empty room monitored by an OptiTrack localisation system equipped with 14 calibrated cameras.The origin of the OptiTrack reference frame was calibrated to make it coincide with the position of the radar post base.The OptiTrack system is able to measure the position of ad-hoc reflective markers placed on moving targets with a standard uncertainty of about 1 mm.The area monitored by both the radar and the Optitrack system was about 50 m 2 wide.According to the data analysis reported in [57], if no spurious target detection occurs, the planar position measurement uncertainty of a single target is about ±35 cm with a 99% confidence level.This value is consistent with the positioning accuracy results obtained in the present multi-target tracking case, as it will be shown in Section 6 (see Fig. 13).
We performed repeated experiments (of duration ranging from 110 s and 270 s) in 6 different scenarios.In each scenario, from 1 up to 5 people move along different paths in the monitored area.Fig. 3(a)-(c) displays three of such experiments involving 2, 4 and 5 people moving simultaneously in the chosen environment.The data sets collected both along these paths and in other 3 scenarios with 1 up to 5 people moving randomly throughout the room (not shown here for the sake of brevity) were used for both algorithm configuration and testing (further details about these aspects are reported in Section 5.2).

Algorithm parameters configuration
After zero-padding either the sets Ŝ or   to the longest sequence (as described in Sections 3 and 4), the FFNN mentioned in Section 4.1 was used for outlier detection.While training the FFNN, the key parameters of the proposed algorithm were properly tuned heuristically to maximise outlier detection and to minimise the false target detection rate.Out of the 6 available data sets, those collected with 1, 3 and 5 people moving in the room (18023 observations) were used for training, whereas those collected with 2 and 4 people, as well as the remaining data collected with 5 targets (16388 observations in total) were used for testing.The data sets samples were tagged by the human operator as true or as outliers using the Optitrack system as a reference.Indeed, due to the preliminary CFAR filtering, the number of outliers is limited.
The differences of the Nearest Neighbours Distances (NNDs) between nearby radar points, as well as the amount of nearest neighbours with a small NND value, determine whether a given point belongs to the cluster representing a true target or it is an outlier.Fig. 4 shows two examples of radar position data and the corresponding sorted NND sequences.As can be seen from the Fig. 4(a)-(b), a large number of points with small NND values and a small difference between them (e.g., see the  from 1 to 12 where the maximum difference is 7 cm) is likely to denote a true target.On the contrary, the few NNDs with small values shown in Fig. 4(c)-(d), highlight a critical situation, i.e., a possible outlier, since the considered radar point has only one close neighbour and a large number of distant neighbours.As mentioned earlier, since the spatial density of clusters and outliers depends also on the distance from the sensor, using a neural network can be very effective to detect the outliers in ambiguous and complex indoor scenarios.Thus, one pivotal parameter to be tuned in the proposed framework is the NND radius .While larger values of  provide more information about the surroundings of each track, the increased neighbourhood size complicates the input pattern and makes FFNN training more challenging due to the dimensionality increase of the chosen features.determine an optimal value of  while mitigating the variability due to the limited number of outliers in the considered data set, the FFNN was trained using the Levenberg-Marquardt algorithm multiple times (i.e., about 30 iterations) for increasing values of .Being theoretically impossible to verify the optimality of , a heuristic criterion based on the computation of the true positive rate in outlier classification is adopted.Fig. 5 shows the rate of outliers that are correctly detected in the testing phase as a function of the percentile of FFNN trainings for increasing values of .It can be noticed that both a radius greater than 1.5 m and smaller than 1 m significantly reduce the probability to detect outliers correctly, henceforth  was finally set to 1.5 m.Recalling that, as discussed in Section 3, a single frame is not long enough to accumulate a sufficient amount of data for outlier detection, the optimal value of the window size  was also determined heuristically by training the FFNN model with 10 different subsets of experimental data while increasing the window length.The box-and-whiskers plot of the true positive rate as a function of the window length is shown in Fig. 6.Note that for  ∈ [5,7] the best performances are obtained.Based on this results,  = 6 consecutive 100-ms-long frames are used for clustering and outlier detection in the rest of this paper.The performance of the FFNN for outlier detection are finally summarised by the confusion matrices shown in Fig. 7(a)-(b) for the training and testing phase, respectively.Approximately, 90% accuracy in detecting the outliers was reached, a number that was deemed effective for the employed FFNN.
Once the outliers are detected, the data are clustered using the DBSCAN algorithm (see Section 4.1).Fig. 8  through DBSCAN, the neighbourhood radius  ⋆ (which plays a similar role as  for outlier detection) and the density threshold  ⋆ (i.e., the minimum number of radar points within the neighbourhood radius  ⋆ to identify a point as the cluster centre [58]) must be set.Fig. 8(a)-(c) reports the accuracy of DBSCAN clustering on the training data sets for different choices of the corresponding parameters.The results clearly show that the best performances are obtained for  ⋆ = 0.5 m and  ⋆ = 3.Further results, omitted the sake of brevity, reveal that if the outliers are not removed before applying DBSCAN algorithm, only the clustering accuracy drops by more than 20%, but also larger values of both  ⋆ and  ⋆ should be used, which negatively affect both sensitivity and selectivity of the clustering algorithm.
In Kalman Filter described in Section 4.2, two key parameters to be tuned are   and   , i.e., the process and measurement standard uncertainty values.The former parameter expresses the uncertainty associated with  , and  , in (3) and it is used to build the covariance matrix  in (4).The standard deviation   instead refers to the measurement uncertainty of the coordinates of the centroids of the clusters associated with each detected target and it is used to build the covariance matrix  in (7).To compute the values of   and   , the entire algorithm is repeatedly applied to three data sets, while sweeping the values of   and   in the intervals [0, 8] m∕s and [0, 4] m, respectively.For each pair of values of   and   , the average absolute error in estimating the true number of targets   and the average detection error   were estimated over  = 3600 time steps (considering all three data sets) for 400 pairs of   and   values: where   and n are the numbers of ground truth and detected targets respectively for the th sample, while | is the norm of the difference between the ground-truth joint state vector   and Ŝ as it results from (2).As shown by the error surfaces of   and   in Fig. 9(a)-(b), both functions reach a minimum value for   = 1.74 m and   = 8 m∕s.Moreover, the minimum of   is global (see Fig. 9(a)).Note that the value of   is much greater than the standard deviation of the positioning error of a single radar measurement (i.e., about 12 cm).However, this setting is needed to keep into account the detrimental effect of multiple targets, as confirmed by the simulation results shown in Fig. 9(a).
The last set of parameters to tune includes both those of the basic SBMHT method described in Section 4.3 and those that should be used for hypotheses evaluation and pruning (see Section 4.4).In this regard, the probability P in (13) was set to 0.8 by observing the number of clusters correctly assigned to the targets, while the values of parameters   = 0.9 and    = 0.1 (which are needed in ( 13) and ( 17)) were directly derived from the results of DBSCAN clustering.Finally, the threshold values to prune the tracks with a low score while keeping those with high scores (namely, the values of parameters  and   Fig. 8. DBSCAN clustering accuracy after outlier detection.Three data sets per experiment are used in either case, i.e. while tracking 2 (a), 4 (b) or 5 (c) targets.The best values of parameters  ⋆ (radius) and  ⋆ (density threshold) are also highlighted.defined in Section 4.4 for hypotheses evaluation and pruning) were found heuristically in two steps, since to the best of Authors' knowledge no analytical criteria exist to set such parameters.First, the number of frames  was gradually incremented to find the best trade-off between the detection rate of true tracks (which should be as high as possible) and the false hypotheses detection rate (which instead should be as low as possible).While in [21] the value of  ranges between 7 and 9, we found that  = 11 is the best value for the considered training data set.Afterwards, the value of   was tuned iteratively, till when the best detection performance is achieved for the chosen  value.In this case   = −6.
Fig. 10 shows the track score of three targets tracked by the SBMHT algorithm as a function of time, for two different values of threshold   .As can be seen from Fig. 10, three valid hypotheses are identified.The scores of the confirmed tracks are reinitialised at the beginning of every time window consisting of  = 11 steps.Note that, in the current example, the log-likelihood function of hypotheses ℎ 2, and ℎ 3, drops due to some bad measurement data which affect the update step of the Kalman filter in (9).As a consequence, the filter just relies on the prediction step (see the arrow labelled with ''open loop'' in Fig. 10).Observe that in Fig. 10(a), the hypothesis ℎ 3, is correctly kept for    = −6, while in Fig. 10(b) (that was obtained with a higher threshold, i.e.,   = −3) hypothesis ℎ 3, is first pruned and then reinstantiated as hypothesis ℎ 4, .This example shows the importance of tuning the key parameters of the SBMHT algorithm.
The list of parameters of the overall MTT algorithm and the chosen values in the considered case study are finally summarised in Table 1.Among them, the most critical for the algorithm performance are: the NDD radius  and the hypotheses pruning parameters  and   .

Experimental results
In the following, we report an experimental characterisation of the proposed approach as well as a comparison with the results obtained with other MHT techniques, i.e., the Global Nearest Neighbour (GNN) multi-object tracker [59], the Joint Integrated Probabilistic Data Association (JIPDA) algorithm [60], and the TOMHT technique [61].The analysis is based on three of the data sets described in Section 5.1, i.e., those with two, four, and five people, respectively.Crucial to this comparison is the use of the Multi-Object Trackers Toolbox in Matlab, which includes several built-in algorithms.As shown in [62], there are around 50 parameters to be set for the above-mentioned trackers in the Toolbox.So a full listing of these parameters cannot be reported for a matter of space.Nevertheless, the most important parameters of the benchmark algorithms used for comparison as well as the corresponding values are summarised in Table 2.Such parameters were empirically tuned to achieve the best possible tracking performance with the considered data sets.In all cases, the same KF based on the input constant velocity model was used to describe targets' motion to ensure a fair comparison [51].Fig. 11 shows a meaningful example, i.e., the number of targets detected by different MHT techniques over time in the most challenging scenario, namely when 5 people move randomly [see Fig. 3(c)].The fluctuating behaviour of different techniques depends on their sensitivity in accepting or rejecting current and new hypotheses.In particular, a rise in a pattern denotes the acceptance of a given hypothesis (i.e., a possible new target), whereas a decrease in the pattern means the rejection of the hypothesis (i.e., the target is no longer within the sensor range) Using Fig. 3(c) as a reference, the results in Fig. 11 vividly confirm the resilience of the proposed solution in the case of challenging tracking scenarios.Indeed, the fluctuating patterns obtained with the other algorithms highlight their vulnerability in complex situations, especially when the paths of different targets cross each other.On the contrary, the proposed approach exhibits stability in target detection, paving the way to consistent and accurate tracking.Fig. 12 provides a deeper insight about the detection performance of the considered MHT techniques, as it shows the box-and-whiskers plots of the absolute detection errors over the three data sets used for testing, i.e., with 2, 4 and 5 people moving in the room at the same time.Observe that both the JIPDA and the GNN technique exhibit visible performance limits in detecting the true number of targets, while the detection errors achieved with both the TOMHT technique and the proposed approach are almost negligible, although the TOMHT results are still affected by several outliers.This is confirmed by the mean value of the distribution of the absolute position errors (shown just  below the boxes) that, in the TOMHT case, is about 60 times greater than when the proposed approach is used.Fig. 13 shows the empirical cumulative distribution functions (CDF) of the magnitude of the position error vector computed over three different kinds of users' trajectories.Observe that the 90th-percentile of the position error is 31 cm.This is an excellent result when up to five people move simultaneously in the same room.Further results (not reported for the sake of brevity) confirm that the positioning accuracy of our approach is much better than the accuracy of the GNN method and it is comparable on average to the TOMHT and the JIPDA algorithms, which nonetheless are affected by more outliers maybe due to the larger multi-target detection uncertainty.Also, quite importantly, the fact that CDF curves in Fig. 13 are almost overlapped suggest that the performance of the proposed algorithm, once its parameters are properly tuned, is rather robust to the increasing number of targets.
As a final comparison, the box-and-whiskers plots of the computation times of the different algorithms are reported in Fig. 14.A PC equipped with an Intel Core i7 CPU running at 2.90 GHz, 32 GB of RAM and Windows 11 as operating system was used to run all tests.The better tracking performance of the TOMHT with respect to the GNN technique is achieved at the price of a much longer processing time.When the JIPDA approach is used instead, a single detection can update multiple tracks at the same time.However, the computational burden reduction is limited by the fact no multiple hypotheses can be kept over multiple scans.As shown in Fig. 14, the proposed algorithm exhibits the smallest mean execution time (i.e., about 100 ms), as well as the lowest variability (i.e., about ±40 ms).The lower computational burden of the SBMHT approach underlying our method is due to the fact that it tends to forget some hypothesised tracks and it does not consider all the hypotheses in every data association (which is preferable whenever the risk of false alarms is rather high).Moreover, the adopted methodology to check and to prune the branches associated with the various hypotheses sequentially is particularly effective.Considering that the reported execution times of all algorithms were obtained in MATLAB using a single core of an Intel Core i7 microprocessor, considerable performance improvements on a cheaper embedded platform could be achieved if the algorithm were implemented using a lower-level programming language (e.g., in C/C++).In this case, the real-time processing of streams of radar data refreshed every 100 ms is definitely feasible.

Conclusions and future work
This paper presents an algorithm to detect and to track multiple targets in indoor environments using the data collected from a mmWave-FMCW radar.The key and most important feature of the proposed solution is its high accuracy in detecting and tracking multiple targets.Detection accuracy and robustness in complex scenarios (i.e., when multiple targets move simultaneously in the same room) are indeed superior to those of other state-of-the-art algorithms.This important result is achieved through a more effective clustering approach (able to discard a large amount of outliers) and by a final Structured Branching Multiple Hypothesis Testing (SBMHT) algorithm, which keeps the risks of both tracking inexistent targets and suddenly losing the existing ones extremely low, i.e., well below 1%.On the contrary, when other MHT techniques are used, target detection errors of about ±1 units are rather likely.As a consequence, the targets' planar positioning uncertainty may be greatly affected by such detection errors as well.Instead, the planar positioning accuracy of the proposed solution using a 60-GHz TI IWR6843 device as a radar platform and a calibrated Optitrack vision system as reference is within about 30 cm with 90% probability when up to five targets move across the same room.Moreover, the proposed approach is faster and more deterministic than other existing MHT algorithms.This is indeed a relevant achievement in view of a possible embedded implementation, which is indeed the next goal, possibly supported by some companies interested in the practical applications of a fully integrated device.Future research directions will also explore the possibility to include the data collected from other sensors (e.g., RFID passive tags or UWB signals) to perform not only target detection, but also identification.In this way, the risk of tracking a wrong target may be further decreased.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Overview of the algorithm for multiple targets detection and tracking.

Fig. 3 .
Fig. 3. Three different tracking examples with (a) two (b) four and (c) five people moving simultaneously in the room.The starting and ending points of each path are labelled with letters  and , respectively.The radar is located in the origin of the Cartesian plane.

Fig. 4 .
Fig. 4. Qualitative examples or radar scans (on the left side) and corresponding neighbourhood patterns (on the right side) within circles of radius .The results in (a)-(b) refer to a true cluster observation, whereas those in (c)-(d) refer to the case when an outlier is considered.

Fig. 5 .
Fig. 5. Empirical true positive outlier detection rate curves for different values of the NND radius  as a function of the percentile of FFNN training data.

Fig. 6 .
Fig.6.Box-and-whiskers plot of the true positive outlier detection rate for increasing values of the window length .The region of interest for the optimal choice of  is highlighted in grey.
(a)-(c) shows the DBSCAN clustering results for the three training data sets with 2, 4 and 5 targets (16388 observations in total) after outlier detection.To cluster the data

Fig. 9 .
Fig. 9. Average absolute target detection error (a) and average target position error as a function of increasing values of   and   .

Fig. 10 .
Fig. 10.Two alternative example of track scores when three targets are tracked by the SBMHT algorithm.The patterns in (a) show how a properly chosen threshold supports a true hypothesis formation, whereas the results in (b) confirm that excessive threshold values may increase the number of wrong hypotheses.

Fig. 12 .
Fig. 12. Box-and-whiskers plots of the absolute detection errors over multiple experiments using different MHT techniques.

Fig. 13 .
Fig. 13.Empirical cumulative distribution functions of the magnitude of the position error vector computed over three different kinds of users' trajectories.

Fig. 14 .
Fig. 14.Box-and-whiskers plots of the execution times of different MHT algorithms.
, ,  , and  , are the uncertainty contributions associated with range, bearing and frequency shift measurements, respectively.Depending on a variety of factors, such as radar settings (e.g., frequency, bandwidth and rate of change of frequency of chirp signals), Doppler effect resolution and CFAR threshold, the amount of data frames associated to the detected targets may differ from the number of objects that are actually present in the room and it may also change with time.Hence, considering that there are   objects in the scene that may or may not be detected by the radar and that multiple measurements may refer to the same object, the radar measurement data in (1) can be labelled with an additional index , i.e.  , , with  ∈   = {1, … ,   }, with   being the number of target measurements.For instance, in a room with   = 2 objects, at the th time step we may have   = 10 measurements: 7 of them refer to first target, while the other 3 refer to the second one.Therefore, the joint measurements vector   = [1, ,   2, , … ,     , ]  can be defined at time step .Since each object  ∈  may generate an arbitrary number of noisy measurements, the set  , ⊆   includes the indices  ∈  , of the measurements  , pertaining to object  only.In the following, a single measurement result  , shall be attributed to one object only.

Table 1
List of parameters of the proposed MTT algorithm.Number of targets detected by different MHT techniques as a function of time when 5 people move randomly in the room.

Table 2
List of the most important parameters of other MTT algorithms implemented in the Multi-Object Trackers MATLAB Toolbox.
a CVEKF = Constant Velocity Extended Kalman Filter.