Radio SLAM: A Review on Radio-Based Simultaneous Localization and Mapping

Simultaneous localization and mapping (SLAM) algorithm has enabled the automation of mobile robots in unknown environments. It enables the robot to navigate through an unknown trajectory by employing sensors that provide measurements to infer the surrounding environment and use this information to localize the robot. Sensor technology plays a key role in defining the quality of measurements as it affects the overall performance of SLAM. While visual sensors, like cameras, can capture rich features of the environment, they, however, fail to work in low-light conditions. On the other hand, radio frequency sensors are invariant to light conditions, however, high-frequency signals such as millimeter wave (mm-wave) are prone to severe channel attenuation, therefore, they are suitable for short-range indoor applications. Despite the high localization accuracy that the mm-wave frequency band has to offer, its shortcomings have limited the amount of research work carried out to enhance the performance of SLAM. Therefore, this paper aims to provide an overview of the recent developments in radio SLAM, with a specific focus on mm-wave enabled localization and SLAM methods. However, some notable research work based on other radio frequency sensors has also been discussed. In addition, we highlight the role of deep learning-based methods for localization and identify some of the key challenges in data-driven implementation.


I. INTRODUCTION
The key idea of simultaneous localization and mapping (SLAM) is to automate the sensing and navigation system of a robot in an unknown environment. With the help of the SLAM algorithm, a robot can reconstruct its surrounding map and localize itself within that map. SLAM finds its applications in almost every field where automation is required including automated robots for industry and transports, self driving cars, indoor positioning systems, space exploration, and unmanned aerial vehicles (UAVs). Owing to the rising demand for SLAM systems, its market value is expected to The associate editor coordinating the review of this manuscript and approving it for publication was Chen Chen . increase to around $3 billion in a decade [1], thus, increasing the need to develop efficient technology for the optimal performance of SLAM.
The approach to implementing SLAM relies on the technology of sensors employed for acquiring measurements and the type of data generated by those sensors. For instance, visual SLAM exploits a camera as the primary sensor for collecting information about the environment in the form of images. Visual SLAM has gained much popularity over the last decade due to its ability to capture and create high dimensional feature-based maps. The resolution of visual SLAM has been enhanced using various vision sensors such as monocular, depth, and stereo cameras [2]. With the recent advancements in data-driven methods, the potentials of neural FIGURE 1. Comparison of sensors, methods, applications, and some pros and cons involved in radio and visual SLAM. network (NN) based algorithms have further fuelled the development and research in visual SLAM. However, one of the frequent pitfalls of visual SLAM is its inability to operate at night and in low-light conditions. Besides, visual SLAM is not a suitable approach for privacy-sensitive applications.
In contrast to visual sensors, the other most prevalent sensor technology operates on electromagnetic waves. As the electromagnetic spectrum spans different frequency bands, there are separate sensors available that operate in the corresponding frequencies. Light detection and ranging (LiDAR) and ultra-wideband (UWB) sensors are well-established technologies to implement SLAM. Other widely used sensor technologies include ultrasonic, Bluetooth, wireless fidelity (WiFi), and radio frequency identification (RFID). Using radio signal-based sensors to implement SLAM is generally referred to as radio SLAM. Not only do radio signals remain unaffected by the light conditions in an environment, but they are also deemed appropriate to ensure privacy. Therefore, the potentials of radio signals seem to be promising to overcome the aforementioned challenges of visual SLAM. Fig. 1 highlights the differences between radio and visual SLAM in terms of sensors and applications and lists some of the pros and cons of each approach. It is also worthwhile to note a key difference in the sensor setup of visual and radio SLAM. Visual SLAM involves a classical sensor setup in which the sensor(s) i.e., the camera is deployed only on the moving robot. On the other hand, since a radio sensor comprises two operating modules i.e., a transmitter and a receiver, there can be two configurations of the sensor setup in radio SLAM depending on the requirement of the application. In a classical setup, like visual SLAM, both the transmitter and receiver are deployed on the agent, whereas in a non-classical setup, the agent is equipped with a receiver while the BS(s) in the environment acts as the transmitter or vice versa. The positions of the BS(s) can be known or unknown depending on the application.
In radio SLAM, the term 'robot' can be referred as an agent, mobile user, user equipment (UE), or tag, depending on the application, and it is depicted by a mobile in the figures of this paper. On the other hand, the term 'landmark' can be attributed to an object, obstacle, node, anchor node, or physical anchor node (PAN) and it is represented by a base station (BS) in the figures of this paper. We will be using these terms interchangeably in the rest of the paper.
The mm-wave band offers a huge available bandwidth, which is beneficial in positioning applications for its increased resolution capability to distinguish closely spaced objects. However, it is not suitable to be employed in long range applications as it faces severe attenuation due to the propagation environment [3], [4]. Therefore, mm-wave signals remain a suitable choice for indoor applications. A pioneering paper on the implementation of mm-wave SLAM was presented in 2001 [5]. In this paper, the authors proved the solution to the SLAM problem by showing that the estimated map can converge to an accurate map given enough series of observations. This has been demonstrated in an experiment where the location of a moving vehicle and surrounding landmarks were estimated using a real mm-wave radar. The mm-wave radar operated at 77GHz frequency, and it was mounted on the vehicle to capture measurements of the outdoor environment. The range and bearing of the landmarks were determined using the FMCW radar technology and acted as state inputs to the extended Kalman filter (EKF) algorithm. The EKF is then used to track the locations of both the vehicle and the landmarks. In addition, the authors also provided a data association mechanism with which the non-stationary detected landmarks were discarded by analyzing the range and bearing of each landmark. The resulting VOLUME 11, 2023 position estimation error was about 0.2m for both the landmarks and the vehicle. However, this study only considered stationary landmarks in line-of-sight (LoS) environment. The effect of multipath propagation and resulting errors in the state estimates need to be accounted for in more practical scenarios and applications. A study in [6] implemented mmwave based localization of clients in the presence of multiple access points (APs). For this purpose, the authors exploited two-range free methods, namely triangulate-validate (TV) and angle difference of arrival (ADoA) methods, for estimating the position of the client. Both LoS and non-line-of-sight (NLoS) signals have been incorporated in the experiments. In the case of NLoS signals, the true location of an obstacle has been computed by exploiting the use of virtual anchor nodes (VANs). Their results reported less than 2m of error in localization and ±15 • of error in the AoA estimates. This study has been validated on simulations as well as on real data in an indoor environment. However, as a caveat, this study assumes prior knowledge of the locations of APs, floor plan, and the obstacles in the environment. A study in [7] not only implemented localization but also proposed a method for identifying the shape of the landmarks. In this study, the AoA estimates were used to evaluate the positions of the receiver and landmarks in the case of a known environment. For the unknown environment, the estimates of the time difference of arrival (TDoA) and received signal strength (RSS) were used to solve the localization problem. Their results achieve a sub cm accuracy of around 0.075m under the case of an unknown environment. While this study also estimates the dimensions of the obstacles, however, it only considers stationary obstacles in the environment. The work in [8] leveraged the built-in structure of a mobile network to render the implementation of SLAM. A device localization accuracy within sub-cm was achieved using no prior information about the environment and network APs, also known as physical anchors (PAs). The ADoA method was reformulated to enhance the performance of localization. Their proposed approach also resulted in reduced computational complexity. For mapping, the estimate of the anchor's location was exploited to predict the static objects around the device. Experiments have been validated on both simulations and hardware. However, this study only considered static obstacles in the environment and performed mapping of the wall only even in the presence of other obstacles.
In recent developments, researchers have exploited the mobile architecture to implement a personal radar for enabling SLAM. For instance, the studies in [9] and [10] presented the idea of a mobile-based mm-wave radar platform that can map the surrounding environment without the need for the additional hardware equipment. A study in [10] evaluated the effects of different parameters, like signal bandwidth and the number of antennas, on the mapping performance of the personal radar.
UWB sensors have been widely accepted for indoor positioning applications due to their higher bandwidth and low manufacturing cost. UWB has been employed in various applications ranging from wireless node detection to pedestrians' motion tracking and robot localization. In numerous applications, localization of a tag is achieved with the help of multiple anchors having known positions [140], [143]. However, a study in [11] proposed an automatic localization methodology that not only localizes the tag but also computes the positional estimates of anchors using an error-state EKF. In addition, many UWB based positioning algorithms use the range-only approach to solve the SLAM problem. As the name suggests, the range-only method only exploits the range information to implement localization. Different research papers have employed the range-only method for performing robot localization [12], [13] and for estimating the positions of agent and anchor nodes [14].
A hybrid approach, known as 'sensor fusion', combines the potentials of multiple types of sensors to increase the performance of SLAM. Combinations of radio and/or visual sensors have been extensively used in sensor fusion based SLAM. While UWB and LiDAR have been extensively used in sensor fusion based SLAM, they both suffer from the geometrical irregularities of the environment which reduces the accuracy of localization and mapping. In [15], an optimization algorithm was proposed to mitigate the geometric degeneracy issue for UWB and LiDAR based SLAM. The study in [16] proposed an optimization method for multi robot-based SLAM. UWB and LiDAR sensors were deployed on multiple robots for fast navigation and mapping of the environment Apart from LiDAR, cameras are also widely employed alongside UWB sensors. The study given in [17] presented a comparative analysis of the performance of SLAM between UWB sensors and sensor fusion with the time of flight (ToF) camera. ToF camera helped in identifying the anchors for accelerating the navigation process of the robot.
Several surveys over the past decade have highlighted the advancements in SLAM technology. The surveys have been categorized according to the techniques and applications. A comprehensive review in [19] focused on the evolution of SLAM methods for robotics and discussed various components of SLAM and their associated challenges. A recent survey [20] explored machine learning (ML) based techniques for the development of visual SLAM and provided a detailed explanation of the relationships between several input-output components of SLAM. The term 'spatial machine intelligence system' introduced in that paper refers to the same idea as that of SLAM i.e., to enable robotic automation, wherein, with the help of precise measurements, a robot can successfully maintain its localization while gradually updating its surrounding map. In addition, the work in [21] focused not only on the recent data-driven developments but also covered the mathematical modelling and characterization of the theoretical bounds for achieving robust performance in SLAM. Apart from visual SLAM, several reviews have been developed covering different sensor technologies including RFID [22], [23], LiDAR [24], [25], and more recently on using terahertz (THz) band [141], [142]. In contrast to previous surveys, this survey attempts to highlight the developments and challenges in the field of radio SLAM, with a specific focus on mm-wave and UWB sensor technology. The rest of the paper is organized as follows. Section II provides a brief introduction to the basic concept of SLAM. Section III describes the basic structure for implementing radio SLAM along with its applications. Section IV highlights the techniques for parameter estimation and provides an overview of the existing literature, whereas the methods for predicting parameters are further discussed in Section V. Section VI describes suitable mapping techniques for radio SLAM. Section VII identifies the available datasets and hardware resources that can be used to perform experiments on the implementation of radio SLAM. Finally, Section VIII highlights the current challenges and future directions in radio SLAM.

II. SLAM: FUNDAMENTALS
The SLAM objective is designed for a robot to make inferences about its surroundings and identify its location within that environment. The robot is thus capable of exploring unknown territory on its own. In 1986 [28], probabilistic methods were first employed to solve the positioning and mapping of robots. However, the word 'SLAM' was first coined by Durrant-Whyte at the Seventh International Symposium of Robotics Research in 1995 [28]. After that, SLAM was implemented in various applications including indoor, outdoor, and undersea environments [24], [29], [30], [31].
There are several resources available to understand the fundamentals of SLAM. Tutorials by Durrant Whyte [18], [32] and Sebastian Thrun [33] present a comprehensive overview of the basic concepts of SLAM. In addition, a book on probabilistic robotics [34] serves as a critical and valuable resource for grasping the underlying mathematics behind different operating blocks in SLAM.

A. MATHEMATICAL FORMULATION
It is crucial to formulate the problem mathematically to understand SLAM. The main goal is to infer the surroundings around a robot and determine its location on a map. Since both the map and location of the robot are unknown, the robot requires onboard sensors to collect and process measurements from its surroundings and estimate the navigation parameters. An illustration of the navigation process is shown in Fig. 2. Let us denote the sensor measurement taken at time t as z t , then the objective of SLAM is to compute the following joint conditional probability: where, x t and m t represents the estimated locations of the robot and landmarks, respectively. So, given the measurements z t , SLAM tends to find the probability of a robot being at location x t and surrounded by landmarks at positions m t . In addition to the measurements, the robot also has motion information, known as odometry. Odometry describes the motion of a robot between the previous and the current location point. This helps the robot decide where to move next given the information from its previous location [33]. Let us now denote the odometry information obtained at time t as u t . Then, (1) can be modified as: There are different methods by which the implementation of SLAM can be categorized, and it mainly depends on the target application and its specific requirements. Some important categories of SLAM are described below.

1) ONLINE AND OFFLINE SLAM
The concept of two approaches for processing the measurements in SLAM, namely, online and offline, were introduced in [33]. In online SLAM, the locations of the robot and the map are updated based on the current acquired measurements.
Expression (2) is the mathematical representation of the online SLAM problem. Alternatively, in an offline SLAM, the robot first gathers the measurements along the path it traverses, then processes all the acquired data to reconstruct the map and the followed path. For offline SLAM, the vectors z and u comprise of sensor and odometry measurements, respectively, which have been recorded at all time intervals during the robot's motion. These measurements are used to estimate the locations of the robot and landmarks for the entire trajectory of the robot's motion. This can be mathematically expressed as follows: where x and m represent the estimated locations and map of the path traversed by the robot.

2) ACTIVE AND PASSIVE SLAM
In most applications, either a passive or an active approach is followed to solve the SLAM problem. In the passive method, the robot requires information from other active agents to explore unknown surroundings. On the other hand, an active SLAM based robot aims to monitor the environment itself with the help of sensors embedded in the robot and it accomplishes its task without the need for other active agents in that environment [33].

III. STRUCTURE OF RADIO SLAM
Implementing radio SLAM involves two main steps: 1) obtaining measurements from the received signal, also known as the multipath propagation components (MPCs), to estimate the states of the robot and the surrounding landmarks. 2) computing the posterior probability density function (PDF) of the state variables for tracking the location of the robot and landmarks. For the first step, the MPC parameters typically include AoA, angle-of-departure (AoD), TDoA, amplitude, delays, radar cross section (RCS), and received signal strength indicator (RSSI) of the received signal. Either one or a combination of these parameters can be used to compute position estimates of the robot and landmarks. In addition, the signal received may not always be in LoS. In other words, as the wireless signal propagates through the channel, it encounters reflections from various objects in the environment, also known as the NLoS environment. Therefore, the resulting received signal contains the sum of the delayed multipath signals, and it affects the accuracy of estimated MPC parameters. Thus, it is important to rectify errors in the MPCs to have a better estimate of the state of the robot and the environment's features. There are various statistical techniques for estimating the MPC parameters. Multiple signal classification (MUSIC) [35] is the most popular and widely used method for AoA estimation. Other popular conventional techniques include ESPIRIT and maximum likelihood estimation (MLE). Moreover, there exist many super-resolution radio-channel parameter estimation methods that provide MPC parameters with high quality for example [36], [37], [38], [39], [40], [41]. Some of the widely used AoA estimation techniques are discussed in detail in Section IV. While computing the MPC parameters may seem sufficient to get an understanding of the environment and robot's location, the future states of the robot and environment's features still need to be predicted as they are required as inputs to the odometery unit to decide on the next appropriate motion step of the robot (depending on the application). Furthermore, since the received signal may contain noise due to the channel impairments, the MPC parameters are not entirely accurate. To mitigate these issues, several filters have been devised that can predict the state estimates from noisy MPC measurements. For instance, EKF [42] and particle filters (PFs) [43] are the most commonly used filters for inferring the posterior PDF of the state variables. These are discussed in detail in Section V. Fig. 3 shows the main steps, algorithms, and associated challenges involved in implementing radio SLAM.

A. VIRTUAL ANCHOR NODE
VANs play a decisive role in the localization of the UE as well as the PAN, also referred to as the access node. There exists a LoS communication when there is no obstacle between the UE and PAN. In such scenarios, computing the range of PAN is straightforward as there is no multipath propagation delay in the received signal. On the other hand, in the NLoS scenario, the signal is scattered and reflected through many obstacles before reaching the destination node. Moreover, this received signal is composed of propagation and multipath delays. The multipath delays arise from various reflective surfaces and result in multiple copies of the same transmitted signal at the receiver end. Therefore, computing the range of such nodes becomes challenging. To overcome this challenge, VANs can be employed. As the name suggests, VAN is a virtual node that neither exists physically and nor serves any wireless function between any nodes. However, it is considered to be virtually present at a mirror position of the PAN and behind the LoS obstacle [44]. The reflected signal is considered as a virtual LoS (VLoS) signal coming from the VAN. A simple illustration of VAN is shown in Fig. 4, where s LoS , s NLoS , and s VLoS are LoS, NLoS, and VLoS signals, respectively, between a UE and a PAN. In addition, each VAN is associated with each reflection coming off from an obstacle. So, if the surface of an obstacle gives rise to multiple reflections, then there will be as many VANs associated with each reflection. The true location of PAN can be computed by mirroring the position of the VAN with respect to the obstacle [6].
In radio SLAM, the problem of NLoS signals is mitigated using VANs. For instance, in the implementation of node localization using multiple APs in [6], the locations of VANs were computed using mirror plane estimation whereas the TV algorithm was employed for node position estimation. Similarly, in [7], VANs were employed to localize the receiver with high accuracy. The locations of the receiver and detected surrounding objects were computed by estimating the locations of VANs using trilateration, TDoA, and RSS.

B. APPLICATIONS OF RADIO SLAM
The applications of SLAM in general span from autonomous vehicles, such as self-driving cars and robots, to exploring and navigating places where human reach is challenging and many indoor applications. However, it is pertinent to mention that most applications of radio SLAM are targeted towards achieving localization in wireless networks. For instance, the SLAM technique has been widely used for localizing a mobile user in the presence of multiple BSs. A similar approach is used for the positioning and navigation of robots using multiple anchor nodes. More recently, the concept of personal mobile radar is emerging. This concept aims at exploiting the built-in architecture and antenna array of a mobile for performing localization and mapping. Table 1 lists some of the relevant papers implemented for different applications in radio SLAM.

IV. ANGLE OF ARRIVAL ESTIMATION METHODS
As described in Section III, the first step is to compute the MPCs such as range, AoA, and TDoA. AoA is the widely used MPC parameter for localization as it is robust against noise. The effect of several MPCs such as RSS, ToA, and AoA, on the performance of localization has been studied for mm-wave signals [48]. In this study, the use of the AoA approach has been shown to significantly improve localization performance. As the name suggests, AoA refers to the direction of the incident signal with respect to some reference orientation, such as the receiver's boresight direction [49]. Together with the knowledge of range or ToF, the information about AoA not only helps to infer the positions of the surrounding target objects but also provides information about the geometry of the environment. Hence, both MPCs i.e., range and AoA, tend to play a key role in the implementation of radio SLAM. Some of the commonly used VOLUME 11, 2023 methods for estimating AoA are briefly discussed in the next section.

A. CLASSICAL ESTIMATION TECHNIQUES
MUSIC, a pioneering algorithm in the field of signal processing, was first presented by Smith in 1983 [35]. MUSIC estimates signal parameters by eigenvalue decomposition on the covariance matrix of the receiving signals on multiple sensor arrays. Another popular subspace-based method for the estimation of signal parameters is via the rotational invariance technique, commonly known as ESPRIT [50]. A simplistic approach for AoA estimation is by computing the TDoA [51]. An illustration of TDoA computation on a uniform linear array (ULA) of antennas is shown in Fig. 5. This technique is beneficial in that it does not require any synchronization between transmitter and receiver nodes. A study in [52] exploited TDoA together with the information about the geometry of the receiver antennas to compute the AoA. Other widely used AoA estimation techniques include Capon method [53], [54] and MLE [55], [56], [57].
Performance comparisons for MUSIC and ESPRIT have been thoroughly analyzed in the last few decades [58], [59], [60]. Despite their straightforward approach, the performance of classical statistical techniques suffers from various factors. For instance, high-resolution techniques such as MUSIC and Capon, are sensitive to noise and multipath-rich channels. In addition, the computational complexity of MLE soars exponentially when scanning in 3D, that is, in both azimuth and elevation directions [61]. Moreover, the number of received signals must also be known in advance for both MUSIC and MLE [62], [63]. ESPRIT, on the other hand, requires the utilization of a higher number of sensors for fast computation [64], and its utility is limited to a specific geometry of antenna array i.e., ULA [65]. To overcome the aforementioned limitations imposed by the classical techniques, many variants have been devised for MUSIC [66], [67], [68], [69], [70] and ESPRIT [65], [71], [72].
A plethora of research work has been carried out to estimate AoA using the MUSIC algorithm. The work in [73] used MUSIC and root-MUSIC algorithms to estimate AoA for mobile localization and the experiments were validated using software-defined radio (SDR) i.e., USRP X310 with daughterboards UBX−160 and TwinRX. A proof-of-concept (PoC) for direction finding using MUSIC and ESPRIT was developed using the NI-PXIe platform. The work in [74] presented a TDoA-based method to estimate the AoA using all the available anchor nodes in the network. Computing AoA from each node resulted in a more confident estimate of the AoA. Synchronization issues, between any two nodes, were resolved using the TDoA method. Apart from RF-based direction finding applications, the aforementioned classical algorithms have also been widely used for the application of sound source separation and localization [75], [76], [77]. The next section describes popular conventional and DL techniques that have been used for AoA estimation.

B. DEEP LEARNING TECHNIQUES
With the recent advancements in technology over the past decade, an enormous amount of data is being generated through various devices and digital platforms. This outburst of data has given rise to a completely new era of artificial intelligence (AI), which comprises data-driven algorithms whose performance seems to supersede that of conventional signal processing algorithms. NN based DL algorithms [78] have become significantly popular in the field of computer vision [79], natural language processing [80], and more recently, in the applications of wireless communications [81]. In fact, many researchers are now trying to leverage the potential of DL for estimating the AoA for the applications of indoor positioning. There are two approaches to design DL models for AoA estimation: regression and classification. Table 2 lists some of the regression and classification-based DL models designed for optimizing AoA estimation. However, it is important to note that each of the listed experiment in Table 2 has considered a different set of experimental parameters such as the number of source reflectors, the field of view (FoV), signal-to-noise ratio (SNR), and the receiver's antenna array configuration such as ULA, nonuniform linear array (NULA), uniform circular array (UCA), and symmetric nested array (SNA). Therefore, the performance of AoA estimation varies according to the set of parameters employed in the experiments. Regression and classification-based approaches are further discussed in the subsections below.

1) REGRESSION MODEL
In ML, regression refers to a method that predicts a continuous value of a quantity [82]. To put it differently, the output of the regression model is a numerical value of a random variable. For the application of AoA estimation, the output of a regression model will be the predicted value of AoA. An illustration of a regression model based on a single-layer neural network is shown in Fig. 6. The input layer contains N features represented by p n , whereas w n and b are the weights and biases of a single-layer neural network, respectively. The output is a scalar value representing the predicted AoA.
In the last couple of years, regression models have been widely used for predicting the AoAs. The work in [83] presented a regression-based DL model to estimate the number of point sources and their associated AoA. They made use of a dense neural network (DNN) and compared its results with other NN architectures such as convolutional NN (CNN) and fully connected network (FCN). Moreover, they also compared their results with conventional estimators, such as MUSIC and MLE, and demonstrated how the proposed DNN outperforms them. Furthermore, others proposed a hybrid approach in which they diffused the features from conventional methods together with the data-driven models. For instance, the work done in [61] used the output of the MUSIC algorithm as input to the ML models to reduce the input data dimensionality and model complexity. They exploited different ML frameworks including NNs, the Gaussian process, and the regression tree to improve the estimation accuracy of AoA. Their hybrid approach resulted in considerable improvement as compared to the conventional method such as MUSIC.
The work in [84] followed a similar hybrid approach to estimate the AoA, however, unlike simulations, they employed a low-cost SDR to perform experiments and validate their results on over-the-air data. In this work, DL models such as FCN and CNN have been employed to estimate only two AoAs. MUSIC algorithm was applied to the receiving in phase (I) and quadrature-phase (Q) signals. The resulting covariance matrix was used as input to train the DL models. The dataset of IQ signals was collected using the low-cost SDR, called KerberosSDR [85]. Their results showed significant performance over MUSIC and support vector regression (SVR). The problem of classifying near-field and far-field sources along with their associated range and AoA has been explored in [86]. First, the received signals are converted to the frequency domain, where each peak corresponds to a reflective source. Using the locations of the peaks, the phase difference matrix is computed for each source, which is then used to train a CNN model to predict AoA of each source. In addition, an autoencoder was also employed to perform classification between near-field and far-field sources. The autoencoder helps to learn only principal components in the input data and discards redundant and irrelevant information, thus, enhancing the capability of the DL model to generalize well over the unseen data. After that, the range of near field sources was predicted using another CNN. This three-step chain comprising regression and classification led to increased localization accuracy. A regression-based endto end model was proposed in [87], in which a CNN was devised to estimate the AoA using phase features from the spatial covariance matrix of the received signal. The proposed method demonstrated increased accuracy over MUSIC and radial basis function NN (RBFNN).

2) CLASSIFICATION MODEL
In classification mode, the model predicts the outcome probability of a given set of class labels. In other words, the output of a classifier assigns a probability to each class and the class with the highest probability is considered to be the predicted class [82]. Unlike regression, the output of a classification model is a vector containing the predicted probabilities of each class, as illustrated in Fig. 6. For AoA estimation, the number of classes can be assigned according to the expected number of reflection sources whereas the probability of each class defines the likelihood of each angle to be the true source of reflection. A study in [88] considered multiple configurations of the sensor array. They proposed multiple DL models that work together to achieve robust performance for the AoA estimation. Autoencoders have been used for spatial filtering followed by multiple layers of DNN classifiers to estimate the AoA. Another work in [89] solved the AoA problem for a NULA of antennas. They designed a CNN network (named as RFDOA-Net) to predict the AoA. In addition, the RFDOA-NET contains multiple sub-modules that enhance the feature extraction for better performance of the AoA estimator. They created a simulated dataset to train the model to classify 181 classes of AoA. Results were compared with MUSIC and other state-of-the-art DL models, such as ResNET, and SqueezeNet. An unsupervised approach was presented [90] to estimate the AoA using a CNN. The proposed approach did not involve any annotations of the input data in advance to train the network, instead, the authors devised an l 1 -normbased loss function to optimize the weights of the network. The input to the network was a covariance matrix of the received signals impinging on the ULA of antennas. The CNN model classified AoAs based on the highest probabilities. Experiments for this work have been performed and verified on simulations using synthetic datasets. Furthermore, in [91], feature-to-feature-based DL models have been developed in order to learn complex function mapping between distorted phases and clean phases of the received signals. The clean phases, reconstructed from a multidimensional CNN, are used to create the covariance matrix, which is then used to compute DoA from conventional methods such as directional beamforming and MUSIC. The main objective of the feature-to-feature learning approach is to mitigate the effects of phase distortions caused by multipath components. Experiments were performed on real data collected from a 21−antenna element-based VHF radar. In a similar work [92], a DNN has been used as a classifier to predict AoA from the correlation matrix obtained from the input signal. It shows an increased estimation accuracy as compared to the MUSIC algorithm.

V. STATE PREDICTION METHODS
There are three main paradigms of SLAM algorithms that have been widely used for tracking the state variables of robot and landmarks, depending on the complexity and requirement of the application. These paradigms are briefly discussed below:

A. KALMAN FILTERS
The KF was originally devised in 1960 and has widely been used since then to track and estimate parameters from noisy observations [93]. However, the KF tends to work only for linear systems. To incorporate non-linear systems, the EKF was introduced [42]. The basic idea of the EKF is to linearize the non-linear system, then apply the fundamental KF to it. For SLAM, the measurements and state transitions are almost always non-linear in nature, therefore, the EKF is commonly used as a conventional method to solve the SLAM problem. The EKF-SLAM aims to compute the joint conditional probability as defined in (2). This technique comprises a state vector that contains states that need to be estimated, for example, the location of the robot and landmarks. The EKF computes the mean and the associated covariance matrix of the estimated state vector, where the covariance matrix represents the uncertainty in the estimated locations. Since the EKF is a recursive algorithm, both the state vector and covariance matrix get updated as the robot moves through the environment. Therefore, the EKF is considered to be a suitable algorithm to be used for implementing the online SLAM.
In order to compute (2), one needs to have the probability distribution model defined for both the robot's observation and motion [18]. The observation model is generally expressed as the probability of finding the measurement z t given the map and robot's location: On the other hand, the motion model describes the probability of the robot's current location given the previous location along with the odometry measurement. This can be expressed as: Although the EKF is a promising algorithm in terms of simplicity and robustness, the computational complexity of calculating the measurement updates increases with increasing the number of landmarks. To be specific, the computational complexity grows quadratically with the number of landmarks, and this can affect the performance of SLAM in real time. Apart from that, the EKF-SLAM uses linearity to represent the distribution models of the robot's observation and motion which are generally non-linear in nature.
This caveat usually leads to poor performance in certain environments [18].
The work presented in [5] used mm-wave radar for measuring the range and bearing of the vehicle and the surrounding landmarks. The states were further predicted using the EKF. A method for identifying the shape of the landmarks was presented in [7], in which power measurements of reflections were obtained at different positions by moving the mobile receiver and the state estimates were updated using the EKF. Apart from range and bearing, the study in [9] employed RCS parameter, which was estimated using the EKF for reconstructing the map. Similar work was done in [46] in which the state vector comprised of RCS collected at each steering angle and it was estimated through the EKF. However, the posterior belief for the EKF was modified considering the inter-dependency between multiple measurements due to the antenna's beamwidth. To sense and map the dynamic environments, a mm-wave based mobile sensing system developed in [94] keeps track of the moving scatterers in the environment using an interactive multiple model-based EKF. The experiments have been validated on both simulations and hardware.

B. PARTICLE FILTERS
PF [43] is a non-parametric optimization method that employs a set of particles to represent a posterior distribution of a random variable. Each particle is considered an estimate of the true state. For instance, if we wish to estimate the location of a robot, then each particle represents one of the possible locations in the environment where the robot is likely to be found. Particles tend to survive when they have the closest estimate to the true location of the robot. Fig 7 illustrates the process of how particles converge to the true location of the target over multiple timesteps. The blue dot represents the aggregate of all the particles and its location represents the estimate of the target's location. Consequently, more particles are required to represent a map with high dimensional feature space. Therefore, the computational complexity of the method increases with the features in the map [33]. However, to reduce computational complexity, the FAST SLAM [95] method emerged as one of the most popular and widely used PF-based methods that effectively mitigates the issue of expensive computations in solving the SLAM problem. Moreover, unlike the EKF-SLAM, a PF works with both nonlinear models and multimodal distributions, hence allowing to reconstruct more accurate representations of complex environments. Table 3 lists the characteristics of the KF and PF [96]. A real mm-wave based indoor user positioning system was implemented in [97] using commercial off-the-shelf 802.11ad, in which the performance degradation occurs due to the irregular beam shapes transmitted by the users. A PF along with the use of Fourier analysis has been employed to overcome the limitations posed by the cost-efficient hardware of the 802.11ad network. Another PF-based method for mobile localization in a UWB-based sensor network has been presented in [44]. They used Rao-Blackwellized PF [98], [99] to estimate the MPC, i.e., range for the location of the mobile user, surrounding VANs and their associated PANs. Furthermore, a Rao-Blackwellized PF has been used for the application of pedestrian tracking [47], in which the pedestrian is modelled as a moving agent and fixed anchors are placed in the surrounding environment with unknown positions. Without a priori knowledge of the positions of anchors, the distance between the pedestrian and anchors was estimated and tracked using a Rao-Blackwellized PF. Other works [100], [101] proposed a method for mobile localization called channel-SLAM. It employs EKF and PF to estimate the MPC parameters of the received signal, including amplitude, AoA, and delay. Apart from the user positioning application, joint vehicle positioning and mapping using mm wave has been implemented in [102] with the help of PF for the estimation of vehicle's states whereas the probability hypothesis density (PHD) filter has been employed for the map ping of the environment. Their proposed method is shown to work with unknown number of landmarks. Recent works have started to circumvent the issue of higher computational complexity that comes with the complex variants of the PF. VOLUME 11, 2023 A novel 5G mm-wave SLAM has been proposed in [103] which demonstrated reduced computational complexity as that of the PHD-based Rao-Blackwellized PF. This work has been extended with the design of a novel mm-wave radio SLAM filter whose complexity has further been reduced using Poisson multi-Bernoulli mixture filter [104].

C. GRAPH SLAM
Graph-based techniques were developed [106], [107], [108] to leverage non-linear sparse optimization for solving the SLAM problem. In graph SLAM, nodes represent the locations of a moving robot and surrounding landmarks. Whereas the arcs represent the relationship between two consecutive locations x t1 and x t of the robot, and also between current robot location x t and the landmark m i observed by the robot, as shown in Fig. 8. Unlike the EKF SLAM, graph SLAM can reconstruct high dimensional maps of the environment [33]. Initially, the formulation of graph SLAM was limited to solving only the offline SLAM problem, however, online variants have also been introduced that can process the measurements for online SLAM [109].
In the context of radio SLAM, the work in [45] presented a belief propagation (BP) based joint probabilistic data association method for joint localization and mapping of the mobile agent, PA, and VAs. Experiments involved real indoor UWB dataset [110] for validating the proposed approach. Moreover, there are other studies in multipath-based SLAM using FGs that have exploited MPCs other than just the multipath delays, for example [111], [112], [113]. The study in [111] not only exploited the delays but also the AoAs from the MPCs of the radio signals to localize a mobile agent and the surrounding PAs and VAs in terms of their position, velocity, and orientation. To enable the MPCs estimation, a BP-based algorithm has been proposed which also exploits the complex amplitudes of the MPCs to enable probabilistic data association. A similar study carried out in [112] proposed a localization strategy of the UE in a mmWave MIMO communication infrastructure. In this study, a BP-based algorithm has been proposed to estimate the states of UE along with the map of the reflecting surfaces in the environment. In addition, a study in [113] addresses the problem of handling dynamic changes in the states of agent and map features. This has been accomplished by incorporating interacting multiple models parameters into the FG and estimating the time-varying states of the agent by using the proposed BP-based algorithm. In more recent developments in radio SLAM using FGs, multipath measurements associated with the same reflective surfaces amongst base stations and higher-order reflections are fused together and represented by a single master VA [26], [27]. This approach has led to faster convergence of mapping while reducing mapping errors.

VI. MAPPING TECHNIQUES
Knowing the map of the environment is crucial for robots and mobile users to find the appropriate trajectory path. The process of reconstructing a map of the environment involves the knowledge of the positions of the surrounding landmarks. The process of inference can be carried out by estimating the states that best represent the features of the environment. In the context of wireless applications, these states can be a set of MPC parameters such as range, RCS, and AoA. The commonly used filters for state tracking include EKF and PF. Apart from using the conventional state estimators, the study in [114] used convex optimization and the Hough detector algorithm to assign distance estimates to the correct reflectors with less combinatorial complexity. The experiments for this work were validated for LOS and single bounce reflection signals in a convex polygon-shaped room.
Constructing map is one of the most challenging tasks in SLAM. Since the environment, whether indoor or outdoor, is often rich in features, it is challenging to develop a methodology that can process high-dimensional data and also distinguishes between several features for identifying certain landmarks of interest. However, there are two basic categories for representing the environment, i.e., the feature based map and the occupancy grid map (OGM). Each mapping technique is specifically designed to work best with the use of certain sensors and for particular environment scenarios [115]. Table 4 summarizes the methods and mapping representation involved in the existing implementations of radio SLAM.

A. FEATURE-BASED MAP
Feature-based maps are mainly useful to represent the outdoor environment due to the presence of ample features in the surrounding. To develop feature-based maps, the sensors must also be capable of taking high-resolution measurements that can be processed to distinguish different landmarks. Therefore, camera-based sensors are appropriate for reconstructing feature-based maps. The most successful approach for processing image data involves computer vision methods that have completely revolutionized the field of image processing. With the help of computer vision techniques, landmarks can be identified, segmented, and distinguished for rich-feature representation of the environment. However, it is challenging to distinguish landmarks using the measurements obtained from radio sensors. Therefore, it is challenging to employ a feature-based mapping approach in wireless positioning applications. On the other hand, OGM is a suitable approach for creating simple maps, especially for indoor environments, using radio sensors. The OGM method is further discussed in the next subsection.

B. OCCUPANCY GRID MAP
As the name suggests, this approach divides the environment into uniform grids, where the size of a grid cell depends on the sensor's resolution to distinguish between features. Each grid cell holds a specific occupancy value, which represents the presence or strength of a detected landmark, as illustrated in Fig. 9. In a binary grid map, each grid cell is either represented by 0, indicating air or absence of any landmark, or 1, indicating the presence of landmark in that location. This is a coarse mapping approach as it cannot differentiate between different types of landmarks, which are present in the environment. To overcome this problem, each grid cell can be represented by a value according to the signal strength received from that location.
In [44], the reconstructed map shows the positions of PAs and VAs as point landmarks whose states are updated using FG and PF. Similarly, other works [100], [101] employed EKF and PF to track the MPC parameters, such as amplitude, AoA, and delay and represented the map using point obstacles.

VII. EXPERIMENTAL RESOURCES
While there are plenty of simulated implementations of radio SLAM, there are only a handful of PoCs developed so far. For developing a PoC, one needs to have an appropriate sensor device available to perform the experiments. Moreover, with the rise in DL methods, a huge number of datasets are required, either for simulated experiments or for developing a PoC. The following sections describe some of the available datasets and hardware devices VOLUME 11, 2023 suitable to experiment with the implementation of radio SLAM.

A. DATASETS
With the advancements in technology over the past few decades, the outburst of data has enabled researchers to devise data-driven algorithms that can learn complex non-linear functions from the data itself and tend to be more robust than conventional analytical methods. Regarding SLAM, few datasets have been collected and made open-source to enable researchers to devise and validate new methodologies. Most of the datasets contain sets of indoor and outdoor images of the environment which can be used with 3D photo realistic simulators, such as AI Habitat [116], to simulate and test the robot's behavior in a particular environment. However, these image-based datasets are mainly aimed toward the implementation of visual SLAM using computer vision algorithms since cameras are the primary sensors in this case. Nevertheless, a recent work [117] has integrated wireless propagation information with the help of wireless channel simulators, such as Remcom [118], within the Gibson dataset [119]. They validated the simulations on AI Habitat [116] and have released an augmented Gibson dataset [120] that also contains wireless ray tracing data along with the camera and LiDAR data. The Gibson dataset contains virtual images of several indoor places. It contains images of 572 buildings having 1447 levels and it covers a total area of 211km 2 . The Gibson environment continuously generates these images in a sequence that is akin to looking at the surrounding using a real camera. Therefore, the idea is to emulate a simulated environment that is close to the real one, so as to reduce the complexity as well as the time of deploying and validating the proposed methodologies in real-world scenarios.
Owing to the limited availability of low-cost radio transceivers, fewer over-the-air radio-based datasets are currently available for positioning applications. For AoA estimation, the study in [84] used KerberosSDR [85] as a VHF/UHF receiver to generate a dataset [121] of complex IQ signals. This dataset contains both LoS and NLoS components, impinging on a ULA of antennas. Some mm-wave based datasets are also available for indoor [122] and outdoor environments [123] for user localization. A mm-wave dataset [124] based on the reflection measurements from different metallic objects at 28 GHz has been recorded using the National Instrument PXI signal transceiver. Its equivalent simulated ray-tracing data is also provided, which is generated through the Wireless Insite software by Remcom. The dataset is composed of the channel impulse response obtained in both indoor and outdoor environments. Another dataset [125] has been developed for DoA estimation. It comprises of compressed signals obtained at different azimuth and elevation angles between 30 • and −30 • . However, the source from which the dataset has been generated is not clearly described in the paper. To foster research in 5G New Radio (NR) technology, a simulated dataset has been generated in [126] which comprises of the channel frequency response obtained for MIMO configuration in both indoor and outdoor settings. The dataset has been generated using the carrier frequency of 3.5 GHz and 40 GHz in accordance with the 3rd Generation Partnership Project (3GPP) Release 16 (R16) standard. Few other mm-wave datasets can be found at [127]. While some of the datasets are made publicly available, others require purchased subscriptions to access them. Table 5 summarizes some of the available datasets, which can be used for the applications of indoor positioning and radio SLAM.

B. SENSORS
Cost-efficient sensors play a key role in the rapid development of PoCs and experimental validation. Decawaves DW1000 [130] is a low-cost UWB transceiver, which can be employed for UWB-based positioning and localization [47], [137]. For mm-waves, there are some cheap evaluation kits available by Texas Instruments [138] and Joybein [139] which can be leveraged to validate mm-wave-based positioning and SLAM methods in real-world scenarios. The evaluation kits [131], [132], [134] process the received signals and provide MPC parameters such as range, AoA, doppler, and SNR. To access the raw IQ data samples, TI provides an additional DCA1000EV M kit [133], which can be connected with the mm-wave kits from TI. Access to the raw IQ data can enable researchers to develop and test their own custom signal processing algorithms for indoor positioning and radio-SLAM. Table 6 summarizes some of the available transceivers that can be employed for the implementation of PoCs.

VIII. CHALLENGES AND FUTURE DIRECTIONS A. MM-WAVE SLAM
The mm-wave band is widely used in positioning applications due to its increased localization accuracy. However, it shows relatively poor performance in long range applications, as it suffers from high attenuation (path loss) as it propagates through the channel, compared to low-frequency signals. Therefore, the development of robust algorithms that can mitigate the path loss problem is required.

B. BEYOND MM-WAVE SLAM
Frequencies beyond mm-wave band such as the THz band are gaining attention in 6G communications and localization lately. Since location-aware communication is becoming the core of 6G communications, and sub-THz bands are soon to be supported by 6G, the research interest in THz based localization is increasing rapidly over time. An extensive review has been provided on the recent developments and challenges of THz based localization [141]. Recent studies have validated the implementation of THz based SLAM in an indoor environment [142] and demonstrated that the THz band has the potential to outperform mm-wave based localization [141]. The THz band seems to be promising for future communication and localization networks, however, its challenges and appropriate scope of applications still need to be explored.

C. LIMITATIONS IN DEEP LEARNING BASED IMPLEMENTATION
As discussed earlier, DL based methods continue to outperform classical estimation methods. However, one needs to know the number of parameters to be estimated in advance. For example, in the case of AoA estimation, the output nodes of the regression and classification model translate to the number of possible reflectors or nodes in the environment. This implies, that the information about the source reflectors must be known in advance in order to design a DL model. For the classification model, not only the number of source reflectors should be known in advance, but the values of AoA associated with each reflector must also be known a priori. These limitations can lead to the complete failure of such DL models if deployed in an unfamiliar environment.

D. OUT-OF-DISTRIBUTION PERFORMANCE
Unlike in the field of computer vision, there are only a handful of datasets available for the wireless technology domain, and those datasets are limited for specific applications only. Also, the distribution of wireless signals can be affected by numerous parameters. Some of these parameters are SNR, wireless propagation channel, modulation type, frequency band, bandwidth, and perturbations caused by the nonlinear behavior of the hardware components. In addition, the aforementioned factors are further controlled by various other parameters, which makes it challenging to create a well-generalized dataset that considers the combination of all the factors. Therefore, ML and DL model tends to overfit on the dataset containing a specific distribution of parameters. However, the model fails to generalize as soon as it encounters the data with a slight distribution shift in those parameters. In DL, this is known as the out-of-distribution problem. This usually happens when a model is trained on a particular training data distribution, but it fails to perform well in real-time scenarios as the actual data distribution changes with respect to the training dataset.
To increase the robustness of data-driven models against distribution shifts, it is imperative to do research in devising intelligent ways for curating generalized datasets as well as designing DL models that are invariant to changes in the data distribution. Furthermore, just like CNN and transformer models are mainly designed to work with images and sequential data, respectively, a different DL framework can be developed that will work optimally for wireless data.

IX. CONCLUSION
This paper provides a holistic overview of radio-based methods for localization and mapping with special emphasis on mm-wave and some UWB based methods. It extensively overviews the potentials and challenges of conventional and DL-based methods used for implementing radio SLAM. While there is a fast development of DL-based methods VOLUME 11, 2023 for localization, methods for mapping still follow classical approaches, such as EKF and PF, leaving room to incorporate data-driven based methods for improved mapping performance. Furthermore, it is also important to note that enhanced localization accuracy using DL algorithms usually comes at the cost of generalizability. Therefore, not only is it necessary to create well-generalized datasets, but it is important to explore the avenues of designing generalized DL models for wireless data. The potentials of mm-wave together with the generalized data-driven models can pave a way for robust radio SLAM in the future.