Machine Learning for Millimeter Wave and Terahertz Beam Management: A Survey and Open Challenges

Next-generation wireless communication networks will benefit from beamforming gain to utilize higher bandwidths at millimeter wave (mmWave) and terahertz (THz) bands. For high directional gain, a beam management (BM) framework acquires and tracks optimal downlink and uplink beam pairs through exhaustive beam scan. However, for narrower beams at higher carrier frequencies this leads to a huge beam measurement overhead that negatively impacts the beam acquisition and tracking. Moreover, volatility of mmWave and THz channels, user random mobility patterns, and environmental changes further complicate the BM process. Consequently, machine learning (ML) algorithms that can identify and learn complex mobility patterns and track environmental dynamics have been identified as a remedy. In this article, we provide an overview of the existing ML-based mmWave/THz BM and beam tracking techniques. Especially, we highlight key characteristics of an optimal BM and tracking framework. By surveying the recent studies, we identify some open research challenges and provide our recommendations that can serve as a future direction for researchers in this area.

The mmWave BM framework specified by the 3rd generation partnership project (3GPP) comprises three operations: initial beam establishment including beam sweeping, beam tracking, and beam failure recovery [4]. Beam sweeping is the process where the base station (BS) or the user equipment (UE) covers a spatial area by sequential transmission of analog beams from a predetermined analog beam codebook to find a suitable Tx-Rx beam pair for further communication. This is achieved on the BS side by transmitting beamformed synchronization sequence blocks (SSBs) and channel state information reference signals (CSI-RSs) or from the UE side by transmitting sounding reference signals (SRSs). The receiver, the BS or the UE, then sweeps its receiving beams to measure the reference signal received power (RSRP) of the beamformed reference signal transmitted by the other entity. In the downlink, the UE measures all the SSBs and reports the one with the highest RSRP to the BS. For beam reporting during initial beam establishment, the UE initializes random access using the random access channel (RACH) resource associated with this SSB and the BS identifies the beam selected by the UE which is then used for further communication. To enable beam tracking, the UE keeps track of the RSRP and as soon as it falls below a predefined threshold the UE initializes beam probing to find an alternative beam pair. In addition, the UE reports several SSBs with the highest RSRP to the BS regularly. For beam failure detection, the UE estimates the RSRP or the block error rate (BLER) of an SSB or the CSI-RS and triggers the beam failure instance if the RSRP is below or the BLER is above a preconfigured threshold. When multiple beam failure instances are triggered, the UE searches for alternative candidate beams. If none of the alternative beams offers a better RSRP, the UE triggers the beam failure detection over the physical RACH. The BS then uses new candidate beams for transmission of the beam failure recovery response. For further insights, a detailed description of the mmWave BM framework specified by the 3GPP has been established in [5] and [4].
For initial beam establishment, the BS and the UE resort to beam sweeping which is carried out through an exhaustive beam scan (EBS) at Tx and Rx as shown in Fig. 1. We assume the number of beams at the BS and at the UE to be M and N , respectively. For mmWave beam measurements, EBS relies on the sequential transmission of SSBs. Intuitively, one can notice that the determination of a downlink beam pair via EBS leads to a significantly huge overhead (M · N beam measurements) with an increasing number of beams at Tx and Rx. Furthermore, if the beam reciprocity is not supported, the EBS needs to be executed as well for the uplink to define the optimum uplink beam pair. In addition, the volatility of the wireless channel due to UE mobility and environmental change makes the application of EBS infeasible particularly at higher frequency bands. As a result, the existing BM framework will pose as a critical bottleneck for future wireless communication networks.
An alternative to the EBS is a hierarchical beamspace search [6] that uses a tree search approach to reduce the number of beam measurements during initial beam establishment. The idea is to design a multi-resolution beam codebook that on the first level scans a smaller set of wider (parent) W M indicate the parent wide beams and the child narrow beams, respectively. However, a fundamental limitation of this approach is a reduced beamforming gain due to utilization of wide beams on the first level. An adaptive sequential test that learns observed signal statistics in terms of their amplitude and noise variance to speed up the beam selection was proposed in [7]. The work was further extended to a multi-user scenario in [8] and to a beam elimination strategy that accelerates the beam selection by eliminating the candidate beams which are highly unlikely to be selected [9]. Alternatively, a frequency selective beam probing that maps distinct frequencies to distinct beamformers was proposed in [10]. The main idea here is to feed different frequencies to different beamformers which enable simultaneous testing of all available beamformers to reduce the training overhead. Although the beam selection techniques can reduce the beam measurement overhead significantly, for most practical scenarios the assumptions of channel modeling required by traditional techniques are too stringent to meet. Furthermore, these schemes are mainly designed for single-user transmission and their overhead can still be very high for multi-user transmission when considering large-scale antenna arrays. Consequently, efficient BM techniques with low beam measurement overhead are sought for mmWave/THz communications.
The limitations of the existing BM framework aggravate further at THz bands where a higher number of narrower beams are formed to compensate the severe path loss. This leads to a higher beam measurement overhead during initial beam establishment. Additionally, the narrow beamwidth at THz bands entails a more frequent beam tracking procedure that further increases the beam measurement overhead. Moreover, the utilization of wider bandwidths at THz bands may give rise to the beam split effect [11], referring to the change in the beam angle as a function of frequency. A similar effect at mmWave bands is named as beam squint [12] but due to the VOLUME 11, 2023 narrower beamwidth it is more severe at THz bands and may cause the beams to split into different directions over different carrier frequencies leading to severe loss in array gain.
To reduce the beam tracking overhead at THz bands, the channel tracking scheme in [13] considers a predefined linear mobility model for the UE which limits its application. To relax the assumption of a linear mobility model, BS-cooperation-aided and hierarchical multiresolution codebook-based beam tracking approaches that aim to improve the accuracy of channel tracking were proposed in [14] and [15], respectively. However, the improved accuracy comes at the cost of inter-BS cooperation. Though all of the above mentioned traditional beam tracking approaches achieve an acceptable performance, they do not consider the beam split effect which limits their application to narrowband systems. To mitigate the negative impact of the beam split effect, the use of true-time-delay array [16] or delay-phase precoding [17], [18] has been proposed. Recently it has been proved that the degree of the beam split effect can be controlled resulting in controlled angular coverage that enables simultaneous generation of multiple beams at different frequencies [11], [19], [20], [21]. Though such approaches significantly reduce the BM overhead, they are either designed for slower UE mobility [11] or suffer from higher power consumption [19], [20]. As a result, sophisticated BM techniques that can cope with the challenging issues of THz band communications are needed.
It is worth mentioning here that a detailed study of traditional BM approaches is out of scope of this paper. Thus, in the following sections more attention is devoted to machine-learning (ML)-based BM solutions for mmWave and THz communication bands. ML-induced intelligence is one of the foreseen key features of 6G and is expected as one way to address the limitations of traditional BM approaches.

B. MACHINE LEARNING FOR BEAM MANAGEMENT
ML is a field of programming computers in a way that they can learn from data. Broadly, ML can be classified into three main categories: supervised learning (SL), unsupervised learning, and reinforcement learning (RL) [22]. A supervised ML model is trained with the labeled input data to learn the complex patterns between the input data and the output labels, enabling it to infer accurate labeling of the unknown data instances. In contrast to SL, there is no labeled data in unsupervised learning and the agent tries to identify the input structure for classification purposes. An RL agent, on the other hand, interacts with the dynamic environment to learn the optimal policy and performs actions in order to maximize the commutative feedback reward. The recent development of fast computer processors and the significant growth in availability of large data sets fueled the rise of deep learning. Inspired by the human brain, deep learning is a subset of ML that tries to imitate the way humans acquire knowledge. More recently, a new branch of artificial intelligence (AI) called federated learning (FL) has opened the door for a new era of ML [23]. In contrast to the centralized ML techniques, FL trains a centralized ML model across multiple decentralized nodes by only sharing learned local model parameters and keeping the raw training data set where it was generated. Thus, FL ensures data security and user privacy by elevating the need of centralized data collection and processing [24]. ML, FL, and deep learning have been successfully applied in many different areas like natural language processing [25], computer vision [26] as well as speech and image recognition [27], [28], [29], where mathematical modeling was proven to be significantly difficult. On the other hand, current wireless communication networks rely on mathematical models that sometimes are not perfect representatives of systems due to their underlying assumptions. Moreover, optimization of current wireless communication networks is becoming more and more challenging because, in order to support the multiplicity of heterogeneous use cases, it requires complicated mathematical models which are computationally inefficient. Due to this increasing complexity, researchers anticipate that ML tools can now be used to replace these complex mathematical models. A more fascinating concept in this context is to use ML for end-to-end wireless communication networks by considering it as a single complex scenario where ML itself can design parts of network, e.g., self-learned modulation and coding schemes [30]. However, the concept is still infancy and requires a huge research effort.

C. RELATED SURVEYS AND OUR CONTRIBUTION
Some previous surveys that explored initial beam establishment and beam tracking in mmWave and THz bands include [88], [89], [90], [91], [92], [93], [94], [95], [96], for which more details are provided as follows. A comprehensive survey on medium access control protocols for THz communications in [88] discusses the initial access mechanism and specify the requirements and challenges for different application areas. A more generic survey on several issues of mmWave communications in [89] presents a survey on mmWave channel models, beamforming architectures, channel estimation techniques, beam alignment, and beam selection algorithms. However, both these surveys lack the study and classification of ML-based BM and beam tracking approaches and do not tackle the issues of mmWave [88] and [89] THz bands. Other surveys on mmWave communications and BM [90], [91], [92] are mainly focused on the traditional approaches and lack the study of ML-based BM and beam tracking solutions. The authors in [93] identify open challenges and present potential future research directions on several aspects of THz band communications but lack a comprehensive review of existing BM studies. A more detailed survey that covers BM at mmWave and THz bands [94] mainly focuses on traditional BM approaches and covers only a limited number of ML-based solutions. Furthermore, it lacks a comparative analysis of existing ML-based BM solutions. Other recent works on AI-and ML-based BM include [95] and [96]. The authors in [95] first provide a brief overview of the BM framework for 5G New Radio and then identify some major limitations of existing BM that can pose as a bottleneck for future releases of 5G and beyond 5G communication networks. Furthermore, the authors provided their recommendation for possible future research. A state-of-the-art literature survey for deep learning-based BM techniques was furnished in [96]. Motivating the need for deep learning in BM, the authors then surveyed most recent works by categorizing them into different research routes and finally identified some open challenges for future research in this domain. Both these articles [95], [96] provide a great overview of traditional and deep learning-based BM, respectively. Yet, authors in [95] only highlights a few of the existing BM challenges without any survey of existing techniques, while in [96] authors only covers limited deep learning techniques for their survey.
Owing to the rapid development of ML for mmWave and THz communications, in this article, we report a comprehensive overview of ML-based BM studies that not only incorporates the deep learning approaches but also includes the existing SL-, RL-and FL-based studies for mmWave/THz BM. The unique contributions of this article can be summarized as follows: 1) By highlighting the limitations of the existing BM framework, we first identify key characteristics of an ideal BM framework for mmWave and THz bands. The identified key characteristics then serve as a baseline for the comparison of existing ML-based BM studies. 2) Existing studies are reviewed and summarized based on the ML environment that assists future researchers in understanding which features have been studied for a specific learning environment. 3) We have identified valuable research gaps by providing a comprehensive and comparative evaluation of existing literature. 4) We identify challenges of existing literature for future research and highlight approaches such as FL [23], meta-learning [97] and AI transformers [98] that appear promising for ameliorating the problems of the existing ML-based BM literature. The rest of the paper is organized as follows. We first shed some light on expected ideal key characteristics of an ideal BM framework in Section II. Section III presents an overview of the up-to-date literature on mmWave and THz BM based on AI and ML tools. Section IV evaluates surveyed studies by comparing them against the ideal key characteristics. Finally, we highlight some open challenges for AI-and ML-based BM at higher frequencies and provide our suggestions in Section V.

II. KEY CHARACTERISTICS OF AN IDEAL BEAM MANAGEMENT FRAMEWORK
BM at higher frequencies has to cope with several scenarios, i.e., high speed mobility, temporary or permanent beam blockage, interference, etc. A graphical illustration of a typical BM scenario in an urban environment is shown in Fig. 2, which impose several requirements on the BM framework. In this section, we enlist some drawbacks of the existing traditional BM framework and provide some of its ideal key characteristics.

A. OVERHEAD AND COMPLEXITY
The existing BM framework performs exhaustive beam sweeping for beam training during initial beam establishment. As shown in Fig. 1, the beam sweeping process is quite complex and incurs a large beam measurement overhead that grows with the number of beam pairs and thus as M · N [91], [96], which is not suitable for low-latency handovers. Moreover, the overhead increases further if more narrower beams are formed, e.g., for THz communication.
Thus, a key characteristic of an ideal BM framework is to have a lower beam measurement overhead and complexity for beam sweeping and beam probing during initial beam establishment and beam tracking, respectively.

B. SCALABILITY TO NARROWER BEAMS AT TERAHERTZ BANDS
Higher mmWave and THz band communication is anticipated as one of the key enablers for 6G communication networks [3]. On one hand, it brings the benefit of large available bandwidths, but on the other hand imposes challenges due to higher propagation loss. To overcome this, beamforming with pencil beams must be enabled but such beams can easily be disrupted by blockages or any change in UE direction and orientation [99]. Consequently, existing mmWave BM solutions do not work well at THz bands. Hence, considering the unique characteristics of THz bands, more accurate blockage mitigation and beam alignment methods are needed. Furthermore, the previously mentioned beam split effect causes the beams to split in different directions at different subcarriers and leads to severe array gain loss at THz bands and limits the extensibility of the existing BM framework.

C. SCALABILITY TO MULTI-PANEL BS AND MULTI-PANEL UE
The current BM framework considers multi-panel UEs, where only one panel can be switched on at a time. This provides resilience to the UE against blockage [100] and helps mitigating interference through directivity [101]. However, to increase robustness against multi-path fading channels and to harvest diversity and/or multiplexing gains, both BS and UE should be equipped with multiple panels that operate simultaneously. In the existing BM framework, this means that the beam sweeping and tracking must be performed over all BS and UE panels, which leads to a severe increase in complexity, latency, overhead, and power consumption. An ideal BM framework thus should be scalable to a multi-panel BS and multi-panel UE without significant increase in complexity or latency.

D. SCALABILITY TO MOBILITY
Higher mmWave and THz band frequencies necessitate the need of smaller cells where a mobile UE can experience more frequent handovers. Furthermore, higher mobility within the cell range requires a large number of probing beams within a shorter sweeping duration. Consequently, the existing BM framework does not scale well with a highly mobile UE [95].

E. ADAPTATION TO ENVIRONMENT
Traditional beam sweeping codebooks cover a spatial area by utilizing transmitting beams in all directions without any consideration of the propagation environment. BSs are usually deployed in dense urban areas where the propagation environment is usually non-line-of-sight. By adapting to the propagation environment, a BS can efficiently serve non-line-of-sight UEs resulting in an enhanced coverage. Thus, an ideal BM framework must adapt to the propagation environment through a dynamic design of beam sweeping codebooks [39], [73].

F. SINR MAXIMIZATION
In the conventional BM process, a UE measures RSRP of the received beams and reports these measurements along with the candidate beam IDs to the BS. The beam with the RSRP above a predefined threshold is then selected for further communication. RSRP measurements are performed irrespective of any consideration of interference from neighboring beams. This may lead to a low signal-to-interferenceplus-noise ratio (SINR), which is a more realistic measure for wireless communication networks. An ideal BM framework thus should consider several measurements such as RSRP, SINR, error vector magnitude (EVM), and BLER to derive the decision.

G. HISTORY UTILIZATION
For ML, it is very well-known that these techniques are good for pattern recognition. For the BM framework this can be utilized by estimating UE mobility patterns from historical data, which can help in reducing the BM overhead. For example, angle of departure (AoD) and angle of arrival (AoA) from previous beam measurements can be utilized to direct fewer beams towards the anticipated direction of movement. In addition, for UEs moving in a high speed train or on highways, temporal channel correlation can be exploited for beam overhead reduction [102]. Thus, any historical information should be exploited for an efficient BM framework.

H. DEPENDABILITY ON SIDE INFORMATION
To overcome some of the limitations of the current BM framework, studies propose to utilize side information, such as UE location or measurements from additional frequency (sub-7 GHz) communication bands [56], [57]. However, such proposals fully lean on the availability of side information. This limits the scope of such proposals as they cannot perform initial beam establishment without the attainability of such information. An ideal BM framework thus must not fully depend on the availability of any side information but should only utilize it for the enhancement of some key performance indicators.
To summarize, a fundamental limitation of the existing BM framework is the beam sweeping and measurement overhead. Thus, an ideal BM framework has to reduce this overhead to make itself scalable to more narrower beams, multi-panel BSs, and to multi-panel UEs. Utilization of any side information such as location can be quite helpful in reducing the beam sweeping overhead. However, to avoid suffering from unavailability or inaccuracy of such information, an ideal BM framework may not fully rely on side information. In addition, RSRP measurements for beam determination in the existing BM framework are discarded after determining the optimal beam pair. However, such information can be stored in history and can be utilized to estimate UE mobility patterns. Thus utilization of historical measurements can help reduce the beam sweeping overhead and can make the BM framework scalable to a highly mobile UE. Furthermore, the dynamic nature of wireless channel imposes the requirement of environmental adaptability on an ideal BM framework.

III. STATE OF THE ART
In this section, we provide a brief overview of ML-based mmWave/THz BM techniques. Existing studies can be classified based on their ML algorithms into three main categories: SL, RL, and FL. Based on the utilization of the side information, SL techniques can be further categorized into side-information-assisted and non-side-information-assisted techniques as shown in Fig. 3.

A. NON-SIDE-INFORMATION-ASSISTED SUPERVISED LEARNING
Due to its simplicity SL is the most frequently used ML technique. Besides BM, it has several applications in wireless communication networks including error correction codes, data compression, mobility management, power management, and channel estimation. Some popular SL algorithms include convolution neural networks (CNNs), recurrent neural networks (RNNs), long short-term memory (LSTM) network, support vector machines (SVMs), and K-nearest neighbors (KNN).

1) NEURAL NETWORKS
Fundamentally, a neural network (NN), also known as artificial neural network (ANN), is a combination of neurons that mimics the behavior of a human brain. In such a network, each neuron takes weighted information from other nodes and perform computations according to a given set of rules to produce an output. Similar to human brain, NNs improve their accuracy (error minimization) with the help of training data and learn to solve complex problems. Due to their success in computer networks and AI, they have been widely studied to enhance the performance of the BM process.
A NN-based ML solution that learns to adapt the codebook to a particular deployment scenario was presented in [38]. The proposed NN learns to predict the beamforming vectors based on the channel structure. Another similar approach in [39] exploits the concept of hierarchical beamspace search [6] in combination with a NN. Here the NN learns environmental characteristics to design a site-specific probing codebook as shown in Fig. 4. During the training phase, the BS first sweeps the wider beams to capture the channel matrix. The NN then utilizes this channel matrix to update the beamforming weights of the wider beams. Once the probing codebook is learned, the NN predicts the narrower beams for data transmission. Simulations, carried out via ray tracing and the DeepMIMO data set [103], indicate that the site-specific probing codebook design approach achieves higher accuracy in comparison to traditional hierarchical approaches and involves only low overhead. However, the solutions presented in [38] and [39] require channel knowledge, which due to the utilization of large antenna arrays is a high dimensional channel matrix and is difficult to acquire in mmWave/THz systems [104].
CNNs were designed to reduce the computational load of ANNs through parameter sharing and are widely used for pattern identification, image classification, and other computer vision tasks [105]. DeepBeam, a CNN-based BM solution in [40], exploits feature extraction properties of a CNN to infer the AoA and the beam ID by passively eavesdropping on the ongoing data transmission in the network. Motivated by the fact that each beam pattern introduces different impairments in the waveform, a CNN is trained to distinguish between different beam patterns by identifying these impairments in the I/Q samples of the received signal. Experimental results indicate that the DeepBeam CNN can achieve a beam prediction accuracy of up to 96% and 77% for a 5 and a 12 beam codebook, respectively. Furthermore, it was shown that for a 12 beam codebook, a 7 times reduction in latency can be achieved by avoiding EBS during initial beam establishment. Another major contribution of this work is the publicly available experimental data set [106].
To reduce the beam sweeping overhead and to enable fast and reliable initial connection, a DNN-based algorithm called DeepIA was proposed in [41]. In contrast to multi-codebook based BM approaches with wide and narrow beams, the DeepIA uses a single beam codebook but instead of sweeping all the beams in the codebook, the proposed solution only sweeps a small subset of beams S M . Beam measurement reports from the UE are then used as an input for the DeepIA that predicts the best beam for initial connection. Furthermore, a sequential feature selection approach that selects the beam subset S M conditioned on the highest prediction accuracy was also presented in [41]. Through simulation results it was shown that in comparison to traditional beamsweeping-based initial access, DeepIA can successfully capture the complex environmental patterns and can predict the best beam with high accuracy and smaller beam measurement overhead.

2) LONG SHORT-TERM MEMORY NETWORKS
The RNN approach overcomes fundamental limitation of a feedforward ANN by introducing recurrent connections in the hidden layers. This makes the RNNs capable of capturing sequential information in the input data. However, due to these recurrent connections, RNNs suffer from the exploding (when the eigenvalues of the weight matrix are greater than one) and vanishing (when the eigenvalues of the weight matrix are less than one) gradient problem [107], [108]. LSTM handles this problem through the introduction of gating functions that regulate the information flow before passing the long-and short-term memory to next cell state [109]. Due to their capability of learning the relevant information and long-term dependencies of input data, LSTM has been extensively studied to enhance the BM and beam tracking process.
An LSTM-based beam tracking solution for AoA estimation over specific paths has been presented in [42]. For this particular problem, LSTM uses the received signal and previous AoA estimates as an input and exploits the fact that mobility over specific paths generates sequential UE parameters that evolve over time. To train this ML network, channels are generated via the QuaDRiGa model [110], and to map to a real-world scenario, the training features and labels are provided by adding a small noise. Through simulation results, it has been established that the LSTM-based AoA estimation outperforms the highly optimized non-ML Kalman-filterbased AoA estimation [111] in terms of network outage probability. However, this work considers the UE mobility in a straight line path which limits its application.
Another mmWave beam tracking approach proposed in [43] utilizes LSTM for channel tracking in a vehicular scenario as shown in Fig. 5. Under investigation is a mmWave multi-input single-output system, where a UE is served by multiple coordinated BSs. Channel tracking involves two stages where the first stage is the online training. During this phase, all BSs estimate the channel h i (t) through uplink pilots and determine the beamformer weights using traditional methods. The estimated channels from each BS are FIGURE 6. Convolutional long short-term memory network for beam prediction [44].
then transmitted to a central LSTM cloud for environment learning. Channel prediction is enabled in the second stage during which the LSTM model predicts the channel for the next beam coherence time (an effective measure of beam alignment frequency [112]), i.e., h i (t + 1) as shown in Fig. 5. In this way the uplink training time is reduced to half for two consecutive beam coherence intervals. Simulations are performed via the ray tracing simulator Wireless Insite [113] and the results indicate that the ML-based approach with half overhead achieves the same transmission rate as achieved by the traditional approach. However, one can see that the proposed scheme relies on cooperation between the BSs which limits its application. Furthermore, the predicted channel To leverage the feature extraction capabilities of a CNN, it is used in combination with LSTM to extract spatial correlation in high and low-resolution beam domain images [44]. Utilizing the concept of a multi-resolution codebook, i.e., wide and narrow beams, low-resolution beam domain images are obtained via wide beam measurements. These images are then provided as an input to an LSTM-based CNN model, which learns the mapping between low and high-resolution beam domain images and estimates the quality of narrow beams as shown in Fig. 6. Simulation results obtained via the ray tracing simulator Wireless Insite [113] indicate that the ML-based multi-resolution codebook design approach achieves similar performance as hierarchical beamspace search but with significantly lower overhead. Similar to [44], a CNN and an LSTM was used for beam prediction in [45]. Here, to further reduce the beam sweeping overhead, an auxiliary LSTM-based adaptive beam training strategy that selects a subset of wider beams based on the previous measurements was proposed. Though this adaptive beam training strategy reduces the beam sweeping overhead significantly, this overhead reduction comes at the cost of degradation in beam prediction accuracy [45] which is not desirable. Another convolutional LSTM-based approach solves the super-resolution AoA estimation problem at THz bands by exploiting spatial and temporal correlation among channel observations obtained in different time instants [46]. Simulations results show that the proposed approach cannot only capture the AoA estimation with high accuracy but can also reduce the beam tracking pilot overhead.
More recently, in [47] an LSTM prediction model was fused with a sequential Bayesian estimation framework for beam tracking. At each beam training period, the LSTM model utilizes all previous channel estimates and contextual information from an inertial measurement unit at the UE to predict the a priori distribution of AoA. This predicted distribution is then utilized to get a more accurate a posteriori channel estimate through Bayesian estimation. The method is thus hybrid in nature and utilizes LSTM and an analytical measurement model for beam tracking. The model is trained via the ADAM optimizer [114] to minimize the loss function through back propagation. Simulation results show that this hybrid approach outperforms the LSTM and the Kalman-filter-based approaches in [42] and [111] in terms of bit error rate (BER) and channel estimation mean squared error (MSE).

B. SIDE-INFORMATION-ASSISTED SUPERVISED LEARNING
The beam measurements overhead can be greatly reduced if the UE can provide any additional side information (location, orientation). In this context, several studies, discussed in this section, propose to utilize ML tools that can predict candidate beams based on the available side information and can significantly reduce the mmWave/THz BM and beam tracking overhead. These studies can be subdivided on the basis of the available side information. In general, these approaches either utilize UE location or sub-7 GHz CSI to maintain a database and train the ML algorithm to map this information for beam prediction as shown in Fig. 7. Recently, the use of sensory information from imaging sensors such as lidar or radar has also gained increasing interest to address the BM challenge at mmWave and THz bands.

1) LOCATION INFORMATION
A beam prediction framework with the availability of situational awareness was proposed in [48]. Motivated by the fact that vehicles are the main source of dynamic reflections in any urban environment, it was proposed to use the location of vehicles (obtained via dedicated short-range communication) to predict the received power of a beam. Simulation results obtained via Wireless Insite [113] indicate that beam prediction accuracy can be significantly improved by utilizing the contextual information. A coordinated beamforming solution was presented in [49], where authors propose to extract radio frequency signatures of the environments from the received VOLUME 11, 2023 FIGURE 7. Side-information, i.e., UE location or low-frequency CSI, is used to train the ML model for beam prediction.
pilots at multiple BSs. Based on the received pilots, a CNN is trained to predict the maximum achievable rate of each BS beam and then selects the beam with the highest predicted rate.
Location information was also exploited to resolve the beam tracking problem for vehicular to infrastructure communication [50], [51]. A database of beam pairs and quantized location bins was maintained in [50]. Further, a learning model for non-discrete receiver locations was proposed in [51]. However, both these approaches consider location awareness about targeted vehicles but ignore other moving objects. To overcome this problem, an ML approach that utilizes the location information of other vehicles to identify the optimal beam pair index was proposed in [52]. It was shown that utilization of this extra information results in an enhanced beam tracking performance. Furthermore, the impact of location inaccuracy was also studied in [52].
Motivated to enable the online learning capability in their previous work [50], a MAB-based tracking framework that utilizes the location was proposed in [53]. The basic idea of the proposed approach for a given location is to learn from the beam measurements obtained in the past. Previous learning parameters of beam measurements are stored at the BS and are updated with each beam alignment attempt. In this way a database of locations and beam pairs is maintained at the BS. Now whenever a UE reports a location already stored in the BS, a subset of most suitable beams is selected. In the next phase, beam sweeping is performed over this limited subset of beams to identify the best beam index.
Most of the above mentioned location-assisted approaches are designed for vehicular scenarios, where the receiver orientation is assumed to be fixed over a given location. However, this is not applicable to pedestrian applications, where the orientation of a receiver may change several times on a given location due to human behavior. A beam tracking method that leverages the location and orientation of the receiver was proposed in [54]. In particular, receiver location and orientation were provided as an input to the deep neural network, which captures the environmental structure and predicts the probabilities of each beam pair for being the best.
Recently, authors in [55] proposed a location-assisted ML-based beam alignment framework that predicts the optimal serving BS and narrows down the best candidate beams based on the receiver's location. For learning purposes, low complex ML tools, i.e., multi-layer perceptrons and random forest classifiers were used. To train the network, a data set generated for a typical dense urban environment was used [115], which makes this approach suitable for pedestrian and vehicular applications. During the training phase, all BSs perform EBS to identify the best BS and beam pair at each location. During the operating phase, each UE transmits its location to several BSs and based on this location a BS identifies itself as the best serving BS and then selects the best serving beam.
Given the promising results of location-assisted beam tracking approaches, its performance for a real-world scenario was analyzed in [56]. A comparison of three ML algorithms (lookup table, KNN, NN) in combination with known location is provided over the real-world data set DeepSense [116]. Experimental results indicated that with an antenna array of 64 beams, location-assisted approaches for beam tracking achieve an accuracy of 99% with 66% reduced overhead. However, beam prediction accuracy mainly depends on the size of the codebook and on the location accuracy.

2) LOW-FREQUENCY CHANNEL STATE INFORMATION
Experimental results in [117] show that in line-of-sight situations, there exists a spatial correlation between the channel gains of mmWave and sub-7 GHz interfaces. To exploit this fact, the power delay profile acquired via sub-7 GHz communication was used to obtain UE fingerprints [57]. Here, to utilize this sub-7 GHz information, a DNN was proposed to perform EBS during the training phase that estimates the correlation between sub-7 GHz power delay profile and the best beam in the mmWave link. Once the training is completed, with the sub-7 GHz power delay profile as an input, the DNN returns the best beam indices for the mmWave link. Through simulation results, it was shown that the proposed approach can predict mmWave beam with higher probability. A similar approach was presented in [58], where the authors formulate the problem as a classification task and propose to use a CNN for this multi-classification problem.
A more detailed study on leveraging low-frequency CSI for mmWave beam prediction was presented in [59]. Here in the first step, a function was derived that maps sub-7 GHz CSI to mmWave beams and link blockages. In the second step, a deep learning framework was employed to learn this mapping. Through simulation results in Wireless Insite [113], it was established that the proposed deep learning framework can leverage the low-frequency CSI to predict blockage and mmWave beams with high accuracy. More recently, the use of previously obtained sub-7 GHz channel estimates instead of instantaneous low-frequency CSI was proposed in [60]. Furthermore, for increased accuracy of mmWave beam alignment, LSTM was used to predict the best beam in between two low-frequency CSI estimation instants. Simulation results indicate that updating the beam in between low-frequency CSI estimation instants results in enhanced beamforming gain.

3) OTHER SENSORY INFORMATION
Due to the highly directional propagation nature at mmWave and THz bands, which makes the BM problem highly dependent on the surrounding environment, the use of environmental sensing information could be leveraged for BM overhead reduction. A vision-aided beam and blockage prediction approach in [61] utilizes the visual data from cameras at mmWave BSs. Analogous to the image classification task, a residual network learns to map a camera image to a beam index. To validate the effectiveness of the proposed approach, data sets were generated via the publicly available ViWi framework [118] and through simulations it was shown that the proposed approach can predict beams and blockages with high accuracy. Another similar approach utilizes visual data collected from cameras installed on drones to enable fast beam prediction at mmWave bands [62].
Inspired by the fact that the use of position or environment sensing devices at the terminals can guide in link establishment and beam tracking tasks, the use of visual sensors (cameras) [63], lidar [64], and other sensing modalities such as radar, position or their combination [65], [66], was proposed to address the BM challenge at mmWave and THz bands. To benefit from sequential modeling capabilities, all these works consider the use of an RNN for beam prediction on the observed sequence from sensory data as an input. Experimental validation using the real-world data set DeepSense [116] indicate that leveraging the sensory information for beam tracking results in higher prediction accuracy and reduced beam tracking overhead. Another similar approach in [67] uses multi-modal sensing information such as ultrasonic sensor, thermographic camera, and infrared camera to train a fast region-based CNN [119] for beam and blockage prediction at mmWave and THz bands.

C. REINFORCEMENT LEARNING
All of the techniques discussed in Section III-A and III-B are supervised in nature, i.e., extensive training is required to get the performance benefits. However, their performance is not guaranteed over untrained scenarios, which limits their application. For more general scenarios, online learning, i.e., RL techniques are more suitable. In this section, we discuss some of the RL based techniques for mmWave and Thz BM.

1) Q-LEARNING AND DEEP Q-NETWORKS
Q-learning is a model-free RL algorithm that learns to find the best action in a given state by evaluating the Q-value (quality) of each action [120]. The goal of Q-learning is to find the optimal policy that maximizes the cumulative feedback reward.
A Q-learning-based BM approach that tries to find the best beam with maximum received power was proposed in [68]. For this beam selection problem, a reward is based on the highest received power. During the exploration phase, the agent selects a specific group of serving beams and evaluates the reward associated with this beam group. It is to be noted that the agent may select non-optimal beams but this helps in discovering alternative beams. During the exploitation phase, the agent always selects optimal beams that maximize the received power. Another line of work combines Q-learning with an auxiliary beam pair [121] to further reduce the beam search space [69]. However, a fundamental limitation of Q-learning is that it requires several iterations before convergence, so that all state action pairs are explored, which limits its application to a fast moving UE.
A major concern in beam tracking is the fast moving UE. Particularly, for a high UE velocity it is more challenging due to increased beam tracking range. For Q-learning this indicates a larger state action space and slower convergence. For an accelerated convergence of beam tracking, the use of multiple Q-learning agents that run in parallel was proposed in [70]. Training several agents for different beam subgroups results in faster beam alignment and achieves better spectral efficiency. To further reduce the training time of Q-learning, a deep Q-network (DQNs) approximates the Q-values through a NN. A DQN-based beam tracking approach in [71] adapts to environmental changes by adjusting the beam probing range, which makes it suitable for UEs with high mobility. The performance evaluation for a slow and a fast mobile UE indicates that the DQN-based approach learns and converges faster as compared to Q-learning. Furthermore, the DQN also achieves a better sum data rate as compared to EBS and the hierarchical beamspace search approach. Another multi-agent-based DQN approach in [72] aims on maximizing the network throughput and reducing the beam alignment overhead through antenna beamwidth optimization.
A fundamental limitation of a DQN is that in its original form it is useful for discrete and low-dimensional action spaces [122]. The development of the deep deterministic policy gradient (DDPG) algorithm was aimed to improve the performance of a DQN for tasks with a continuous action space. DDPG uses two NNs, one for the critic and one for the actor. The actor takes the state as an input and decides the best action. The critic is essentially a Q-function and evaluates the selected action by computing the value function. Due to its capability of handling high-dimensional action space, the use of DDPG was proposed for BM in [73]. The main focus of this work is to develop an RL-based BM approach that can learn to adapt the beam codebook based on the surrounding environment. In principal, the scheme is quite similar to the SL based learning codebook approaches [38], [39] but it does not require any explicit channel knowledge. Further, it also incorporates the impact of hardware impairments on the learned beam patterns. Simulation results over the DeepMIMO data set [103] show that the DDPG based approach reduces the beam sweeping overhead by avoiding beam scanning in the directions where there is no user at all. However, it is to be noted that the proposed approach incurs a large overhead during the initial learning phase which may be repeated every time when the codebook needs to be relearned due to the significant environmental change.

2) MULTI-ARMED BANDITS
The multi-armed bandit (MAB) approach is an ML technique where the agent has to select an action (arm) from a set of possible arms to maximize the long-term reward. Drawn from the explore vs exploit dilemma, there are several approaches to deal with the arm selection problem, i.e., which arm to select [123]. One basic approach is the upper confidence bound (UCB) [124], where the agent selects the arm that has returned the best reward so far. With this so-called greedy approach the agent exploits whatever it has learned so far. Another approach is to encourage a bit more of exploration, i.e., the ϵ-greedy approach. This enforces the agent to explore other arms with the probability of ϵ and provides some control over exploration and exploitation. A third approach known as Thompson sampling relies on the Bayesian probabilistic model to update the distribution after each action [125]. It is very common to utilize a beta distribution for this probabilistic model. Due to the sequential nature of beam tracking, there exists an inherent exploration vs exploitation tradeoff and the problem can be formulated as the MAB problem.
First studies that formulate beam tracking as a MAB problem include [74] and [75]. A MAB-based beam tracking solution that utilizes contextual information along with MAB was proposed in [74]. In [75], MAB was utilized for beam alignment at higher speed, i.e., high speed railways. However, both these approaches consider each beam as an arm which leads to misalignment over fast time varying mmWave channels. To deal with this problem and to capture the environmental change, in [76] Zhang et al. propose to use a combination of the beam index difference and the beam sweeping subspace as an arm. This formulation helps in adopting to fast varying channels and to also reduce the beam tracking overhead. Based on these changes, modified UCB and ϵ-greedy MAB approaches were proposed for beam tracking. Through simulation results, it was shown that both algorithms achieve spectral efficiency very close to the perfect alignment scenario.
Another MAB-based approach in [77] proposes to use non-stationary bandits for beam tracking due to the time-varying nature of wireless channels. In order to account for non-stationarity and to utilize the historical data, two modified versions of UCB were proposed. The first algorithm is called discounted UCB which introduces a discount factor (γ ) that discounts past observations based on the discount factor. Here the purpose is to predict the channel variations by assigning more weightage to the most recent past. An alternative to discounted UCB is the sliding-window UCB, which assigns equal weightages to all the past values inside the sliding window. Simulations were performed using the QuaDRiGa channel model [110] and it was shown that, in comparison to stationary MAB, discounted UCB and sliding-window UCB approaches converge in a smaller number of iterations while at the same time achieve a higher throughput. Similarly, a hierarchical-codebook-based MAB in [78] uses prior channel knowledge for accelerated beam alignment at THz bands.
To avoid misalignment due to UE mobility, adaptive Thompson-sampling-based MAB was proposed for beam tracking in [79]. For higher beam tracking accuracy, it was proposed to use ACK/NACK feedback during two consecutive SSB transmissions. This helps in selecting the best arm, i.e., best beam and best modulation and coding scheme, in between SSB transmissions to maximize the data rate. The a priori reward distribution for each beam is obtained during the initial beam establishment procedure, which is then updated to obtain the a posteriori reward distribution based on the reception of ACK/NACK packets from the UE in response to an arm selection. Further, a forgetting and a boosting factor were introduced to deal with the non-stationarity of the environment. The forgetting factor discounts the past information while the boosting factor assigns more weightage to recent observation. Intuitively, one can see that updating the beam in between the SSB scans results in enhanced beam tracking accuracy. Further, through experimental results, it was shown that the proposed approach offers higher throughput as compared to a static oracle which updates the beam only after each SSB scan.
The beam tracking approach in [79] was further explored by Sarkar et al. in [80]. To better track mmWave channel variations, the impact of the forgetting and boosting factor was studied on beam tracking performance. It was concluded that adaptive Thompson sampling based MAB is highly sensitive to the forgetting factor value that depends on the UE mobility. To overcome this issue, an LSTM network was used to predict the optimal values of the forgetting factor. Simulation results based on the Lumos5G data set [128] indicate that by utilizing an adaptive forgetting factor the performance of adaptive Thompson-sampling-based MAB can be enhanced in terms of throughput and outage probability.

D. FEDERATED LEARNING
All of the surveyed works in Section III-A to III-C are based on traditional centralized ML techniques and require training data in a centralized data center collection by a central controller. However, due to privacy concerns and limited communication resources, it is often undesirable for the participating devices to transmit data to the centralized data center. These reasons have led to a growing interest in FL, which enables the participating devices to train an ML model without sharing the raw data [32].
Though FL has already been studied in various aspects of wireless communications [129], [130], [131], [132], the authors in [81] are pioneers to exploit it for BM in ultra-dense mmWave networks. The high density of smaller cells in such a network makes conventional BM methods highly complex and inefficient. To ensure reliable connectivity and to achieve high data rates, a DQN-based user-centric association scheme was proposed in [133]. However, the systematic BM approach in [81] uses FL framework to mitigate the need of centralized data collection and to ensure data privacy. In this approach, a double DQN on each mmWave small cell trains a local BM model on the cleaned data set and then shares the trained model features to the macro BS for aggregation. Due to data cleaning and model-based aggregation, the proposed approach ensures privacy protection while conserving wireless resources. Through simulation results it was shown that, in comparison to traditional approaches, the FL-based approach provides a better tradeoff between computational complexity and network throughput.

IV. EVALUATION OF MACHINE-LEARNING-BASED MILLIMETER WAVE AND TERAHERTZ BEAM MANAGEMENT STUDIES
In general, SL-based mmWave/THz BM techniques, either side-information-dependent or not, rely on extensive offline training requiring a large number of training samples. However, collecting such training data sets is often complicated and costly. Furthermore, depending on the environmental changes these samples need to be renewed and the underlying SL algorithm needs to be retrained for this updated data set. In addition, if both the BS and the UE are equipped with multiple panels and have more narrower beams (higher mmWave and THz band), then the offline learning process needs to be completed on all the BS and the UE panels for  all the beams. This makes the learning process quite complex and time consuming.
Side-information-assisted mmWave/THz BM techniques mostly rely on SL techniques and thus share their drawbacks. These techniques maintain a database of location bins and associated beam IDs or a database of low-frequency CSI to mmWave channel mapping. However, maintaining such a database for all possible locations and CSI may not be possible and the database needs to be updated continuously, which in case of SL indicates retraining the algorithm to capture the environmental changes. Utilization of other sensory information for BM purposes on the other hand requires a large amount of visual data collection. However, the transmission of collected data to the BS, e.g., from drones to the BS in [62] incurs an additional overhead. Moreover, being fully dependent on side information, these techniques suffer from inaccuracy in side information, e.g., errors in the sensory information or the estimated sub-7 GHz CSI.
existing RL based approaches for BM and tracking rely on simple RL algorithms, i.e., Q-learning and MAB. Due to simplicity of these algorithms, it is difficult to identify useful patterns and to make complex decisions. Hence, more efficient and intelligent RL algorithms need to be explored for the BM and tracking purposes.
For further evaluation, the data generation tools and the performance metrics used for the evaluation of the ML-based BM surveyed works has been furnished in Table 1A and  Table 1B. Here it can be observed that the surveyed works don't use a common performance measure to evaluate their proposed solution. This complicates the performance comparison of these approaches. Recently, 3GPP has agreed to use beam prediction accuracy (%) as a key performance indicator to evaluate the performance of ML-based BM solutions [134]. This standardization helps in fair comparison of newer ML-based BM solutions. Furthermore, Table 2 provides a summary of the key characteristics of existing mmWave and THz BM techniques discussed in Section II, while Table 3 enlists some parameters used in Table 2. Table 2 indicates that ML-based BM solutions provides a significant reduction in beam sweeping overhead. SL solutions offer high beam prediction accuracy as they go through an extensive offline training for a fixed environment. RL solutions on the other hand offer lower beam prediction accuracy as they have to explore several beams during online training. However, online learning capability of reinforced BM solutions makes them more generalizable and capable of environmental adaptability. Furthermore, to alleviate the need of centralized data collection and to ensure data privacy concerns federated reinforcement learning based BM solutions are more promising for wireless communications. Figure 8 compares the beam measurements overhead of the surveyed works against the traditional exhaustive beam sweep. To clearly visualize their impact on beam measurement overhead, the number of BS beams (M ) are varied from 16 to 128 while the number of UE beams (N ) is fixed to 16. Here for a fair comparison two scenarios are considered. In the first scenario, as shown in Fig. 8a, for multi-codebook based beam management approaches, the number of wider beams at the BS are kept fixed to 8 for [39], [44], [45], [71]. As a consequence, the resolution of wider beams stays fixed and the initial beam can be established with the same latency irrespective of the number of BS beams. However, increasing the number of BS beams results in fine narrower beams M W M . Furthermore, the subset of candidate beams for [40], [41], [43], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [68], [69], [73], [74], [75], [76], [77], [79], [80], [81] at the BS is selected as S M = M 8 . Consequently, the subset of candidate beams increases with increased number of beams at the BS which ensures that the beam prediction accuracy stays similar even with increasing the number of BS beams. For the second scenario, as shown in Fig. 8b, the number of narrower beams per wide beam are kept fixed to 4 for [39], [44], [45], [71]. This results in a higher number of wider beams (W M = M 4 ) by increasing the number of BS beams. Furthermore, the subset of candidate beams for [40], [41], [43], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [68], [69], [73], [74], [75], [76], [77], [79], [80], [81] at the BS is selected as S M = 8, which means that the ML model always predicts the eight best beams irrespective of number of the BS beams. Fig. 8b shows that this simplification results in reduced beam measurement overhead. However, it is worth mentioning that by fixing the subset of predicted candidate beams with the increased number of BS beams will have negative implications on the beam prediction accuracy.

V. EXISTING CHALLENGES AND RECOMMENDATIONS
Although ML techniques offer a significant performance gain over EBS, there are still several open challenges. In this section, we highlight these problems and suggest our recommendations for their resolution.

A. LACK OF STANDARDIZATION
After successful initial beam establishment, the main goal of BM is to ensure link stability by tracking the UE, which makes its performance dependent on the channel variations due to UE mobility. Therefore, to compare the performance of ML-based mmWave and THz BM solutions, it is crucial to have standardized mobility and rotation models for UEs. However, in the surveyed works, training and testing data sets are collected through different channel models, as shown in Table 1A and Table 1B, which makes it difficult to compare their performance. This necessitates the need of standardization for generating a training and testing data set with standardized channel models, UE mobility and rotation patterns. Furthermore, ML functionality can be enabled at either single side (BS or UE) or both (UE and BS). In the latter case, joint ML functionality is required. However, such functionality along with the required signaling between BS and UE still needs to be standardized. In addition, standardization bodies should also consider the interaction of ML models executed by different providers.
Recommendations: To resolve this problem, some inspiration can be taken from the ML community. It has taken standardized data sets which are used as a benchmark. For example, MNIST is a well-known database for image processing [135], which is used for training and cross validation of ML-based image processing techniques. Motivated by such databases, wireless researchers can create their own training and validation data sets covering a wide range of scenarios. Furthermore, researchers should also reach to a consensus on standardized UE mobility and rotation patterns for evaluation of BM techniques. Recently, 3GPP started its discussions for Release 18 and included AI-and ML-based mmWave BM as study item for standardization [136].

B. PARAMETER SPECIFICATION FOR DATA SET CONSTRUCTION
Performance of an ML model has a huge dependence on the type of data set used for training, validation, and testing. However, data set construction requires the specification of several simulation parameters and values which makes the data set specific and the ML model cannot generalize well over these parameters. This indicates that a large number of ML models are needed for different scenarios. These models are then switched as part of model life cycle management procedure [137]. Consequently, a large number of ML models require a higher memory and power consumption and may increase model switching overhead adding to another challenge for the BM framework. Furthermore, the data set distribution for training, validation, and testing needs to be identified to make a fair comparison between different ML models [137].
Recommendations: For a fair comparison, specifying the data set construction and distribution used for training, validation, and testing of the ML model may help. Furthermore, to reduce memory, power consumption, and model switching overhead simulation parameters may be subdivided into different categories as proposed in [137].

C. MULTI-AGENT COOPERATION AND DATA PRIVACY
To accelerate the learning efficiency of a single-agent centralized ML model, in a multi-agent environment several agents train the model by transmitting the raw data to a centralized data center or a cloud. The collected data is then used to train a generalizeable ML model for inference purposes. However, in a wireless communication environment transmitting such a large amount of raw data is highly resource inefficient and raises severe privacy concerns [138]. Most of the above surveyed works are based on centralized ML models and thus suffer from the above mentioned issues.
Recommendations: The limitations of centralized learning motivate the development of distributed learning frameworks that allow multiple-agents to use individually collected data to train a learning model locally. One of the most promising distributed learning frameworks is FL, that only shares the learned local model parameters with a centralized model aggregation server. In the context of BM, multi-agent FL is a promising way to realize an efficient and secure multi-BS BM framework where several BSs may cooperate with each other to enable delay intolerant handovers or to reduce inter-BS interference. To the best of our knowledge, [81] is the only contribution where FL has been exploited for BM. However, considering its potential, more research efforts must be devoted to utilize multi-agent cooperative FL for an enhanced BM framework that strengthens the privacy protection of users. Furthermore, the tradeoff between resource efficiency and the frequency of local model parameters sharing need to be studied in detail.

D. ENVIRONMENTAL ADAPTABILITY
One of the major challenges in any wireless communication system is to adopt to environmental changes caused by UE mobility and/or mobility in the surroundings. Most of the experimental results of BM and tracking techniques are obtained from a predefined environmental setup. Such results do not guarantee that the learning model in a given environment will also perform well in another environment. This necessitates the need of more generic ML models for BM which are less sensitive to environmental changes. However, as depicted by the no free lunch theorem of optimization, making a model too generic may lead to performance degradation [139]. Thus, it is still open to determine which ML model can result in better performance: a more generic ML model for various scenarios or multiple ML models for different scenarios.
Recommendations: A possible solution for this problem is to utilize both, offline and online learning approaches. For example site-specific codebook design in [39] can be first designed offline based on a simulation setup and then it can adapt online to learn environmental changes. The idea of combining SL and RL is not new in literature [140], [141], [142]. However, it has not yet been explored well for BM and VOLUME 11, 2023 FIGURE 9. Meta-learning for beam management that benefits from offline and online learnings along with optional side information (location and low-frequency CSI).
tracking purposes. Recently, there has been a growing interest in meta-learning, also known as learning-to-learn, where an ML algorithm learns from its experience to improve its future performance [97], [143], [144]. Using meta-learning for BM, as shown in Fig. 9, can lead to significant performance gains particularly with the changing environment where the ML model can stack its learning and use these learnings to adapt to the changing environment.

E. HIGH MOBILITY
6G is expected to operate at THz frequencies leading to the need of more smaller cells to benefit from beamforming gains. With reduced cell size, a UE with even moderate mobility will experience more frequent handovers to ensure link connectivity. With the current BM framework, this can lead to misalignment particularly for UEs with high mobility, e.g., UEs in high speed trains or on highways.
Recommendations: A solution to this problem lies in the utilization of historical data and contextual information. For example, a UE traveling to or from work usually follows a predefined route either on highways or on high speed trains. Thus, historical data of UE movement can be utilized to significantly reduce the BM overhead. Furthermore, highways and high speed train tracks usually have a fixed infrastructure around them. An ML algorithm can leverage this contextual information to predict the optimal beam sequence. In addition to historical and contextual information utilization, metalearning can also be used to deal with the high mobility scenario. With meta-learning, one ML algorithm can be trained offline for a given infrastructure around highways and high speed train tracks while the second ML algorithm can support online learning to adapt to a high mobility scenario.
More recently, the use of AI transformers [98] has gained significant interest in the field of natural language processing and computer vision. Similar to an RNN, a transformer model is capable of processing sequential information and learning long-term dependencies in the input data. However, in contrast to an RNN, it can process the entire input all at once resulting in an increased training speed. Another distinguishing aspect of transformers is their ability to quickly adapt to other tasks they have not been trained on, i.e., transfer learning [145]. Due to the promising capabilities of learning long-term dependencies and transfer learning, transformers can be leveraged to deal with the high mobility scenario of BM. However, in addition to studying meta-learning and transformers for mmWave and THz BM, the complexity of these techniques must also be analyzed against the performance gain.

F. MULTI-PANEL BS AND MULTI-PANEL UE
The advantages of multiple panels at BSs and UEs are twofold. First, multi-antenna diversity is one of the well-known techniques to deal with channel fading due to blockage or mobility [100]. Second, multi-antenna interference cancellation through directivity while maintaining the diversity order [101]. Equipping BSs and UEs with multiple panels brings a new challenge to the BM process. With the existing BM framework, EBS needs to be performed over all BS and UE panels which leads to significant latency. Furthermore, a BS and UE with multiple antenna panels needs to search all possible beams over all antenna combinations, which further increases the complexity. Once again this indicates that the existing BM framework is not suitable for multi-panel BSs and UEs. In all of the surveyed works, researchers only consider single-panel BSs and UEs to simplify the problem. However, to ensure link stability at mmWave and THz bands, it is necessary to equip the BSs and UEs with multi-panel antennas.
Recommendations: To resolve this problem, a BM framework that does not rely on EBS is more attractive. A possible solution is to utilize the relative position of antenna panels for the BM process. Furthermore, an ML model that can learn the mapping between multiple antenna panels can be employed at BSs and UEs. However, considering the power constraint at the UE side, the feasibility of such a solution needs in detail study. Moreover, multi-agent cooperation where multiple agents interact in a shared environment to achieve conflicting or common goals can be leveraged for BM with multiple panels.

G. SINR MAXIMIZATION
Narrower beams and small cells at mmWave/THz frequencies make inter-cell interference more erratic. In particular, the beam pointing to a local UE from a neighboring cell leads to severe interference, which further increases for cell edge users. Due to this severe interference, a cell edge UE may suffer from unnecessary handovers. Thus a BM process must be able to identify and avoid such interference to ensure stable link connectivity for cell edge users.
Recommendations: A promising solution in resolving the interference problem is to have a central node to organize the BM framework at the BSs. However, this may need information sharing between the BSs. An alternative to this centralized approach is the distributed approach, where information sharing can be enabled only in neighboring BSs. ML tools such as FL and LSTM can then be utilized for a distributed approach to minimize the amount of information sharing by predicting the possible interference occurrences based on the UE mobility pattern and then may take measures to avoid or cancel this interference.

H. BEAM MANAGEMENT AT TERAHERTZ BANDS
To benefit from more bandwidth, 6G is expected to operate at THz bands. However, to compensate the propagation loss at higher frequencies, it is necessary to have even narrower beams to benefit more from beamforming gains. Thus, moving to higher frequencies leads to larger codebooks, which further complicates the BM and tracking process. One can intuitively see that the current EBS based BM framework will be infeasible at higher frequencies as its overhead increases quadratically with the number of beams as shown in Fig. 8. Furthermore, the utilization of wider bandwidth at Thz bands may give rise to the beam split effect [11] resulting in an array gain loss.
Recommendations: A solution to this high overhead problem is to limit the beam search space through environmental perception, which can be achieved via integrated sensing and communication (ISAC). In general, the idea of ISAC is to jointly perform communication and sensing of the environment by collecting data from different distributed platforms (cars, drones, smart watches, mobile handsets, etc.) as shown in Fig. 10 [146], [147]. Furthermore, any other side information (UE location and low-frequency CSI) can also be included for environmental sensing. An ML model can benefit from ISAC to narrow down the beam search space based on its environmental perception. However, the use of centralized ML models, where collected data from various different platforms need to be transmitted to a central node, will not only cause additional data transmission overhead but will also raise severe privacy and security concerns, motivating the need for distributed learning, i.e., FL. The benefit of FL can be seen in two major aspects. First, FL ensures data privacy by never sharing the raw collected data. Secondly, the quality of the collected data can be ensured by testing the local models prior to aggregation [148]. In order to address the issue of array gain reduction due to the beam split effect, traditional approaches control the angular coverage of frequency dependent beams to track multiple users in each time slot resulting in reduced beam training overhead [11]. However, the overhead may still be quite significant for a highly mobile UE. To deal with this problem ML tools such as meta-learning and AI transformers can be leveraged as discussed previously in Section V-E.

I. UTILIZATION OF SIDE INFORMATION
Most of the side-information-assisted BM and tracking techniques surveyed in this work solely depend on the availability of side information. However, in case of unavailability of such information EBS may need to be performed. Furthermore, any inaccuracy in the side information (location, lowfrequency CSI) may lead to misalignment.
Recommendations: To resolve this problem, ML solutions that are less sensitive to the accuracy of side information are desired. For example, RL solutions can leverage online learning to reduce the beam search space in the first step. In the second step they can use this side information to identify the best beam pair or to further reduce the beam search space. Thus, hybrid solutions as shown in Fig. 9 are needed.

J. PERFORMANCE VS COMPLEXITY TRADEOFF
Another motivation to have a better BM process is the complexity. The ML approaches discussed above lead to better performance in terms of the number of beams to sweep. However, computational complexity and the response time of such approaches has not been studied in detail.
Recommendations: Due to the parallel computing capabilities of a graphical processing unit, it has been reported that the trained NNs can make predictions in milliseconds [149]. However, such evaluation is not available for ML algorithms in BM. For researchers, it is necessary to evaluate performance vs complexity tradeoff of the underlying ML-based BM framework. VOLUME 11, 2023

VI. CONCLUSION
The use of AI and ML techniques has gained significant attention for BM at mmWave and THz bands. Due to their capability of extracting and tracking nonlinear environmental characteristics, several AI-and ML-based BM solutions have been proposed in the literature. This article summarizes the key contributions of these solutions and compares them against the key characteristics of an ideal BM framework. Though existing AI-and ML-based BM solutions promise enhancements in terms of reduced beam measurement overhead and better beam tracking capability, there are still several open challenges. First and foremost, the lack of standardization for UE mobility and rotation patterns as well as the unknown data set distribution for training, validation, and testing makes the comparison of these techniques significantly difficult. Thus, there is a strong need for standardization activities for an enhanced AI-and ML-based BM framework. Furthermore, for accelerated initial beam establishment and for reduced beam measurement overhead, existing solutions rely on SL methods which are not very well generalizable and may result in severe performance degradation with environmental changes. In this regard, RL methods are more promising due to their online learning capability for better tracking of environmental fluctuations. However, most of the existing RL-based BM solutions rely on the simple RL methods and may suffer from slower convergence resulting in slower initial beam establishment. A possible solution for fast convergence is to use a pre-trained ML model that can be further fine tuned online by the RL methods. Moreover, for faster convergence and for privacy protection multi-agent federated RL can be used. Thus, ML-based BM solutions that share the benefits of SL, RL, and FL are more suitable. Furthermore, any kind of contextual or side information can also be utilized to enhance environmental adaptability and high mobility support.
Similar to the conventional BM framework, existing AI-and ML-based BM solutions resort to RSRP measurements for beam identification. This may lead to the selection of suboptimal beams due to no consideration of other measurements such as SINR, EVM, or BLER. Thus, the consideration of additional performance measures is crucial for the AI-and ML-based mmWave/THz BM framework. Another important aspect is the consideration of the complexity vs accuracy tradeoff. Surveyed works in this article indicate a significant reduction in the beam measurement overhead but the computational complexity and the response time of these techniques need to be studied in detail. Furthermore, AI-and ML-based BM techniques that reduce the beam sweeping overhead by predicting a smaller subset of candidate beams may result in the reduced beam prediction accuracy. Thus, for AI-and ML-based BM solutions it is necessary to investigate the beam sweeping overhead vs the beam prediction accuracy tradeoff.
To conclude, the existing BM framework can significantly benefit from AI and ML by learning dynamic environments and by adapting to unpredictable challenges in an intelligent and automated fashion. However, there are several open challenges that hinder the mature application of ML in BM at mmWave and THz bands. In addition to surveying the existing literature and identifying the open research challenges, in this work we also provide our recommendations that can help to mitigate these challenges. Our contribution in form of this survey can serve as a guideline for researchers to develop a better AI-and ML-based BM framework that fulfills the requirements of next-generation wireless communication networks. After more than one year at the Barkhausen Institut, Dresden, Germany, where he studied rateless codes in the context of multi-connectivity, he is currently the Research Group Leader with the Vodafone Chair and focuses on resilience of wireless communications systems. His research interests include flow-level modeling and the application of queuing theory on communications systems with respect to ultra-reliable low-latency communications.
GERHARD FETTWEIS (Fellow, IEEE) received the Ph.D. degree under the supervision of H. Meyr from RWTH Aachen University, in 1990. After postdoctoral work at IBM Research, San Jose, CA, USA, he joined TCSI, Berkeley, USA. Since 1994, he has been the Vodafone Chair Professor with Technische Universität Dresden. Since 2018, he has also headed the new Barkhausen Institute. In 2019, he was elected into the DFG Senate (German Research Foundation). He researches wireless transmission and chip design, coordinates 5G Lab Germany, has spun out 17 tech and three non-tech startups. He is a member of the German Academy of Sciences (Leopoldina) and the German Academy of Engineering (acatech).