Airborne Integrated Access and Backhaul Systems: Learning-Aided Modeling and Optimization

The deployment of millimeter-wave (mmWave) 5G New Radio (NR) networks is hampered by the properties of the mmWave band, such as severe signal attenuation and dynamic link blockage, which together limit the cell range. To provide a cost-efficient and flexible solution for network densification, 3GPP has recently proposed integrated access and backhaul (IAB) technology. As an alternative approach to terrestrial deployments, the utilization of unmanned aerial vehicles (UAVs) as IAB-nodes may provide additional flexibility for topology configuration. The aims of this study are to (i) propose efficient optimization methods for airborne and conventional IAB systems and (ii) numerically quantify and compare their optimized performance. First, by assuming fixed locations of IAB-nodes, we formulate and solve the joint path selection and resource allocation problem as a network flow problem. Then, to better benefit from the utilization of UAVs, we relax this constraint for the airborne IAB system. To efficiently optimize the performance for this case, we propose to leverage deep reinforcement learning (DRL) method for specifying airborne IAB-node locations. Our numerical results show that the capacity gains of airborne IAB systems are notable even in non-optimized conditions but can be improved by up to 30 % under joint path selection and resource allocation and, even further, when considering aerial IAB-node locations as an additional optimization criterion.

Abstract-The deployment of millimeter-wave (mmWave) 5G New Radio (NR) networks is hampered by the properties of the mmWave band, such as severe signal attenuation and dynamic link blockage, which together limit the cell range.To provide a cost-efficient and flexible solution for network densification, 3GPP has recently proposed integrated access and backhaul (IAB) technology.As an alternative approach to terrestrial deployments, the utilization of unmanned aerial vehicles (UAVs) as IAB-nodes may provide additional flexibility for topology configuration.The aims of this study are to (i) propose efficient optimization methods for airborne and conventional IAB systems and (ii) numerically quantify and compare their optimized performance.First, by assuming fixed locations of IAB-nodes, we formulate and solve the joint path selection and resource allocation problem as a network flow problem.Then, to better benefit from the utilization of UAVs, we relax this constraint for the airborne IAB system.To efficiently optimize the performance for this case, we propose to leverage deep reinforcement learning (DRL) method for specifying airborne IAB-node locations.Our numerical results show that the capacity gains of airborne IAB systems are notable even in non-optimized conditions but can be improved by up to 30 % under joint path selection and resource allocation and, even further, when considering aerial IAB-node locations as an additional optimization criterion.

I. INTRODUCTION
T HE commercial 5G New Radio (NR) cellular systems oper- ating at sub-6 GHz frequencies are already being deployed across the globe.However, the proliferation of millimeter-wave (mmWave)-based systems that promise to fulfill the 5G NR data rate requirements at the access interface is still in its early Nikita Tafintsev, Dmitri Moltchanov, and Mikko Valkama are with Tampere University, 33100 Tampere, Finland (e-mail: nikita.tafintsev@tuni.fi;dmitri.moltchanov@tuni.fi;mikko.valkama@tuni.fi).
Digital Object Identifier 10.1109/TVT.2023.3293171stages, with a few deployments worldwide [1].One of the reasons for the slower adoption of mmWave 5G NR systems is severe signal attenuation, constrained base station output power, and possible dynamic link blockage [2], [3].Propagation and blockage phenomena not only drastically limit cell coverage but also negatively affect service reliability [4], [5].Although the recently proposed techniques such as bandwidth reservation [6] and 3GPP inter-and intra-band multi-connectivity options [7], [8] promise to partially alleviate these challenges, mmWavebased systems inherently require dense deployments of 5G NR base stations.The latter may be infeasible due to extreme capital expenditures and unavailability of fiber connections to ensure wired backhauling.
To address this challenge, the integrated access and backhaul (IAB) technology recently proposed by 3GPP promises to deliver a flexible solution for network densification and thus decrease the capital and operational expenditures associated with mmWave 5G NR deployments [9].This is achieved by employing wireless backhauling to 5G NR base stations without fiber connectivity, named IAB-nodes.Performance improvements for both coverage extension and capacity boosting, as well as cost reductions brought by the use of IAB technology, have been demonstrated in several recent studies [10], [11], [12].Various cellular operators have already indicated interest in implementing IAB systems within their 5G networks [13].It is expected that IAB is to be employed in up to 10 − 20 % of 5G sites [14].
The application of IAB technology in dense urban areas is, however, challenging [15].This is because the efficient utilization of mmWave-based IAB-nodes requires line-of-sight (LOS) conditions over the wireless backhaul channel, i.e., between IAB-node and IAB-donor, as well as over the access channel, i.e., between IAB-node and user equipment (UE).From the network operator's perspective, this poses a significant issue in dense urban areas, which are characterized by limited base station mounting locations and high building density.Further, both channels suffer from harsh propagation conditions at mmWave frequencies.At the access interface, human and vehicle blockers may occlude the propagation path between the IAB-node and the UEs, thus resulting in either data rate drop or service outage.
The use of unmanned aerial vehicles (UAVs), also known as drones, as IAB-nodes is one of the alternative options for improving mmWave-capable IAB deployments in dense urban areas as suggested in [16], [17].The main advantages stem from Fig. 1.IAB-based system topology.
higher deployment altitudes and better potential for dynamic navigation.These functionalities may decisively benefit the LOS conditions for both the access and the backhaul channels of the mmWave system.However, the usage of UAV-mounted IABnodes brings along additional challenges to system designers.Particularly, ensuring optimized performance already in terrestrial mmWave-based IAB systems may require mixed-integer non-linear programming (MINLP) formulations, as further discussed in Section IV.An additional degree of freedom related to the less constrained deployment of airborne IAB-nodes complicates the problem further by calling for new efficient low-complexity solutions demanded by network operators.
The main goal of this study is to systematically characterize the coverage and capacity gains provided by different IAB deployment options beyond the 3GPP baseline.To this aim, we conduct a detailed performance analysis and optimization campaign by assessing various implementation alternatives for mmWave-based IAB networks.Departing from the conventional terrestrial IAB deployments, we proceed with airborne UAVbased IAB layouts.First, we solve the end-to-end UE throughput optimization problem for static IAB networks, without dynamic relocations of IAB-nodes, and show that this is an MINLP formulation, which is known to have exponential complexity.Second, to solve the location optimization problem for airborne IAB-nodes, we continue by proposing a deep reinforcement learning (DRL)-based method.
The main contributions of this article are as follows.
r We formulate and solve a joint resource allocation and path selection optimization problem for the IAB system with predetermined locations of IAB-nodes by taking into account multi-beam and multi-connectivity capabilities, as well as half-duplex constraint.
r We further account for the deployment of airborne IAB- nodes and apply the Deep Deterministic Policy Gradient (DDPG), a DRL algorithm that efficiently solves the optimization problem at hand with an additional degree of freedom -locations of airborne IAB-nodes.Our solution is capable of determining these by simultaneously ensuring efficient resource allocation, appropriate path selection, and low complexity beneficial for system operators.
r To validate our solution methods, we conduct a system- level performance evaluation campaign to compare the alternative IAB network densification strategies under various environmental conditions and system parameters.The rest of this article is organized as follows.We start with the technological aspects of IAB networks and UAV capabilities in Section II.Then, we introduce our system model in Section III.Joint path selection and resource allocation problem is detailed in Section IV.Further, DRL algorithm is presented in Section V. Finally, the considered deployment schemes and selected numerical results are provided in Section VI. Conclusions are drawn in Section VII.

A. IAB Technology
IAB was introduced as a study item for 5G NR within the scope of 3GPP Release 15.Currently, the enhancements are being ratified for Release 18.
IAB technology uses 5G NR capabilities not only for access links between base stations and UEs but also for backhaul links between base stations.IAB supports both sub-6 GHz and mmWave spectrum and can operate in standalone (SA) or non-SA (NSA) regimes [9].In practice, IAB is highly relevant for mmWave, where backhaul links can leverage larger volumes of spectrum and further benefit from massive beamforming.Also, 3GPP considers IAB networks under both in-band and out-of-band operating modes.In the in-band case, access and backhaul links are multiplexed within the same frequency band, whereas in the out-of-band case, access and backhaul links use separate frequency bands.Our main focus here is on the in-band operating mode, since it allows to more efficiently utilize the scarce spectrum resources.
The in-band operating mode involves the need for multiplexing both the access and the backhaul traffic within the same frequency band, which requires half-duplex operation.Hence, the radio resources need to be divided orthogonally between the access and the backhaul links by using a centralized or decentralized coordination mechanism.As considered in [9], IAB is expected to rely on time-division multiplexing (TDM).A TDM network is configured with a pattern for the time domain allocation of downlink (DL) and uplink (UL) resources, as discussed further.
The IAB architecture is designed to reuse the existing 5G NR functions and interfaces to minimize the impact on the core network.In particular, IAB systems utilize Central Unit (CU) and Distributed Unit (DU) split architecture, which enables efficient multi-hop support (see Fig. 1).Specifically, each IAB-node has a Mobile Termination (MT), which is used for wireless backhauling toward IAB-donor, and a DU, which is employed for connectivity with the UEs.The CU at the IAB-donor handles all the control and upper-layer operations, while the time-critical and lower-layer functionalities are located at the IAB-nodes.The IAB-nodes connect to the IAB-donor via the NR F1* interface that serves for backhaul connectivity and topology formation.They can periodically transmit information on the traffic load and channel quality of their backhaul links.This enables the CU to form the topology of an IAB network, which can then be dynamically adapted to maintain service continuity.
Academic research on wireless backhauling proliferated recently.In [12], the authors investigated the benefits and challenges of the IAB technology.They studied the robustness of IAB systems to blockage, weather conditions, tree foliage, and rain, as well as compared the performance of IAB networks with that of hybrid IAB/fiber-backhauled systems.In [18], the authors studied the effects of deployment optimization on the coverage of IAB networks with constrained IAB-node placement.Spectrum allocation for access and backhaul links in a single-hop IAB setup was addressed in [19].The authors proposed a DRL-based framework to control real-time spectrum allocation in different scenarios.Notably, that work assumed that IAB-nodes are capable of full-duplex operation.
The authors of [20] developed resource allocation algorithms to maximize weighted sum-rate performance in multi-hop IAB networks.They also proposed an approach to numerically establish optimal IAB-node locations by leveraging the proposed resource allocation solutions.In [21], a semi-centralized resource partitioning scheme for IAB networks was contributed.The authors developed an algorithm based on the maximum weighted matching, the complexity of which is linear in the number of IAB-nodes.The authors of [22] considered backhaul bandwidth partition strategies in mmWave-capable IAB systems.They proposed an analytical framework to investigate various options for bandwidth partitioning.
The challenge of path selection and rate allocation in selfbackhauled mmWave networks was addressed in [23].The authors proposed a multipath scheduling scheme by incorporating latency constraints and traffic splitting techniques.In [24], traffic forwarding strategies for mmWave-based IAB networks were investigated.The authors considered various path selection schemes and compared their performance.The authors of [25] solved the topology formation problem as a graph optimization problem using a combination of DRL and graph embedding.None of these past studies, however, addressed the problem of joint path selection and resource allocation by capturing the features of airborne mmWave-based IAB deployments.
For completeness, we also note that IAB is not the only option aimed at improving the mmWave 5G NR system performance.For example, the concept of cell-free networks has recently been introduced [26].Such networks, also known as distributed antenna systems, target to enhance the system performance by eliminating traditional cell boundaries and by jointly processing signals from multiple distributed access points.Both IAB and cell-free concepts offer unique advantages and can be used in different scenarios depending on the specific requirements and constraints of a radio network deployment.

B. UAV Support
Vendors and network operators are increasingly interested in leveraging UAVs across numerous emerging applications [27], [28].As their attention grows, the standardization community aims to improve the support for UAVs by augmenting their wireless capabilities via cellular systems.In Release 17, 3GPP introduced system architecture enhancements to support efficient integration of airborne networks [29].These include various services, such as identification, authentication, authorization, and tracking of UAVs.Release 18 is expected to further improve co-existence with terrestrial users.
Going forward, 3GPP does not exclude mobile IAB-nodes in its future releases as per [9].Mobile relay nodes were already studied in the past by 3GPP Release 12 [30].As part of initial studies for Release 18, a recent 3GPP technical report introduces several 5G use cases and requirements for mobile relays mounted on vehicles [31].Also, 3GPP Release 18 work on IAB [32] introduces enhanced operation with mobile IABnodes.Even though UAV-based relay nodes may not be a part of the Release 18 discussions, 3GPP is likely to extend the support for UAVs toward new areas and emerging use cases in the forthcoming studies.In our work, we go beyond today's 3GPP perspective and consider IAB-enabled UAV nodes capable of providing radio access to the UEs.The integration of mmWave communications with IAB and UAV-assisted wireless networks inherits multiple challenges.These include mmWave channel propagation characteristics, fast beamforming training and tracking, directionality, and various mmWave blockage effects.
Different UAV networks have been extensively studied recently.The respective research showed that by utilizing UAVs as IAB-nodes, significant improvements can be achieved in the connectivity, coverage, and capacity of wireless systems [33], [34].Also, moving nodes may enhance the topology flexibility and optimize the number of hops in a radio network [35].The authors of [36] analyzed the coverage gains of utilizing UAVs as hovering relays in mmWave-capable IAB scenarios.They used a ray-tracing simulation tool and demonstrated an improved coverage as compared to the baseline scenario solely with an IABdonor.In [37], the authors optimized UE resource allocations, their associations, and UAV-mounted base station placement in an in-band IAB system.They proposed a sub-optimal solution by iteratively solving the UE association problem and the UAV placement problem.
The authors of [38] proposed an optimization framework for deployment and mobility control of a number of UAVs for energy-efficient communications.Further, in [39], they offered a framework to benefit the services delivered to users based on the maximum possible hover time.In [40], a mission-critical scenario was considered, wherein UAV base stations employing IAB technology were deployed to provide coverage for users.The authors proposed to jointly optimize the antenna configurations and the 3D locations of UAVs by utilizing a DRL algorithm.An approach for improving UAV coverage and connectivity by using DRL was proposed in [41].However, feasible optimization techniques were not investigated for UAV-based IAB networks.Addressing the indicated gap, this work offers a method that can handle complex formulations having multiple airborne IABnodes equipped with mmWave capabilities.

III. SYSTEM MODEL
In this section, we introduce our system model by specifying its components.We start with the target scenario and the considered deployment options, including terrestrial and airborne Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.IAB systems.Then, we proceed by introducing the propagation and blockage models, as well as the IAB network procedures.Finally, we present our metrics of interest.

A. Deployment and Connectivity Models
We consider an IAB-based heterogeneous scenario with cellular macro and micro layers as specified by 3GPP in [9].We assume that the layout has a constant number of active UEs uniformly distributed within the cell, where each UE generates elastic traffic by employing a full-buffer traffic model.In this case, UE data rates vary according to the network conditions and the buffers of packet flows always have unlimited amounts of data to transmit.We consider an orthogonal frequency-division multiple access (OFDMA) system, where the total available bandwidth B is divided across sub-carriers, while the time slot duration is Δ.
Continuous variables x np represent time-frequency resources and realize a part of the demand of UE n over path p. Binary variables u np ∈ {0, 1} associated with each of the variables x np indicate if path p is selected.The height of UEs associated with pedestrians is assumed to be h U , while the height of IAB-donor is h D and the height of IAB-nodes is h N .To model the antenna array setups for IAB-donor, IAB-nodes, and UEs, we employ planar uniform rectangular antenna arrays with N V × N H elements by following the evaluation assumptions in [9].
In our study, we consider and compare several IAB deployment options as outlined below.
r Baseline: This is a conventional cellular deployment option for 5G NR.It is characterized by the inter-site distance, R, and may be utilized in what follows for benchmarking purposes.r Terrestrial IAB system: This deployment considers the use of terrestrial IAB-nodes mounted on the typical sites, e.g., building walls or lampposts (see Fig. 2(a)).In addition to the inter-site distance R, this layout is characterized by the number of IAB-nodes following the 3GPP guidelines in [9], i.e., IAB-nodes are assumed to be randomly and uniformly distributed over the area.r Airborne IAB system: Finally, as the airborne deployment option, we consider UAV-mounted IAB-nodes that can be freely positioned in 3D space over the area of interest (see Fig. 2(b)).Similar to the terrestrial IAB deployment, this layout is characterized by the number of airborne IABnodes.

B. Propagation Models
To capture mmWave radio propagation, we utilize the 3GPP Urban Macro (UMa) channel model for IAB-donor-to-UE and IAB-donor-to-terrestrial IAB-node interfaces, and the 3GPP Urban Micro (UMi) street canyon channel model for terrestrial IAB-node-to-UE interfaces [3].Accordingly, the LOS path loss measured in dB is given by with the following unknowns where where h E is the effective environment height.The non-LOS (NLOS) path loss is given as where the unknown L T N is estimated for the UMa model as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Accordingly, for the UMi model, L T N is formulated as

5). (5)
To characterize the radio propagation with aerial IAB-nodes, we use the 3GPP UMa with aerial vehicles (UMa-AV) channel model for IAB-donor-to-aerial IAB-node interfaces [42].Correspondingly, the LOS path loss measured in dB is given as while the NLOS path loss measured in dB is Similarly, to model the path loss for aerial IAB-node-to-UE interfaces, we employ the 3GPP Rural Macro (RMa) channel model represented as where is the breakpoint distance, while the unknowns are calculated as − min 0.044 h1.72 , 14.77 + 0.002d 3 log 10 h, where h is the average building height.Finally, the NLOS path loss in the RMa model is with the unknown component L UAV RMA, N set as where W is the average street width.

C. Blockage Models
In our study, we explicitly consider two types of blockage: (i) blockage by large-scale stationary objects such as buildings and (ii) blockage by small-scale dynamic objects such as humans.On the backhaul side, only the former type is relevant.Specifically, for terrestrial IAB-nodes, the LOS probability at the 2D distance d 2 for the UMa model is while for the UMi model it is For the backhaul interfaces at airborne IAB-nodes, we capture possible link blockage by using the UMa-AV model 1 in the form where the variables p 1 and d are given as Similarly, for the RMa, the LOS probability is given by In contrast to the backhaul links, both types of blockage are relevant at the mmWave access interfaces.In particular, to model the blockage by large-scale stationary objects for terrestrial systems, we utilize (14), as well as (17) for aerial IAB-nodes.The human-body blockage at the access interface is captured by employing the blockage framework from [43], where human bodies are modeled as cylinders.We assume 15 dB of additional blockage-induced impairment.Particularly, the human-body blockage probability is given by where λ p is the density of pedestrians, h p and r p are their height and radius, respectively, h B is the height of base station, and h U is the height of UE.The overall blockage probability can then be calculated as where P LOS is the LOS probability specified in ( 14) and ( 17).
As IAB functionality has been included in the recent release of 3GPP NR specifications, we assess the performance of the considered schemes by following the corresponding evaluation assumptions.Particularly, to achieve compliance with the 3GPP IAB specifications, we employ the channel models given in [3] and [42].However, the developed framework is not limited to 3GPP propagation and blockage considerations.Therefore, these can be extended appropriately.For example, [44] provides an overview of mmWave propagation parameters and channel models.Further, instead of the 3GPP blockage model, one can utilize ITU-R [45] formulations or a recently proposed model in [46].

D. Network Procedures
In this study, we consider in-band IAB operating mode, which assumes the utilization of the same frequency band for both backhaul and access links.In such networks, where DU and MT parts of an IAB-node function over the same spectrum, it is necessary to ensure coordination of time-domain resources between the DU and the MT.The time-domain coordination is essential to avoid a full-duplex mismatch, where the transmissions to be received by the MT are severely interfered by the DU transmissions, as well as DU reception is interfered by MT transmissions.
Accordingly, the overall transmission duration may be divided into two time intervals (see Fig. 3), where we assume the following pattern: during the first time interval, IAB-nodes are in the receiving state; in the second time interval, they switch to the transmitting state.We also require that the IAB-donor operates in a multi-beam mode, where multiple simultaneous beams can be created, while each radio of the IAB-nodes supports only one beam.We consider the system in its steady state, where each UE is to follow the procedure specified in [9] to register with the network by establishing a control channel to the IAB-donor or IAB-nodes.Once a connection is established, the association point may be changed upon an explicit request from the CU.

E. Metrics of Interest and Approaches
In our work, we primarily concentrate on user-centric metrics, including the average UE data rate and outage probability.To improve these parameters, we consider two options: (i) conventional optimization and (ii) DRL-aided approach.The former is capable of jointly optimizing the routing paths and the resource allocations.Particularly, it is applicable when the locations of IAB-nodes are fixed and do not change throughout the IAB network operation.However, the approach itself leads to an MINLP problem, which is known to be N P-complete.
To address the features of airborne IAB deployments, which goes beyond the capabilities of classical optimization methods, we then develop a DRL-aided solution that is suitable for the optimization of dynamic IAB systems.Our framework determines airborne IAB-node locations and then optimizes the routing paths and the resource allocations, which permits a practical implementation.We also assume a certain navigation strategy of the aerial IAB-nodes, where they take a 'snapshot' of the current UE channel qualities and then update their positions according to the proposed algorithm as detailed below.

IV. JOINT PATH SELECTION AND RESOURCE ALLOCATION
To formulate our target optimization problem, we consider a single-cell network topology comprising M IAB-nodes with index m = 1, 2, . . ., M, see Fig. 4. Also, we denote by N the number of UEs in the system with index n = 1, 2, . . ., N. The UEs may establish connections with the IAB-nodes or the IAB-donor.To differentiate between these groups of UEs, we consider two subsets: X 1 and X 2 , which consist of UEs connected to IAB-nodes and to IAB-donor, respectively.Their data rates are expressed in bits per second (bps) and denoted as H n , n = 1, 2, . . ., N. The values of H n are not known in advance due to the assumption on full-buffer traffic demands.
We further denote by p the candidate paths for UE n, i.e., p = 1, 2, . . ., P n + 1, where P n + 1 is the total number of the available paths.Note that in the considered IAB system, a path may consist of several links.Therefore, there are two types of paths: those containing a link to the IAB-donor and those traversing through the IAB-nodes and comprising two links.To differentiate between these paths, we assume that the last path in the set is always the one to the IAB-donor.
In our optimization problem, there are two types of variables.First, to represent the radio resource allocations, let be continuous variables implementing a part of the demand of UE n over path p.These terms are measured in Hz • s and represent time-frequency resources.Second, we introduce binary variables u np ∈ {0, 1} associated with each of the variables x np .These integer terms are binary decision variables that receive value 1 if a certain path is selected, or 0 if not.The multiplication of x np u np leads to an MINLP problem, which is known to be N P-complete [47].First, we determine the achievable UE data rates, which ensure that all of the demands are fully realized by using the variables x np and u np .The physical meaning of the UE demand may be interpreted as the UE data rate.For the subset of UEs in X 1 , these are defined as where Δ is the time slot duration, s np is the spectral efficiency of using path p for UE n.The latter can be calculated as where S np is the signal-to-interference-plus-noise ratio (SINR) of the access link.
Similarly, the data rates for the subset of UEs X 2 are given as Further, the link constraints are defined as where u np is a binary variable, k n is a predetermined maximum number of paths for UE n.Note that the constraint in (24) ensures that only k n binary variables associated with a given UE n are equal to 1 and, together with the constraint in (21), implies that the paths corresponding to non-zero binary variables carry all the traffic demand.IAB-donor and each IAB-node have limited time-frequency resources at the radio access interface.This fact is reflected in the following capacity constraints n∈X 1 ,p=m where B is the total bandwidth.Note that (25) does not incorporate the backhaul limitations.Finally, we introduce the backhaul data rate constraints as where s m is the spectral efficiency of the backhaul links from IAB-donor to IAB-nodes that can be calculated similarly to the spectral efficiency in (22).We further introduce our objective function.An objective function can be formulated for various fairness criteria, e.g., max-min or proportional fairness.To derive a solution for max-min fairness, the objective function takes the following form This problem can alternatively be expressed by maximizing an additional variable z that is a lower bound for each of the data rates as The problem in question can be solved by using exact algorithms such as branch-and-cut or branch-and-bound [48], [49].Approximation approaches such as simulated annealing or evolutionary algorithms are also applicable [47].For a solution algorithm, we utilize here the advanced process optimizer (APOPT) algorithm of the GEKKO optimization suite as MINLP internal solver [50].APOPT is an active-set sequential quadratic programming solver that employs the branch-and-bound method and the warm-start approach to speed up successive nonlinear programming solutions.

A. Preliminary Background
The challenge of allocating the time-frequency resources for both fronthaul and backhaul traffic, while at the same time exploiting the spatial diversity enabled by airborne IAB-nodes, is the core of this section.Each IAB-node needs to be able to (i) offer the best possible communication performance to the ground users under its coverage, (ii) relay user traffic to the IAB-donor, and (iii) position itself so that the overall cell capacity is maximized.Aside from the inherent complexity of this problem, the fact that the channels between the IAB-node and the UEs are not known beforehand makes the use of traditional optimization methods challenging, as they typically rely on pre-determined system parameters and strong assumptions about the environment.A solution in which the IAB-donor decides on the IAB-node locations in advance may be ineffective to counteract the inherent system dynamics [51].
Reinforcement Learning (RL) is a well-known data-driven approach for learning via exploitation-exploration strategy, which supports adaptive control policies for dynamic systems [52].
The key aspect of RL involves modeling the system at hand as a Markov Decision Process (MDP).The optimal course of action for a learning agent to transition from any state to the best one is then determined by interactions with its environment.This is achieved by maximizing the expected rewards.
In this work, we adopt an actor-critic DRL method that helps airborne IAB-nodes choose where to position themselves to improve the total cell capacity.DRL algorithms incorporate deep learning to solve the task at hand, where a neural network is used to approximate a value or a policy function.They have been used successfully for optimizing resource allocations in dynamic networks [53].Hence, DRL methods may provide a promising solution when the state space or the action space are excessively large to be fully known.Their ability to learn from experience allows them to adapt to the changing network conditions and potentially discover novel strategies that might not be considered by a predefined optimization framework.In the following subsection, we detail how the proposed solution operates.

B. Proposed DRL-Based Method
In this section, we present our proposed DRL-based method for airborne IAB-node deployment.The foundation for our scheme is a state-of-the-art actor-critic approach known as Deep Deterministic Policy Gradient (DDPG).Conventional actorcritic solutions leverage two neural networks: (i) actor-network and (ii) critic-network.The actor-network aims to approach the optimal policy given the current and past observations of the actor's states and chooses the most appropriate action to maximize a sum of the expected rewards.The critic-network evaluates the chosen action value.It does so by taking the current state as input and outputting a value estimate.It is then used to update the policy by adjusting actor's parameters with the aim of improving the policy performance.In this context, the basic DRL technique is deep Q-learning, which employs a Deep Q-Network (DQN) to estimate the Q-value for each state-action pair.However, DQN is designed for limited discrete action spaces.In contrast, DDPG is suitable for continuous action spaces and combines ideas from DQN and Deterministic Policy Gradient (DPG) by enabling experience replay and slow-learning target networks [54].
The objective of a DRL agent is to maximize the aggregate data rate in a cell based on the IAB-node backhaul link quality, its load, and the coverage probability that it experiences at each location.The DRL agent thus observes the states of the IAB network, finds the best action, and deploys it by sending commands to move the airborne IAB-nodes around.Below, we proceed to specify actor-critic networks, states, actions, and rewards utilized by our algorithm.
1) The DRL Agent.States, Actions, Rewards: Each RL solution requires a definition of its states (observable parameters), actions (controllable parameters), and rewards.In this work, the states are introduced as follows: The state space at each epoch t consists of three parts: ] has a cardinality of 3 M .We note that the DRL agent makes its decisions essentially based on the current load and the distance to the IAB-donor.
The action space comprises two parts, where both variables assume continuous values: Finally, the reward at decision epoch t is defined as where the reward function r t measures the instantaneous performance of all the UEs connected to each IAB-node and the backhaul link quality.Practically, as the UEs connect to their nearest IAB-node, every time an action is taken, two effects occur: r a new set of UEs connects to the IAB-node; r the instantaneous data rates of the served UEs are recorded, by also considering the backhaul performance, which is determined by the spectral efficiency of the backhaul link between the IAB-node and the IAB-donor.
2) The Critic Network: The critic is implemented as a DQN, while we denote its Artificial Neural Network (ANN) as Q(s, a|θ Q ), where θ Q are the weights of the critic's ANN.The critic network is updated by minimizing a loss function in the following form where y t is the target value that can be obtained as where γ is a discount factor.
3) The Actor Network: The actor is implemented as a parametric function π(s|θ π ), where θ π represents the actor's ANN weights.The actor policy is updated by computing the gradient as 4) The Actor-Critic Neural Network: In our implementation, we utilize a two-layer fully-connected feedforward neural network for the actor-network, which incorporates 400 and 500 neurons in the first and the second layer, respectively, and has rectified linear unit (ReLU) as its activation function.Similarly, for the critic-network, we employ a two-layer fully-connected feedforward neural network with 400 and 300 neurons in the hidden layers, and having ReLU as the activation function.These ANNs are then implemented by using TensorFlow 2.4.Our DRL-based method is formally presented in Algorithm 1.

VI. NUMERICAL RESULTS
In this section, we systematically compare the coverage and capacity performance of alternative IAB implementation strategies.To this aim, we start by introducing these strategies based Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Perform action a t , obtain s t+1 and r t 7: Store transition sample (s t , a t , r t , s t+1 ) 8: Update the weights Update the weights θ π of π(•) 12: Update the target ANNs 14: end for 17: end for on the deployment options specified in Section III and the optimization frameworks developed in Section IV and Section V.Then, we proceed by comparing the baseline scenario against the terrestrial and airborne IAB deployments by using our joint path selection and resource allocation methodology introduced in Section IV.Finally, we characterize the gains in the airborne IAB layout after the optimization of IAB-node locations, routing paths, and resource allocations.
To reflect the technology options usable by system operators for the purposes of network densification, we consider several potential strategies characterized by different types of IAB deployments as well as by various optimization algorithms.The deployment strategies are summarized in Table II.Initially, in Section III, we introduced the deployment alternatives for network densification, including terrestrial and airborne IAB systems.These strategies, despitebeing N P-complete, can both benefit from the joint path selection and resource allocation framework proposed in Section IV.However, in the case of airborne IAB systems, it does not allow for the optimization of IAB-node positions.Therefore, to extend it and adjust the airborne IAB-node locations, while keeping the complexity of the resultant solution feasible, we developed the DRL-based algorithm described in Section V. Finally, we added an exhaustive search option to the pool of our considered strategies.
The default system parameters utilized throughout this section are collected in Table I.In what follows, we also consider different IAB deployment areas as an additional parameter of interest.To offer accurate parametrization, we rely upon realistic measurements of radio deployments provided by ITU-R in [45].Particularly, we differentiate between four distinct area types: (i) suburban, (ii) urban, (iii) dense urban, and (iv) highrise urban.We derive the relevant parameters of the considered urban deployments from the density of buildings, the fraction of land covered by them to the total area, as well as the variable representing the height distribution.These parameters are summarized in Table III.

A. Conventional Optimization
Since one of the main targets of the IAB technology is to improve 5G NR coverage in complex environments such as city centers, we start by investigating the introduced densification strategies with respect to the coverage extension capability by utilizing the optimization framework developed in Section IV.To this end, Fig. 5 demonstrates UE outage probability depending on the type of the deployment area for Baseline; Ter-IAB, RSRP; Ter-IAB, PS+RA; UAV-IAB, RSRP; and UAV-IAB, PS+RA densification strategies given the cell radius of R = 300 m, 3 IAB-nodes, 30 UEs, and the uniformly distributed UAV altitude in the range of (25,50) m.
Analyzing the obtained results, one may observe that all the densification strategies lead to higher outage probability as the deployment area becomes denser in terms of the number of buildings, while buildings themselves become higher, see Table III for the parameters of the environment.This behavior is caused by link blockages at both access and backhaul interfaces that occur much more frequently in the deployments with larger average building heights.Further, considering non-optimized  deployments, Ter-IAB, RSRP and UAV-IAB, RSRP, one may notice a similar trend, i.e., denser deployments yield worse performance in terms of the outage probability.
The gain obtained by densifying the system with IAB capabilities is, however, different and heavily depends on the considered deployment area.Particularly, it amounts to 50 % for suburban areas and diminishes to only 9 % in highrise urban cases.It is important to note that the benefits of utilizing the airborne densification strategy are rather marginal for all the considered deployment options although they are significantly higher in suburban conditions as compared to urban and highrise urban alternatives.This is explained by the fact that the outage probability is mostly determined by link blockage that is likely to occur in denser deployments and over the areas with taller buildings.Finally, we do not observe any gains in terms of the outage probability when densification is performed in an optimized manner, i.e., for Ter-IAB, PS+RA and UAV-IAB, PS+RA strategies.This is because the optimization routine does not affect the locations of IAB-nodes in both cases.
We now proceed with the average UE data rate assessment, which is shown in Fig. 6 for Baseline; Ter-IAB, RSRP; Ter-IAB, PS+RA; UAV-IAB, RSRP; and UAV-IAB, PS+RA densification strategies given the cell radius of R = 300 m, 3 IAB-nodes, 30 UEs, and the uniformly distributed UAV altitude in the range of Fig. 7. Outage probability as a function of network cell size.(25,50) m.Here, we see that the average UE data rate increases for the baseline scheme as we densify the deployment area.The rationale is that number of UEs experiencing outage increases, see Fig. 5, and thus the UEs in the connected state receive a higher share of the radio resources.Understanding the presented results further, one may notice a similar trend for the non-optimized densification strategies, Ter-IAB, RSRP and UAV-IAB, RSRP, which implies that the data rate should not be considered separately from the outage probability.Here, the airborne IAB deployment characterized by a lower outage probability is associated with a smaller increase in the considered parameter as more nodes reside in the connected state.
In Fig. 6, the particularly important results for the system operators are those related to the optimization-based densification, where CU may control the paths and allocate the radio resources across the network in an optimized manner.As one may infer from the presented results, Ter-IAB, PS+RA allows for obtaining at least 30 % improvement on top of the non-optimized strategy Ter-IAB, RSRP.The average UE data rate gain when utilizing UAV-IAB, PS+RA scheme is somewhat lower as this strategy is characterized by a smaller outage probability and thus a higher number of UEs in the connected state.Notably, these relative gains remain similar for all the considered deployment areas.Also, as one may learn, the relative increase in the average UE data rate when addressing denser deployment conditions is similar for UAV-IAB, PS+RA and UAV-IAB, RSRP strategies and for Ter-IAB, PS+RA and Ter-IAB, RSRP options.This is due to the fact that they have nearly the same outage probabilities.
Another essential parameter for the network operators is the cell size.To reflect its effect on the considered densification strategies, Figs.7 and 8 illustrate the outage probability and the average UE data rate for varying IAB-donor numbers in dense urban conditions, 3 IAB-nodes, and 30 UEs.Assessing the reported results, we observe that the outage probability decreases dramatically with higher IAB-donor density.However, even for higher values of the deployment density, it remains relatively large for Baseline scheme, where no IAB-nodes are utilized.The use of terrestrial IAB-nodes in the system yields a corresponding decrease in the outage probability.However, the benefit of utilizing an airborne IAB deployment is greater.Proceeding with the data rate evaluation, we conclude that an increase in the density of base stations (gNBs) leads to a higher average UE data rate.Employing IAB-nodes in non-optimized case with Ter-IAB, RSRP and UAV-IAB, RSRP densification strategies brings along a significant data rate boost.As one may observe, this benefit grows with increased gNB density and reaches 83 % and 95 % for Ter-IAB, RSRP and UAV-IAB, RSRP schemes, respectively.At the same time, the relative gain of utilizing airborne IAB-nodes is rather moderate and remains nearly constant for the considered deployment density.However, a much more profound improvement comes from the deployment optimization by using Ter-IAB, PS+RA and UAV-IAB, PS+RA schemes.These gains slightly increase with the growing gNB density and reach 128 % and 150 % for 25 gNB/km 2 by applying Ter-IAB, PS+RA and UAV-IAB, PS+RA strategies, respectively.

B. DRL-Based Method
The results demonstrated above clearly indicate that IABbased network densification allows for achieving considerable performance gains in terms of both outage probability and average UE data rate.Moreover, the use of joint path selection and resource allocation improves these metrics even further.However, in aerial IAB deployments, the 3D locations of airborne IAB-nodes are an additional parameter that affects the system performance.We, therefore, investigate the effects of this parameter by applying our DRL-based approach developed in Section V, which is capable of optimizing the airborne IAB-node positions.
A crucial design parameter for the system operators is the number of IAB-nodes inside the coverage area of an IAB-donor, which directly affects the degree of network densification.Observe that this case represents another type of network densification as compared to an increase in the gNB density, since it maintains the number of IAB-nodes within the coverage of an IAB-donor constant as illustrated in Figs.7 and 8. To this aim, Figs. 9 and 10 demonstrate the outage probability and the average UE data rate depending on the number of IAB-nodes for the cell size of R = 300 m, dense urban environment, 30 UEs, and four of the considered schemes, Baseline; UAV-IAB, PS+RA; UAV-IAB, A2C, LOC; and UAV-IAB, A2C, LOC+PS+RA.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Analyzing the reported data, one may observe that the outage probability decreases as the network is being densified, i.e., more IAB-nodes are being added inside the coverage of the IAB-donor.It is important to note that the schemes capable of optimizing the IAB-node locations, UAV-IAB, A2C, LOC and UAV-IAB, A2C, LOC+PS+RA, result in a slightly better outage performance.However, these gains are marginal in the dense urban environment.On the other hand, the impact of optimized airborne deployment is notable in terms of the average UE data rate as indicated in Fig. 10.Here, we see that our proposed use of DRL techniques enables a higher average UE data rate as compared to the optimized path selection and resource allocation scheme utilized in UAV-IAB, PS+RA.Conducting location optimization, path selection, and resource allocation according to our extended DRL algorithm allows for increasing the data rate even further.However, these additional gains are marginal, which implies that location optimization in airborne IAB deployments unlocks most of the benefits.
Finally, we explore the response of our system to the number of UEs in Figs.11 and 12 by quantifying the outage probability and the average UE data rate, respectively, for Baseline; UAV-IAB, PS+RA; UAV-IAB, A2C, LOC; and UAV-IAB, A2C, LOC+PS+RA densification strategies, different numbers of UEs in the range of (20,100), cell size of R = 100 m, 3 IAB-nodes within the coverage of an IAB-donor, and dense urban deployment conditions.The main observations that one  may infer from these figures are quantitative in nature.First, note that all of the trends revealed previously hold here for different numbers of UEs, e.g., UAV-IAB, A2C, LOC densification scheme outperforms its counterpart UAV-IAB, PS+RA option across the entire range of the numbers of UEs, while the location, path, and resource optimization in UAV-IAB, A2C, LOC+PS+RA provides further benefits.However, as the number of UEs increases, both the data rate and the outage probability slightly degrade.

VII. CONCLUSION
In this article, we evaluate the coverage and capacity performance of two mmWave-based IAB system design options, namely, terrestrial and airborne.For this purpose, we propose two optimization methods based on the conventional optimization theory and DRL by taking into account the features of mmWave-based IAB network operation such as half-duplex constraint.The first method allows to jointly optimize the routing paths and the resource allocations, while the second one can also incorporate the locations of airborne IAB-nodes.
Our results demonstrate that the terrestrial IAB deployments can offer significant benefits in terms of outage probability, but these gains heavily depend on the type of deployment area and may be rather moderate for dense city centers.The relative capacity improvements when utilizing joint path selection and resource allocation in IAB networks are consistent across Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
various deployment types characterized by different building densities and heights, and may achieve 30 %.The UE data rates can be increased by up to 128 % when employing joint path selection and resource allocation, and further by up to 150 % when considering the airborne IAB-node locations as an additional optimization parameter.
Node position optimization in airborne IAB systems also offers better outage performance, but the gains are not dramatic.The UE data rates are higher when utilizing the location information for further optimization as compared to when optimizing path selection and resource allocation with random positions of IAB-nodes, which highlights the importance of tracking where the nodes are.Finally, optimizing the airborne IAB-node locations with an added consideration of path selection and resource allocation enables a further increase in the average UE data rates.The more computationally efficient DRL-aided approach considered in this article can be utilized in practical IAB deployments to better leverage the potential gains.

Manuscript received 29
April 2022; revised 2 January 2023 and 20 June 2023; accepted 20 June 2023.Date of publication 7 July 2023; date of current version 19 December 2023.This work was supported in part by the Academy of Finland through projects RADIANT, IDEA-MILL and SOLID and in part by JAES Foundation through STREAM project.The work of Dmitri Moltchanov was supported by the Academy of Finland through project HARMONIOUS (Machine learning methods and algorithms for 6G terahertz cellular access).The review of this article was coordinated by Dr. Tao Dusit Niyato.(Corresponding author: Nikita Tafintsev.)

Fig. 5 .
Fig. 5. Outage probability as a function of deployment area type.

Fig. 6 .
Fig. 6.Average UE data rate as a function of deployment area type.

Fig. 8 .
Fig. 8. Average UE data rate as a function of network cell size.

Fig. 9 .
Fig. 9. Outage probability as a function of the number of IAB-nodes.

Fig. 10 .
Fig. 10.Average UE data rate as a function of the number of IAB-nodes.

Fig. 11 .
Fig. 11.Outage probability as a function of the number of UEs.

Fig. 12 .
Fig. 12.Average UE data rate as a function of the number of UEs.
Airborne Integrated Access and Backhaul Systems: Learning-Aided Modeling and Optimization Nikita Tafintsev , Graduate Student Member, IEEE, Dmitri Moltchanov , Alessandro Chiumento , Member, IEEE, Mikko Valkama , Fellow, IEEE, and Sergey Andreev , Senior Member, IEEE 2 and d 3 are the 2D and 3D distances, respectively, k 1 = 20 and k 2 = 10 for the UMa model, k 1 = 21 and k 2 = 9.5 for the UMi street canyon model, f c is the carrier frequency, h B and h U are the actual antenna heights, d B is the breakpoint distance calculated as d B = 4h B h U f c /c, where c = 3 • 10 8 m/s is the speed of light, while h B and h U are the effective antenna heights at the base station and the UE, respectively.These effective antenna heights h B and h U are computed as h B 2π]: the angle or moving direction for each IABnode m; r d t m ∈ [0, 1]: the normalized distance for each IAB-node m.If d t m = 0, IAB-node hovers at its current location; alternatively, it moves to a certain d t m ; when d t m = 1, it moves for the maximum distance d max .

TABLE I DEFAULT
SYSTEM PARAMETERS FOR NUMERICAL ASSESSMENT