On Cost-Effective, Reliable Coverage for LoS Communications in Urban Areas

The use of ultra high frequencies in 5G and future networks to improve transmission speeds and capacity requires that users’ equipment remain in Line of Sight with the access antennas most of the service time. This requirement implies a change in perspective to plan the coverage: Antennas cannot be placed on roofs or remote antenna sites, and a robust coverage is based on multi-antenna visibility from any point. This paper tackles the problem of public street coverage in urban areas with a data-driven methodology. Starting from 3D digital maps, we formalize the problem of antenna placement as a set coverage problem and leverage powerful heuristics to implement a general algorithm that allows the exploration of different policies, returning the detailed coverage, the antenna placement, and the cost of the coverage. Results on 15 areas in 3 Italian cities show the properties of different policies and confirm for the first time on large scale real data the feasibility of Line of Sight communications with a sustainable number of antennas per km2.


I. INTRODUCTION
T HE PERFORMANCE boost of next generation access networks, 5G and 6G included, is based mostly on the densification of base stations and the use of ultra high frequencies (mmWave, TeraHertz, up to optical frequencies) to provide increased capacity to users' equipment. The consequence is that communications require Line-of-Sight (LoS) (or near LoS) links, since any obstacle (wall, vehicles, but also humans) at these frequencies is an almost impassable barrier. In these conditions, network reliability requires that each point in a public space is covered by multiple antennas to compensate dynamically for the shadowing of LoS links, which exacerbates the well-known requirement of a higher density of base stations compared to 4G. Manufacturers estimate that 5G will require more than 100 base stations per km 2 , up from the 10 per km 2 in LTE [1], and even more in 6G. In the literature there are works suggesting densities with more than a hundred next Generation NodeB (gNB)/km 2 [2].
The placement of antennas (gNB from now on) for LoS communications in urban areas is a very complex task given the complexity of the 3D topology defined by the building layout, yet computing the coverage, its redundancy for reliability, and the cost to achieve it is of paramount importance for the success of next generation mobile networks. gNB should be moved from antenna sites and roof-based positions to building facades, light poles, or other suitable positions that grant LoS on streets and public spaces, yet there is a lack of scientific evidence on the feasibility and costeffectiveness of achieving redundant LoS coverage in urban areas. Traditional coverage studies and analysis are based on Non Line-of-Sight (NLoS) propagation models devised for lower frequencies, where diffraction around corners and reflections have very different characteristics from mmWave. Modeling of 3D characteristics and cities is thus very different; moreover antennas at lower frequencies allow over-the-roof placement with good coverage on the ground, which is not possible at mmWave frequencies, completely changing the rules of network coverage and planning. This paper uses a novel 3D approach to estimate the w-coverage, i.e., the coverage with multiplicity w, and the cost of its deployment. We study and adapt to the problem at stake heuristics developed for set coverage and minimum partial set multi-cover problems (both of them known to NP-complete). The solution to these problems returns the optimal placement of gNB for LoS communications in urban areas, and we propose different metrics (or score functions) that allow adapting the heuristic solution to target different goals of the coverage.
The knowledge base are LiDAR-derived open datasets that describe the 3D shapes of buildings. This paper analyzes parts of 3 Italian cities (Trento, Firenze, and Napoli) where we selected 5 central areas with an average size of 0.7 km 2 , for a total of 15 different locations. These three cities, even if all in Italy, have different characteristics, and allow a good insight and general conclusions stemming from the results obtained. Extension to other cities and urban areas is straightforward once the LiDAR-based data are available. In an Open Science effort, we publish both the code 1 and datasets [3] in public repositories.
We consider placing antennas on the building facades, and we use state-of-the-art ray-tracing techniques based on GPU computation to estimate the LoS coverage with its multiplicity on the ground area identified by public streets. gNB are supposed to be able to cover the entire area seen by their placement site, possibly with more than a single antenna if required. In other words, a gNB on the wall of a building has a 180 • visibility, while if mounted on the corner it has a 270 • one. We devise a general algorithm that accepts different score metrics of the candidate gNB positions, thus enabling the exploration of different policies in pursuing a given w-coverage, where w = 1, 2, . . . expresses the level of redundancy requested for the reliability. For instance, w = 3 instructs the algorithm to pursue a coverage where each point in the streets has visibility toward 3 gNB.
Given the score function describing the coverage policy selected, we evaluate the coverage (the actual outcome of the algorithm), and the cost necessary to achieve it, offering significant insight on the problem of robust coverage for LoS communications, ranging from the effective number of gNB required to achieve a given coverage (e.g., 90 or 95%) to building selection strategies to reduce the coverage cost. This paper's contribution addresses two different aspects of coverage planning related to mmWave and in general ultra-high frequency technologies that require LoS for good performance.
The first one relates to the algorithms available to solve the multi-coverage problem to increase robustness and resilience of the network. We consider state-of-the-art algorithms derived from visibility analysis problems and adapt them to tackle communication networks' planning, showing how to modify them and how they affect the network design.
The second contribution is, to the best of our knowledge, the first quantitative analysis on real urban data on the required density of gNB to achieve a robust and reliable coverage with mmWave links. The results we present show that in general between 60 and 120 gNB per km 2 are sufficient for LoS coverage in public streets when the maximum distance between the user equipment and the gNB for LoS communications is set to 300m [4]. This is consistent with the estimate of 100 gNB per km 2 often mentioned by operators [1] and academics [5].
The remaining part of the paper is organized as follows. First, Sect. II reviews the state of the art and the work related to our contribution. Next, Sect. III formalizes the problem, introduces the notation we use, and reduces it to a visibility analysis problem, allowing the use of powerful tools and heuristics to solve it. Sect. IV describes the optimization objectives of coverage problems together with the constraints that limit the solution space. The following Sect. V, and VI introduce heuristics at the state of the art that we adapt to the specific problem at stake, also extending the heuristics to reduce the cost of coverage. Sect. VII, and VIII define the metrics describing the fitness and robustness of the solutions found and present numerical results for the three Italian cities we have considered. Sect. IX closes the paper with a final discussion.

II. STATE OF THE ART
The background of this contribution is rooted in different but related research areas: cellular networks, land and population coverage, foundational results on set coverage, and sensors and camera placement problems. In fact, the LoS requirement changes significantly the definition of the network coverage problem itself, which essentially becomes a problem of covering extremely complex topologies (the 3D maps of cities) that are represented digitally, with a resolution that can be as good as one square meter or even better. With this representation the task of covering urban areas can be naturally mapped onto (very large) set covering problems, thus allowing the exploitation of powerful heuristics developed for this class of NP-complete problems. At the same time, the wealth of results and methodologies coming from the cellular networks' design must not be disregarded, even if NLoS coverage and LoS coverage are subject to different modeling solution techniques. Similar problems have also been addressed in the study of sensor and camera placement for monitoring purposes.

A. Cellular Networks Coverage
The placement of base transmitting stations has been widely studied since the deployment of the first mobile networks [6]. The densification of gNB required by next generation networks has brought the topic back in the limelight, but the specific topic of LoS coverage of public areas has not received vast attention yet; even very recent works as [5] base their analysis on antenna directionality, but disregard the 3D geometry of the problem. Anjinappa et al. [7] investigate the optimal placement for gNB and passive reflectors in two urban areas; however, they do not consider any cost model or try to account for the Capital Expenditure (CapEx) to deploy such a network. Two other works, [8] and [9] focus on the placement of wall-mounted gNB and approach the problem using computational geometry in two dimensions, thus disregarding the impact of buildings and the height of the antennas. Zhang et al. [10] tackle the problem of gNB placement trying to minimize the outage probability by studying a regular, Manhattan style urban topology, while Haile et al. [2] use an approach similar to ours, for NLoS communications. Another branch of research focuses on the placement of Unmanned Aerial Vehicles (UAVs) base stations [11] but has different requirements compared to our problem.

B. Set Covering Algorithms
The family of set covering problems are classical ones in computer science. Given a set Λ and a collection Ω = {σ j | j ∈ [0, N ]} where each σ j is a subset of elements of Λ, the typical problem is to find a minimal subset Ω ⊂ Ω that covers the entire Λ. This problem is known to be NPcomplete, but polynomial-time heuristics exist with bounded errors [12], [13]. In our case, Λ is the set of points on a public street and σ j is the set of points covered by a gNB placed at a certain position. There are two differences compared to the classical problem. The first is that to achieve reliability we want to cover each element of Λ at least w times, and the second is that we do not impose total coverage, but only the coverage of a percentage of points. This is called a partial set multi-cover problem and is less studied than the previous one. Ran et al. address the difficulty of the problem [14], while recent works [15], [16] provide heuristics with bounded error that still require solving an integer linear problem at each step, which is computationally impossible in our context.

C. Sensors and Camera Placement
The research on sensor networks has addressed a similar problem. Sensors are assumed to be able to perceive events in a certain area around them, and one problem is to find the minimal deployment of sensors that cover each point at least w times. This problem is not new [17] and received a lot of attention [18], but the constraints and the solutions are very different from network coverage: The visibility regions of urban public spaces are not unit circles nor are they stochastic regions; furthermore, there is no reason to keep points connected because they are not communicating one another, and our points are contiguous and not randomly distributed on a 2D space. All the available solutions are based on geometrical considerations that do not apply to our case.
A last area of research we mention as part of set covering is the problem of placing the minimal set of cameras to monitor a certain area, or a certain number of objects in space [19]. The problem is only partly similar because cameras can be oriented, they can pan and zoom, and deployments are in the order of tens of cameras, while we have tens of thousands of potential locations. However, given the similarity to network coverage, we adapt the heuristic proposed in [20] that tries to achieve a fair coverage of selected targets from multiple cameras.

III. PROBLEM STATEMENT
The optimal placement of gNB for LoS communications requires an efficient solution to determine the visibility between any two points. The solution to the problem is enabled by two innovations. First, public administrations publish 3D shapes of cities obtained using Laser Imaging Detection and Ranging (LiDAR) technology as open data (generically called Digital Elevation Model (DEM)). These models can reach a horizontal precision down to 30 cm and a vertical precision in the order of 10 cm. Second, GPU computing allows the efficient computation of LoS existence between two points in a 3D space for thousands of potential gNB on a high precision DEM at reasonable costs.
In our research, we take advantage of visibility analysis, a methodology that, given a point in space, provides the points on the DEM that are in LoS. The outcome of the visibility analysis is usually called viewshed, and the algorithms are called viewshed algorithms. We have implemented a viewshed algorithm [21] using Numba [22], a Python compiler that allows producing object code for Nvidia GPUs. We have introduced this technique in preparatory works to study the connections between gNB [23] while here we describe it applied to ground coverage.
The DEM D we consider is a set of points in space expressed by their 3D coordinates, with a sampling precision of 1 point per squared meter (for the main notation used in the paper see Table I. The coordinates are relative to the area, so a point p in the DEM is given by its coordinates (x, y, z) where x and y are integer values and z is a real number. D where E x ,y is the element of the matrix of indices x and y, that is, the elevation of the point with planar coordinates (x, y). E is a bi-dimensional matrix, and every point p in D corresponds to a triplet (x , y, E x ,y ). In some cases, we may refer to a point p elevated from the 3D surface (z > E x ,y ), the context will make it clear. Since the space of (x, y) coordinates is sampled we can enumerate the points, and we often refer to points p j or p i . Given D let Υ be a function that takes in input two points p i and p j and returns 1 if there is LoS between p i and p j or 0 otherwise. Computing Υ(p i , p j ) is not a trivial task, as it requires calculating the segment between the two points and checking if it intercepts the surface identified by D, zero times (thus, there is LoS) or an even number of times as p i and p j are on or above the surface identified by D (then there is no LoS).
Given a point p i , we can define the viewshed from p i as the binary matrix V i of size m x × m y so that for every point If p i is the position of a gNB then V i represents the points in D that have direct line of sight with the gNB. We say that point p j is covered by gNB i if V i x ,y = 1 and we say that p j has w-coverage if, given a certain placement of gNB, it is covered by at least w gNB. The goal of our contribution is to study the deployment of gNB that maximizes the w-coverage of users' equipment placed at 1.5 meters from the ground given a specific density of gNB per km 2 .
In this initial work we only consider buildings, disregarding the effect of trees, lamps, vehicles and any other static or moving object that is not found in the public datasets. This choice leads to an optimistic evaluation of the LoS probability, which is instead influenced also by these objects as studied Fig. 1. Dilation of a building r with its perimeter points P r in blue (each pixel represents a point) and the rasterized building shape S r in black. recently in [24]. Adding a probabilistic LoS model to our study is not difficult, but, unless the model can be tuned with realistic parameters for the area, it can also be misleading. The results we obtain and present in Section VIII are thus related to the coverage and not to the (complement of) outage probability: They represent the upper-bound of the probability that a user in the area can receive the service through a mmWave communication interface.

IV. PROBLEM CONSTRAINTS AND OPTIMIZATION GOALS
As LoS communications prevent placing gNB on roofs, we assume their deployment on the facades of buildings, a location that is already used by telecommunication companies to install fiber-to-the-home splitters and 4G small cells.
Let A(s r ) be the area contained in a polygon s r that represents the 2D perimeter of a building, which we obtain from open datasets, and let R be the total number of buildings in the area. Given a building r whose 2D perimeter is represented by s r , we implement a function that transforms the polygon into a binary matrix S r of the same dimensions of E, so that: In order to isolate the border points of the building, we use a morphological operation δ() called dilation, which dilates the binary matrix of the polygon by one unit in every direction [25] and sets to 1 the corresponding coordinates. Then given the matrix S r we define its perimeter using a matrix P r as: Figure 1 shows the graphical representation of the matrixes S r and P r . We assign to every point on the perimeter a height from the ground z r that is the minimum between the height of the building minus one meter and 10 m from the street level, and finally we define the set of coordinates C r as: Eq. (4) identifies a set of potential positions for gNB on the facade of building r. Each point lies on the border of the buildings and it is placed at the height of the building, or at 10 m from the ground if the building is taller than 10 m (as the ITU recommendations for micro cells in urban areas suggest [26]). In every area we have roughly 500 to 700 buildings, the set of points p i ∈ {∪ r C r } is the set of potential locations where to place our gNB. As the average number of points per building perimeter is roughly 100 we have a total of 50,000 to 70,000 potential gNB locations.

A. Defining the Ground Visibility Matrix
We use street maps provided by Openstreetmap, where each road is identified by a mono-dimensional line that represents the center of the street. We expand the line to make it a 2D surface and we call Λ the set of all (x, y) coordinates of the points on the street, obviously ( Let us now consider all points p j of coordinates (x , y, E x ,y + 1.5) where (x , y) ∈ Λ. The point p j is a point in the street elevated by 1.5 m from the ground, a common assumption of the position of a mobile terminal. Assume p i is the position of a gNB, we call σ i a m x × m y matrix whose elements are defined as: where d (p i , p j ) is the Euclidean distance between two points and d max is a technology-dependent, arbitrary maximum communication distance. σ i is a binary matrix that represents all the points in the street elevated of 1.5 m that are in line of sight with the gNB placed at p i , and that are at a maximum distance d max . We generally refer to σ i as the ground visibility matrix from point p i . Figure 2 shows the graphical rendering of a matrix σ i .
Given all the potential locations of gNB, i.e., the union of all the sets C r , let Ω be the collection of all potential visibility matrices: and let k be the number of gNB we want to install in that specific area. The problem can be expressed as the search of a subset Ω ⊆ Ω that maximizes w-coverage, with |Ω | ≤ k . The algorithm used to compute the viewshed from p i [27] has a worst case complexity O(|E | 2 ) as it has to check all the points in the DEM. However, we limit the maximum distance of each ray to d max . This leads to a complexity O(d 4 max ) for each viewshed. Since we compute a different viewshed for each potential location in the set C r the overall complexity for the computation of the set Ω is O(|C r | · d 4 max ). We start by defining 1-coverage, which is intuitive, and we then extend the approach to arbitrary values of w.

B. Maximizing 1-Coverage
Let |·| 0 be the L 0 norm of a matrix (the number of non zero elements); the problem of maximizing 1-coverage can be formalized as the search of a subset Ω ⊆ Ω, such that This is a classical maximum coverage problem, in which we have a collection Ω of sets, each set has elements in Λ and given k, we need to find the k sets that cover the largest number of elements of Λ. Being a set covering problems, it is NP-complete and can not be solved exactly when |Ω| and |Λ| are in the order of tens of thousands. We leverage a polynomial heuristic with bounded error [28] that allows finding a semioptimal solution in a reasonable time, extending it to fit our problem. Since the maximum coverage problem can be seen as a special case of maximization of submodular functions with a cardinality constraint, we can state that the error of this polynomial heuristic is bounded and it matches the theoretical bound [12], [29].

C. Maximizing w-Coverage
The interpretation of Eq. (7) is straightforward: given a number k of gNB that can be deployed to cover a certain area, we want to have visibility from at least 1 gNB to the largest possible number of points in the public streets.
The 1-coverage is sufficient to claim to be covering an area, but does not provide reliability because there can be obstacles that obstruct the LoS with the gNB, the most obvious one being the person that holds the terminal, but also other people, cars, trucks, and so forth. The goal of a good coverage plan must be to provide 1-coverage to the largest area, but also a reliable service to mobile users, thus the problem is better formulated as a w-coverage, with a suitable w.
Maximizing w-coverage can be formulated as finding Ω such that: where Φ is a matrix populated with the multiplicity of the coverage for each point. This is an extension of the set covering problem known as minimum partial set multi-cover problem, it is NPcomplete [16], and no efficient heuristics are yet known. To solve this problem, we start from a heuristic proposed for multi-camera visibility [20] and, after analyzing the specific problem and goal, we propose a simpler one with comparable performance. for i ← 0 to k do 5: h = −∞ 6: for σ j ∈ Ω do 7: Calculate score 9: if h j > h and σ j / ∈ Ω then 10: end if 12: end for 13: Update covered elements 14: end for 16: return Ω 17: end procedure

V. A GENERIC HEURISTIC FOR w-COVERAGE
The goal of w-coverage is to guarantee a robust and efficient coverage, but in communications it has many subtleties that need to be considered, in particular, the problem that a complete w-coverage in real cities is probably impossible or simply too costly to pursue, so one has to take into account also the points where the coverage is smaller than w, solving dilemmas as, for instance, is it better to add an antenna that improves the 1-coverage in n points or one that improves the w coverage in m other points?
To achieve this goal we wish to have a single approximation algorithm where different metrics (or score functions) can be applied to explore different balances of the coverage. Algorithm 1 describes the heuristic we propose to tackle the w-coverage problem, extending the heuristic algorithm for set coverage proposed in [28].
Let us first introduce the notation used in the algorithm. Let 0 be the matrix of dimension m x × m y of all zeros. We also introduce the sum-capped-to-w operator with symbol + w so that given matrices B, D, then A = B + w D is a matrix whose elements are: Algorithm 1 works as follows. It takes in input a set of ground visibility matrices Ω, each one corresponding to a point p j where a gNB can be placed, a number k of gNB to be placed, and the desired value for w. It returns a subset Ω ⊂ Ω corresponding to the set of the viewsheds (1-t-1 mapped to gNB positions) that provide a close-to-optimal w-coverage. Line 2 initializes the matrix C at 0. The points on public streets are considered by the algorithm, the others remain untouched at zero. Line 3 initializes the set Ω to an empty set. Every iteration of the loop beginning at Line 4 chooses a new viewshed σ j (corresponding to a possible gNB location) to be added to Ω . To select the viewshed above, every iteration of the loop beginning at Line 6 evaluates the additional coverage offered by every possible viewshed, i.e., every possible candidate position for a gNB. C is defined at Line 7 as the matrix of the points covered if the candidate viewshed σ j is added to the set. Every element of C is thus the result of the element by element sum capped to w of the already existing coverage plus the viewshed σ j . The score function ρ(C , k ) at Line 8 is the core of the algorithm that actually defines the metric by which candidate gNB locations are ranked. Sect. V-A discusses three different score functions we propose and evaluate.
The score functions are defined so that a better coverage obtained by adding σ j corresponds to an algebraically higher value of h j . Line 9 checks the score and, if the score for candidate σ j is greater than the largest found so far, the candidate σ and its corresponding score h are updated. Finally, Line 13, and 14 update the coverage C and the list of selected viewsheds Ω ; at the end of the algorithm, Ω is returned, also yielding the selected gNB positions.

A. Score Functions
The final outcome of Algorithm 1 depends on the score function ρ(), which specifies, step-by-step, the metric for the best σ j selection. These functions express how the algorithm will push towards 1-, 2-, . . . or w-coverage at each iteration. We introduce here three functions that assign a different importance to different levels of coverage; Sect. VIII analyzes their impact on the final outcome.

1) w-Coverage Maximization (w-CM):
The first score we consider is a simple 1-norm (sum of all the elements) on the candidate coverage C * computed at Line 7 in Algorithm 1: For w = 1 applying Eq. (9) in Algorithm 1 solves the canonical maximum set coverage problem. For w > 1 it returns the global weight of C * elements, thus Algorithm 1 ends up in maximizing the weighted coverage, without any attempt to prioritize the points that have zero coverage over the ones that have coverage between 1 and w − 1. Notice that C * are capped to w, so that points whose coverage is larger than w do not influence the score. Besides its simplicity, this naive function completely disregards the difference between a point that is not served at all and one that already has coverage, but whose coverage is improved by adding the considered viewshed.

2) w-Coverage Fairness (w-CF):
Reducing the difference between different levels of coverage resembles a problem of fairness, thus we recall the well-known fairness index proposed by Jain et al. [30], that applied to our problem can be written in terms of matrix norms as the 2-norm is the square root of the sum of the squared elements of the matrix. The fairness pushes all elements of C * to a similar value, but alone does not provide a useful score, since we are interested in a wide, fair coverage and not only in a fair one (all 0-coverage is perfectly fair). This problem has been analyzed in the context of visual coverage in sensor networks in [20], and we adopt the same cost function proposed there, which, adapted to the problem and with the notation of this paper can be formalized as FI(C * ) is weighted by the ratio between the 1-norm of the 1-coverage and the target w-coverage (|Λ|w ), so that the first factor of Eq. (11) tries to balance the coverage and the second tries to extend it.

3) w-Coverage Gap (w-CG):
The score defined by Eq. (11) is composite (a multiplication of two factors aiming at different goals), and its interpretation not always straightforward, thus we propose a third, simple metric whose goal is to weight the gap between the actual coverage and the target coverage with a quadratic function. Let 1 Λ be the m x × m y matrix with 1 in all the positions that need to be covered and 0 otherwise. The target coverage can be expressed simply as 1 Λ ·w , and the score becomes We use the squared value of | · | 2 for computational efficiency.
Since this score measures a gap from the intended coverage, we use its negative value so that Algorithm 1 remains consistent in selecting the algebraic maximum. When w = 1 Eq. (9) and Eq. (12) yield exactly the same result, while Eq. (11) returns a different numeric result (values are divided by |Λ| 2 ) but the obtained ranking is the same of the other two. For this reason, Sect. VIII presents only 1-CM results, while for other values of w all metrics are reported.

B. Complexity
Algorithm 1 is composed of two nested loops, the first one iterates over the number k of gNB that are to be deployed and the second one iterates over the number of potential locations |Ω|. At each inner loop the algorithm computes a score h j by calling the function ρ. All score functions loop iteratively over all the values of |C * |, so they have time complexity O(|C * |). This leads to a worst case time complexity of O(k |Ω| |C * |).
The memory required to execute the algorithm is bounded by the number of ground points (Λ), as the algorithm allocates a 2-dimensional matrix of size (|Λ|, |Ω|) (Line 2). At Line 7 we allocate a new temporary matrix C * of the same size. Since we only need to store a boolean value for each cell of the matrix, we can use the smallest datatype available, which is uint8. 2 Thus the memory footprint is 2 |Λ| |Ω| bytes. The memory footprint of ρ is constant.

VI. THREE-STEP HEURISTIC
The heuristic presented in the previous section finds what we call the semi-optimal placement for the gNB. Given a certain number k of gNB that the operator can afford to place, it provides the best locations where these gNB should be placed. Iterating on values of k, we can answer the question: what is the minimum number of gNB and their positions to obtain w-coverage of a certain fraction of points on the street, e.g., 95%? However, Algorithm 1 treats every σ j in Ω the same way, irrespectively of the building it corresponds to. This means that if convenient, the algorithm can place one gNB per building, or even all gNB in the same building. Albeit theoretically possible, these configurations would be practically impossible or not cost-effective. To produce realistic results we need to introduce two practical constraints based on the cost estimation of a gNB deployment.
In a study done in 2020, Oughton and Russell calculated the cost of a 5G small cell to be 20100$, of which only 3380$ were accountable to the radio 3 while the rest includes several fixed costs for constructions, hardware, backhaul fiber network, etc. that are needed to realize the gNB [31]. We assume that a multi-radio small cell can be realized keeping the fixed cost constant, and increasing the number of radios up to a maximum of 5 per gNB. This is a free parameter of our algorithm and can be adjusted to any other value.
We designed a heuristic that applies Algorithm 1 in three steps to limit the number of buildings on which to place gNB and the number of radios per gNB: First, for every building, it finds the best 5 points that maximize the building visibility; Second, it finds the top g buildings ordered by their cumulative visibility provided by the best 5 points; Third, among the best 5 points of the top g buildings, it finds the overall best k locations for the gNB. The choice of g is key for the final result, in fact, if the target of the operator is to provide w-coverage to a certain percentage of the points on the ground, if g is too small such objective may not be reachable. On the other hand, the larger is g, the higher is the cost of the deployment. The optimal g is of course scenario-dependent and can be found iterating on g with our heuristic. Since reducing the number of buildings (and thus the cardinality of Ω) makes the 3-step heuristic faster than the original one, the optimal g for an area of roughly 0.7km 2 with 135 gNB/km 2 can be computed in around 1 minute.
Note that the formulation of the Γ () function in Algorithm 1 is extremely generic as it takes a set Ω of viewsheds as an input, and it returns another set of viewsheds. We can thus change the input Ω and k and apply the algorithm three times obtaining semantically different results, without the need to modify its internals.

A. First Step: Point Selection
At the first step we limit Ω to the set of all points from a single building. Let Ω r be the set of all the visibility matrices from points on the (dilated) perimeter of building r, then: We apply Γ () using Ω r as the first argument, and limiting the choice to 5 points. We obtain a list of viewsheds from the best 3 These costs have been converted to Dollars ($) for the sake of readability, in the original research they were expressed in British Pounds (£). 5 points in the building.
We repeat this step for every building r.

B. Second Step: Building Selection
Let Φ r be the ground visibility matrix of building r, i.e., the visibility matrix from the best 5 points on the facade of building r. Φ r is defined as the logical OR ( ) of all the binary visibility matrices: (15) and let Ω R = {Φ r ∀ r < R} be the set of all the visibility matrices from all buildings. Invoking the function Γ () using Ω R and g as an input, we obtain the cumulative viewsheds from the best g buildings. Since the areas in which we run the experiments are different and contain a different number R of buildings, in practical terms it is more convenient to refer to the percentage X of buildings to be used, where g = X 100 * R . We then define: that is the set of the best cumulative viewsheds from a percentage X of the buildings of the area.

C. Third Step: gNB Selection
Step two returns a set of cumulative viewsheds obtained considering the best 5 points on X% of the buildings, and we use it to select the set B X of the indices of the best buildings: (17) and thus we can define the input to the third and last step of the heuristic: Ω X contains all the possible viewsheds from each of the best 5 points (selected in step 1), from X% of the buildings that have the best cumulative viewshed (selected at step 2). As an example consider an area with 600 buildings each of which has an average of 100 points on its border. If X = 10 then g = 60, and the third step of our heuristic would explore at most |Ω X | = 60 × 5 = 300 locations out of the 60.000 available.
Finally, we apply Γ () again and obtain: Ω X contains exactly k viewsheds that correspond to k points in space where to place gNB. These places represent the semioptimal choice among Ω X , with a maximum of 5 radios per building and X% of buildings. Similarly to what we did passing from g to X, it is more practical to target a desired density λ of gNB per squared km, rather than a number k. So, given a certain area and λ we have: We are now able to compare the coverage obtained in heterogeneous areas with a target density of gNB per squared km, using X as a tuning parameter to obtain the most favorable cost/performance trade-off.
Note that while the original heuristic has a bounded error, our three steps heuristic loses this theoretical property. However, when used with X = 100% the effect is to pick the best 5 points for each building, and then apply the bounded heuristic to this restricted set of points. This reduces the dimension of the problem, but also corresponds to a practical constraint in real deployments, since the number of gNB per building cannot grow arbitrarily. The application of the threestep heuristic with X = 100% thus maintains the bounded error, and Sect. VIII shows that results with X = 4% are very close with a strong reduction of the cost of the infrastructure. Finally, since Algorithm 1 has polynomial complexity, its three-step application is again of polynomial complexity.

VII. EXPERIMENTS SET UP AND METRICS
As already mentioned we evaluate 5 areas for each of the three densely populated Italian cities for a total of 15 scenarios. Trento, Firenze, and Napoli, however, have quite different urban textures, so they represent a valid sample of different urban areas. The average size of the areas is roughly 0.7 km 2 and is limited by the computational power we had access to, as obtaining Ω X ,λ is a computationally intensive task that requires handling tens of thousands of large integer matrices. To speed-up the algebra on large matrices we implemented the algorithms on an NVIDIA V100 GPU equipped with 5120 CUDA cores, which enables parallelization of matrix operations. The limit is given by the 16 GB of RAM of the GPU which limits computation to areas of up to 1km 2 . Results can be extended to larger areas straightforwardly with more resources, and with some effort by improving the memory space optimization.
Placing the gNB in a constrained area causes a border effect: Points on the border of the area can be covered only by the gNB placed inside the area, but not from the ones outside the area, which would be present in the real world coverage. For this reason, we enlarge the area where gNB can be placed with a guard band of width d max /2 m as shown in Fig. 3 (light gray area), but we measure coverage only in the inner area (dark gray area).

A. Evaluation Metrics
To evaluate the performance of different algorithms and their parameters we calculate the effective coverage, that is the sum of all the viewsheds σ i ∈ Ω X ,λ : This matrix, with the same dimensions of E, contains values ranging from 0 (no LoS with any gNB) to k = λ * area (LoS   with all gNB). Fig. 4 shows the effective coverage on a sample area of Napoli. Every experiment is repeated varying λ, X and w as reported in Table II; in graphs we report the confidence interval on the 15 areas with confidence level α = 0.95 as error bars.
1) Coverage: The coverage metric, analyzed as a function of X, counts the number of points with w-coverage normalized by the total number of points on the public street; where γ w (Φ X ,λ ) is a function that counts the number of elements of Φ X ,λ that are greater or equal than w. In the numerical results we show c 1 (λ), which expresses the portion of streets covered by at least one gNB, and c 3 (λ) the portion of streets covered with higher reliability (at least 3 gNB).
We also show the coverage distribution on the points of Λ, i.e., the Empirical Probability Density Function (e-pdf) of the coverage values.
2) Resistance to Obstruction: Fig. 5 depicts the ground projection of the rays that connect a point p to three gNB in two different cases. In both scenarios, an obstacle that obstructs an angle α 2 + α 3 may totally shadow p, but in the right figure this is way more likely than in the left one, so w-coverage alone does not necessarily imply resistance to obstruction. We define α c (p) = 360 • − max(α 1 , α 2 , α 3 ) to be a measure of robustness against shadowing. It is easy to show that one single object that obstructs an angle smaller than α c (p) cannot totally prevent LoS with some gNB, so the larger is α c (p) the better it is. Let Λ 2 be the set of points with coverage larger or equal than 2, then provides the fraction of points for which one obstacle that obstructs 45 • does not prevent LoS. In the absence of finer metrics, O R provides a heuristic measure of how well the coverage can resist to shadowing.

3) Total Cost:
The cost function we use to evaluate the CapEx for the deployment of the network follows a cost model conceived for 5G networks [31]. From the model we extract a fixed part c gNB for the deployment of the gNB in a building, and a variable part for each radio interface of the node (c radio ): where N gNB counts the number of buildings on which at least one gNB has been deployed and k = λ * area. The cost() value depends on the desired density of gNB per km 2 , so it is expressed as a function of λ, but it is also influenced by X. We also define upper and lower bounds of the cost function, as: The upper bound UB is given by the worst case where every coverage point is a single antenna gNB on a different building, while the lower bound LB is given by the (practically impossible) network where all radios are placed on a single building.

4) Marginal Cost:
To estimate the Return of Investement (RoI) of deploying a robust network, we also evaluate the cost-effectiveness of adding new gNB to further improve the coverage. We evaluate the cost of a deployment with growing densities λ i . Since the heuristic is deterministic, if λ i−1 < λ i then Ω X ,λ i−1 ⊂ Ω X ,λ i , and the difference cost(λ i )−cost(λ i+1 ) is exactly the marginal cost of the added gNB. Similarly, c w (λ i ) − c w (λ i−1 ) is the marginal increase of relative coverage. Thus we can define the incremental cost metric m c : that provides an efficiency metric ($/m 2 ) to estimate how costeffective it is to increment the density of gNB to improve coverage.

VIII. RESULTS
We present the results following the same order we introduced for the metrics: coverage, total cost, and marginal cost. Fig. 6 reports c w (λ) for X = 2, 4, 100% and w = 1, 3. Each point is the average of all 15 areas we consider; vertical bars are the 95% confidence intervals. Since areas are never overlapping and thus they are independent, we can safely consider their average coverage as the outcome of an i.i.d. random variable, thus the confidence interval can be computed as a Student-t distribution with 15 degrees of freedom. The confidence intervals are reasonably compact, thus we deem that the results are reliable.

A. 1-Coverage
If we focus on the first row (1-coverage) the plots show one key point of our analysis, that it is possible to obtain a very large c 1 in urban areas with λ below, or very close to 100. This confirms that in urban areas a very high LoS coverage is achievable with a number of gNB that is close to what is expected (roughly ten times what is used for 4G, so λ 100). We see in fact that with X = 100 and with X = 4 all the curves reach c 1 = 80% (λ = 45), c 1 = 90% (λ = 75), and 1-CM and 3-CG reach c 1 = 95% (λ = 105). We stress that these results were obtained with realistic assumptions: a limited, realistic number of devices per building and a precise ray-tracing model based on real data. All curves follow a trend expected in a set covering problem, with a steep initial rise followed by a saturation phase.
1-CM provides the highest 1-coverage, which is consistent with its design, while it is important to note that 3-CG, when evaluated on c 1 , dominates the other 3-coverage strategies on values of c 1 > 80%. The difference is not very large, but it is consistent in the saturation phase, in which it is extremely expensive to gain even a single percentage point, thus this very simple score function serves well both coverage and robustness. We also see that with X = 2% all strategies stay below c 1 = 90%, while as said, with X = 4% we can achieve results that are very close to X = 100.

1) n-Coverage and Reliability:
Let us now focus on the 3-coverage row, which confirms that X = 2% is outperformed by the other two configurations (and thus, we do not comment on results with X = 2% anymore). It is evident that a remarkable difference between 1-CM and the 3-coverage strategies exists. Among the latter ones, 3-CM is the one that performs slightly better than others, as it focuses only on 3-coverage and disregards other possible goals. In this case, the difference in the saturation point between X = 4 and X = 100 is more evident, as with X = 4 the highest 3-coverage is 84%.
The take-away from this set of results is that if an operator wants just to provide 1-coverage, it should use 1-CM, and can reach 95% coverage with λ = 75 and X = 4%. This is the most cost-effective solution, as shown later, but obviously is also the most fragile. If the operator is interested only in 3-coverage, because its target customers need extremely high reliability, then the best choice is 3-CM, calibrating X on the desired coverage. Most likely the operator is interested primarily in 1-coverage (because it enables the service to people) and with lower priority in 3-coverage (for users that need high reliability). In this case, 3-CF and 3-CG are the best choices, as they perform very close to 1-CM in terms of 1-coverage while being very close to 3-CM, outperforming 1-CM, in terms of 3-coverage.
To better understand the behavior of the coverage as a function of the gNB density, and why it tends to saturate before reaching 100% coverage, Fig. 7 shows the e-pdf of the coverage multiplicity for X = 4% and three values of λ. First, it is clear that to reach a good 1-coverage some areas are covered many times, with the tail of the coverage distribution growing significantly. Second, 3-CM penalizes the 1-coverage to the point that the probability of 0-coverage is higher than the probability of strictly 1-coverage. The mass of the distribution is different between 1-CM and the 3-score functions, but the  tails remain very close to one another suggesting that there are geometric properties that force a very high degree of coverage in some areas when trying to cover the areas that have remained uncovered. Fig. 8 shows the robustness to obstruction O R for all points p ∈ Λ 2 for different λ values. We report also the bar for 1-CM for completeness, but previous results show that 2-coverage for 1-CM is bad (for each reported λ slightly more than 50% of 2-coverage compared to the worst of the other strategies), thus O R should be weighted by its lower 2-coverage and it is not really meaningful. The other strategies perform similarly, with a fairly high fraction of points (between 63% and 77%) that resists to shadowing from an obstacle that obstructs a 45 degree angle, and 3-CG better than the others in three cases. On the other hand, increasing the density of gNB by a factor of 3 improves O R only about 10%, which indicates that to improve this result a different heuristic that takes into account this specific objective should be considered.
Finally, Fig. 9 presents the E-CDF of the length of the link to the closest gNB for every covered point. We report this metric to show that with any value of λ the largest majority of the best links are below 100 meters of length, and the 95% of them are below 214 meters (less than 200 meters for higher density), so our choice of setting the maximum link length to 300 meters only marginally impacts the results. Fig. 10 shows the cost of each deployment as a function of λ for X = 4%; and X = 100%. Changing X gives rise to very  different trends, with the cost for X = 4% that grows remarkably more slowly than the cost for X = 100%. This is the beneficial effect of limiting the number of buildings. Fig. 11 reports the trends of the cost using as a free variable the coverage, with vertical lines to highlight key coverage values (90% and 95% in c 1 and 84% in c 3 ). The cost for X = 100% is always higher than the cost for X = 4% at the same coverage: our heuristic with X = 4% may need a higher λ compared to X = 100% (as Fig. 6 shows), however, the total cost of for X = 100% is higher because it uses more buildings. This results highlight two facts. First, the cost to ensure nearly complete w-coverage increases very fast with a sort of asymptotic behavior independently from w. Second, and rather obvious, including all buildings in the selection allows a better coverage again independently from w.

B. Total Cost
We quantify this increase in Table III reporting the relative cost difference between X = 4% and X = 100% with the 1-CM and the 3-CG strategies at the same coverage value (the intercept of the vertical lines and the curves in Fig. 11). Not limiting the number of buildings produces an increased cost of 19% and 36% for c 1 and a probably unacceptable 70% for c 3 .   Fig. 12 reports the marginal cost per m 2 m c (λ). Each point is the ratio between the increased cost and the increased covered area (Eq. (26)). The figure shows that c 1 (left plots) monotonically increases for the 1-CM strategy, with very high costs when the gNB density becomes very high (λ = 160, 180). This is consistent with the design of the metric as it is trying to cover every single uncovered point, at the cost of placing new gNB that cover just a few squared meters. The other metrics instead grow up to a certain coverage and then oscillate with a more noisy trend. This is an artifact of the metric, as the incremental cost is monotonously growing as in Fig. 10, but the incremental covered area is not monotonously growing for strategies that use w = 3 that do not simply try to maximize c 1 .

C. Marginal Cost
Looking at the right column, we see that the marginal cost per m 2 of c 3 has a different trend. Initially, the cost for all strategies decreases, as the 3-coverage is very low (see Fig. 6) and it is easy to improve it. When the coverage curves start to saturate, then m c increases. The 1-CM strategy has a lower m c simply because it has a lower c 3 and thus, it is not saturating even at high values of λ.
Our approach makes it possible to obtain these data that are vital for operators, because they help to identify a realistic maximum value for coverage, beyond which it is not costeffective to add more gNB.

IX. CONCLUSION
The increased demand for bit-rate and the growth of the number of devices pushed the designers of 5G and 6G to use high frequencies (mmWave or THz), a direction that is surely nonreversible. At high frequencies, the communication takes place primarily in LoS, and we know that this implies a densification of base stations. Yet, we do not know how to plan a robust access network that uses LoS communications, we do not have evidence to state if this approach is actually feasible at large scale, and we do not know if it is cost-effective.
This paper provides some initial answers to these open issues using a data-based approach to plan a LoS network in urban areas. We exploited the availability of open data and fast GPU-based computation and we implemented several algorithms to study the realization of robust coverage of open areas, showing that the problem can be solved with a reasonable density of gNB per km 2 . We have shown that the problem of maximizing 1-coverage or 3-coverage are competing ones, and there are no algorithms in the literature that we can use to solve them at the needed scale.
The new heuristic proposed prove to be the best trade-off between 1-coverage (that is the minimal requirement for an operator) and 3-coverage, and finally, we have evaluated the cost of our solution using state-of-the-art estimations.
We believe the approach we propose can open the way to new research, improving our own, and introducing several other challenges. As an example, we mention the use of Integrated Access and Backhaul (IAB) to further reduce the cost of each gNB, or the design of efficient solutions for rural areas, where the cost per user is way larger than in urban areas. Data-driven design can tackle those challenges and, to support the research community in making these steps, we publish all our code and data, together with the paper.