Low-Complexity Beam Allocation for Switched-Beam Based Multiuser Massive MIMO Systems

This paper addresses the beam allocation problem in a switched-beam based massive multipleinput-multiple-output (MIMO) system working at the millimeter wave (mmWave) frequency band, with the target of maximizing the sum data rate. This beam allocation problem can be formulated as a combinatorial optimization problem under two constraints that each user uses at most one beam for its data transmission and each beam serves at most one user. The brute-force search is a straightforward method to solve this optimization problem. However, for a massive MIMO system with a large number of beams N , the brute-force search results in intractable complexity O(N), where K is the number of users. In this paper, in order to solve the beam allocation problem with affordable complexity, a suboptimal low-complexity beam allocation (LBA) algorithm is developed based on submodular optimization theory, which has been shown to be a powerful tool for solving combinatorial optimization problems. Simulation results show that our proposed LBA algorithm achieves nearly optimal sum data This paper was presented in part at the IEEE International Conference on Communications (ICC), London, June 2015. J. Wang, H. Zhu, N. J. Gomes, and J. Wang are with the School of Engineering and Digital Arts, University of Kent, Canterbury, Kent, CT2 7NT, UK (email: {jw712, h.zhu, n.j.gomes, j.z.wang}@kent.ac.uk). L. Dai is with the Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong, China (email: lindai@cityu.edu.hk).


Low-Complexity Beam Allocation for
Switched-Beam Based Multiuser Massive MIMO Systems communication systems. In order to meet this challenge, there has been a great interest in moving to millimeter wave (mmWave) spectrum where a wide range of bandwidth is available, which also enables deploying a massive number of antennas at the base-station (BS) [1]- [10].
With a massive number of antenna elements at the BS (popularly known as massive multiple-input-multipleoutput (MIMO)), a few advantages can be harvested, e.g., the random channel vectors from users to the BS become pairwisely orthogonal, the effect of small-scale fading can be averaged out, and the transmit power can be reduced significantly [9], [10]. Thanks to the aforementioned benefits, massive MIMO has been also adopted in wireless sensor networks recently [11], [12]. Moreover, by employing massive MIMO in mmWave communication systems, narrow and highgain beams can be formed to overcome the severe propagation loss of mmWave signals for establishing reliable links. Therefore, massive MIMO beamforming has been seen as a promising key technology for 5G cellular networks [1].
Generally, beamforming technologies can be divided into two categories: digital beamforming [13] and analog beamforming [14]. Digital beamforming is often employed in the conventional communication systems where beams are formed by digitally adjusting the amplitudes and phases of transmitted signals, which is flexible and programmable. However, performing digital beamforming requires individual radio frequency (RF) chain for each antenna element. For the massive MIMO systems, even though with very good performance, deploying a massive number of RF chains to enable digital beamforming is of high cost and with high power consumption at mmWave frequencies. By contrast, analog beamforming is implemented just by using a number of independent phase shifters on the BS antennas at the RF part, i.e., only one RF chain is needed for a single data stream, which is much simpler and cheaper. Therefore, more and more attention has been paid on the hybrid analog-digital beamforming [15]- [20] and pure analog beamforming [21]- [23].
For analog beamforming based systems, a switched-beam scheme along with beam selection turned to be a popular technique for data transmission through beams due to its simplicity [24]- [32]. In this scheme, a fixed number of beams are generated and pointed to different predetermined directions to cover the whole cell. The Butler method is a representative method to create fixed beams [25]. In the switched-beam based This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ multiuser systems, each user uses only one beam to transmit and/or receive its data signal and switches its beam from one time slot to another according to the strength of the line-ofsight (LOS) signal to achieve efficient beam switching. As the switched-beam scheme requires LOS, it has been mainly investigated in satellite communication systems [26]- [28]. For cellular systems, the study was limited in wideband code division multiple access (WCDMA) mobile networks [29], [30]. Recently, due to the potential of high capacity by exploiting the mmWave bands, the switched-beam scheme has been also studied in the mmWave communication systems where LOS often exists [31], [32].
For such a switched-beam based multiuser system, a key challenge is how to assign multiple beams to multiple users to achieve a high sum data rate. This beam allocation problem shares similarities to that in the conventional random beamforming based systems [33]- [36]. Specifically, in a random beamforming based system, to exploit multiuser diversity gain, the number of users K is assumed to be much larger than the number of beams N, i.e., K N. By assuming that all the N beams are used with equal power allocation, each user can calculate the received signal-to-interference-plusnoise ratios (SINRs) on the N beams and then feeds back the SINRs and the corresponding beam indices to the BS. After receiving feedback from all users on all beams, the BS assigns each beam to the best user with the highest SINR to maximize the sum data rate [33]- [35]. In this scenario, the beam allocation problem is reduced to N independent user selection problems, i.e., selecting the proper user for each beam to serve.
However, the beam allocation algorithms in conventional random beamforming based systems [33]- [35] cannot be directly adopted in switched-beam based massive MIMO systems. The main reason is that in massive MIMO systems, the number of beams can be much larger than the number of users, i.e., N K , and thus some of the beams may not be used for data transmission, and the beams used for data transmission could vary when the channel condition changes. As a result, it is impossible for each user to calculate its received SINR on each beam due to the lack of information of active beams. Moreover, even under the condition N K , it is shown in [36] that using all the beams for transmission is not always optimal due to strong inter-beam interference. Several suboptimal beam selection algorithms were further proposed to maximize the sum data rate [36], among which the simplest greedy algorithm still has the complexity O(K N 2 ), which could be very high when the number of beams N is large.
This paper aims to develop a low-complexity beam allocation algorithm to maximize the sum data rate of a switchedbeam based multiuser massive MIMO system where a massive number of N fixed beams are formed by using the Butler method with a uniform linear array of N antenna elements to serve K users, i.e., N K . The beam allocation problem is formulated as a combinatorial optimization problem with two constraints including that each user can be served at most by one beam and each beam can serve at most one user. As the brute-force search leads to the complexity O(N K ) to obtain the optimal solution, which is intractable when the number of beams N is large, in this paper, a suboptimal low-complexity beam allocation (LBA) algorithm is proposed based on submodular optimization theory, which is a powerful tool for solving combinatorial optimization problems [37]. Specifically, the original optimization problem is first reformulated as a non-monotone submodular maximization problem under two partition matroid constraints, which still has high computational complexity due to the non-monotone objective function. To reduce the complexity, the non-monotone submodular maximization problem is further decoupled into two sub-problems, including a beam-user association sub-problem and a beam allocation sub-problem which is a monotone submodular maximization problem subject to a single partition matroid constraint and can be efficiently solved by a greedy algorithm. The LBA algorithm is then proposed by combining the solutions of these two sub-problems. Simulation results show that compared with the optimal brute-force search, our LBA algorithm can achieve nearly the same sum data rate, but only with the complexity O(K log N).
Note that to maximize the sum data rate, some users might not be served, which causes delay for the unserved users. It is therefore of great importance to study how many users can be simultaneously served by the system. This performance is indicated in the paper by service ratio, which is defined as the ratio of the number of served users to the total number of users. An explicit expression of the average service ratio (i.e., the service ratio averaged over users' positions) is obtained and shown to be a monotonic increasing function of the ratio of the number of beams N to the number of users K . Simulation results verify that the analytical result serves as a good approximation of the average service ratio, which sheds important insights on the service delay.
The remainder of this paper is organized as follows. Section II introduces the system model and problem formulation. A low-complexity beam allocation algorithm is proposed in Section III, followed by the simulation results and discussions provided in Section IV. Concluding remarks are summarized in Section V.
Throughout this paper, E[·] denotes the expectation operator. x ∼ CN (u, σ 2 ) denotes a complex Gaussian random variable with mean u and variance σ 2 . |X| denotes the cardinality of set X. 2 X denotes the power set of set X. X ∩ Y and X ∪ Y denote the intersection and union of set X and set Y , respectively. X \ Y denotes the relative compliment of set Y in set X. ∅ denotes the empty set. n k denotes a binomial coefficient, i.e., the number of ways to choose k elements from a set of n distinct elements. n k denotes a Stirling number of the second kind, i.e., the number of ways to partition a set of n distinct elements into k non-empty subsets.

II. SYSTEM MODEL AND PROBLEM FORMULATION
As shown in Fig. 1(a), the downlink transmission is considered for a multiuser switched-beam based system with K users and a BS with a linear array of N equally spaced identical isotropic antenna elements to form N fixed beams. It is assumed that K users are uniformly distributed within a circular cell with unit radius, and each user is equipped with a single antenna. 1 The BS is located at the center of the cell and all the BS antenna elements are equally spaced at distance d = 0.5λ, where λ is the propagation wavelength. (ρ k , θ k ) denotes the location of user k.
By applying the Butler method to form the beams with N = 2 i (where i ≥ 1 is an integer), the normalized array factor of any beam n, n = 1, 2, · · · , N, with respect to an angle of departure (AoD) θ of signal is given by [24] where β n = − N + 1 2 + n π.
(2) Fig. 1(b) illustrates the array pattern generated according to (1) and (2) when N = 16, where the beam index increases from 1 to N from the left-hand side to the right-hand side. Consider user k as the reference user. By assuming an LOS channel at the mmWave frequencies, unlike multipath channels in conventional cellular systems [38]- [40], the AoD of the received signal at user k is θ k and the corresponding received power can be written as [41] where p n denotes the transmit power allocated on beam n. ρ k is the distance from the cell center to user k and α is the path-loss exponent. D n (θ ) denotes the directivity 2 of beam n with regard to an AoD θ , given by [24] Illustration of beam allocation in switched-beam based massive MIMO systems. "x" represents a user. A beam is allocated to the user in the same color. N = 16, K = 5.
Appendix A shows that (4) can be further reduced to (A.5), i.e., In (3), c k,n ∈ {0, 1} denotes the beam allocation indicator. If beam n is allocated to user k, c k,n = 1; otherwise, c k,n = 0. With beam switching, each user can only use one beam for its data transmission, i.e., N n=1 c k,n ≤ 1. Moreover, to avoid severe intra-beam interference, each beam can be used at most by one user, i.e., K k=1 c k,n ≤ 1. For a massive MIMO system where the number of beams N is much larger than the number of users K , only some of the beams will be used for data transmission, as illustrated in Fig. 2. Let N s denote the number of allocated beams, given by which is no larger than K as each user can only select at most one beam for its data transmission. Assume that the total transmit power is fixed at P t , and equally allocated to the beams selected for data transmission. The transmit power allocated on beam n is then given by By assuming that the total system bandwidth is normalized to unity, the achievable data rate of user k can be written as where σ 2 0 is the variance of the additive white Gaussian noise (AWGN), and I k is the inter-beam interference power received at user k, which is given by In this paper, we aim at developing a beam allocation algorithm for maximizing the sum data rate of switched-beam based massive MIMO systems, which can be formulated as where (10b) and (10c) follow the constraints that each user can select at most one beam to transmit, and each beam can be used by at most one user to avoid severe intra-beam interference, respectively. For the combinatorial optimization problem given by (10a)-(10d), a brute-force search among all (N + 1) K possible beam allocations leads to unaffordably high complexity in massive MIMO systems with a large number of beams N. In the literature [43]- [47], a widely adopted method for solving combinatorial optimization problems is to relax the indicator c k,n into a continuous variable between 0 and 1, and convert the objective function into a convex one, which could then be efficiently solved by the convex optimization algorithms. In our case, however, there are N × K indicators to be optimized, which still results in prohibitively high computational complexity for large N even with relaxation. As a result, in this paper, we resort to submodular optimization, which has been shown to be a powerful tool for solving combinatorial optimization problems [37]. In the following section, the beam allocation problem will be reformulated into a submodular maximization problem subject to matroid constraints.

III. BEAM ALLOCATION DESIGN BASED ON SUBMODULAR OPTIMIZATION
Before reformulating our beam allocation problem into a submodular optimization problem, let us first present the definitions of submodular functions and matroids given in [37] as follows.

A. Basic Definitions
Definition 1: Let U be a finite ground set, and 2 U be the power set of U (i.e., the set of all subsets of U , including the empty set and U itself). A set function f (S) with the input S ⊆ U (i.e., S ∈ 2 U ) and a real value output, denoted by for any S, T ⊆ U . An equivalent definition of a submodular function is that for any S ⊆ T ⊆ U.
where U is a finite ground set and I ⊆ 2 U is a collection of subsets of U with the following properties: (1) I is nonempty, in particular, ∅ ∈ I ; (2) I is downward closed; i.e., for each X ⊆ Y ∈ I , we have X ∈ I ; (3) If X, Y ∈ I and |X| > |Y |, then there exists an element Note that I is also called independent sets. Intuitively, for any set X ∈ I , the elements in X are independent of each other.
Particularly, a partition matroid is a matroid where the ground set U is partitioned into some disjoint sets, U 1 , U 2 , · · · , U l , and (14) for some given parameters w 1 , w 2 , · · · , w l .
Based on the above definitions, the original optimization problems given by (10a)-(10d) will be reformulated in the following subsection.

B. Problem Reformulation
Let us first define the ground set U as and the beam allocation set S as a subset of U such that u k,n ∈ S if beam n is allocated to user k, i.e., c k,n = 1, ∀k, n; otherwise, u k,n / ∈ S. For any beam allocation set S ⊆ U , the objective function of (10a) can then be written as according to (3) and (6)-(9).
The constraints can be written as an intersection of two partition matroids on the ground set U . Specifically, let us partition the ground set U into K disjoint subsets, 2 , · · · , u k,N } is the set containing all the possible beam allocations of user k, and the superscript denotes that the ground set is partitioned according to the user index. Since the beam allocation indicator c k,n = 1 if u k,n belongs to the beam allocation set S, i.e., u k,n ∈ S, the constraint given in (10b) can be written as S ∈ I U , where According to the definition given by (14), it is clear from (17) that M U = (U, I U ) is a partition matroid. Similarly, by partitioning the ground set U into N disjoint subsets according to the beam index, i.e., U B n = {u 1,n , u 2,n , · · · , u K ,n }, n = 1, 2, · · · , N, constraint (10c) can also be written as a partition matroid constraint, S ∈ I B , with denotes the corresponding partition matroid. The optimization problem formulated in (10a)-(10d) can then be rewritten as max S⊆U u k,n ∈S (19c) Appendix B shows that the objective function of (19a) is not submodular due to the existence of the inter-beam interference. By ignoring the inter-beam interference, it can be further relaxed into the following form with a submodular objective function 3 : The proof of the submodularity of the objective function in (20a) can be found in Appendix B.
For maximization problems with a non-negative submodular objective function and multiple matroid constraints, a localsearch based algorithm has been proposed in [48], which achieves at least 1 p+2+ 1 p + of the optimal result, where > 0 is a given parameter with small value and p is the number of matroid constraints. It is also shown in [48] that this algorithm requires at most O 1 pq 4 log q local operations, where q is the size of the ground set. In our case, with the size of the ground set q = |U | = K N and the number of matroid constraints p = 2, the number of required local operations is O 1 (K N) 4 log(K N) , which still results in high complexity when the number of beams N is large. In the following subsection, the beam allocation problem given by (20a)-(20c) will be decoupled into two sub-problems, based on which a low-complexity beam allocation algorithm will be proposed.

C. Low-Complexity Beam Allocation (LBA)
1) Problem Decomposition: In this subsection, the beam allocation optimization problem given by (20a)-(20c) will be decoupled into two sub-problems: (1) beam-user association for each user, and (2) beam allocation.
For the first sub-problem, let us define S U k as a subset of U U k = {u k,1 , u k,2 , · · · , u k,N } of user k. For each user, we aim at maximizing its achievable data rate by properly choosing S U k . The corresponding optimization problem can be written as where (21b) follows the constraint given in (20b), i.e., each user can be only associated with one beam.
In the objective function of (21a), only the directivity D n (θ k ) depends on S U k . Therefore, we have where the index of user k's associated beam, π(k), is given by That is, each user is associated with the beam with the largest directivity.
Based on the solution of the first beam-user association sub-problem, the ground set of possible beam allocations is reduced to Then the second beam allocation sub-problem can be written as where set I of matroid M = (U , I ) is given by with U n = U ∩ U B n . Appendix C shows that the objective function of (25a) is a monotone submodular function on the ground set U when the transmit signal-to-noise ratio (SNR) P t /σ 2 0 > K N sin 2 π 2N K K −1 K −1 for largeK ,N ≈ 6.707K /N, where N is the number of beams and K is the number of users. For a massive MIMO system with N K , the optimization problem given by (25a)-(25b) is a monotone submodular function maximization problem under a single matroid constraint, which can be effectively solved by a greedy algorithm that achieves at least 1 2 of the optimal result [49]. Specifically, the algorithm starts with an empty beam allocation set S, and adds one element with the highest marginal gain at each step to set S while the updated S is still an element of I . Note that to maximize the objective function of (25a), elements in U should be included in the beam allocation set S as far as possible while S still satisfies constraint (25b), i.e., each beam can only be used at most by one user. Therefore, we can conclude that the greedy algorithm is equivalent to allocating each associated beam to its best user with the largest received signal power.
2) LBA Algorithm Design: By combining the solutions of the above two sub-problems, a two-step beam allocation algorithm is proposed: (1) Each user is associated with its best beam with the largest directivity; (2) Each associated beam is allocated to its best associated user with the largest received signal power. The detailed description of this two-step algorithm is presented in Algorithm 1. As Fig. 3 illustrates, in the first step, each user is associated with the beam with the highest directivity, which is highlighted in the same color.
In the second step, if a beam is associated with more than one user (such as the blue beam), then it is allocated to the user with the highest received signal power.
It is clear from Algorithm 1 that in the first step of the proposed LBA algorithm, there are K iterations. In each iteration, the number of comparisons required to find the best beam of that user is log 2 N by adopting the binary search. In the second step, as there are K users in total, at most K −1 comparisons are needed when all the users are associated with the same beam. Therefore, the maximum number of comparisons required by our proposed LBA algorithm is K log 2 N + K − 1. Compared with the optimal brute-force search, it can be clearly seen that by relaxing the objective function and decoupling the problem into two sub-problems, the computational complexity is significantly reduced from

IV. SIMULATION RESULTS AND DISCUSSIONS
In this section, simulation results will be presented to demonstrate the performance of the proposed LBA algorithm. As described in Section II, we assume that K users are uniformly distributed in a circular cell with unit radius, and a linear antenna array with N identical isotropic antenna elements is placed at the center of the cell.  from Fig. 4 that our proposed LBA algorithm can achieve very close sum data rate to that of the optimal brute-force search under most realizations. Since the sum data rate closely depends on the locations of users {r k : k = 1, 2, · · · , K } as shown in Fig. 4, we further focus on the average sum data ratē R s E {r k :k=1,2,··· ,K } K k=1 R k , in the following discussion. Specifically, Fig. 5 presents the simulation results of the average sum data rateR s over 1000 realizations of users' positions for both the proposed LBA algorithm and the optimal brute-force search under varying values of the transmit SNR P t /σ 2 0 . It can be clearly seen from this figure that the average sum data rateR s first increases with P t /σ 2 0 , and then becomes saturated for large P t /σ 2 0 as the system becomes interference-limited. A small gap of the average sum data rate between the optimal brute-force search and the proposed LBA algorithm can be observed for smaller values of P t /σ 2 0 , while the gap becomes noticeable when the transmit SNR P t /σ 2 0 is higher than 20dB. In this case, as the system operates at the interference-limited region, the rate gap becomes significant due to ignoring the effect of inter-beam interference in our proposed LBA algorithm. Fig. 6 further presents the simulation results of the average sum data rateR s with both the optimal brute-force search and our proposed LBA algorithm under varying values of the number of users K and the number of beams N when the transmit SNR P t /σ 2 0 is fixed at 20dB. Note that as the complexity of the brute-force search algorithm O(N K ) sharply increases with K and N, we only present the optimal average sum data rate for K ≤ 6 in Fig. 6(a), and for N ≤ 128 in Fig. 6(b). It can be seen from Fig. 6(a) that with the number of beams N fixed at 64, the average sum data rateR s increases with the number of users K as more users can be served for a larger K . With the number of users K fixed at 4 in Fig. 6(b), the average sum data rateR s logarithmically increases with the number of beams N thanks to the enhanced beam gain. It can be also clearly observed in both Fig. 6(a) and Fig. 6(b) that our proposed LBA algorithm achieves nearly the same average sum data rate as the optimal brute-force search. The gap is slightly enlarged when the number of beams N is small in Fig. 6(b) because with a small N, the beams are very wide and the overlap of two adjacent beams is large, which results in high inter-beam interference for each user.

B. Service Ratio
It should be noted that with the proposed LBA algorithm, as shown in Fig. 3, some users might not be served, since multiple users might be associated with the same beam, and only one of them can be served. In this section, how many users can be served simultaneously by using our LBA algorithm will be evaluated.
Specifically, let us first define the service ratio, P s , as the ratio of the number of served users to the total number of users K , which is given by Here N s is also the number of beams allocated for data transmission, which is given in (6). As the beam allocation result closely depends on the positions of users, similar to the average sum data rate, we further define the average service ratio asP Note thatP s sheds important light on the service delay performance as well. A largerP s indicates that more users can be simultaneously served by the system, and thus shorter average service delay can be expected for each user. Appendix D shows that by approximating the beam allocation problem as a ball-dropping problem, the average service ratioP s can be obtained as and for a large number of beams N, we havē (30) indicates that the average service ratioP s is solely determined by the ratio of the number of beams N to the number of users K . As Fig. 7 illustrates, the average service ratioP s increases as the ratio N/K increases. Intuitively, with a larger N/K , the average number of beams that could be used by each user increases, leading to a higherP s . Specifically, for a small N/K 1, it can be easily obtained from (30) thatP s ≈ N/K . As N/K → ∞, the average service ratiō P s → 1. We can clearly see from Fig. 7 that the average service ratio is above 0.9 when N/K is larger than 5, which indicates that over 90% of the users can be simultaneously served. The analysis is verified by the simulation results in Fig. 7.
Recall that it is shown in Fig. 6(b) that the average sum data rateR s increases as the number of beams N increases. Therefore, with the number of users K fixed, both the sum data rate and the service ratio can be improved by increasing the number of beams N. By contrast, for a given number of beams N, although the average sum data rateR s increases with the number of users K as shown in Fig. 6(a), the average   service ratioP s decreases as K increases. In this case, the number of users K determines a trade-off between the sum data rate and the service ratio, which can be clearly observed from Fig. 8.

V. CONCLUSION
In this paper, the beam allocation problem in switched-beam based massive MIMO systems has been studied to maximize the sum data rate. Based on submodular optimization theory, a low-complexity suboptimal beam allocation algorithm has been proposed, which achieves nearly optimal sum data rate with complexity O(K log N) with respect to the number of users K and the number of beams N. As some users might not be served for the sake of maximizing the sum data rate, the average service ratio, i.e., the average percentage of users that can be served simultaneously, is further evaluated. The analysis shows that the average service ratio is solely determined by the ratio of the number of beams N to the number of users K . Simulation results corroborate that the average service ratio can be greatly improved by increasing the ratio N/K , which is above 90% when N/K exceeds 5.
Note that in this paper, as we aim at maximizing the sum data rate from the system's perspective, not all the users can be simultaneously served. For practical systems, however, it is equally important to serve as many users as possible and consider the fairness issue among users. To balance the sum data rate and service ratio, and ensure fairness among users with varying locations, some service ratio constraint and individual data rate requirements for the users can be added into the beam allocation problem, which deserves much attention and should be carefully investigated in the future work.

APPENDIX A DERIVATION OF (5)
From (1), the denominator of (4) can be written as π 0 sin 2 (0.5Nπ cos ψ − β n ) As N is a power of 2 and sin(2x) = 2 sin x cos x, we have which can be further expanded by substituting cos 2 into it as (A.3) shown at the top of this page.
By combining (4) Recall that the objective function of (19a) is given by (B.1) To prove that R s (S) is not submodular, we only need to find a counter example where (11) does not hold.
Let us consider a special case: for any two sets S, T ⊂ U with |S| ≥ 2 and |T | ≥ 2, their intersection S ∩ T = {u k * ,n * }, and the transmit SNR P t /σ 2 0 → ∞. In this case, the inter-beam interference suffered by each user would be much larger than the noise for both set S and set T . Then the achievable sum data rate with respect to S and T can be obtained as and R s (T ) respectively, both of which have finite values. In contrast, for the intersection S ∩ T = {u k * ,n * }, as only beam n * is used by user k * in the whole system, there is no inter-beam interference. Therefore, we have Since the sum of R s (S) and R s (T ) is finite, it is obvious that According to (11), we can conclude that the objective function of (19a) is not a submodular function on the ground set U .

B. Proof of Submodularity of Objective Function in (20a) on Ground Set U Proof:
Let R ap s (S) denote the objective function of (20a) as Then for any S ⊆ T ⊆ U , and u j,l ∈ U \ T , we have Similarly, To show that the objective function of (20a), i.e., R ap s (S), is submodular, according to (12), we only need to show that (B.7) is larger than (B.8), i.e, (C.1) Proof: Denote the objective function of (25a) as R ap s (S). Since R ap s (S) is shown to be a submodular function on the ground set U in Appendix B and U is a subset of U , R ap s (S) is also submodular on the ground set U .
Moreover, for any S ⊂ T ⊆ U , we have Let u k ,n = arg min u k,n ∈T \S log 2 By substituting (C.3) to (C.2), we have For u k ,n ∈ U , beam n is the best beam with the largest directivity of user k , i.e., D n (θ k ) ≥ D n (θ k ), ∀n = n , n = 1, 2, · · · , N. It is clear from Fig. 1(b) that the minimum value of D n (θ k ) is achieved at the crosspoint of two adjacent beams. Let l and l + 1 denote the indices of any two adjacent beams, and φ denote the AoD measured at the crosspoint. Then we have and According to (1), (2), and (5), D l (φ) can be obtained as It can be then obtained from (C.6) and (C.7) that i.e., By substituting (C.9) into (C.7), D l (φ) is obtained as (C.10) By combining (C.5) and (C.10), we further have (C.11) With the distance from user k to the cell center ρ k ≤ 1, by combining (C.1), (C.4) and (C.11), we have According to the definition of a monotone set function presented in Section III-A, we can conclude that R ap s (S), i.e., the objective function of (25a), is a monotone submodular function on the ground set U .

APPENDIX D DERIVATION OF (29) AND (30)
A. Derivation of (29) For given total number of users K , the average service ratioP s can be written as To find the probability mass function of the number of allocated beams N s , let us first consider the following balldropping problem: Assume that each ball is dropped into N distinct boxes with equal probability 1/N. What is the probability that there are N s non-empty boxes after dropping K balls? The ball-dropping problem shares similarities to our beam allocation problem because by regarding beams as boxes and users as balls, associating each user with a beam is equivalent to dropping each ball into a box. The probability that N s beams are used is then the probability that N s boxes are non-empty. Note that here an implicit assumption is that each user has equal probability 1/N to be associated with any beam, which is a good approximation only when the number of beams is large, i.e., each beam approximately has equal width. For the ball-dropping problem, the total number of possible combinations by dropping K distinct balls into N distinct boxes is N K . Note that a Stirling number of the second kind K N s denotes the number of ways to partition a set of K distinct objects into N s non-empty subsets [50]. Then the number of combinations that N s boxes are non-empty is given by N N s N s ! K N s . Therefore, the corresponding probability that there are N s non-empty boxes after dropping K balls can be obtained as where the Stirling number of the second kind, K m , is given by [50]

B. Derivation of (30)
Note that a Stirling number of the second kind K m obeys the recurrence relation [50]  The average service ratioP s can be then obtained by combining (D.6) and (D.8) as