Hybrid Precoding and Combining Based on Rate Balancing for Wideband mmWave Multiuser MIMO Systems

Considering wideband millimeter wave (mmWave) multiuser multiple-input multiple-output (MIMO) systems, we propose a new hybrid precoding and combining method for rate balancing among users. The orthogonal frequency division multiple access (OFDMA) scheme is employed for multiuser transmission, and a greedy-based algorithm is developed for subband allocation under the max-min user rate criterion. A new algorithm is derived that optimizes the radio frequency (RF) precoder and combiners under the constant-modulus constraints as well as the baseband precoders and combiners under the rate balancing power allocation criterion. The proposed algorithm iteratively updates the RF precoder and combiners using the conjugate gradient method, and baseband precoders and combiners through the singular value decomposition and numerical gradient-based power allocation among users and subcarriers. The complexity of the proposed method is compared to existing wideband hybrid processing schemes through analysis and numerical runtime measurement. The convergence of the proposed algorithm is verified via numerical simulations in mmWave MIMO-OFDMA systems. It is also shown through numerical simulations that the proposed wideband hybrid processing method outperforms the existing wideband hybrid techniques in terms of the minimum user rate, regardless of the signal-to-noise ratio, the number of antennas, and the number of users. Moreover, simulation results present that the proposed scheme is more advantageous than the conventional hybrid processing methods under channel uncertainty.


I. INTRODUCTION
The use of large bandwidths in millimeter wave (mmWave) channels is very attractive as a means to accommodate the rapid growth in the volume of wireless data, and thus mmWave transmission techniques have been adopted in the mobile network standards for fifth-generation (5G) and beyond [1], [2], [3], [4], [5]. The severe path loss in mmWave bands can be mitigated by large-scale transmit and receive antennas exploiting the short wavelength. Additionally, spatial multiplexing can be realized by concurrently The associate editor coordinating the review of this manuscript and approving it for publication was Olutayo O. Oyerinde .
The fully digital baseband precoding requires excessive power consumption and cost due to the large-scale antennas in a mmWave system. To overcome this difficulty, the analogdigital hybrid architectures have been widely investigated under the constraint on the number of radio frequency (RF) chains. Initial studies related to hybrid processing have been focused on the development of hybrid precoding and combining techniques in narrowband frequency-flat mmWave channels [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23]. The orthogonal matching VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ pursuit algorithm was employed for sparse hybrid precoding and combining along with an adaptive technique for channel parameter estimation [9], [10], and the required number of RF chains was theoretically analyzed for hybrid processing [11]. Assuming the continuous phase shifts in analog processing, hybrid precoders and combiners were designed based on the alternating minimization algorithms [12], [13], [14] and the matrix factorization techniques [15], [16]. Moreover, more practical hybrid precoding schemes were studied considering low-resolution phase shifters [17], [18], [19], [20] and codebook-based feedback of precoding information [21], [22], [23]. The hybrid processing architecture has been designed for mmWave multiuser MIMO (MU-MIMO) systems with narrowband channels in terms of maximizing the sum rate [22], [23], [24], [25], [26], [27], [28] and the energy efficiency [22]. For reducing the feedback overhead, hybrid precoding methods were developed in combination with predetermined codebooks for multiuser transmission systems with a single receiver antenna [22] and multiple receiver antennas [23], respectively. For mitigating inter-user interferences in MU-MIMO systems, hybrid analog-digital precoders and combiners were constructed based on the minimum mean square error (MMSE) criterion in [24] and [25], and the RF precoding with phase shifts were combined with baseband processing with channel diagonalization techniques in [26], [27], and [28]. On the other hand, the rate balancing precoding schemes were derived under the MMSE criterion to ensure the rate fairness among users in the downlink of MU-MIMO channels [29], [30]. By applying the rate balancing approach to mmWave downlink MU-MIMO systems, hybrid precoders and combiners were developed in terms of maximizing the minimum user rate obtained by the zeroforcing (ZF) and MMSE criteria [31], [32].
Hybrid processing techniques have been further extended for wideband mmWave MIMO systems by employing dynamic subarray assignment considering partially connected architectures based on a single-user orthogonal frequency division multiplexing (OFDM) system with multiple antennas [33], [34]. For mmWave MIMO-OFDM systems, the alternating optimization algorithm was adopted to design the analog precoder for single-user and multiuser transmission in [12] and [35], respectively, and a lowcomplexity hybrid precoding scheme was devised using hierarchical beam search and dynamic beam assignment [36]. Wideband precoding methods have been further studied in order to improve the sum rate by using an iterative algorithm for the analog precoder design [37]; to design the codebook for hybrid processing with limited feedback [38]; to reduce the design complexity for hybrid processing using machine learning approaches [39]; and to develop a hardware-efficient hybrid beamforming architecture considering dynamic antenna subarrays and low-resolution phase shifters [40]. Recently, hybrid precoding methods for MIMO-OFDM systems in mmWave and/or sub-Terahertz bands have been derived in terms of maximizing the achievable sum rate, by exploiting the long-term channel covariance matrix and the angle of departure (AoD) information in [41] and by considering the beam squint in point-to-point transmission [42], [43] and multiuser transmission [44], respectively. On the other hand, the optimality of semi-unitary precoding and combining was investigated accounting for the mmWave channel scattering condition and the beam squint over frequency selective channels [45], [46], and the channel state information (CSI) of frequency selective channels was obtained through compressed sensing-based estimation techniques in block fading channels [47], [48] as well as channel tracking based on sparsity-constrained maximum likelihood estimation in time-varying channels [49].
Prior work on mmWave hybrid processing for singleuser MIMO-OFDM systems have been conducted on the design of practical hybrid precoders considering practical implementation issues such as partially connected subarray architectures [33], [34], [39], finite-resolution of analog phase shifters [38], [40], and beam squint [42], [43]. These hybrid techniques have been further extended to mmWave multiuser MIMO-OFDM systems mainly focused on the sum rate maximization [35], [36], [37], [41], [44]. When the sum rate maximization criterion is used, the matrix factorization scheme derived for narrowband hybrid precoding can be exploited to design the hybrid precoders for mmWave multiuser MIMO-OFDM systems by minimizing the Euclidean distance between the fully digital precoder and the hybrid precoder. In contrast, when the rate balancing is considered to ensure fairness among users, the RF precoder for wideband hybrid processing is adjusted to increase the user rate with the worse channel gain, thereby the narrowband hybrid precoding method cannot be applied to the design for multiuser MIMO-OFDM systems. Motivated by this fact, this paper focuses on rate balancing among users, considering the orthogonal frequency division multiple access (OFDMA) scheme with multiple transmit and receive antennas that exploits the hybrid precoding and combining for multiuser downlink transmission. To this end, we propose a new wideband hybrid processing method that the RF precoder is commonly used for all users, the RF combiner is separately designed for each user, and baseband precoders and combiners are designed for individual subcarriers in terms of maximizing the minimum user rate. The contribution of this paper is summarized as follows.
• For MIMO-OFDMA transmission, a greedy-based subband allocation algorithm is developed for rate balancing among users. To increase the minimum user rate, we select the user with the minimum channel gain and assign the subband with highest achievable rate for the selected user. This procedure is repeated until all subbands are allocated.
• We formulate an optimization problem that maximizes the minimum user rate when downlink data streams are simultaneously transmitted to multiple users using the FIGURE 1. MIMO-OFDMA system supporting hybrid precoding and combining when the transmitter with K subcarriers, N t antennas, and N RF t RF chains conveys multiple data streams to M users with N r antennas and N RF r RF chains.
OFDMA scheme with hybrid precoding and combining. Based on the conjugate gradient method, we derive a new iterative algorithm to update the RF precoder for the transmitter and the RF combiners for the user receivers, respectively. Moreover, we devise a new baseband precoding and combining method in combination with power allocation across users and subcarriers in terms of maximizing the minimum user rate, given the RF precoder and combiners.
• The complexity of the proposed method is analyzed and compared with existing wideband hybrid processing methods. Also, through numerical simulations in mmWave MIMO-OFDMA systems, the convergence of the proposed algorithm is verified by presenting the growth of the minimum user rate with the increment of the number of iterations, and the convergence speed and runtime of the proposed method are compared with those of existing wideband hybrid processing schemes.
• To validate the proposed method, various simulation results are shown that compares the performance of various wideband hybrid processing methods under the perfect and imperfect CSI conditions. It is demonstrated that the proposed method achieves substantial gain compared to existing wideband hybrid processing methods in terms of the minimum user rate, irrespective of the signal-to-noise ratio (SNR) value, the number of transmit antennas, the number of users, and the CSI conditions. The remainder of this paper is organized as follows. Section II introduces a downlink wideband mmWave MIMO-OFDMA system using hybrid precoding and combining. In Section III, we develop a subband allocation algorithm for MIMO-OFDMA transmission and propose a new design method for RF and baseband processing including power allocation for rate balancing. Complexity analysis and numerical simulation results are provided in Sections IV and V, respectively. Finally, the conclusions are presented in Section VI.
Notations: Superscripts T , H , * , and −1 denote transposition, Hermitian transposition, complex conjugate, and inversion, respectively, for any scalar, vector, or matrix. |x| means the absolute value of x; the notations |X|, ∥X∥, and ∥X∥ F denote the determinant, ℓ 2 -norm, and Frobenius-norm of matrix X, respectively; I m represents an m × m identity matrix; 0 m×n and 1 m×n denote the m × n zero matrix and all-ones matrix, respectively; tr(A) is the trace operation of matrix A; diag(x) returns a diagonal matrix whose main diagonal elements are equal to x; blkdiag(·) stands for a blockdiagonal matrix with matrices on its diagonal; [A] p,q denotes the (p, q)th element of matrix A; • and ⊗ are Hadamard and Kronecker matrix products; x ∼ CN (0, σ 2 ) means that a random variable x conforms to a complex normal distribution with zero mean and variance σ 2 ; and E[x] stands for the expectation value of a random variable x. Fig. 1 presents a MIMO-OFDMA system with hybrid precoding and combining for multiuser downlink transmission, when the transmitter with K subcarriers, N t antennas, and N RF t RF chains transfers multiple data streams to M users with N r antennas and N RF r RF chains, respectively. For notational convenience, it is assumed that all users have the same number of receive antennas, RF chains, and data streams. Notice that the proposed wideband hybrid processing method in Section III can be applied to a MIMO-OFDMA system whose receivers have different number of antennas and RF chains. The entire frequency band is divided into multiple subbands consisting of contiguous subcarriers, and the subbands are assigned to users for OFDMA-based transmission. S m and K m denote the set composed of the VOLUME 11, 2023 subcarrier indices assigned to user m and the number of subcarriers assigned to user m (i.e. K m is the cardinality of S m ), respectively.

II. SYSTEM MODEL FOR MIMO-OFDMA
Let us define {H m (d) ∈ C N r ×N t ; 1 ≤ m ≤ M , d = 0, 1, · · · , D − 1} as the frequency-selective discrete-time MIMO channel for the d-th tap of user m, where D is the number of taps for discrete-time channels. {H m (d)} are modeled by using the transmit and receive antenna array response vectors, complex path gains, time-domain path delays, and discrete-time filter responses, as shown in [33], [34], [35], [36], [37], [38], [39], [40], [43], [44], [45], [46]. Then, by taking the K -point discrete Fourier transform (DFT), the MIMO channel corresponding to subcarrier k can be obtained as where 1 ≤ m ≤ M and 1 ≤ k ≤ K . It is assumed that the frequency-domain channels for all users, {H m,k }, are available at the transmitter. A time division duplexing system can estimate the CSI {H m,k } from the uplink reference signals using the channel reciprocity [47], [48], [49], and a frequency division duplexing system can obtain the CSI through the feedback from users [22], [38]. CSI errors can be caused by imperfect channel estimation and/or the outdate of CSI, and the effect of CSI uncertainty is evaluated through numerical simulations in Section V-C. Each subcarrier transmits N s data streams using the baseband precoding followed by the inverse DFT (IDFT) and the RF precoding. Specifically, the transmit symbol vector for subcarrier k is expressed as where F ∈ C N t ×N RF t is the RF precoder whose elements have a constant amplitude, i.e. [F] p,q = 1 is the baseband precoder for subcarrier k, and s k ∈ C N s ×1 is the modulated symbol vector for subcarrier k satisfying E[s k s H k ] = I N s . The RF precoder and baseband precoders meet the transmit power constraint given by where R = K k=1 G k G H k and P is the maximum transmit power.
For simplicity, we assume that each subband (or subcarrier) is allocated to only one user. At the receiver, each user conducts the RF combining followed by the K -point DFT and baseband combining. When subcarrier k has been assigned to user m, the corresponding received signal is given by where V k ∈ C N RF r ×N s is the baseband combiner for subcarrier k , 1 W m ∈ C N r ×N RF r is the RF combiner for user m, and n m,k ∈ C N r ×1 is the noise vector for subcarrier k of user m whose elements are independent and identically distributed (i.i.d.) complex Gaussian variables with zero mean and variance σ 2 m,k , i.e. n m,k ∼ CN (0, σ 2 m,k I N r ). As in the RF precoder, the elements of {W m } have a constant amplitude and adjustable phases, i.e. [W m ] p,q = 1 √ N r e jφ (m) p,q . By substituting (2) into (4), we have where k ∈ S m . Following the approach in [33] and [40], the achievable rate for user m is given by Considering the transmit power constraint in (3) and the constant-modulus constraints of RF processing matrices, the problem for maximizing the minimum user rate is formulated as follows:

III. PROPOSED HYBRID PROCESSING FOR MIMO-OFDMA
In this section, we introduce a subband allocation scheme for maximizing the minimum user rate. Then, considering the constant modulus phase shifting operation of analog circuits, we derive a new algorithm to design the RF precoder and combiners under the rate fairness criterion. Also, we develop a new design method for rate balancing baseband precoding and combining. Fig. 2 describes the overall procedure for the proposed wideband hybrid processing method whose details are provided in the following subsections.

A. SUBBAND ALLOCATION FOR RATE BALANCING
This subsection develops a simple greedy-based subband allocation algorithm accounting for the max-min user rate criterion. Suppose that the number of subbands, N B , is equal to or greater than the number of users for OFDMA transmission, i.e. N B ≥ M , in order to assign at least one subband to each user. The subband allocation method is presented in Algorithm 1, where B s is the set for subcarrier indices included in the subband s. Initially, we define the set for user indices without subbands as M, the set for non-allocated subband indices as A, and the set for subcarrier indices assigned to user m as S m . We allocate one subband at a time to the user with the minimum channel gain, and sequentially repeat the procedure until all subbands are assigned. If there exists any user with no assigned subbands (i.e. i ≤ M ), we compute the per-user subband channel gains g(m, s) and the sum channel gainsg(m) for non-allocated users (m ∈ M) and non-allocated subbands (s ∈ A). Then, we select the user with the minimum channel gain from non-allocated users to ensure rate fairness among users. When at least one subband is assigned to all users (i.e. M < i ≤ N B ), the sum channel gains of non-allocated subbands are evaluated for all users (m = 1, 2, · · · , M ), and the user with the minimum gain is selected. Then, we assign the subband with the maximum channel gain among all available subbands to the selected user m o . Considering the allocation result at the ith iteration, we update the sets S m o , M, and A. This procedure is repeated until all subbands are allocated.

B. DESIGN OF HYBRID PRECODERS AND COMBINERS
Given the subband allocation to users, S 1 , S 2 , · · · , S M , we derive a new hybrid processing method for mmWave MIMO-OFDMA systems. Due to the nonconvex constraints in (7a) and (7b), it is difficult to directly solve (7) to find an optimal solution for hybrid precoding and combining.
To make the optimization problem more tractable, we divide (7) into several subproblems by employing the block coordinate descent (BCD) method. Also, the maximum transmit power is used at the optimal point for maximizing the minimum user rate, and thus the constraint (7d) is changed to tr(FRF H ) = P. Firstly, given {G k }, {W m }, and {V k }, the subproblem for finding an optimal RF precoder is expressed as where r m (F) means the achievable rate of user m expressed as a function of F from (6). The elements of F satisfy the Algorithm 1 : Subband Allocation Scheme for Rate Balancing in MIMO-OFDMA Systems 1. Input: for m ∈ M do 6.
for s ∈ A do 7.
Compute the per-user subband channel gains: 8. end for 9.
Find the user index with the minimum channel gain: m o = arg min m∈Mg (m).
Get the sum gains: 14. constant-modulus constraints in (8b), and the RF precoder and baseband precoders jointly meet the transmit power constraint in (8c). Here, when F is adjusted to increase the minimum user rate, the scale of G k needs to be changed to satisfy (8c) and this makes it difficult to optimize F without regarding to {G k }. To avoid this problem, we separate {G k } into two parts as follows: where ρ is a scaling factor irrespective of the subcarrier index k and G k is a normalized baseband precoder for subcarrier k It is noticeable that { G k } are fixed and only ρ is a function of F in (10).
To find the optimal solution of (10), we derive an iterative algorithm. For any RF precoder F, we can find the minimum user rate r min and the corresponding user index n as below: i.e. r min = min{r 1 (F), · · · , r M (F)} = r n (F). If we neglect (10b), the problem (10) can be represented as a non-smooth generalized quadratic matrix programming (GQMP) following the definition in [50], and r m (F) for any m can be approximated as a concave function using the lower bound and upper bound as shown in Section II-B of [50]. Furthermore, r n (F) also can be approximated as a concave function because r n (F) is defined by selecting the minimum among {r 1 (F), · · · , r M (F)} [51]. Using this property, we find the Euclidean gradient of r n (F) and then derive an iterative update procedure based on the conjugate gradient method for accounting for the constraints (10b). By substituting (10c) into (6), the minimum user rate is expressed as k . An optimal F maximizing r n (F) can be iteratively found by the conjugate gradient method in [51]. To do this, the Euclidean gradient of r n (F) with respect to F is obtained as where c n,k = σ 2 n,k P tr(FRF H ), Q k = c −1 n,k A k FR k F H A H k , and λ k,ℓ is the ℓth singular value of A k F G k . Taking into account the constant-modulus constraints (10b), we compute the Riemannian gradient through the orthogonal projection of the Euclidean gradient ∇r n (F) in (13) onto the tangent space as follows [52]: where Re(x) means the real part of a complex x. Now, by employing the update procedure of the conjugate gradient method, the gradient direction matrix for ith iteration, D(i), is adjusted as where F(i) is the RF precoder at ith iteration and β is a stepsize parameter. Using (15), the RF precoder is projected to a constant-modulus matrix conforming to (10b) as below: where 1 ≤ p ≤ N t , 1 ≤ q ≤ N RF t , and α is a step-size parameter to update the RF precoder.
As a next step, given F and {G k } for hybrid precoding and Note that r m (W m ) is the achievable rate for user m expressed as a function of W m which is separately computed for each user using pre-determined {G k ; k ∈ S m } and {V k ; k ∈ S m }. Specifically, the achievable rate for user m is given by where In a similar approach to the RF precoder, the RF combiner can be adjusted by using the conjugate gradient method. As before, the Euclidean gradient of r m (W m ) is expressed as and the Riemannian gradient corresponding to ∇r m (W m ) is computed as Moreover, we update the RF combiner by projecting to a constant-modulus matrix conforming to (17b) as follows: where 1 ≤ p ≤ N r , 1 ≤ q ≤ N RF r , and D m (i) is the gradient direction matrix for W m at ith iteration which is iteratively computed as Note the procedure in (19)-(22a) is separately conducted for each user to update all RF combiners. Now, given the RF precoder F and the RF combiners {W m }, we design the baseband precoders and combiners. From (7), the subproblem for optimizing {G k } and {V k } is formulated as Considering the RF combining at the receiver, let us denote the baseband precoder for subcarrier k of user m as where (W H m W m ) − 1 2 is a whitening filter for the RF combined signal andV k ∈ C N RF r ×N s is the baseband combining matrix except the whitening filter. By substituting (24) into (5), the received signal can be rewritten as k is the effective noise vector whose distribution is the same as n m,k . By singular value decomposition (SVD), the effective channelH m,k is factorized as whereṼ k ∈ C N RF r ×N s andŨ k ∈ C N RF t ×N s are the matrices composed of left and right singular vectors corresponding to N s largest singular values ofH m,k , respectively, and k ∈ R N s ×N s is a diagonal matrix whose diagonal elements are the N s largest singular values ofH m,k . From (25) and (26), G k andV k are given by where P k ∈ R Ns×Ns is a diagonal matrix for power allocation of subcarrier k.
Using G k andV k in (27), the achievable rate for user m can be expressed as a function of {P k } as below: where D k = (1/σ 2 m,k ) 2 k . Again, by substituting (27a) into (23b), we have i = i + 1.

5.
Using (11), get the minimum user rate r min (i) and find the corresponding user index with the minimum rate, n.
Define {Ũ k } and {V k } in (27)   where C k = diag([c k,1 , c k,2 , · · · , c k,N s ]) and c k,ℓ is the (ℓ, ℓ)th element ofŨ H k F H FŨ k . Here, let us denote the transmit power to user m as whereP k = C k P k . From (28), the achievable rate for user m can be rewritten as whereP k is determined by the water-filling solution using the diagonal elements of D k C −1 k and p m . Using (31), the optimization problem in (23)  Here, finding the optimal {p m } includes to determine {P k } in (30) which means the power allocation across subcarriers assigned to user m. The power allocation problem in (32) can be solved by the numerical gradient-based method described as Algorithm 2 in [31]. As mentioned before,P k corresponding to the optimal p m is determined by the waterfilling algorithm. Finally, from (24) and (27), the baseband precoders and combiners are updated as In the proposed algorithm, the hybrid precoders and combiners are designed by repeating the optimization of F and {W m } using the conjugate gradient methods in (16) and (21), respectively, followed by the update of {G k } and {V k } using the SVD and the rate balancing power allocation in (26) and (33). The overall design procedure of the proposed wideband hybrid precoders and combiners is summarized as Algorithm 2, where ϵ means the tolerance for termination and T is the maximum number of iterations. It is noticeable that the convergence of the proposed algorithm is presented via numerical simulations in Section V-A.

IV. COMPLEXITY ANALYSIS
In this section, we compare the complexity order of the proposed method with those of existing wideband hybrid processing methods. Because Algorithm 2 has much higher computational complexity than Algorithm 1, we only consider the complexity order of Algorithm 2. For simplicity, it is assumed that the same number of subcarriers is assigned to all users, i.e. K m = K M , and also assumed that N t ≫ N RF t , N r ≫ N RF r , and N t ≥ N r . The complexity of the proposed Algorithm 2 is summarized in Table 1, where J 1 denotes the number of iterations until convergence. Since N t ≫ N RF t and N r ≫ N RF r , the overall complexity order of the proposed method is given by O (J 1 KN t N r N s ).
Compared to the proposed hybrid method, Table 2 presents the time complexity of existing hybrid processing methods when the algorithms are extended to a MIMO-OFDM system with K subcarriers. Here, J 2 denotes the number of iterations for convergence and N cb is the codebook size for RF precoding. It was assumed that the codebook size for RF precoding is greater than or equal to that for RF combining in the codebook-based hybrid method. When J 1 ≈ J 2 and N cb ≈ N r , the complexity order of the proposed hybrid method is similar to those of the phase extraction alternating optimization (PE AO) hybrid method and the codebook-based hybrid method. Moreover, the complexity of the proposed method is proportional to N t , whereas the Broyden-Fletcher-Goldfarb-Shanno (BFGS) hybrid method requires the number of operations proportional to N 2 t . When N t ≈ N r and J 1 > N t , the channel averaging-based method has lower complexity order than the proposed scheme. In Section V-D, the time complexity of the proposed method is compared to existing wideband hybrid processing schemes in terms of the runtime.

V. SIMULATION RESULTS
This section presents numerical simulation results for the convergence of the proposed method and compares the performance of the proposed method with Algorithms 1 and 2 with those of existing wideband hybrid processing techniques in terms of the minimum user rate and the runtime. Specifically, we consider the following hybrid processing methods for MIMO-OFDMA systems.
• Fully digital processing: Algorithm 1 is used for subband allocation to users. Each subcarrier uses the fully digital optimal precoder and combiner obtained by SVD of H m,k . The numerical gradient-based method in [31] is used for rate balancing power allocation among users in combination with the water-filling algorithm. This method presents the performance upper bound of the MIMO-OFDMA system with Algorithm 1.
• Proposed hybrid method: Algorithm 1 is used for subband allocation to users. The hybrid precoders and combiners are designed by Algorithm 2 for ensuring rate fairness among users.
• BFGS hybrid method [16], [33]: time-division multiple access (TDMA) 2 is used to support multiple users and all subcarriers are allocated to one user in each time interval for MIMO-OFDM transmission. The BFGS algorithm 2 The existing hybrid processing methods cannot be directly applied to the MIMO-OFDMA system, because the conventional schemes are based on the matrix factorization approach in [9] to design the RF and baseband precoders from the fully digital precoder under the least squares criterion. Therefore, we use a MIMO-OFDM system along with TDMA for multiuser transmission.
in [16] is employed to design the RF precoder and combiners, and the SVD-based scheme in [33] is used to design the baseband precoders and combiners.
• PE AO hybrid method [12]: TDMA is used to support multiple users and the MIMO-OFDM transmission is used in each time interval. The common RF processors and the baseband processors for MIMO-OFDM are designed by utilizing the SVD-based phase extraction (PE) scheme and the alternating optimization (AO) method in [12].
• Codebook-based hybrid method [38]: TDMA is used to support multiple users and the MIMO-OFDM transmission is used in each time interval. The codebook-based selection method in [38] is used to determine the RF precoder and combiner. Also, the baseband precoder is designed by the Gram-Schmidt-based greedy algorithm in [38] and the baseband combiner is designed by SVD of the effective channel.
• Channel averaging-based method [49]: TDMA is used to support multiple users and the MIMO-OFDM transmission is used in each time interval. As in [49], the RF precoder and combiners are designed in a similar manner to a single-carrier system by averaging the frequencydomain multiple channels. Given the RF precoder and combiners, the baseband precoders and combiners are designed by the SVD-based approach. Numerical simulations have been conducted with N r = 32, N RF t = N RF r = 4, and N s = 2. The number of subcarriers K is set to 132 (each subcarrier represents a resource block (RB) of 5G New Radio in [5], assuming that the channel is identical within a RB). For Algorithm 1, the number of subbands is equal to the number of users (i.e. N B = M ), and the same number of subcarriers are assigned to all users (i.e. K 1 = K 2 = · · · = K M ). In the proposed Algorithm 2, we set α = 2, β = 0.5, ϵ = 0.0001, and T = 300. The maximum number of iterations is set to 300 for the BFGS hybrid method, the PE AO hybrid method, and the channel averaging-based method. For the codebook-based hybrid method, two codebooks were designed with 512 quantized phase shifting vectors considering the angle-of-departure (AoD) and angle-of-arrival (AoA) for the RF precoding and combining, respectively.
To generate the wideband mmWave MU-MIMO channels, we used the Saleh-Valenzuela model for representing geometric channel parameters as in [33], [34], [35], [36], [37], [38], [39], [40], [41], [49] in combination with the tapped delay line A (TDL-A) model in [53] for the channel delay profile. The channel parameters are set as follows: the carrier frequency is 28 GHz; the bandwidth is 200 MHz with 132 RBs; the sampling rate is 245.76 Msps; the transmitter has a 8 × n planar antenna (4 ≤ n ≤ 18); the users have 4 × 8 planar antennas; the number of clusters is 3; the number of subpaths per cluster is 8; the AoD and AoA for each cluster are uniformly distributed from −π to π in the azimuth direction and from −0.5π to 0.5π in the elevation direction, respectively; the subpath angular spread is set to π/64 and π/16 for the transmitter and receiver, respectively; and the  inter-element spacing is equal to half wavelength for both the transmitter and receiver. The average channel gains are set asymmetrically with 10 dB deviation to reflect the distance variation from the transmitter to the receiver. Specifically, the channel gains meet where a random variable ζ k uniformly distributed in the range of (0.1, 1.0). For simplicity, it is assumed that the noise variance is identical to all subcarriers and all users (i.e. σ 2 m,k = σ 2 for ∀m, k), and the nominal signal-to-noise ratio (SNR) is defined as the total transmit power over the noise variance, i.e. P/σ 2 . Fig. 3 shows the transient user rate obtained from an instantaneous channel realization, and Figs. 4 -7 present the average of the minimum user rate acquired by averaging over more than 100 independent channel realizations.

A. CONVERGENCE OF PROPOSED ALGORITHM
Using the CSI available at the transmitter, the subband is allocated to users by Algorithm 1. In Fig. 3, the convergence of the proposed Algorithm 2 is verified through numerical simulations, when M = 4, N t = 64, and SNR = 0 or 10 dB. When the number of iterations is less than 10, the achievable VOLUME 11, 2023 rates for all users rapidly grow with the iterations, and the minimum user rate converges to a steady-state value when the number of iterations is greater than 150. The difference of user rates for SNR = 10 dB is larger than that for SNR = 0 dB, because the hybrid processing gain is prominent in the high SNR region. Also, the steady-state user rate is higher in the high SNR regime. For these reasons, the minimum user rate converges faster in the low SNR region compared to the high SNR regime. Fig. 4 compares the convergence characteristics of various wideband hybrid processing methods when M = 4, N t = 64, and SNR = 15 dB. The fully digital processing and the codebook-based hybrid method obtain the minimum user rate in closed forms without iterations. The proposed hybrid method requires more number of iterations than the existing wideband hybrid methods. For example, in Fig. 4, the proposed method and the BFGS method converge to the peak values after about 80 and 50 iterations, respectively. In the proposed hybrid method, the precoder is updated to the conjugate gradient direction of the user with the lowest achievable rate for rate balancing. This approach reduces the computational complexity for each iteration, yet requires additional iterations due to the user switching with the minimum achievable rate. The PE AO hybrid method and the channel averaging-based method converge faster than the BFGS hybrid method and the proposed method, because the PE AO method designs the baseband precoders and combiners via SVD-based approximation and the channel averaging-based method constructs the RF precoder and combiners by simply averaging channel correlation matrices. Although slow convergence speed, the proposed hybrid method achieves much higher minimum user rate than the existing wideband hybrid processing schemes in the steady-state. In the following, the performance will be compared in detail.

B. PERFORMANCE EVALUATION UNDER PERFECT CSI
Considering the rate balancing criterion, the minimum user rate of the proposed method is compared to the existing wideband hybrid processing methods, when the transmitter has the perfect CSI for all users. The minimum user rate is shown for various wideband hybrid processing techniques according the SNR in Fig. 5 and the number of transmit antennas in Fig. 6, respectively, when M = 4. The proposed scheme designs the analog precoder considering the power allocation among users for maximizing the minimum user rate as well as the water-filling-based power assignment within a subband, whereas the existing wideband hybrid techniques determine the analog precoder via the matrix factorization technique without power allocation among subcarriers. Therefore, the proposed hybrid method outperforms the BFGS hybrid method, the PE AO hybrid method, the codebook-based hybrid method, and the channel averagingbased method, irrespective of the SNR region and the number of transmit antennas. Compared to the fully digital processing representing the performance upper bound, the proposed  In the proposed method, considering that the RF precoder is commonly used for all subcarriers and the RF combiner is also commonly used for each user, the performance loss seems to be reasonable. Fig. 7 compares the minimum user rate of the proposed method with those of the conventional wideband hybrid processing schemes across the number of users, when N t = 64 and SNR = 5 dB. In this case, the entire 132 subcarriers are divided into M subbands of the same size as possible. For example, when M = 7, the subbands include 19,19,19,19,19,19, and 18 subcarriers, respectively. For all hybrid processing methods, the minimum user rate decreases as the number of users increases, because the number of subcarriers assigned to one user is inversely proportional to M . As in Figs. 5 and 6, the proposed hybrid method performs better than the existing wideband hybrid processing methods regardless of the number of users. The performance gap between the fully digital processing and the proposed scheme is about 0.4 ∼ 0.5 bps/Hz when 2 ≤ M ≤ 8. As the number of users increases, the performance difference between the proposed method and the conventional wideband hybrid processing techniques gradually decreases due to the reduction of overall user rates.

C. IMPACT OF CSI UNCERTAINTY
This subsection evaluates the impact of CSI erros on the performance of various wideband hybrid precessing methods including the proposed scheme. Imperfect channel estimation and/or the outdate of CSI can lead to CSI errors. When the channel estimation is carried out in the frequency domain, the channel for subcarrier k of user m can be expressed aŝ where E m,k ∈ C N r ×N t is a CSI error matrix whose elements are i.i.d. Gaussian random variables with zero mean, 1 ≤ m ≤ M , and 1 ≤ k ≤ K . By employting the normalized mean square error (NMSE), the average power of CSI errors relative to the corresponding channel gains is denoted as Considering a practical MIMO-OFDMA system with CSI errors, we perform Algorithm 1 for subband allocation and Algorithm 2 for hybrid processing usingĤ m,k in (34) instead of the perfect CSI H m,k . For comparison, the hybrid precoders and combiners are also designed byĤ m,k in the fully digital processing and the existing hybrid processing methods. Fig. 8 presents the minimum user rate according to the NMSE of CSI errors, when M = 4, N t = 64, and SNR = 20 dB. In order to simplify numerical simulations, it is assumed that the NMSE is identical to all users, i.e. σ 2 ϵ,1 = · · · = σ 2 ϵ,M = σ 2 ϵ . As the NMSE σ 2 ϵ increases,  the minimum user rate decreases in all hybrid precoding methods. Especially, when σ 2 ϵ > 0.3, the minimum user rate rapidly decays in all hybrid processing schemes. As in the simulation results under perfect CSI, the proposed hybrid method outperforms the conventional wideband hybrid processing techniques such as the BFGS hybrid method, PE AO method, codebook-based method, and channel averagingbased method, regardless of the NMSE. The performance degradation due to the CSI error is similar in all wideband hybrid processing schemes including the proposed method. Moreover, compared to the fully digital processing, the performance loss of the proposed scheme is the largest when σ 2 ϵ = 0.01 and gradually decreases with the increment of the NMSE, because the minimum user rate of the fully digital processing diminishes faster than that of the proposed method.

D. RUNTIME COMPARISON
To compare the time complexity of various wideband hybrid processing methods, the average runtime is measured through VOLUME 11, 2023 numerical simulations. In the simulation, we implemented the hybrid processing algorithms using MATLAB R2022b and executed in a server with i7-12700 4.9 GHz CPU, 16 GB RAM, and 64-bit operating system. Fig. 9 compares the average runtime of the proposed hybrid method with the existing wideband hybrid processing schemes according to the number of antennas, when M = 4 and SNR = 5 dB. Every value was obtained by averaging the runtime over at least 100 independent simulations. The execution time of the proposed method includes the procedures for Algorithms 1 and 2, and the runtime of conventional hybrid processing schemes is the average value for TDMA-based hybrid processing of M users. From the complexity analysis in Table 2, the proposed method, the PE AO hybrid method, and the codebook-based method have the complexity order O(N t ), whereas the BFGS hybrid method and the channel averaging-based method have the complexity order O(N 2 t ). As expected in the complexity analysis, the runtime of the BFGS hybrid method rapidly grows as the number of transmit antennas increases. The runtime of the channel-averaging method grows more slowly than the BFGS scheme, because we set J 1 = J 2 > K in the simulation and the channel averaging process is suitable for parallel operations. Overall, the proposed hybrid method presents a similar runtime with the PE AO hybrid method and the codebook-based hybrid method, and requires less computation time than the BFGS hybrid method. The proposed method necessitates more runtime than the channel averaging-based method, yet it acquires huge achievable rate gains compared to the channel averaging-based method as shown in Figs. 5 -8.

VI. CONCLUSION
For mmWave MIMO-OFDMA systems, a greedy-based subband allocation algorithm was developed and a new design method was proposed for hybrid precoding and combining under the max-min rate criterion. In order to increase the minimum user rate, the proposed method iteratively updates the RF precoder and combiners using the conjugate gradient method as well as adjusts the baseband precoders and combiners utilizing SVD and rate balancing power allocation. The proposed method achieves better minimum user rate than the existing wideband hybrid processing schemes, requiring a complexity order comparable to the conventional hybrid techniques. The proposed method can be exploited for hybrid precoding and combining in future 5G-Advanced and 6G mobile networks operating in mmWave and sub-Terahertz bands. For future studies, it is important to take into account practical imperfections such as CSI uncertainty associated with implementation of mmWave wideband systems. Moreover, the proposed wideband hybrid processing technique can be combined with an intelligent reflecting surface (IRS) to enhance the link performance of future wireless systems, and it is a good future research topic that jointly optimizes the hybrid precoders, the hybrid combiners, and the phase shifts of IRS elements.