Iterative Analog–Digital Multi-User Equalizer for Wideband Millimeter Wave Massive MIMO Systems

Most of the previous work on hybrid transmit and receive beamforming focused on narrowband channels. Because the millimeter wave channels are expected to be wideband, it is crucial to propose efficient solutions for frequency-selective channels. In this regard, this paper proposes an iterative analog–digital multi-user equalizer scheme for the uplink of wideband millimeter-wave massive multiple-input-multiple-output (MIMO) systems. By iterative equalizer we mean that both analog and digital parts are updated using as input the estimates obtained at the previous iteration. The proposed iterative analog–digital multi-user equalizer is designed by minimizing the sum of the mean square error of the data estimates over the subcarriers. We assume that the analog part is fixed for all subcarriers while the digital part is computed on a per subcarrier basis. Due to the complexity of the resulting optimization problem, a sequential approach is proposed to compute the analog phase shifters values for each radio frequency (RF) chain. We also derive an accurate, semi-analytical approach for obtaining the bit error rate (BER) of the proposed hybrid system. The proposed solution is compared with other hybrid equalizer schemes, recently designed for wideband millimeter-wave (mmWave) massive MIMO systems. The simulation results show that the performance of the developed analog–digital multi-user equalizer is close to full-digital counterpart and outperforms the previous hybrid approach.


Introduction
The underutilized millimeter-wave (mmWave) frequency spectrum has been explored for future wideband cellular communication networks because there is an overcrowding of conventional sub-6 GHz bands [1]. Together with mmWave, the use of a large or a massive number of antennas allows higher data rates for future wireless networks [2]. Therefore, mmWave communications and massive MIMO (mMIMO) are considered as two key technologies for future 5G communications [3].
The combination of mmWave with mMIMO is very attractive because, when compared to the current communication systems, it has a smaller wavelength, and more antennas can be compacted in the same volume [4]. This combination offers more degrees of freedom, but it also leads to more correlated channels [5], and thus, new and efficient beamforming techniques and spatial multiplexing for both the transmitter and the receiver sides must be exploited [6]. Furthermore, the power consumption and high cost of analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) mixers

Related Works
In this section, we briefly review the state-of-the-art on hybrid analog-digital architectures. Namely, beamforming approaches designed for narrowband single-user mmWave communications systems have been proposed in [11][12][13]. Particularly, the authors of [11] designed a bird swarm algorithm based on a matrix-inversion bypass precoder algorithm to overcome the large complexity of orthogonal matching pursuit method, namely the implicit matrix inversion. In [12], a general solution for single user narrowband systems was proposed to convert any existing precoder/combiner designed for the full digital structure into an analog-digital precoder/combiner for the hybrid structure. The nonconvex problem is decoupled into a series of convex subproblems and then solved by a singular value decomposition-based technique to obtain an initial solution. Next, the phase increment of each entry in the RF precoder/combiner from iteration to iteration is restricted. The approach discussed above assumed a fully connected hybrid architecture, where each RF chain is connected to all antennas. However, this architecture may require a large number of connections; thus, one possible solution is the use of the subconnected hybrid architectures presented in [13], where each RF chain is connected only to a subset of antennas.
Transmit and receive beamforming approaches designed for narrowband multi-user mmWave communications systems have also been proposed in [14][15][16][17][18][19][20][21]. The authors of [14] designed an uplink receive beamforming with single antenna user terminals (UTs) that consider the multi-user interference at two stages: analog and digital. In [15], it was proposed four different digital precoding techniques with a hybrid beamformer for multicell systems, to reduce inter-beam interference. An iterative matrix decomposition based on the subspace projection of angles of departure (AoD) of the channel paths to perform the block diagonalization was proposed for downlink mmWave narrowband channel was proposed in [16] to obtain interference free channels for the UTs. The authors of [17] designed an iterative hybrid precoder and combiner algorithm by exploiting the duality of the uplink and downlink multi-user mmWave MIMO channels. A hybrid weighted minimum mean square error (MMSE) precoding and combining scheme was proposed in [18]. In this algorithm, the hybrid precoder Sensors 2020, 20, 575 3 of 20 and combiners are optimized and updated in an iterative manner to minimize a weighted sum-MSE cost function. The authors of [19] proposed a linear detector for uplink systems, where in order to avoid the direct matrix inverse of the MMSE, which is computationally expensive, they proposed an improved Jacobi iterative algorithm. This algorithm works by accelerating the convergence rate of the signal detection process, and where a whole-correction method is applied to update the iterative process. In [20], it was proposed a precoder for a downlink multiuser system, with partially connected hybrid beamforming architecture, and where the analog part of precoder is composed by both variable phase shifters and constant phase shifters. Additionally, the sum rate was considered as a metric and a greedy algorithm was employed to reduce the complexity of the algorithm, since the exact solution of the combinatorial problem is mathematically intractable. A hybrid iterative block space-time equalizer, based on IB-DFE principles [22], was proposed [21].
The previous works mainly focused on hybrid beamforming approaches for either single or multi-user narrowband systems. However, the design of a hybrid solution for mmWave mMIMO wideband communications is of paramount importance. Hybrid approaches specifically designed for single user can be found in [23][24][25]. Hybrid precoding solutions and codebooks for limited feedback wideband mmWave systems were discussed in [23]. It was assumed that the digital precoding is performed in the frequency domain and can be different for each subcarrier, and the analog precoder is constant over the subcarriers. To flatten the fading channel over a wide band and maximize the system capacity, a signal-to-interference ratio constrained capacity maximization algorithm to design the precoder and the combiner was proposed in [24]. A closed-form solution for fully connected orthogonal frequency division multiplexing (OFDM) hybrid analog/digital precoding for frequency selective mmWave single user systems was developed in [25]. This approach was then extended to the partially connected case, and a novel technique that dynamically constructs the hybrid subarrays knowing the long-term channel characteristics was proposed. Recently, solutions for wideband mmWave multi-user downlink massive MIMO-OFDM systems were proposed in [26], and for uplink in [27][28][29][30][31]. Hybrid precoders for downlink OFDM wideband mMIMO systems, aimed at minimizing the total transmit power of the base station, subject to both the coverage constraint of signaling and data rate requirements of users were proposed in [26]. The authors of [27] designed a joint spatio-radio resource and three hybrid precoding algorithms for systems with limited feedback: (1) a hybrid precoder with user-beam selection to maximize the sum proportional rate; (2) a low complexity suboptimal solution using limited statistical channel state information (CSI) feedback; (3) a k-mean algorithm based on an unsupervised machine learning scheme. In [28], a hybrid linear equalizer for sub-connected hybrid architectures that minimizes the average BER over all the subcarriers was designed. Also for sub-connected architecture, but using a dynamic subarray antennas, it was designed in [29], a two-step hybrid equalizer, where in the first step, the antennas are dynamically mapped to the RF chains and then, in the second step, an iterative digital equalizer is designed. In [30,31], it was also applied the two-step approach, both for full-connected architectures, using in [30] the constant envelope OFDM (CE-OFDM) modulation technique, and in [31], SC-FDMA.

Contributions
The major novelty of this work is the design of a fully iterative hybrid multi-user equalizer, where both analog and digital parts of equalizer are computed iteratively, allowing better performance than the two-step approaches, with fixed analog part, proposed by the authors in [29][30][31]. Both the analog and digital parts are derived by minimizing the sum of the mean square error, which can be shown to be equivalent to minimizing the weighted difference between the hybrid and the fully digital equalizers. For this, we assume that the analog part is constant over all the subcarriers and the digital part is computed on a per subcarrier basis. Due to the complexity of the optimization problem, we propose an approach to sequentially compute the analog phase shifters for each RF chain, i.e., we first compute the analog coefficients for RF chain 1, then 2, and so on. The computational complexity of the proposed fully iterative analog-digital is higher than the two-step approaches [29][30][31], but its performance is Sensors 2020, 20, 575 4 of 20 clearly better. Moreover, the simulation results show that the proposed scheme achieves a performance close to the fully digital equalizer.
The remainder of this paper is organized as follows: Section 2 describes the system model adopted in this work. Section 3 describes the analog precoders employed at each UT. In Section 4, we design the hybrid iterative analog-digital multi-user space-frequency equalizer. Section 5 describes the main performance results. Finally, the conclusions are presented in Section 6.

Notations
For any matrix A, denoted by boldface capital letters, or for any column vector a, denoted by boldface lowercase letters, tr(.), (.) * , (.) T , and (.) H denote the trace, the conjugate, transpose and Hermitian of a matrix, respectively. The operator sign(a) represents the sign of real number a; it is applied element-wise to matrices and sign(c) = sign( (c)) + jsign( (c)) if c ∈ C. The functions (c) and (c) represent the real part and imaginary part of c, respectively. {α l } L l=1 represents an L length sequence. The function diag(a) gives a diagonal matrix A, where the entries of the diagonal of A are equal to vector a. The function diag(A) gives a vector equal to the entries of diagonal of the matrix A. The vector a = [a q ] 1≤q≤Q 1 ∈ C Q 1 Q 2 and the matrix A = [A q ] 1≤q≤Q 1 ∈ C Q 1 Q 2 ×L are the concatenation of vector a q ∈ C Q 2 and matrix A q ∈ C Q 2 ×L , respectively. The element of row n and column l of A is denoted by A(n, l). The matrix I N is the identity matrix N × N. Finally, the indices, t, k, and u represent the time domain, subcarrier in the frequency domain, and user terminal, respectively.

System Model Characterization
In this manuscript, we consider an uplink mmWave system with U users sharing the same radio resources and equipped with a single RF chain and N tx transmit antennas, where each user transmits a single data stream per subcarrier. The base station is equipped with N RF rx RF chains and N rx receive antennas, with U ≤ N RF rx ≤ N rx [14]. The considered uplink system uses SC-FDMA as the access technique, with N c available subcarriers.
We consider a clustered channel discussed in [23], where the delay-d MIMO channel matrix of the uth user, H u,d , represents the sum of the contribution of N cl clusters, each of which contribute with N ray propagation paths, and may be expressed as where α u q,l is the complex path gain of the lth ray in the qth scattering cluster, and a raised-cosine filter is adopted for the pulse shaping function p rc (.) for T S -spaced signaling as in [23]. The qth cluster has a time delay τ u q while each ray l from qth cluster has a relative time delay τ u q,l . The angles φ rx,u q,l , θ rx,u q,l , φ tx,u q,l , and θ tx,u q,l are the azimuth and elevation angles of arrival and departure, respectively. For instance, φ rx,u q,l has a Laplacian distribution, with mean φ rx,u q uniformly distributed in [0, 2π] and variance σ 2 The remaining angles have similar distributions. The paths delay is uniformly distributed in [0, DT s ], and the angles follow the random distribution mentioned in [23], such that E[ H u,d 2 F ] = N rx N tx . Finally, the vectors a rx,u and a tx,u denote the receive and transmit array vectors, respectively. For a uniform planar array (UPA) in the yz-plane with N y and N z elements on the y and z axes respectively, the array response vector is given by Sensors 2020, 20, 575

of 20
where λ is the wavelength, γ is the inter-element spacing, 0 ≤ m < N y and 0 ≤ n < N z . The uniform linear array (ULA) is obtained, making N y = 1 or N z = 1. ρ PL denotes the path-loss between the transmitter and the receiver separated by a distance d, such that [32] ρ PL,dB = ρ PL0,dB + 10n p log 10 where ρ PL,dB = 10 log 10 (ρ PL ) and d 0 is the reference distance. ρ PL0,dB represents the free-space loss at distance d 0 , n p is the path-loss exponent and S σ s is the log-normal shadowing with a standard deviation of σ s (dB). The frequency domain channel H u,k ∈ C N rx ×N tx of the uth user at the subcarrier k can be given by, Which can also be expressed as where ∆ u,k is a diagonal matrix, with entries (q, l) that correspond to the paths gain of the lth ray in the qth scattering cluster.
, θ rx,u N cl ,N ray ))] hold the transmit and receive array response vectors of the uth user, respectively.
At the kth subcarrier, the received signal is given by where n k ∈ C N rx denotes the zero mean Gaussian noise, with variance σ 2 n , and x u,k ∈ C N tx represents the discrete transmitted complex baseband signal of the uth user at subcarrier k.

Transmitter Design
In this section, we describe the proposed transmitter; the receiver design will be discussed in the next section. In Figure 1, we present the block diagram of the uth user terminal. We consider M-QAM constellations, where the data symbols s u,t , have E[ s u,t 2 ] = σ 2 u . The sequence s u,t N c t=1 is divided into R data blocks of size S = N c /R, where s u,t rS t=(r−1)S+1 denotes the rth data block, and the sequence is the DFT of c u,t rS t=(r−1)S+1 a transformation from the time domain to the frequency domain. After the time-frequency transformation the frequency domain data is interleaved and mapped to the OFDM symbols. Finally, the cyclic prefix (CP) is inserted in the blocks associated with each RF chain. Since the proposed hybrid equalizers detect each S-length data block independently, let us focus on a single S-length data block in order to simplify the math formulation [10]. Therefore, hereinafter the index r is not considered and the formulation is done for the frequency domain S-length . The analog precoder is independent of the subcarrier index k, i.e., it is the same for all subcarriers. These assumptions are followed by most of the recently works on hybrid massive MIMO mmWave based systems [23][24][25].
The discrete transmit complex baseband signal ,  of the uth user at subcarrier k may be mathematically expressed by Since the aim is to design a multi-user equalizer, in this paper, we consider the pure analog precoders discussed in [29,31], which are briefly discussed here. In the first case, presented in [31], we consider that the UT has no access to CSI, which simplifies the overall design, and in the second case, designed in [29], we consider an analog precoder based on the knowledge of partial CSI at the transmitter, i.e., only the average AoD is assumed to be known at the transmitters.

Analog Precoder Design: No CSI at Users Terminals
In this case, the analog precoder vector is generated randomly for the uth user according to [31]

Analog Precoder Design: Average AoD knowledge at User Terminals
For the second case, it is assumed that the uth user has knowledge of the average AoD, of each cluster of its own channel. Based on this knowledge, user u computes the correlation matrix , , is given by [29] , , ,  The analog precoder is modeled mathematically by the vector f a,u ∈ C N tx . Because of hardware constraints, only analog phase shifters are employed. This forces all elements of the vector f a,u to have tx ) The analog precoder is independent of the subcarrier index k, i.e., it is the same for all subcarriers. These assumptions are followed by most of the recently works on hybrid massive MIMO mmWave based systems [23][24][25].
The discrete transmit complex baseband signal x u,k ∈ C N tx of the uth user at subcarrier k may be mathematically expressed by where c u,k ∈ C.
Since the aim is to design a multi-user equalizer, in this paper, we consider the pure analog precoders discussed in [29,31], which are briefly discussed here. In the first case, presented in [31], we consider that the UT has no access to CSI, which simplifies the overall design, and in the second case, designed in [29], we consider an analog precoder based on the knowledge of partial CSI at the transmitter, i.e., only the average AoD is assumed to be known at the transmitters.

Analog Precoder Design: Average AoD knowledge at User Terminals
For the second case, it is assumed that the uth user has knowledge of the average AoD, φ tx,u q , θ tx,u q , q = 1, . . . , N cl , of each cluster of its own channel. Based on this knowledge, user u computes the correlation matrixĀ tx,uĀ H tx,u , where the matrixĀ tx,u is given by [29] with a tx,u (θ u q ) computed from (2) for the UPA case. LetĀ tx,uĀ H tx,u = Λ tx,u Σ tx,u Λ H tx,u be the eigenvalue decomposition of the previous correlation matrix, thus the analog precoder vector of the uth user is set as with 1 ≤ n ≤ N tx . As mentioned only partial CSI (average AoD's, φ tx,u q and θ tx,u q ) is required at each UT, which may be obtained at the base station, coded with b bits each one, and then, sent to each user terminal by a feedback channel. For instance, for a channel with N cl = 4 and b = 4, this results Sensors 2020, 20, 575 7 of 20 just in a feedback of 2 × 4 × 4 = 32 bits. Notice that the channel correlation-based approaches would require the feedback and/or estimation of the full correlation matrix, which has N 2 tx entries. Then, the proposed analog precoder has a clear advantage compared to the several channel correlation SVD-based beamforming proposals which have been proposed in the literature [25], since it has both low-complexity and low CSI requirements.

Analog-Digital Receiver Design
In this section, we start by deriving the proposed fully iterative multiuser equalizer, and then, a complexity comparison with the sub-optimal two-step approaches designed for full connected architecture [31] is presented.

Iterative Analog-Digital Equalizer
We assume at the receiver, a hybrid iterative S-length block space-frequency decoder, as shown in Figure 2. At the kth subcarrier, the corresponding concatenated received signal of all users, where W (i) a ∈ C N rx ×N RF rx denotes the analog part of feed-forward matrix for the ith iteration, and W d,k ∈ C U×U denote the digital part of the feed-forward and feedback matrices, computed at the ith iteration at the kth subcarrier, respectively. The vectorĉ is the FFT of the block of time domain estimates conditioned to the detector output for user u and iteration ŝ u,t is the hard decision associated with the data symbols of user u at iteration i.  . ag As optimization problem (22) is nonconvex, an optimum solution is difficult to obtain. Nevertheless, in the following section, we propose a method to obtain an approximate solution to this optimization problem.   Figure 2. Proposed receiver structure.
The received signal is first processed through the analog phase shifters, with W a (n, l) 2 = N −1 rx , and then follows the baseband processing composed by N RF rx RF chains. The digital baseband processing includes a feedback closed-loop that employs a forward path and a feedback path for each subcarrier.
In the forward path, the signal first passes through a linear filter W Gaussian distributed, then as the input-output relationship between variables c k andĉ (i) k , is memoryless, thus by the Bussgang theorem [33] is approximately given bŷ whereˆ (i) k is a zero mean error vector uncorrelated with c k , k ∈ {1, . . . , S}, and Ψ (i) ∈ C U×U is a diagonal matrix whose uth element gives a blockwise reliability measure of the estimates of uth block, associated to the ith iteration [22]. The coefficients of each block can be estimated at the receiver, as discussed in [10].
The error between the estimated signal before the S-IFFT c (i) k given by (11) and the transmit signal after the S-FFT c k may be expressed by where H . From (13), we can identify three error terms: (1) the residual intersymbol interference (ISI); (2) the error from the incorrect estimate made byĉ k of the signal c k ; and (3) the part that corresponds to the channel noise. From (13), as we can see in the Appendix A, we obtain the corresponding mean square error for the kth subcarrier.

MSE
From (14), we can obtain a semi-analytical BER approximation for an M-QAM constellation with Gray mapping, given by [34] where k , is the mean square error on samples c (i) k,u at iteration i. The optimization problem can be given by where W a denotes the set of feasible analog coefficients, i.e., the N rx × N RF rx matrices with constant-magnitude entries. The amplitude constraint in (16) is justified by the fact that, if we only consider the MSE minimization, then it may lead to biased estimates [10]. Note that the Sensors 2020, 20, 575 9 of 20 optimization of (16) considers as a metric the average MSE of the S subcarriers. Because the feedback matrix B (i) d,k is independent of the constraints of the optimization problem (16), the digital feedback matrix can be designed by minimizing the MSE From the KKT conditions of the previous problem i.e., ∂ S k=1 Replacing (18) in (14), the MSE (i) k is up to a constant equal to [29] MSE (i) where W (i) f d,k denotes the non-normalized full-digital equalizer [29]. Therefore, the analog and digital parts of the feed-forward matrix are the solution of the following optimization problem As optimization problem (22) is nonconvex, an optimum solution is difficult to obtain. Nevertheless, in the following section, we propose a method to obtain an approximate solution to this optimization problem.

Digital Feed-Forward Equalizer Design
First, we compute the feed-forward digital part of the equalizer as a function of the analog matrix, a . According to (22), for a given analog equalizer matrix W (i) a , the optimum digital part of the equalizer for the kth subcarrier is the solution of the following convex optimization problem whose solution is [29] with where Ω d is used to normalize the received power.

Analog Feed-Forward Equalizer Design
To optimize the analog part of the equalizer, we consider an iterative procedure that at step r selects the column r of matrix W (i) a from the dictionary A rx ∈ C N rx ×N cl N ray U given by A rx = [A rx,1 , . . . , A rx,U ], with A rx,u defined in Section 2. Please note that on each iteration i, the matrix W (i) a is computed sequentially in r steps, one for each RF chain, i.e., we first compute the analog coefficients for RF chain 1, then 2, and so on.
for r = 1, . . . , N RF rx . At step r of the algorithm W a,r−1 were already selected in the previous steps, and W (i) d,k,r−1 is set to its optimum value accordingly to optimization problem (22), i.e.,W (24). Replacing (27) in (19), the problem (22)  where F a represents the dictionary defined by columns of A rx and W (i) ad,k,r−1 is the residue matrix. Note that the normalization amplitude constraint in (22) was removed in (28) because it was already taken into account in the derivation of the digital part of the equalizer (see (24) and (25)). As described in Appendix B, the optimization problem (28) is equivalent to Therefore, w (i) a,r is selected as the element of the codebook that maximizes the previous metric. As the codebook F a elements are the columns of matrix A rx the index of the best element, denoted by n opt,r , may be extracted as where Π k,r = W The pseudo-code for the proposed hybrid fully iterative receiver is presented in Algorithm 1, where initially it is assumed Ψ (0) = 0 U , since in the first iteration we do not have any estimates. For Ψ (0) = 0 U , the iterative digital part of the equalizer reduces to the standard MMSE equalizer. The algorithm can be summarized as follows. We have an outer loop with i max iterations and an inner loop with N RF rx steps. Let us consider iteration i of the outer loop. For this iteration the non-normalized full-digital equalizer for iteration i is computed and the residue matrix is set to the trivial value ad,k,0 = 0. Next, in the inner loop, we select the best column from the dictionary for the RF chain r to build the analog part of feedforward hybrid equalizer and compute the digital part according to (24). Then, the residue matrix is updated and the previous steps are repeated for r = 1, . . . , N RF rx , i.e., until the full analog and digital parts of the feedforward hybrid equalizer matrix are obtained. After that, we compute the digital feedback matrix, and estimate the transmitted data symbols. Finally, we updateĉ

Complexity Comparison
In this section, we compare the complexity of the proposed algorithm with the two-step approach proposed in [31]. This analysis may be divided in two parts, one for the computation of the analog, and the other for the digital component of the equalizer. For Algorithm 1 both the digital and analog components are computed in a given iteration i. On the other hand, for two-step only the digital component is updated in iteration i. The analog part is only computed once. The update of the digital component needs the inversion of a U × U matrix, and therefore has complexity O(U 3 ). The computation of the analog component requires the evaluation of the metric in (27) for all elements of the codebook F a . The complexity of a matrix-vector product of sizes U × N rx and N rx , respectively, is O(N rx U). As codebook F a has N CB = N cl N ray U elements, then the complexity of the metric evaluation is O(N CB N rx U). As previously mentioned both the analog and digital components are updated in each iteration of Algorithm 1, which means that its computational complexity is O(i max (U 3 + N CB N rx U)). Note that i max denotes the maximum number of iterations. For two-step only the digital part is updated in all i max iterations, then the computational complexity of this algorithm is O(i max U 3 + N CB N rx U). As the number of receive antennas is larger than the number of user and N CB = N cl N ray U ≥ U follows that N CB N rx U ≥ U 3 . Therefore, the complexity of Algorithm 1 and two-step simplify to O(i max N CB N rx U) and O(N CB N rx U), respectively. Hence, the two-step is approximately i max times faster than Algorithm 1. However as shown in the next section, the proposed fully iterative analog-digital equalizer clearly outperforms the two-step approach.
Additionally, in the next section, we also made a performance comparison with the algorithm of [14], extended here to broadband SC-FDMA systems. In the Gram-Schmidt orthogonalization is done N rx U 2 multiplications and U(U − 1)/2 divisions, therefore, we have O(N rx U 2 + U(U − 1)/2). Then, the analog equalizer matrix is computed by N rx U multiplications, 2N rx U divisions and N rx U square roots, which means that O(4N rx U). Finally, in digital part of the equalizer it is used the linear MMSE, which is equivalent to iteration 1 of Algorithm 1, and then O(U 3 ). The total complexity is Hence, the algorithm of [14] is approximately i max N cl N ray U/(U 2 N −1 rx + U + 4) times faster than Algorithm 1.

Performance Results
In this section, we show the BER performance of the proposed receiver structure for both analog precoders designed in Section 2, whose parameters are presented in Table 1. The results presented from Figures 3-10 do not consider either path loss or shadowing effects between the UTs and the base station. The path loss and shadowing effects are evaluated in Figures 11 and 12, for n p = 4.17 and σ s = 9 dB [32]. We consider the wideband mmWave channel model defined in (1), perfect synchronization, and CSI at the receiver side. To evaluate the proposed hybrid multi-user equalizers, we consider the analog precoders discussed in Section 2. The analog precoder is generated either randomly according to (8), referred to here as the Non-CSI precoder (NCSI precoder) or the one computed on the basis of the average AoD of each cluster according to (10), referred to as the Partial CSI precoder (PCSI precoder). The performance results are presented in terms of the average BER as a function of E b /N 0 , where E b denotes the average bit energy, and N 0 denotes the one-sided noise power spectral density. It was assumed that σ 2 1 = . . . = σ 2 U = 1 and that the average E b /N 0 is identical for all users u ∈ {1, . . . , U} and is given by E b /N 0 = σ 2 u /(2σ 2 n ) = σ −2 n /2.                       . Performance comparison between the Algorithm 1 and the two-step approach for the NCSI precoder with ULA configuration, path loss, and shadowing and QPSK modulation.  Figure 11. Performance comparison between the Algorithm 1 and the two-step approach for the NCSI precoder with ULA configuration, path loss, and shadowing and QPSK modulation. Figure 11. Performance comparison between the Algorithm 1 and the two-step approach for the NCSI precoder with ULA configuration, path loss, and shadowing and QPSK modulation. First, let us analyze the results of the proposed fully iterative analog-digital equalizer (Algorithm 1 in the legends) for the referred two precoders, with no path loss or shadowing effects between the UTs and the base station. The curve for fully digital equalizer discussed in [29] is added because it can be considered as a lower bound for the hybrid architectures. The results are also compared with the semi-analytical BER approximation (15). We only added the theoretical curves for the first and fourth iterations for clarity. From the Figures 3 and 4 we can see that the theoretical curves almost overlap with the simulated ones, which means that our semi-analytical approach is quite accurate. As also observed in Figures 3 and 4, the performance improves for both cases as the number of iterations increases. We may also see from these two figures that the performance gap from iteration 1 to iteration 2 is higher than from iteration 3 to iteration 4. This change occurs because most of the residual multi-user and intersymbol interferences are mitigated from the first to the second iteration. From the third to the fourth iteration, there is also a benefit from residual interference removal; however, the gain is smaller because most of the interference was previously removed. We can also verify that the gap between NCSI ( Figure 3) and PCSI (Figure 4) precoders at a target BER of 3 10 − is 5.3 dB for the fourth iteration, i.e., with only the knowledge of average AoD information at the UTs with the proposed efficient hybrid equalizers, the average BER performance has a significant improvement. Let us compare the average BER performance between the proposed Algorithm 1 and the fully digital iterative multi-user equalizer. Figures 3 and 4 show that, for iteration 4, the proposed Algorithm 1 almost achieves the fully digital bound, i.e., the proposed receive and First, let us analyze the results of the proposed fully iterative analog-digital equalizer (Algorithm 1 in the legends) for the referred two precoders, with no path loss or shadowing effects between the UTs and the base station. The curve for fully digital equalizer discussed in [29] is added because it can be considered as a lower bound for the hybrid architectures. The results are also compared with the semi-analytical BER approximation (15). We only added the theoretical curves for the first and fourth iterations for clarity. From the Figures 3 and 4 we can see that the theoretical curves almost overlap with the simulated ones, which means that our semi-analytical approach is quite accurate. As also observed in Figures 3 and 4, the performance improves for both cases as the number of iterations increases. We may also see from these two figures that the performance gap from iteration 1 to iteration 2 is higher than from iteration 3 to iteration 4. This change occurs because most of the residual multi-user and intersymbol interferences are mitigated from the first to the second iteration. From the third to the fourth iteration, there is also a benefit from residual interference removal; however, the gain is smaller because most of the interference was previously removed. We can also verify that the gap between NCSI ( Figure 3) and PCSI (Figure 4) precoders at a target BER of is 5.3 dB for the fourth iteration, i.e., with only the knowledge of average AoD information at the UTs with the proposed efficient hybrid equalizers, the average BER performance has a significant improvement. Let us compare the average BER performance between the proposed Algorithm 1 and the fully digital iterative multi-user equalizer. Figures 3 and 4 show that, for iteration 4, the proposed Algorithm 1 almost achieves the fully digital bound, i.e., the proposed receive and transmit structures are sufficiently efficient to overtake the constraints imposed by the hybrid architectures.
In Figures 5 and 6, we compare the performance of the proposed iterative hybrid multi-user equalizer, against the performance of two-step approach proposed in [31], and the scheme proposed in [14], where the analog part is computed by applying the Gram-Schmidt (GS) method, while in the digital part a MMSE equalizer is used, referred here as GS/MMSE approach. Starting by analyzing the first iteration, we can see that the BER performance of the two-step approach and Algorithm 1 are similar, since for iteration 1, both algorithms assume Ψ (0) = 0 U , and then they are equivalent. Focusing now on the fourth iteration, the performance of two-step approach is far from the performance of Algorithm 1. This happens because only the digital part is iteratively, while the proposed algorithm the analog and digital parts are both computed in each iteration, and thus it is more efficient to remove the interferences. From the figure, we can also observe that the performance of the GS/MMSE approach is approximately the same as the one obtained by our schemes for the first iteration. This can be explained by the fact that in our schemes the digital part, for the first iteration, falls in the MMSE equalizer. Now, let us analyze the results presented in Figures 7 and 8 using 16-QAM and considering PCSI precoder. Figure 7 presents the results for Algorithm 1 for 1 up to 6 iterations. As for the QPSK, the performance improves, as the number of iterations increases. Comparing these results with the ones obtained in Figure 4 for QPSK we observe a performance penalty since the 16-QAM is more prone to errors and therefore more iterations are required to remove the residual multi-user and intersymbol interferences. However, similarly to the QPSK case, Algorithm 1 almost achieves the fully digital bound for 16-QAM, requiring only a few iterations. Figure 8 presents the results for both Algorithm 1, the two-step approach considering first and fourth iterations and the GS/MMSE for the first iteration. Comparing these results with the ones presented in Figure 6 for QPSK the same conclusions can be reached.
In Figures 9 and 10 we present the results for the same schemes of the Figures 3 and 4 but now considering UPA configuration. From them, we can see that the theoretical curves almost overlap with the simulated ones, which means that our semi-analytical approach is also quite accurate for the UPA configuration. Moreover, the performance of Algorithm 1 and the fully digital bound is very closed, with a penalty lower than the penalty of ULA case. This occurs because the proposed equalizer explores the correlative feature of channel, and thus, with the UPA configuration where the correlative level of channel is higher than ULA case, the proposed equalizer is more efficient.
Finally, in Figures 11 and 12, we evaluate the impact of the path loss and the shadowing effects on the performance. As it can be seen the proposed algorithm outperforms the two-step one, with just four iterations. Also, the proposed algorithm almost achieves the fully digital bound. In general, we can draw the same conclusions as those drawn for scenarios without path loss and showing effects.

Conclusions
In this paper, we developed a fully iterative analog-digital multi-user equalizer approach for wideband mmWave mMIMO SC-FDMA systems. In the proposed approach the digital and analog parts are computed to efficiently remove the residual multi-user and inter-symbol interferences. The equalizer was designed by using the sum of the MSE as the minimization metric. In the design it was assumed that the analog part is constant over the subcarriers because of hardware constraints.
The results showed that the proposed multi-user equalizer is very efficient to remove the multi-user/inter-symbol interference, achieving a performance close to the digital counterpart for the both antenna array configurations, ULA and UPA, and for scenarios with and without path loss and shadowing effects. Furthermore, the proposed iterative approach outperforms the two-step one previously proposed, at the cost of some more complexity. Therefore, the proposed iterative hybrid multi-user equalizer could be an interesting approach for practical scenarios that requires high reliability links. Funding: This work is supported by the Project MASSIVE5G (PTDC/EEI-TEL/30588/2017), the project UID/EEA/50008/2019; by the European Regional Development Fund (FEDER), through the Competitiveness and Internationalization Operational Programme (COMPETE 2020), Regional Operational Program of Lisbon, Fundação para a ciência e Tecnologia; PES3N: Soluções Energeticamente Eficientes para Redes de Sensores Seguras -POCI-01-0145-FEDER-030629, and FCT grant for the first author (SFRH/BD/129395/2017).

Conflicts of Interest:
The authors declare no conflict of interest.