Channel Estimation and Data Detection Analysis of Massive MIMO with 1-Bit ADCs

We present an analytical framework for the channel estimation and the data detection in massive multiple-input multiple-output uplink systems with 1-bit analog-to-digital converters (ADCs) and i.i.d. Rayleigh fading. First, we provide closed-form expressions of the mean squared error (MSE) of the channel estimation considering the state-of-the-art linear minimum MSE estimator and the class of scaled least-squares estimators. For the data detection, we provide closed-form expressions of the expected value and the variance of the estimated symbols when maximum ratio combining is adopted, which can be exploited to efficiently implement minimum distance detection and, potentially, to design the set of transmit symbols. Our analytical findings explicitly depend on key system parameters such as the signal-to-noise ratio (SNR), the number of user equipments, and the pilot length, thus enabling a precise characterization of the performance of the channel estimation and the data detection with 1-bit ADCs. The proposed analysis highlights a fundamental SNR trade-off, according to which operating at the right noise level significantly enhances the system performance.


I. INTRODUCTION
The migration of operating frequencies from first-to fourthgeneration wireless systems, i.e., from 800 MHz to the sub-3 GHz range, did not bring major changes in terms of signal propagation. The current fifth generation (5G) features a more pronounced transition in this respect by operating at sub-6 GHz frequencies and, eventually, up to 30 GHz with the objective of boosting the data rates. Following this trend, beyond-5G systems will exploit the large amount of bandwidth available in the mmWave band (i.e., 30 GHz-300 GHz) and raise the operating frequencies up to 1 THz [2]. In this context, maintaining the same signal-to-noise ratio (SNR) over a given distance will require larger antenna arrays and increasingly sharp beamforming to spatially focus the signal power. Although the short wavelength at mmWave and sub-THz frequencies allows to pack many antennas into a very small area, realizing fully digital, high-resolution massive multipleinput multiple-output (MIMO) arrays remains prohibitive in practice [3], [4].
As in the system model illustrated in Fig. 1, each base station (BS) antenna is generally equipped with a dedicated radiofrequency (RF) chain that includes complex, power-hungry analog-to-digital/digital-to-analog converters (ADCs/DACs) [4]. In this setting, while the transmit power can be made inversely proportional to the number of antennas, the power consumed by each ADC/DAC scales linearly with the sampling rate and exponentially with the number of quantization bits [5]- [9]. Another limiting factor is the amount of raw data exchanged between the remote radio head (RRH) and the base-band unit (BBU), which scales linearly with both the sampling rate and the number of quantization bits [10]- [12]. For these reasons, adopting low-resolution ADCs/DACs with 1 to 4 quantization bits as opposed to the typical 10 or more [13] enables the implementation of massive MIMO arrays comprising hundreds (or even thousands) of antennas, which are necessary to operate in the mmWave and sub-THz bands [10]. In this regard, 1-bit ADCs/DACs are particularly appealing due to their minimal power consumption and complexity since they only evaluate the sign of the input signal [5]. Such a coarse quantization is especially motivated at very high frequencies, where high-order modulations are not essential. There is a large body of literature on massive MIMO with low-resolution and 1-bit ADCs/DACs, ranging from performance analysis to channel estimation and precoding design. The capacity of the low-resolution and the 1-bit quantized MIMO channel is characterized in [14] and [5], respectively, whereas [15] shows that replacing even a small number of high-resolution ADCs with 1-bit ADCs entails a modest performance loss. The performance-quantization tradeoff in orthogonal frequency-division multiplexing (OFDM) uplink systems is studied in [6], which shows that using 4 to 6 quantization bits involves almost no performance loss compared with infinite-resolution ADCs. The spectral efficiency of singlecarrier and OFDM uplink systems with 1-bit ADCs is analyzed in [16]. The problem of multi-user detection is considered, e.g., in [17] and [18] for low-resolution and 1-bit ADCs, respectively, whereas [19] focuses on the joint channel estimation and data detection. An efficient iterative method for near maximum likelihood detection with 1-bit ADCs is proposed in [7]. The work in [8] analyzes the channel estimation and the uplink achievable rate with 1-bit ADCs. In addition, it proposes a linear minimum mean squared error (MMSE) channel estimator based on the Bussgang decomposition, which allows to reformulate the nonlinear quantization function as a linear function with identical first-and second-order statistics [20]: we refer to this estimator as Bussgang linear MMSE (BLM) estimator. A similar analysis is presented in [21] for the downlink direction. The work in [10] extends some of the results derived in [8], [16] for 1-bit ADCs to the multi-bit case. Specifically, it presents a throughput analysis of uplink systems and proposes a linear channel estimator based on the Bussgang decomposition with low-resolution ADCs. The channel estimation with 1-bit ADCs when the quantization threshold is not known is studied in [22]. The channel estimation exploiting the angular and delay structure is considered in [23] and [24] for low-resolution and 1-bit ADCs, respectively, whereas [25] exploits the temporal correlation for 1-bit ADCs. A recent line of works employs machine learning techniques in scenarios where obtaining accurate channel state information with low-resolution ADCs is impractical (see, e.g., [26], [27]). The benefits of oversampling for 1-bit quantized uplink systems are investigated in [28], [29]. The performance of linear precoding schemes for downlink systems with 1-bit DACs is analyzed in [9]. A similar analysis with multi-bit DACs is presented in [11] considering both linear and nonlinear precoding, and in [12] considering linear precoding with oversampling in OFDM downlink systems. Lastly, [30] proposes a general optimization framework for downlink precoding with 1-bit DACs and constant envelope assuming quadrature amplitude modulation (QAM) transmit symbols.

A. Contribution
This paper broadens prior analytical studies on the channel estimation and the data detection in massive MIMO uplink systems with 1-bit ADCs. On the one hand, existing works do not provide a precise characterization of the performance of the channel estimation with 1-bit ADCs with respect to key system parameters such as the SNR, the number of user equipments (UEs), and the pilot length. We fill this gap by analyzing the mean squared error (MSE) of the channel estimation along with its asymptotic behavior at high SNR assuming independent and identically distributed (i.i.d.) Rayleigh fading channels among the UEs. In this respect, we consider the BLM estimator in [8] as well as the class of scaled least-squares (LS) estimators, such as the one proposed in [16] (which can be obtained from the former by ignoring the temporal correlation of the quantization distortion). On the other hand, in the context of data detection with 1-bit ADCs, the statistical properties of the estimated symbols have not been characterized by existing works. In this regard, an interesting SNR trade-off was observed in [10], whereby the estimated symbols resulting from transmit symbols with the same phase overlap at high SNR; however, this aspect has not been formally described in the literature. We fill this gap by analyzing the expected value and the variance of the estimated symbols along with their asymptotic behavior at high SNR. 1 Our results on both the channel estimation and the data detection ultimately impact the symbol error rate (SER) and thus provide important practical insights into the design and 1 This second part of the paper complements the results of [1] with a detailed analysis of the normalized variance of the estimated symbols (also by means of tractable upper bounds), while the first part is entirely new. the implementation of 1-bit quantized systems.
The contributions of this paper are summarized as follows: • For the channel estimation with 1-bit ADCs, we derive closed-form expressions for the BLM estimator in [8] and for the class of scaled LS estimators, such as the one proposed in [16]. This enables a precise characterization of the performance of the channel estimation with respect to the SNR, the number of UEs, and the pilot length. Furthermore, we show that, in the case of i.i.d. Rayleigh fading channels among the UEs, the BLM estimator can be simplified as a scaled LS estimator with UE-specific scaling factors and that using a common optimized scaling factor for all the UEs entails a negligible performance loss. • For the data detection with 1-bit ADCs, we characterize the statistical properties of the estimated symbols by deriving closed-form expressions of the expected value and the variance when maximum ratio combining (MRC) is adopted at the BS. These results can be exploited to efficiently implement minimum distance detection (MDD) and, potentially, to design the set of transmit symbols to further improve the data detection performance. • Building on the proposed analysis, we provide a thorough discussion on the effect of 1-bit quantization on both the channel estimation and the data detection. For each of the two aspects, we describe a fundamental SNR trade-off, according to which operating at the right noise level significantly enhances the system performance. In this respect, the optimal transmit SNR for the channel estimation is shown to decrease as the pilot length increases.
Outline. The rest of the paper is structured as follows. Section II introduces the system model with 1-bit ADCs. Sections III and IV present our performance analysis results on the channel estimation and the data detection, respectively, each including dedicated numerical results and discussions. Finally, Section V summarizes our contributions and draws some concluding remarks.
Notation. A = (A m,n ) specifies that A m,n is the (m, n)th entry of matrix A; likewise, a = (a n ) specifies that a n is the nth entry of vector a. (·) T , (·) H , and (·) * represent the transpose, Hermitian transpose, and conjugate operators, respectively. Re[·] and Im[·] denote the real part and imaginary part operators, respectively, whereas j is the imaginary unit. E[·] and V[·] are the expectation and variance operators, respectively. I N , 0 N , and 1 N denote the N -dimensional identity matrix, allzero vector, and all-one vector, respectively. Diag(·) produces a diagonal matrix with the entries of the vector argument on the diagonal, sgn(·) is the sign function, and vec[·] is the vectorization operator. The Kronecker product is denoted by ⊗ and {·} is used to represent sets. Lastly, CN (0, 1) is the complex normal distribution with zero mean and unit variance, whereas N (0 N , Σ) is the real N -variate normal distributions with zero mean and covariance matrix Σ.
Reproducible research. All the numerical results can be reproduced using the MATLAB code and data files available at: https://github.com/italo-atzeni/Ch_Est_Data_Det_1-Bit.
II. SYSTEM MODEL Consider the scenario depicted in Fig. 1, where a BS with M antennas serves K single-antenna UEs in the uplink. Let H (H m,k ) ∈ C M ×K denote the uplink channel matrix: assuming i.i.d. Rayleigh fading (as, e.g., in [8], [10]- [12]), the entries of H are distributed independently as CN (0, 1). Each UE transmits with power ρ and the additive white Gaussian noise (AWGN) at the BS has unit variance: thus, ρ can be interpreted as the transmit SNR. Note that the same transmit SNR is assumed for the two phases of channel estimation and uplink data transmission. Each BS antenna is connected to two 1-bit ADCs, one for the in-phase component and one for the quadrature component of the receive signal. Therefore, according to [10], we introduce the 1-bit quantization function Q(·) : C L×N → Q, with and where Q ρK+1 2 {±1±j} L×N is the set containing the scaled symbols of the quadrature phase-shift keying (QPSK) constellation.

A. Channel Estimation
In the channel estimation phase, the UEs simultaneously transmit their uplink pilots of length τ . Let P (P u,k ) ∈ C τ ×K denote the pilot matrix whose columns correspond to the pilots used by the UEs, with |P u,k | 2 = 1, ∀u, k. We assume τ ≥ K and orthogonal pilots among the UEs, so that P H P = τ I K . Hence, the receive signal at the BS prior to quantization is given by where Z p (Z m,u ) ∈ C M ×τ is the AWGN term with entries distributed as CN (0, 1). Then, at the output of the ADCs, we have with R p = (R m,u ), which is used by the BS to estimate H. Some comments are in order. First, correlating the quantized receive signal R p in (3) with P, as done in (5) below, results in residual pilot contamination even when the pilots are orthogonal (see, e.g., [16]). Second, the pilots should be preferably chosen such that their entries span an interval η, η + π 2 , with η ∈ [0, 2π], so as to accurately estimate the phases (especially at high SNR). This is explained in Appendix I, which provides a detailed discussion on the channel estimation with 1-bit ADCs.
The LS estimator for 1-bit ADCs, which correlates the quantized receive signal R p in (3) with P, was first presented in [31]. Then, a linear MMSE estimator based on the Bussgang decomposition (see [20]), which we refer to as BLM estimator, was proposed in [8]. According to this, h vec[H] ∈ C M K×1 is estimated as 2 withP P ⊗ I M ∈ C M τ ×M K , r p vec[R p ] ∈ C M τ ×1 , and where Σ p E[r p r H p ] ∈ C M τ ×M τ denotes the covariance matrix of r p . A linear estimator with a simpler structure can be obtained from (4) by ignoring the temporal correlation of the quantization distortion, which implies that the off-diagonal entries of Σ p are zero. Such a scaled LS estimator was proposed in [16] (and later extended to the case of multi-bit ADCs in [10]), whereby H is estimated aŝ where the scaling factor Ψ (common for all the UEs) is defined as We point out that Ψ in (6) implicitly depends on the channel distribution. Note that, when τ = K, (5) coincides with (4) since Σ p = (ρK + 1)I M K ; otherwise, (5) accurately approximates (4) at low SNR or when K is large [10]. In Section III, we analyze the performance of the BLM estimator in (4) and of the class of scaled LS estimators, such as the one in (5). Moreover, we highlight the relationship between these two in the case of i.i.d. Rayleigh fading channels among the UEs.

B. Uplink Data Transmission
Let x k ∈ C be the transmit symbol of UE k, with E |x k | 2 = 1 and x (x k ) ∈ C K×1 . The receive signal at the BS prior to quantization is given by where z (z m ) ∈ C M ×1 is the AWGN term with entries distributed as CN (0, 1). Then, at the output of the ADCs, we have r Q(y) ∈ C M ×1 (8) and the BS obtains a soft estimate of x aŝ where V ∈ C M ×K is the combining matrix adopted at the BS. Finally, the data detection process associates each estimated symbol to a transmit symbol, e.g., via MDD. In Section IV, we focus on characterizing the statistical properties of the estimated symbols when MRC is adopted at the BS.

III. CHANNEL ESTIMATION WITH 1-BIT ADCS
In this section, we are interested in characterizing the performance of the channel estimation with respect to the different parameters when 1-bit ADCs are adopted at each BS antenna (see Section II-A). In doing so, we consider the BLM estimator in (4) and the class of scaled LS estimators, such as the one in (5).

A. MSE of the Channel Estimation
The (normalized) MSE of the channel estimation when the BLM estimator is used is given by withĥ BLM defined in (4). In [8,Eq. (17)], the closed-form expression of (10) was derived for the case of τ = K, which gives Note that the above expression is lower bounded by 1 − 2 π 0.363, which is achieved in the limit of ρK → ∞. Hence, in realistic scenarios (especially for small values of K), using a pilot length that is equal to the number of UEs results in quite inaccurate channel estimates. In general, τ should be sufficiently large to compensate for the low granularity of the ADCs, as detailed in Appendix I. For this reason, we assume that the pilot matrix P is chosen such that PP H is circulant 3 and derive a closed-form expression of (10) that is valid for any value of τ and K.
Theorem 1. Suppose that the BLM estimator in (4) is used and that P is chosen such that PP H is circulant. Then, the MSE of the channel estimation in (10) is given by where we have defined with Proof: See Appendix II.
The result of Theorem 1 enables a precise characterization of the performance of the BLM estimator with respect to the transmit SNR ρ, the number of UEs K, and the pilot length τ . The parameter δ k in (13) is roughly proportional to τ (τ − 1) and, for a fixed τ , decreases with K. When τ = K, (12) recovers the expression in (11): Moreover, the choice of the pilot matrix P affects the MSE of the channel estimation through the parameters {δ 1 , . . . , δ K }. Lastly, we point out that, if PP H is not circulant, MSE BLM can be computed via the more involved and less insightful expression in (68) (see Appendix II).
We now show that, in the case of i.i.d. Rayleigh fading channels among the UEs and when PP H is circulant, the BLM estimator can be simplified as a scaled LS estimator with UE-specific scaling factors. Corollary 1. Suppose that P is chosen such that PP H is circulant. Then, the BLM estimator in (4) can be simplified aŝ Proof: See Appendix III.
The result of Corollary 1 states that, under the above assumptions, the BLM estimator can be implemented in a way that avoids the inversion of and the multiplication with the M τ -dimensional matrix Σ p , where M τ can be quite large at mmWave and sub-THz frequencies. In particular, P diagonalizes Σ p when PP H is circulant, which greatly simplifies the structure of (4). Lastly, we point out that, in the case of correlated channels, Corollary 1 does not generally hold as the channel covariance matrix is embedded into Σ p and the latter is not diagonalized by P.
Let us move our focus to the class of scaled LS estimators, such as the one in (5); recall that, unlike the simplified expression of the BLM estimator in (15), scaled LS estimators are characterized by a common scaling factor for all the UEs. In this regard, we first derive the closed-form expression of the MSE of the channel estimation for an arbitrary scaling factor.
Theorem 2. Suppose that the scaled LS estimator in (5) is used with arbitrary Ψ. Then, the MSE of the channel estimation is given by where we have defined with δ k defined in (13).
Proof: The expression in (18) is obtained from the proof of Corollary 1 by replacing the UE-specific scaling factors {Ψ 1 , . . . , Ψ K } in (99) with the common scaling factor Ψ.
In particular, when Ψ defined in (6) is used, (18) becomes Note that, when τ = K, (20) recovers the expression of the MSE of the BLM estimator in (11). Now, if we consider the scaling factor Ψ as a tuning parameter, we can minimize MSE SLS in (18) by optimizing over Ψ. As a result, we obtain the optimal estimator within the class of scaled LS estimators.

Corollary 2. Suppose that the scaled LS estimator
is used, where Ψ is obtained by minimizing (18) with respect to Ψ and is defined as with ∆ defined in (19). Then, the MSE of the channel estimation is given by Proof: Since (18) is a convex function of Ψ, Ψ in (22) can be obtained by setting d dΨ (18) = 0. Then, replacing Ψ with Ψ in (18) yields the expression in (24). Note that Ψ in (22) implicitly depends on the channel distribution as does Ψ in (6). The result of Corollary 2 shows that the optimal scaled LS estimator (in terms of MSE of the channel estimation) is not the one that simply ignores the temporal correlation of the quantization distortion from the BLM estimator (see (5)-(6)); instead, a simple optimization over the scaling factor can significantly improve the channel estimation accuracy. When τ = K, the optimal scaled LS estimator in (21) coincides with the estimator in (5), and, in turn, with the BLM estimator in (4): in fact, τ = K implies that Ψ in (22) reduces to Ψ in (6) and (24) recovers the expression in (11). On the other hand, the estimator in (21) shall be always preferred to the estimator in (5) when τ > K and the performance gap between the two widens with τ − K. The improved performance of (21) over (5) is also suggested by the resemblance between MSE BLM in (12) and MSE SLS in (24), where the latter can be obtained from the former by replacing δ k with ∆, ∀k. Remarkably, in Section III-C, we show that the optimal scaled LS estimator in (21) entails a negligible performance loss with respect to the BLM estimator.
It is of particular interest to study the asymptotic behavior of the MSE of the channel estimation at high SNR.
Corollary 3. From Theorems 1 and 2 and from Corollary 2, in the limit of ρ → ∞, we have where we have defined The results of Corollary 3 show that arbitrarily increasing the transmit SNR is detrimental for the performance of the channel estimation since the right amount of noise is necessary to recover the difference in amplitude between channel entries (see Appendix I). This is in sheer contrast with the case of infinite-resolution ADCs, where boosting ρ produces the same beneficial noise-averaging effect as increasing τ . In the next section, we also discuss the asymptotic behavior at low SNR.

B. Tractable Upper Bounds
The MSE expressions derived so far depend on the specific pilot choice through the parameters {δ 1 , . . . , δ K } in (13) or ∆ in (19). To gain more practical insights, we now consider the single-UE case (i.e., K = 1) and derive tractable upper bounds that are independent of the pilot choice. We begin by pointing out that, when K = 1, we have δ k = ∆ and, thus, MSE BLM in (12) is equal to MSE SLS in (24). Let p (p u ) ∈ C τ ×1 denote the pilot used by the UE. In this setting, ∆ in (19) can be simplified as where the upper bound in (32) is obtained by fixing p such that p u ∈ {±β, ±j β}, ∀u, with β ∈ C and |β| 2 = 1; in the rest of the paper, when referring to this case, we will simply use p = 1 τ . Indeed, for a given pilot length τ , such a structure of p represents the worst possible pilot choice since it maximizes the MSE of the channel estimation in (12) and (20). As detailed in Appendix I, this effect is particularly detrimental at high SNR and, in the limit of ρ → ∞, each channel entry is reduced to a scaled symbol of the QPSK constellation regardless of the value of τ .
Hence, plugging (32) into (12) and (26) yields (33) and respectively. In addition, considering (33) in the limit of τ → ∞, we have Likewise, plugging (32) into (20) and (27) yields and (37) respectively. Moreover, considering (36) in the limit of τ → ∞, we have Some comments are in order. First, when τ = 1, both (33) and (36) recover the expression in (11) with K = 1. Second, (34) does not depend on τ since, in the absence of noise, estimating the channel repeatedly over the same pilot symbol does not bring any benefit. Third, it can be demonstrated that (33) and (36) are quasiconvex functions of ρ and, as such, they have a unique minimum. This defines a clear SNR trade-off, according to which operating at the right noise level enhances the channel estimation accuracy. In particular, as discussed in Appendix I,   we have that: • At low SNR, the channel estimates are corrupted by the strong noise; • At high SNR, the difference in amplitude between channel entries cannot be recovered. In general, when τ > 1, the value of ρ that minimizes (33), denoted by ρ , satisfies 2 π Since the left-hand side of (39) monotonically increases with the transmit SNR, ρ decreases as the left-hand side of (39) decreases, i.e., as τ increases. This means that using longer pilots allows to operate at lower SNR as the noise can be averaged out more efficiently. This interdependence between ρ and τ can be also observed from (35): in the limit of τ → ∞, since lim w→0 w arcsin(w) = 1, it follows that MSE → 0 as ρ → 0. Lastly, it is shown in Section III-C that the upper bounds obtained by fixing p = 1 τ are remarkably tight at low SNR and up to the region around the optimal value of ρ. Therefore, the above observations also apply to the general case.

C. Numerical Results and Discussion
We now focus on the performance evaluation of the channel estimation with 1-bit ADCs with respect to the different parameters based on the analytical results presented in Sections III-A and III-B. In this regard, when K > 1, we choose a pilot matrix P composed of the first K columns of the τ -dimensional DFT matrix. On the other hand, when K = 1, we use the pilots p = p and p = 1 τ , where p [1, e −j π 2τ , e −j 2 π 2τ , . . . , e −j (τ −1) π 2τ ] T ∈ C τ ×1 (40) denotes the vector whose entries are equispaced on the first quadrant of the unit circle: these represent the best and the worst possible pilot choices, respectively (see Section III-B and Appendix I for more details on the pilot choice). We thus consider the expressions of MSE BLM derived in (12), resulting from the BLM estimator in (4) (see [8]), MSE SLS derived in (20), resulting from the scaled LS estimator in (5) (see [16]), and MSE SLS derived in (24), resulting from the optimal scaled   LS estimator in (21). These are compared with Monte Carlo simulations with 10 6 independent channel realizations. For the latter, we fix M = K to generate the channel matrices, although the value of M does not affect the analytical and numerical results in any way. Fig. 2 illustrates the MSE of the channel estimation against the transmit SNR ρ, also including the asymptotic MSE expressions in (26)- (28). Fig. 2(a) considers K = 4 and τ = 32 and shows that MSE BLM is 5.3% lower than MSE SLS at high SNR. Remarkably, the performance loss associated with MSE SLS with respect to MSE BLM is negligible and can only be noticed by examining the relative MSE difference in Fig. 2(b), which reaches its maximum of about 4 × 10 −4 in the limit of ρ → ∞. Hence, in the case of i.i.d. Rayleigh fading channels among the UEs, the scaled LS estimator with a common optimized scaling factor for all the UEs essentially achieves the same accuracy as the BLM estimator. Lastly, we highlight the SNR trade-off described in Sections III-A and III-B as well as in Appendix I, whereby the MSE of the channel estimation exhibits a valley at about ρ = 3 dB. Fig. 2(c) considers τ = 128 and different values of K. Here, the gap between MSE BLM and MSE SLS widens as τ − K increases, reaching 8.5% for K = 8 at high SNR. In this respect, at high SNR, MSE SLS for K = 8 surpasses its counterpart for K = 16: in fact, (26)- (28) are not monotonically increasing with K due to the fact thatδ k in (29) is a decreasing function of K. Moreover, the SNR trade-off appears more evident for small values of K. Fig. 2(d) considers the single-UE case, showing that the upper bounds in (33) and (36) obtained by fixing p = 1 τ are remarkably tight at low SNR and up to the region around the optimal transmit SNR. Note that the optimal value of ρ with p = 1 τ satisfies the condition in (39) and gives an accurate approximation of the optimal value of ρ with p = p . Fig. 3 plots the MSE of the channel estimation against the pilot length τ . The transmit SNR is fixed to ρ = 10 dB in Fig. 3(a)-(c), whereas Fig. 3(d) considers the optimized transmit SNR (we recall that ρ should be reduced as τ increases to enhance the channel estimation accuracy). Fig. 3(a) considers K = 4 and shows that MSE BLM is 10% lower than MSE SLS at τ = 128. Furthermore, as in Fig. 2, MSE SLS closely matches MSE BLM for any value of τ , which means that the optimal scaled LS estimator essentially achieves the same accuracy as the BLM estimator. Fig. 3

(b) considers different values of K,
showing that the gap between MSE BLM and MSE SLS widens as τ − K increases and reaches 7.8% for K = 8 and τ = 128. Fig. 3(c) examines the case where the number of UEs grows together with the pilot length. In this setting, the gap between MSE BLM and MSE SLS is roughly constant and increases with the ratio τ K , reaching about 5% for τ K = 8. Lastly, Fig. 3(d) considers the single-UE case and the upper bound on MSE BLM in (33) obtained by fixing p = 1 τ , which is optimized over the transmit SNR for each τ . As discussed in Section III-B, the optimal value of ρ satisfies the condition in (39) and decreases as τ increases.

IV. DATA DETECTION WITH 1-BIT ADCS AND MRC
In this section, we are interested in characterizing the performance of the data detection with respect to the different parameters when 1-bit ADCs are adopted at each BS antenna (see Section II-B). In this regard, we consider the scenario where the BS uses the BLM estimator in (4) in the channel estimation phase and the MRC receiver in the data detection phase (see also [8]). Assuming that PP H is circulant and building on Corollary 1, the combining matrix is given by V =Ĥ BLM and we can write the soft estimate in (9) aŝ (42) In the following, we focus on the single-UE case (i.e., K = 1) and characterize the statistical properties of the estimated symbols. We recall that the BLM estimator is equivalent to the optimal scaled LS estimator in (21) when K = 1, as detailed in Section III-B. Note that, in a multi-UE massive MIMO context with infinite-resolution ADCs, MRC asymptotically becomes the optimal receive strategy as the number of BS antennas increases. However, when the MRC receiver results from the quantized channel estimation, it cannot be perfectly aligned with the channel matrix, resulting in residual multi-UE interference. Hence, the following analysis of the single-UE case does not consider this interference; nonetheless, this can be straightforwardly included at the expense of more involved and less insightful expressions, which is left for future work.

A. Expected Value and Variance of the Estimated Symbols
Let x ∈ S be the transmit symbol of the UE, where S {s 1 , . . . , s L } denotes the set of transmit symbols, with s ∈ C, ∀ ; for instance, S may correspond to the QPSK or 16-QAM constellation. To facilitate the data detection process at the BS, for each transmit symbol s ∈ S, we are interested in deriving the closed-form expression of the expected value of the resulting estimated symbolŝ .
with ∆ given in (31) and where E is derived in closed form in (43).
Proof: See Appendix V.
The result of Theorem 4 quantifies the dispersion of the estimated symbols about their expected value, which results from the 1-bit quantization applied to both the channel estimation (through the MRC receiver) and the uplink data transmission. This dispersion is not isotropic and assumes different shapes for different transmit symbols, as illustrated in Section IV-C (see also [32]). Some additional comments are in order. First, V reduces as |s | increases due to the negative term on the right-hand side of (44): this is somewhat intuitive since the transmit symbols that lie further from the origin are less subject to noise. Second, although V increases linearly with the number of BS antennas M , the normalized variance V |E | 2 (which expresses the relative dispersion of the estimated symbols about their expected value) is inversely proportional to M . Third, the combined results of Theorems 3 and 4 can be exploited to design the set of transmit symbols S by jointly minimizing the relative dispersion and the overlap between different symbols after the estimation, which is left for future work. Lastly, in the context of MDD via Voronoi tessellation described above, one can utilize the variance derived as in (44) to further refine the detection regions [1]. It is of particular interest to study the asymptotic behavior of the expected value and variance of the estimated symbols at high SNR. and with∆ defined in (30), which can be simplified for K = 1 as Corollary 4 formalizes a behavior of the estimated symbols that was observed in [10]. From (45), it emerges that, at high SNR, all the estimated symbols lie on a circle around the origin and their amplitude no longer conveys any information. As a consequence, the estimated symbols resulting from transmit symbols with the same phase become indistinguishable in terms of their expected value, which depends only on

Re[s ]
|s | and Im[s ] |s | . For instance, if S corresponds to the 16-QAM constellation as in Section IV-C, the inner estimated symbols become indistinguishable from the outer estimated symbols with the same phase. Furthermore, according to (46), these estimated symbols become identical also in terms of variance. In the light of this, blindly minimizing the (normalized) variance of the estimated symbols is not the key to enhancing the system performance. Instead, the variance (which roughly decreases with the transmit SNR) should be minimized alongside the overlap between different symbols after the estimation (which generally increases with the transmit SNR). This determines a clear SNR trade-off, according to which operating at the right noise level enhances the data detection accuracy and thus reduces the SER. In the next section, we also discuss the asymptotic behavior at low SNR.

B. Tractable Upper Bounds
As done in Section III-B for the MSE of the channel estimation, tractable upper bounds on the normalized variance of the estimated symbols, i.e., that do not depend on the specific pilot choice, can be obtained by fixing p = 1 τ since such a structure of p represents the worst possible pilot choice (see Section III-B and Appendix I). Hence, plugging (32) into (44) and (46) (48) is a quasiconvex function of ρ and, as such, it has a unique minimum that defines a further SNR trade-off. It is shown in Section IV-C that this SNR trade-off, which is inherited from the channel estimation phase through the MRC receiver, is not as significant as the one described in Corollary 4. In fact, the normalized variance of the estimated symbols roughly decreases with ρ; on the other hand, the difference in amplitude between symbols cannot be recovered if ρ is too high. Lastly, while for the channel estimation a reduction of the transmit SNR can be compensated by increasing the pilot length (see Section III-B), a low transmit SNR in the uplink data transmission phase inevitably results in a high normalized variance of the estimated symbols (see, e.g., Fig. 6). 4 In this respect, we point out that the system performance can be further enhanced by optimizing the transmit SNR separately for the two phases of channel estimation and uplink data transmission.

C. Numerical Results and Discussion
We now focus on the performance evaluation of the data detection with 1-bit ADCs with respect to the different parameters using the analytical results presented in Sections IV-A and IV-B. In this regard, we assume that the BS uses the BLM estimator in (4), which is equivalent to the optimal scaled LS estimator in (21) when K = 1, in the channel estimation phase and the MRC receiver in the data detection phase. We thus consider the expressions of E and V derived in (43) and (44), respectively, for the single-UE case (i.e, K = 1). As in Section III-C, we use the pilots p = p , with p defined in (40), and p = 1 τ . Moreover, we specifically analyze the scenario where the set of transmit symbols S corresponds to the 16-QAM constellation, i.e., S = 1 √ 10 ± 1 ± j, ±1 ± j 3, ±3 ± j, ±3 ± j 3 , which is normalized such that 1 L L =1 |s | 2 = 1; however, we remark that our analytical framework is valid for any choice of S. Fig. 4 plots the estimated symbols for different settings, where each 16-QAM symbol is transmitted over 10 2 independent channel realizations and p = p is used in the channel estimation phase. The expected value of the estimated symbols is computed as in Theorem 3: this matches the sample average of the estimated symbols for each 16-QAM transmit symbol and can be used to efficiently implement MDD. Comparing Fig. 4(a)-(c), which consider the same transmit SNR and pilot length, the relative dispersion of the estimated symbols about their expected value reduces as the number of BS antennas grows from M = 64 to M = 256. In fact, a higher granularity   in the antenna domain allows to sum the contribution of a larger number of independent channel entries. On the other hand, comparing Fig. 4(b) and (d), which consider the same number of BS antennas and transmit SNR, the relative dispersion of the estimated symbols about their expected value slightly intensifies as we decrease the pilot length from τ = 32 to τ = 8. This stems from the overall diminished accuracy of the channel estimate used to compute the MRC receiver for each channel realization. Lastly, comparing Fig. 4(b) and (e)-(f), which consider the same number of BS antennas and pilot length, the estimated symbols resulting from the 16-QAM transmit symbols with the same phase, i.e., ± 1 √ 10 (1 ± j) and ± 1 √ 10 (3 ± j 3), get closer as the transmit SNR increases from ρ = 0 dB to ρ = 10 dB and they almost fully overlap when ρ = 20 dB. This behavior was observed in [10] and is formalized in Corollary 4, according to which such estimated symbols become identical in terms of both their expected value and variance at high SNR. In this respect, the SNR trade-off described in Sections IV-A and IV-B is quite evident: while the normalized variance of the estimated symbols roughly decreases with ρ, the difference in amplitude between symbols cannot be recovered if ρ is too high. For the 16-QAM, this produces a SER of about 25% since there are four pairs of indistinguishable estimated symbols (see also Fig. 8). In summary, having independent phases between the channel entries and operating at the right noise level are crucial to accurately estimate the phases and the amplitudes, respectively; we refer to Appendix I and to the related discussion in [10] for more details.
Let us now examine the behavior of the variance of the estimated symbols derived in Theorem 4, which we compare with Monte Carlo simulations with 10 6 independent channel realizations.  Fig. 6 considers M = 128 and τ = 32, showing that V |E | 2 generally diminishes with the transmit SNR ρ except for the SNR tradeoff exhibited with p = 1 τ ; here, the asymptotic expressions in (46) and (49) are also included. Despite this trend, we recall that the difference in amplitude between symbols cannot be recovered if ρ is too high, as discussed in the previous paragraph for Fig. 4: thus, arbitrarily increasing the transmit SNR is detrimental for the system performance. Lastly, Fig. 7 considers M = 128 and ρ = 10 dB, showing how V |E | 2 reduces with the pilot length τ .
We conclude this section by investigating the combined  impact of the channel estimation and the data detection with 1-bit ADCs on the system performance in terms of SER, which we compute numerically via Monte Carlo simulations with 10 6 independent channel realizations. In this context, the symbols are decoded by means of MDD aided by the result of Theorem 3. Fig. 8 illustrates the SER against the transmit SNR ρ, with M = 128 and τ = 32. Here, the SNR trade-off appears quite evident, whereby the SER decreases until it reaches its minimum at about ρ = 5 dB (where the upper bound obtained with p = 1 τ proves to be remarkably tight) before escalating again. Then, the SER asymptotically reaches 25% at high SNR, where the inner estimated symbols of the 16-QAM constellation become indistinguishable from the outer estimated symbols with the same phase (see also Fig. 4(f)). We remark that the SER can be further reduced by optimizing the transmit SNR separately for the two phases of channel estimation and uplink data transmission, which is left for future work.

V. CONCLUSIONS
This paper presents an analytical framework for the channel estimation and the data detection in massive MIMO uplink systems with 1-bit ADCs. First, we provide a precise characterization of the MSE of the channel estimation with respect to different parameters. In addition, we show that, for i.i.d. Rayleigh fading, the BLM estimator can be simplified as a scaled LS estimator with UE-specific scaling factors and that using a common optimized scaling factor for all the UEs entails no noticeable performance loss. For the data detection, we characterize the expected value and the variance of the estimated symbols when MRC is adopted. These results can be exploited to efficiently implement MDD and to properly design the set of transmit symbols. The proposed analysis gives important practical insights into the design and the implementation of 1-bit quantized systems. In particular, it highlights a fundamental SNR trade-off, according to which arbitrarily increasing the transmit SNR is detrimental for the system performance. In this respect, the optimal transmit SNR for the channel estimation is shown to decrease as the pilot length increases. Future work will consider extensions of the proposed analytical framework to more realistic channel models (for the channel estimation) and to the multi-UE case (for the data detection), as well as a SER optimal design of the set of transmit symbols capitalizing on our data detection analysis.

ACKNOWLEDGMENTS
The authors would like to thank the anonymous reviewers, whose comments and suggestions helped to improve the paper.

APPENDIX I FUNDAMENTALS OF CHANNEL ESTIMATION WITH 1-BIT ADCS
Assuming K = 1, let h (h m ) ∈ C M ×1 and p (p u ) ∈ C τ ×1 denote the uplink channel vector and the pilot, respectively, of the UE. When a scaled LS estimator (such as the one in (5)) is used, the channel estimateĥ (ĥ m ) is obtained aŝ Let h m = α m e j θm , with ϑ m θ m mod π 2 , and let p u = e j φu (recall that |p u | 2 = 1, ∀u). Assuming ρ → ∞, the phase of h m can be estimated from Q(h m p H )p as detailed in (53)-(54) at the top of the page, i.e., Q e j (θm−φu) shifts quadrant according to the phase of p u . Assuming that the entries of p span the unit circle, in the limit of τ → ∞, we obtain (55)-(56) at the top of the page, where (56) follows from Finally, from (56), we have which yields Hence, the phase of h m can be estimated accurately if τ is sufficiently large and the pilot symbols span the unit circle. Nonetheless, from (53)-(54), it is straightforward to see that Q e j (θm−φu) e j φu = Q e j (θm−φu∓ π 2 ) e j (φu± π 2 ) = Q e j (θm−φu∓π) e j (φu±π) , i.e., shifting the phase of the pilot symbol by a multiple of π 2 does not add any information about the phase of h m when ρ → ∞. As a consequence, the best possible pilot choice features equispaced and non-repeating phases on an interval η, η + π 2 , with η ∈ [0, 2π] (one such choice is p in (40)). On the other hand, the worst possible pilot choice is given by fixing p such that p u ∈ {±β, ±j β}, ∀u, with β ∈ C and |β| 2 = 1 (one such choice is p = 1 τ ). Note that fixing p = 1 τ with ρ → ∞ would reduce each channel entry to a scaled symbol of the QPSK constellation regardless of the value of τ , whereas with finite ρ the phase of h m can be still estimated by exploiting the independent noise realizations over the pilot symbols.
The right-hand side of (63) does not include any information about the amplitude of h m due to the assumption that ρ → ∞. Assuming now finite ρ and, for simplicity, p = 1 τ , the amplitude of h m can be estimated from (64) In the limit of τ → ∞, we have where erf(w) where (68) follows from applying E[hr H p ] = 2 π ρP T ; note that a similar MSE expression appears in [33,Eq. (48)]. Now, as detailed in Appendix II-A, we can express the covariance matrix of r p as Furthermore, let p k ∈ C τ ×1 denote the kth column of P.
Hence, we can writeP where (72) results from the fact that, if P is chosen such that PP H is circulant, Φ is also circulant and, as a consequence, so is its inverse: in this case, P diagonalizes both Φ and its inverse, which implies p T k Φp * i = 0, ∀k = i. Finally, plugging (72) into (68) (12) is obtained by observing that p T k Φp * k = τ + δ k , with δ k defined in (13). A. Derivations of (69) In this section, we derive the closed-form expression of Σ p . To this end, we introduce the following definitions: Moreover, we present the following proposition, which will be also used in Appendix IV.
(99) Since each term in the summation of (99) is a convex function of Ψ k , the expression of the UE-specific scaling factor in (16) can be obtained by setting d dΨ k (99) = 0. Finally, plugging (16) into (99) yields the MSE of the BLM estimator in (12