Semi-blind Channel Estimation and Data Detection for Multi-cell Massive MIMO Systems on Time-Varying Channels

We study the problem of semi-blind channel estimation and symbol detection in the uplink of multi-cell massive MIMO systems with spatially correlated time-varying channels. An algorithm based on expectation propagation (EP) is developed to iteratively approximate the joint a posteriori distribution of the unknown channel matrix and the transmitted data symbols with a distribution from an exponential family. This distribution is then used for direct estimation of the channel matrix and detection of data symbols. A modified version of the popular Kalman filtering algorithm referred to as KF-M emerges from our EP derivation and it is used to initialize the EP-based algorithm. Performance of the Kalman smoothing algorithm followed by KF-M is also examined. Simulation results demonstrate that channel estimation error and the symbol error rate (SER) of the semi-blind KF-M, KS-M, and EP-based algorithms improve with the increase in the number of base station antennas and the length of the transmitted frame. It is shown that the EP-based algorithm significantly outperforms KF-M and KS-M algorithms in channel estimation and symbol detection. Finally, our results show that when applied to time-varying channels, these algorithms outperform the algorithms that are developed for block-fading channel models.


I. INTRODUCTION
S INCE the length of the pilot sequence must be no less than the number of transmitting antennas, for massive MIMO systems [1], [2] pilot-based channel estimation in the downlink (DL) is very challenging. The increased length of pilot sequences results in low spectral efficiency, and the time required for data and pilot transmission may exceed the coherence time of the channel.
In time division duplexing (TDD), channel state information (CSI) can be estimated in the uplink (UL) from pilot transmissions. For users employing a single antenna, the length of the pilot sequence only needs to scale with the number of users in the cell which is typically much smaller than the number of base station (BS) antennas. Invoking the channel reciprocity property, the UL CSI is used to obtain the DL CSI which can then be used for precoding [3]- [5].
However, even in this case, due to the limited number of orthogonal pilot sequences, pilots must be shared among the users in neighboring cells resulting in the so-called pilot contamination problem [6].
To overcome the effects of pilot contamination, several blind channel estimation schemes have been proposed in recent years [7]- [10]. Blind methods, however, suffer from phase ambiguities in the demodulated symbols, requiring a pilot symbol and user label in order to resolve these ambiguities [8]. In contrast, adding a few additional training symbols in a semi-blind approach significantly improves the channel estimation accuracy compared to the blind methods (see Fig.  1 in [11], [12]). Semi-blind channel estimation methods are proposed and analyzed in [13] and [14]. The asymptotic results in [13] show that with a semi-blind approach the effective length of pilot symbols increases without affecting the system's throughput suggesting that this approach can mitigate the effect of pilot contamination in multi-cell massive MIMO systems. In [15] the authors present a semi-blind joint channel estimation and data detection method based on the regularized alternating least-square (R-ALS) method. Semi-blind algorithms are also proposed in [16], [17].
In the algorithms described above, the massive MIMO channel is assumed to be spatially uncorrelated, as well as static during the transmission time of a frame. Due to their large number of antennas, the BS's in massive MIMO systems have fine angular resolutions making some spatial directions more likely than others [18], [19]. This results in spatial correlation in a user's channel vector to the BS. Secondly, while the assumption of static channel (the so-called block-fading) model is valid for stationary or low-mobility users, it breaks down for high-mobility users where the mobile channel is subject to time variations due to Doppler spread. While for block-fading channels CSI estimation is only required at the start of each frame, for time-varying channels, CSI estimates need to be updated instantaneously throughout the frame [5], [20].
For a time-varying massive MIMO channel, the data rates achievable by a linear MMSE estimator is studied in [21]- [24] and interpolation-based channel estimation schemes are presented in [25]- [27]. The performance of interpolationbased methods is strongly dependent on the number of the missing channel coefficients between the pilots and the length of the pilots within the frame. In the approach in [27], referred to as FIT-CP, the missing channel coefficients are predicted using a first-order Taylor expansion model. Simulation results show that FIT-CP outperforms the classical interpolation-based method. Kalman filter (KF) is popular for estimating time-varying channels with linear signal models [24] and has been developed for massive MIMO channel estimation in [28]. Simulation results are presented for channels with normalized Doppler shifts up to 0.03. KF is also used to estimate time-varying sparse massive MIMO channel in [29]. In [30] deep convolutional neural network (DCNN) is proposed for channel estimation in hybrid massive MIMO systems with time-varying channels. It is shown that the DCNN-based channel estimator reduces the required pilotoverhead to 1/3. In the above algorithms channel estimates are based on the transmitted pilot symbols alone. As shown in [31], for the same number of pilot symbols, the semi-blind methods which also exploit the transmitted data symbols outperform the pilot-only based methods.

A. CONTRIBUTIONS
We consider semi-blind joint channel estimation and data detection in multi-cell massive MIMO systems for a spatiallycorrelated time-varying channel. This paper is a substantial extension of [32] which was presented at the International Conference on Communications (ICC), June 2021. In the following we summarize the contributions made in this paper and point out the content which is not included in [32].
• The channel vector from a user to the BS is modeled as a complex circularly-symmetric Gaussian vector with a given (known) correlation matrix [1]. As in [21], [22], [24] the temporal correlation of the channel is modeled by a first-order AR process with its correlation corresponding to the Jakes' model [33]. A semi-blind method is developed for joint channel estimation and data detection in which the users transmit a few pilot symbols (on the order of the number of users) at the beginning of each frame. The proposed method is based on the EP algorithm and iteratively approximates the joint a posterior distribution of the channel matrix and the transmitted data symbols with a distribution from an exponential family. This distribution is then used for direct estimation of the channel matrix and detection of the data symbols.
As mentioned previously, this algorithm was presented in part in our conference paper [32]. However, [32] does not include the main steps in the derivation of the proposed method including the factor graph-based illustrations of the EP algorithm in Figs. 1-3 which help explain the derivation of the main algorithm. The summary of the algorithm as Algorithm 1, Appendices A and B, Remark 1 on the computational complexity of the algorithm, and Subsection III.A entitled "Reducing computational complexity" are also not included in [32]. • A modified (semi-blind) version of Kalman filtering (KF) algorithm referred to as KF-M is also proposed and its performance is evaluated. KF-M emerges from our EP derivation and is used to initialize the EPbased algorithm. The backward recursion equations of Kalman smoother are also a part of our EP derivations. The performance of the Kalman smoothing algorithm followed by a single pass of KF-M is also presented here and is denoted as KS-M. To benchmark the performance of semi-blind KF-M, KS-M, and EP, we also present the performance of the Kalman filter and smoother in a pure training mode (TM) when the entire frame is composed of known pilot symbols and only channel estimation is performed. These two cases, are referred to as KF-TM and KS-TM. In channel estimation, KF-TM provides a lower bound for the semi-blind KF-M, and KS-TM provides a lower bound for the semi-blind KS-M and EP. We also present the symbol error rate (SER) performance of the MMSE estimator with known CSI (denoted PCSI) for comparison with SER performance of the proposed algorithms. Channel estimation error and SER versus the antenna array size and the length of the data frame are presented in [32]. For completeness these results are also included here in Figs. 4-5 and 10-11. Additional results on channel estimation error and SER versus the number of antennas for spatially correlated channels and different cases of temporal correlation are presented in Figs. 6-7. Channel estimation error and SER versus interference cross gain for a multi-cell system is presented in Figs. 8-9 and the performance of the proposed method versus the Doppler frequency is presented in Figs 12-13. • [32] does not include any comparisons with algorithms designed for time-varying channels. In this paper we compare our results with those from the interpolationbased algorithm FIT-CP and show that KF-M, KS-M and EP all significantly outperform FIT-CP. In addition, to demonstrate that the semi-blind algorithms developed under the assumption of a block-fading channel are not suitable for a time-varying channel model, we compare our results with those from [13] and [15].

B. ORGANIZATION AND NOTATIONS
The rest of this paper is organized as follows. Section II describes the system model of a time-varying multicell massive MIMO system. The semi-blind EP algorithm for this system is derived in Section III. Simulation results are discussed in Section IV and Section V concludes this work. Throughout this paper, small letters (x) are used for scalars, bold small letters (x) for vectors, and bold capital letters (X) for matrices. R and C represent the set of real and complex numbers, respectively. The superscripts (.) T , (.) H , (.) * , and (.) −1 represent transpose, Hermitian transpose, complex conjugate, and matrix inverse, respectively. Also, ⊗ denotes the matrix Kronecker product. For a pdf p(.), E p denotes the expectation operator with respect to p(.). I N denotes the N × N identity matrix. Finally, vec(X) and ||x|| denote the vectorization of the matrix X and the 2 norm of the vector x, respectively.

II. SYSTEM MODEL
We consider a multi-user MIMO network made up of L cells each with its own BS and with K users in each cell. Every BS has M antennas and each user has a single-antenna transceiver. At time t, the channel gain between the m-th antenna of the l-th BS and the k-th user present in the i-th cell is denoted by h limk (t) and written as where g limk (t) models the fast-fading channel between the k-th user in cell i and the m-th antenna of BS l, and β lik models the large-scale fading incurred by the geometric attenuation and shadowing effects. We assume that β lik is a known constant which is independent of the antenna index m.
The fast-fading channel g limk (t) is considered to be a wide-sense stationary complex Gaussian process with zero mean and unit power. Using the Jakes' model [33], the time autocorrelation of g limk (t) is given by in which, J 0 (.) is the zero-order Bessel function of first kind and f d limk represents the normalized maximum Doppler shift corresponding to the channel between the m-th antenna of cell l and the k-th user in cell i. The time autocorrelation function of h limk (t) is then given by Let g lik (t) [g li1k (t), g li2k (t), . . . , g liM k (t)] T represent the M × 1 fast-fading channel vector from the k-th user in cell i to the l-th BS antenna array. We assume that the elements in g lik (t) are correlated with the correlation . The overall channel gain between the l-th BS and the users in cell i is given by where D li diag{β li1 , β li2 , . . . , β liK }. It is assumed that the users' channels to the BS are uncorrelated. In particular this implies that The signal vector received at BS l at time t is given by where s i (t) = [s i1 (t), s i2 (t), . . . , s iK (t)] T represents the transmitted symbols by all the users in the i-th cell. We assume that the symbols s ij (t) belong to an M-ary modulation constellation, denoted by A M , and have zero mean with average energy E s . We also assume that E[s i (t)s j (t) H ] = E s δ ij I K , i.e., the symbols s ij (t) are independently selected from A M . The noise term at the l-th BS is modeled with a circularly symmetric complex Gaussian distribution, i.e., where w l (t) has zero mean and correlation matrix If the modulation constellation A M is PSK, then w l (t) is a circularly symmetric complex-valued Gaussian vector, i.e., w l (t) ∼ CN (w l |0, R w ). For other modulations schemes such as QAM, assuming that KL is large and invoking the central limit theorem we assume that w l (t) ∼ CN (w l |0, R w ).

VOLUME 4, 2019
To keep the notations uncluttered, in the following we drop the subscript l from the signal model and write the received vector at time t at the l-th BS as where H t is the channel gain matrix, s t ∈ A K M is the transmitted symbols by the K users, and w t is the complex Gaussian disturbance with covariance matrix given by (7).
By applying the vectorization property as in [7], (8) can also be written as where we define S t = s T t ⊗ I M and h t = vec(H t ). Autoregressive (AR) process has been widely used to model the time evolution of the channel matrix [36]- [39]. Since a higher-order AR model can be written as a first-order model in matrix state-space form [24], [38], we use a firstorder AR model (AR(1)) for the time-varying vector channel given by where A is a diagonal matrix with the elements on the diagonal denoted by [A] n,n = a n , for n = 1, 2, . . . , M K.
The variable a n is the AR(1) coefficient corresponding to the n-th channel between a user and a BS antenna in the cell. We let a n = J 0 (2πf d n ) in which f d n is the normalized maximum Doppler shift for channel n 1 . The innovation process {v t } is an iid circularly symmetric complex Gaussian process with v t ∼ CN (0, Q). To include the spatial correlation of the channel vector, we set Q = R 1/2 The matrix Q v is diagonal with elements on the diagonal given by [Q v ] n,n = σ 2 n , for n = 1, 2, . . . , M K. To match the autocorrelation function in (3) such that the average power of the channel coefficient in (1) is equal to the large-scale fading coefficient, we set σ 2 n = (1 − a 2 n ). Consider a transmitted frame of length T which is comprised of T p pilot symbols in the beginning followed by We are interested in a detector where, having received Y, the unknown channel vectors in H and the unknown transmitted symbols in S d are jointly estimated. The posterior joint distribution of S and H is given by in which p(s t ) is the probability mass function (pmf) of the transmitted vector s t and by convention we set p(s t ) = 1 for t = 1, . . . , T p and p(h 1 ) = p(h 1 |h 0 ). From (9) we have The optimum receiver implements the maximum a posteriori rule according to the following Finding the optimum solution in (13) is very difficult and requires multidimensional integration. The proposed EP algorithm in the next section exploits the multiplicative nature of (11) to approximate the conditional distribution of (S, H) such that the marginals can be easily calculated.

III. SEMI-BLIND EP FORMULATION
In this section we develop the EP algorithm for semi-blind detection in for spatially correlated fast fading channels. For a review of the EP algorithm we refer to [41], [42]. Let F denote a family of exponential distributions. Similar to [39] we exploit the factorized structure of (11) to approximate the posterior distribution p(S, H|Y) with the following distributions from F. (11) and following [39], we use the following product form for q(S, H): Small rectangles represent factor nodes and circles represent variable nodes. A plate (big rectangle) notation is used to represent a repetition of variables in the subgraph. (16) and (17) into (15), and letting q F

Now inserting
From (18) and (14), the approximating posterior distribution q t (s t , h t ) is given by Note that in (19) we have p(s t ) = 1 for all t = 1, . . . , T p . The true posterior distribution in (11) and its approximation in (18) are illustrated with factor graphs in Fig. 1. Updating the factors q F t (.), q R t (.) and q O t (.) in (19) result in forward, reverse, and observation messages, respectively, which propagate through the factor graph in Fig. 1 (a) until a good approximation to the true posterior is obtained.
With EP, we update the factors ensuring that they belong to the exponential family F. As a result, their product q(S, H) also belongs to F and can be effectively used for maximum a posteriori estimation. To this end, we select the distributions and q R t (h t ) from the exponential family F and q O t (s t ) to be discrete. Since p(s t ) is also discrete, it follows that (19) and (18) are from the exponential family. In the following, we update these factors.
First, we ignore the beginning portion of the frame and compute q O t (s t ) and q O t (h t ) for the blind part of the frame, i.e., for t = T p + 1, . . . , T . As shown in Fig. 2 we define the cavity distribution as For the exponential family F, we consider the family of multivariate Gaussian distributions. More specifically, we let q \O FIGURE 3: Steps to update q R t−1 (h t−1 ) and q F t (h t ): (a) Eliminate q R t−1 (h t−1 ) and q F t (h t ) to find the cavity distributions q \R t−1 (h t−1 ) and q \F t (h t ) as in (33), (b) Use p(h t |h t−1 ) to define the intermediate posterior distribution r t (h t−1 , h t ) as in (34), and (c) Project r t (h t−1 , h t ) onto F and update q F t (h t ), q R t−1 (h t−1 ) as in (40), (41), and (43).

VOLUME 4, 2019
functions (pmf) of their corresponding random variables. Denoting A K M {a 1 , a 2 , · · · , a M K }, the pmf is defined as Next, the hybrid posterior distribution is defined by combining the t th factor in the likelihood function p(Y|S, H), namely p(y t |s t , h t ), with (20) to get the following intermediate approximate posterior: where the normalization factor Z t is given in Appendix A.
Next we projectq t (s t , h t ) onto the closest distribution (in the sense of Kullback- where KL(· ·) denotes the KL divergence. It is shown in [7] that the above optimization problem can be divided into the following two separate optimizations: q t (s t ) = arg min whereq t (h t ) andq t (s t ) are the marginal distributions of h t and s t , respectively, derived from their joint distribution q t (s t , h t ).
The solution to (24) is obtained from the so-called moment matching property [7]. Since the approximated posterior q t (h t ) ∈ F, we assume q t (h t ) ∼ CN (h t |m t , V t ). The moment matching property implies The values of m t and V t are given by the Lemma in Appendix B. Therefore q O t (h t ) can be updated as As discussed in [39], during the iterations of the algorithm, V O t may become singular. Therefore, to avoid numerical issues we write the mean and covariance as the natural parameters as follows: The solution to (25) is to match the pmf of the posterior distribution q t (s t ) to the marginalq t (s t ) which results in Therefore, q O t (s t ) is updated by We now consider the beginning of the frame for t = 1, . . . , T p for which we do not compute q O t (s t ) as s t 's are the known pilot symbols. To update q O t (h t ), given q \O t (h t ) the posterior factor q t (h t ) can be directly updated from (61) and (63) by using It should be noted that for this part of the frame, the summations in (60), (62), and (64) reduce to a single term corresponding to the known pilot symbol s t . Following this, q O t (h t ) can be computed from (29) and (30).
Next, we need to update q F t (h t ) and q R t (h t ) for the entire frame. Following the steps summarized in Fig. 3 we define the following intermediate distribution where where and the means are related by Next we project r t (h t−1 , h t ) onto F by minimizing the following KL divergence to get As in the case of the optimization in (23), the above problem can also be decomposed into two separate problems. In each one, we minimize the KL divergence between q k (h k ) ∀ k ∈ {t, t − 1} and the respective marginal distribution Update mt using (52) and Vt using (53). end as follows:  obtained from r t (h t−1 , h t ) by integrating out the other variable. Setting q t (h t ) to the marginal distribution in the KL optimization problem, q F t (h t ) can be updated from where To update q R t (h t ), we follow the Kalman smoothing derivation to directly incorporate it into q t (h t ). Towards this, we first compute the term q \R t (h t ) as follows Next, since (38) is related to the marginal distribution of h t−1 , we obtain the mean of the posterior distribution q t−1 (h t−1 ) by solving for µ t−1 . By substituting µ t = m t , Λ t−1,t−1 and Λ t−1,t from (35)- (36), it can be shown that Using results from Kalman smoothing the above can be reduced as follows. where The update equations are obtained by substituting (50) and (51) into (46) and (47) and adjusting the notation to update the tth factors:

A. REDUCING COMPUTATIONAL COMPLEXITY
For t = T p + 1, . . . , T , computation of (62) and (64) require summation of M K terms which is computationally challenging. Since pilot symbols are transmitted for t = 1, · · · , T p , even in the first pass of the algorithm, m \O t provides a reasonably good estimate for h t for t = T p + 1, · · · , T . Therefore, the terms in V \O t and Σ t become smaller. As a result, the PDF CN y t |S t m \O t , Σ t becomes narrow and all the summands in (62) and (64) become negligible except for the single term in which y t is close to S t m \O t . To find the dominant term we can use either one of the two following methods: a) MMSE estimator: where is the inverse of the vec(.) operation, and followed by the hard decision b) ML estimator : Onceŝ t is computed from above, then (62) and (64) can be approximated with  in the smoothing pass. Therefore the computational complexity of the proposed algorithm is O(nT (M 3 K 3 + M 2 K 2 )) where n is the number of iterations of EP. This is the same as that of the EP algorithm in [7].

IV. SIMULATION RESULTS
In this section, we evaluate the performance of the proposed semi-blind EP algorithm for joint channel estimation and symbol detection through simulations. We consider a cellular system with L = 4 cells and K = 8 users in each cell. The performance of the algorithms in the first cell is presented. The large-scale fading coefficients are set to β 11k = 1 and β 1ik = α, k = 1, . . . , K and i = 2, 3, 4. By varying the value of the cross gain α we study the effect of pilot contamination and inter-cell interference on channel estimation and symbol detection [43], [44].
Using the Kronecker model, [35], [42] for the spatial correlation matrices R 1ik , we assume [R 1ik ] m,n = r(m−n), m, n = 1, 2, . . . , M and i = 1, 2, 3, 4, where r(m) = (ρ) |m|2 . The transmitted frame of length T is composed of T p pilots symbols in the beginning followed by the T d unknown data symbols. QPSK modulation with average symbol energy E s = 0 dB is assumed for both pilot symbols and the data symbols. Hadamard code is employed to ensure that the pilot symbols of users in the first cell are orthogonal. Note that with our assumed signal model in (9), orthogonality between the pilot sequences in the first and neighboring cells is not assumed. The time-varying channel vectors for all the users in the first cell are generated according to (10) with an initial Gaussian prior distribution on h 0 with zero mean and covariance matrix R h . Further, it is assumed that all the users in the first cell have the same normalized Doppler shift f d and therefore the diagonal components of matrix A in (10) are set to J 0 (2πf d ) 3 .
Algorithm 1 is initialized with m \R 0 = 0 and V \R 0 = R h . Further, since EP converges in a few iterations, we set the maximum number of iterations to n = 10 and the error tolerance for terminating the algorithm to = 10 −6 .
The symbol error rate (SER) is averaged over all the users in the first cell and the channel estimation accuracy is measured using the following normalized error, We note that (57) and (58), together with (42), (61), and (63) represent the prediction and time-update equations of the Kalman filtering algorithm where the unknown data symbols are estimated from density maximization. Therefore we refer to the initial forward pass of our algorithm as the modified Kalman filter (KF-M). The performance of this semi-blind version of Kalman filter which emerges from our EP derivations is also presented. Further, (52) and (53) represent the backward recursion equations of Kalman smoother. Thus, the performance of the Kalman smoothing algorithm followed by a single pass of KF-M is also presented here for comparison and is denoted by KS-M. To benchmark the performance of KF-M, KS-M, and EP, we also present the performance of the Kalman filter and smoother in a pure training mode (TM) when the entire frame is composed of known pilot symbols and only channel estimation is performed. These two cases, are referred to as KF-TM and KS-TM. In channel estimation, KF-TM provides a lower bound for KF-M, and KS-TM provides a lower bound for KS-M, and EP. Finally, we also plot the SER performance of the MMSE estimator with known CSI (denoted PCSI) for comparison with SER performance of the proposed algorithms.    Fig. 4 also shows the performance of the semi-blind expectation maximization (SB-EM) and the regularized alternating least-square (R-ALS) algorithms proposed in [13] and [15], respectively, for the channel described above using T p = 8 pilot symbols. We observe that for f d = 0.01, both SB-EM and R-ALS perform better than the KF-M and KF-TM algorithms, but worse than the KS-M and EP. The improvement in performance over KF-M and KF-TM is because both SB-EM and R-ALS estimate the channel using the entire received frame, whereas at any given time, KF-M and KF-TM update the channel estimates using the received signals up to the present time. Since KS-M, KS-TM, and EP use the entire received frame as well in a smoothing pass, they outperform SB-EM and R-ALS. Further, we observe that in the case of f d = 0.04, where the channel is more time-varying, the performance of R-ALS and SB-EM is worse than all the other algorithms.  Figs. 4 and 5), ρ = 0.4 and ρ = 0.9 confirms the expected result that as ρ increases, channel diversity decreases resulting in the degradation of system performance. In addition, spatial correlation reduced the level of channel hardening. In conclusion, as spatial correlation increases, a larger number of antennas are required to achieve the same level of performance [1].
In Figs. 8 and 9, we study the effect of pilot contamination on channel estimation error and SER. To this end, we vary the cross gain α for the cases of M = 32, 64. As expected, as α increases, the performance of the algorithms degrades. However, as the figures show, EP significantly outperforms KF-M and KS-M. Moreover, there is a significant improvement as M increases from 32 to 64.
In Figs. 10 and 11 we show the performance of the semiblind algorithms versus the number of data symbols T d in the transmitted frame. Two different cases of inter-cell interference described by α = 0.3 and α = 0.4 are considered. It is observed that as T d increases, the channel estimation error and SER performance of all algorithms improves. This improvement is due to the fact that our semi-blind approach uses the T d data symbols as virtual pilot symbols in estimating the channel. Thus for a fixed number of antennas M , using a larger frame duration T d can mitigate the impact of pilot contamination and time-varying channels with a pilot-overhead much less than 1/9. For example for T d = 136, the pilot overhead is only 5.5%. In these figures, the improvement in EP's performance is better than both KF-M and KS-M. This is due to the fact that EP updates the channel estimate at each time instant by incorporating the forward, reverse, and observation messages. In contrast, KF-M does not include the reverse and observation messages and KS-M does not include the observation messages. Hence as T d increases, the channel    Finally, in Figs. 12 and 13, we study the performance of KF-M, KS-M, and EP versus the normalized Doppler shift f d . We consider two cases of inter-cell interference, namely α = 0.1 and α = 0.3. It is observed that for both cases as f d increases (the temporal correlation among the channel vectors decreases), the performance of all algorithms degrades. It is interesting to note that for f d ≤ 0.005, the performance of the algorithms is almost constant with f d . In essence for these values of f d , the channel may be assumed to be time-invariant (block fading). For the parameters listed in 4 , this translates to mobile velocities less than 20 Km/h. The channel estimation performance of the interpolation-based algorithm FIT-CP [27] is also shown in these figures for T p = 8 and T p = 32 pilot symbols. Note that since FIT-CP performs interpolation in the time-domain, it does not take the channel spatial correlation into account. It can be seen that for the same number of pilot symbols (T p = 8), FIT-CP performance is much worse than the KF-M, KS-M, and EP. For example for f d = 10 −2 , EP outperforms FIT-CP by more than 10 dB in channel estimation. The corresponding SER values are 5 × 10 −3 and 2 × 10 −1 for EP and FIT-CP, respectively. It is noteworthy that increasing the number of pilots to T p = 32 improves the performance of FIT-CP only slightly while increasing the pilot overhead from 1/9 to 1/3.

V. CONCLUSIONS
In this paper, we propose a semi-blind algorithm for joint channel estimation and symbol detection in multi-cell massive MIMO systems for spatially and temporally correlated VOLUME 4, 2019 channels. The expectation propagation (EP) algorithm is developed to approximate the a posteriori distribution of the channel matrix and data symbols with a distribution from an exponential family. The latter is then used to directly estimate the channel and detect the data symbols. A modified (semiblind) version of the classical Kalman filtering algorithm (referred to as KF-M) which emerges from our EP derivations is also proposed. Performance of Kalman Smoothing algorithm followed by KF-M (referred to as KS-M) is also examined. Simulation results show that the performance of KF-M, KS-M, and EP algorithms improves with the increase in the number of base station antennas M and the number of the data symbols, T d , in the transmitted frame. Thus for a fixed M , using a large T d can mitigate the effect of pilot contamination and time-varying channels in multi-cell systems with a pilot-overhead of around 5%. Moreover, the EPbased algorithm significantly outperforms the interpolationbased algorithms such as FIT-CP and the semi-blind KF-M and KS-M algorithms. .