Doppler Spread Estimation Based on Machine Learning for an OFDM System

In this paper, we propose a Doppler spread estimation approach based on machine learning for an OFDM system. We present a carefully designed neural network architecture to achieve good performance in a mixed-channel scenario in which channel characteristic variables such as Rician K factor, azimuth angle of arrival (AOA) width, mean direction of azimuth AOA, and channel estimation errors are randomly generated. When preprocessing the channel state information (CSI) collected under the mixed-channel scenario, we propose averaged power spectral density (PSD) sequence as high-quality training data in machine learning for Doppler spread estimation. We detail intermediate mathematical derivatives of the machine learning process, making it easy to graft the derived results into other wireless communication technologies. Through simulation, we show that the machine learning approach using the averaged PSD sequence as training data outperforms the other machine learning approach using the channel frequency response (CFR) sequence as training data and two other existing Doppler estimation approaches.


Introduction
Information about Doppler spread can be used for handoff, adaptive modulation, equalization, power control, etc., in wireless communication systems. Various Doppler spread estimation approaches have been studied for isotropic scattering and Rayleigh fading channels [1][2][3][4][5][6]. The Doppler spread estimation approaches of [5,6] in single-antenna systems have been shown to outperform previous Doppler spread estimation approaches in isotropic scattering and Rayleigh fading channels. However, the performances of previous Doppler spread estimation approaches designed for isotropic scattering and Rayleigh fading channels degrade when the channels are shaped by nonisotropic scattering and line-of-sight (LOS) channel components. Therefore, it is still worthwhile to develop powerful new techniques for nonisotropic scattering and Rician fading channels for single-antenna systems. Meanwhile, machine learning has been the focus of extensive research in recent years because of its empirical success in various fields [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24]. First of all, machine learning has been used for visual object recognition and speech recognition [7,8]. Machine learning-based approaches have been successfully applied to wireless communication systems [9][10][11][12][13][14][15][16][17][18]. Recently, efforts have been made to apply machine learning to estimate the angle of arrival (AoA) [19][20][21][22] and apply it to indoor positioning [23,24]. In [25], machine learning was used to improve the performance of human detection and activity classification based on the information of micro-Doppler signatures, i.e., Fourier transforms of the Doppler radar data measured under lineof-sight (LOS) conditions. In [26], machine learning was used to improve the performance of vehicle collision avoidance services based on information from the Doppler profiles, i.e., spectral representations of temporal non-line-ofsight (NLOS) Doppler energy. In [27], machine learning was used to estimate Doppler spread based on channel state information (CSI) by a high-speed train (HST) system. The channel considered in [27] differs from a typical mobile radio communication channel in that it features a LOS component formed based on the position of the HTS moving on the same track with respect to adjacent radio head units. To the best of our knowledge, machine learning has not been used for Doppler spread estimation in the context of general mobile radio communication channels. This led to the study of applying machine learning to Doppler spread estimation for mobile radio communication channels in this paper.
Machine learning can work efficiently even when the system model is unknown, or the parameters cannot be accurately estimated. Doppler spread estimation techniques designed for specific channel conditions do not work well in real-world channel environments where channel characteristic variables such as Rician K factor, azimuth angle of arrival (AOA) width, average azimuth AOA, and channel estimation error are arbitrarily generated. Therefore, it is interesting to see if additional gains in performance can be obtained by applying a machine learning approach to Doppler spread estimation, especially when the channel characteristic variables are randomly generated. In [28], it was mentioned that machine learning requires a very large number of training data to effectively train the weights of a neural network for accurate classification. In other words, machine learning will not perform well if the amount of training data is not sufficient. However, it is often difficult to obtain a very large number of training data in real mobile radio communication systems. Given a limited number of training data, preprocessing the training data into high-quality training data by using feature selection and feature extraction can improve the performance of machine learning [29,30]. This motivated our study to find out how to preprocess the collected CSI into high-quality training data to improve the machine learning performance for Doppler spread estimation.
In this paper, we propose a Doppler spread estimation approach based on machine learning for an OFDM system. We present a carefully designed neural network architecture to achieve good performance in a mixed-channel scenario in which channel characteristic variables such as Rician K factor, azimuth AOA width, mean direction of azimuth AOA, and channel estimation errors are randomly generated. When preprocessing the CSI collected under the mixedchannel scenario, we propose averaged power spectral density (PSD) sequence as high-quality training data in machine learning for Doppler spread estimation. We detail intermediate mathematical derivatives of the machine learning process, making it easy to graft the derived results into other wireless communication technologies. Through simulation, we show that the machine learning approach using averaged PSD sequence as training data outperforms the other machine learning approach using channel frequency response (CFR) sequence as training data and two other existing Doppler estimation approaches. The main contributions of this paper are summarized as follows: (1) This paper applies machine learning to Doppler spread estimation in the context of mobile radio communication channels in a mixed-channel scenario. (2) The machine learning process detailed in this paper can be easily applied to other wireless communication technologies as well.
The rest of this paper is organized as follows. Section 2 describes the system model. Section 3 proposes two machine learning approaches and presents comprehensive algorithms for grafting machine learning into Doppler spread estimation. Section 4 presents the simulation results and compares the performance of the proposed and conventional approaches. Finally, Section 5 provides concluding remarks.

System Model
We consider an OFDM system with FFT size N F . We assume that the number of OFDM symbols required to compute the PSD function is N data , and the number of test user equipments (TUEs) required to collect training data for machine learning is N TUE . The channel impulse response of the m-th TUE for m = 1, 2, ⋯, N TUE is modeled as a tapped delay line with L taps, for n = 0, 1, ⋯, N data − 1, where n denotes the OFDM symbol index and h ðmÞ l ½n denotes the coefficient of the l-th channel path. To consider the effects of nonisotropic scattering and LOS channel components on Doppler spread estimation, we model the channel coefficients as in [1] by where h ðmÞ l,a ½n and h ðmÞ l,b ½n denote non-LOS and LOS components, respectively, K ðmÞ r denotes the Rician K-factor defined on a linear scale meaning the ratio of LOS component power to non-LOS (or fluctuating) component power, and Ω h,l denotes the power delay profile. For Rician fading, most estimated K-factor values are less than 3 dB in urban areas [31]. For the simulation, we assume an exponential power delay profile where Ω h,l for l = 0, ⋯, L − 1 is given by with ρ = 0:8. The autocorrelation function of h ðmÞ l,a ½n was presented in [1,5,32] as where f ðmÞ D denotes the Doppler spread, κ ðmÞ denotes the azimuth AOA width factor which controls the width of the azimuth AOA, α ðmÞ denotes the mean direction of the azimuth AOA, T S denotes the OFDM symbol period, and I 0 ð·Þ denotes the zero-th order modified Bessel function of the first kind. The l-th Rician channel component, h l,b ½n, can be written as is the parameter representing the phase of the LOS component with a uniform distribution between 0 and 2π. The autocorrelation function of h ðmÞ l ½n was presented in [5,32] as The CFR coefficient over the k-th subcarrier is given by the discrete Fourier transform (DFT) of h ðmÞ ½n, for k = 0, 1, ⋯, N F − 1. The OFDM demodulated signal over the k-th subcarrier of the n-th OFDM symbol for the m-th TUE can be written as where X ðmÞ k ½n denotes the transmitted symbol and W ðmÞ k ½n denotes a zero-mean circularly symmetric complex Gaussian noise with variance σ 2 . SNR is defined as 1/σ 2 assuming that the transmitted symbol has unit average power. If the number of pilot symbols in the OFDM block is N P , the CFR coefficient over the subcarrier of the p-th pilot symbol for p = 0, 1, ⋯, N P − 1 can be estimated by the least square detection method in [33] aŝ where n P denotes the subcarrier index of the p-th pilot symbol. We use the minimum mean square error-based channel interpolation method in [34] to estimate the CFR coefficients, fĤ ðmÞ k ½ng 0≤n≤N data −1,0≤k≤N F −1 . By using this channel estimation method, the channel estimation error increases as the SNR decreases. The ideal PSD function of the channel can be obtained as in [35] by taking the Fourier transform of r h ðmÞ l ½n, for u = 0, 1, ⋯, N data /2 − 1. The Doppler spread index is defined by where Tð= N data T S Þ denotes the period of collecting information of N data consecutive OFDM symbols and f ðmÞ D ð= 1/TÞ denotes the frequency resolution required to determine the Doppler spread index with the Doppler spread. Since base stations (BSs) in macrocells generally suffer from severe nonisotropic scattering, S ðmÞ ideal,l ½u has its maximum quantity at u ðmÞ max,l = u ðmÞ D cos α ðmÞ [5]. For severe nonisotropic scattering, it is reasonable to assume α  [5]. For severe nonisotropic scattering, the angle spread θ ðmÞ is mostly less than 30°, and for very severe nonisotropic scattering, it is less than 10° [ 5]. If κ ðmÞ is not too small, κ ðmÞ can be approximated as κ ðmÞ = ð360°/θ ðmÞ /πÞ 2 [37]. We assume that the transmitted signal undergoes severe nonisotropic scattering and that θ ðmÞ has a uniform distribution between 10°and 30°. Although there are several ways to estimate the PSD function [38], periodogrambased nonparametric techniques are often used because of its simplicity. Using the periodogram-based nonparametric technique, multiple PSD functions can be computed with multiple series of CFR coefficients; the k-th PSD function computed using a series of the k-th estimated CFR coefficient, fĤ ðmÞ k ½ng 0≤n≤N data −1 , can be written as for k = 0, 1, ⋯, N F − 1 and u = 0, 1, ⋯, N data /2 − 1. Because of channel estimation error and the lack of channel coefficients used to compute the PSD function, the accuracy of the PSD function decreases. With an inaccurate PSD function, the maximum value of S ðmÞ k ½u given in (12)  value. Therefore, it is important to set Δf ðmÞ to a small value to improve the performance of the nonparametric Doppler spread estimation approach. In order for Δf ðmÞ to be small with T S fixed, N data should be larger because Δf ðmÞ = 1/ðN data T S Þ. The sampling theorem states that in the time dimension, the channel sampling rate must be at least twice the Doppler spread, i.e., 1/T S ≥ 2f  consists of an input layer with N in nodes (or neurons), an output layer with N out nodes, and three hidden layers. Since 0 ≤ u ðmÞ D ≤ N data /10, we set N out to N data /10 and let the output layer act as a multiclass classifier generating a one-hot encoded binary sequence of multiple zeros and single ones. The position of a single 1 in the one-hot encoded binary sequence represents u ðmÞ D . The output signals of the three hidden layers are processed with the batch normalization function [39], the dropout function [40], and the rectified linear unit (ReLU) activation function [41]. The output signal of the output layer is processed with a softmax regression (SoftMax) activation function [42]. Since a bias factor of 1 is added to every sequence entering each hidden layer, the dimensions of the four weight matrices, W 1 , W 2 , W 3 , and W 4 , are chosen as ðN mid − 1Þ × ðN in + 1Þ, ðN mid − 1Þ × N mid , ðN mid − 1Þ × N mid , and N out × N mid , respectively, where N mid is selected as 1000. We randomly initialize all components of W 1 , W 2 , W 3 , and W 4 using a Gaussian distribution with mean 0 and variance ffiffiffiffiffiffiffiffiffiffiffiffi 2/N col p as in [43], where N col denotes the column number of the matrix to initialize. The CFR can be said to be a kind of raw CSI, and the averaged PSD sequence can be said to be a kind of processed CSI. In this paper, as two representative machine learning approaches, we consider a machine learning approach using the CFR sequence (i.e., raw CSI) as training data and a machine learning approach using an averaged PSD sequence (i.e., preprocessed CSI) as training data. We name the two machine learning approaches ML1 and ML2, respectively. In ML1, the training data are selected from a CFR sequences of length N F × N data (i.e., N in = N F × N data ). The m-th training data for m = 1, 2, ⋯, N TUE can be written as

Proposed Doppler Spread Estimation
where In ML2, the training data are selected from the averaged PSD sequences of length N data /2 (i.e., N in = N data /2). The m -th training data for m = 1, 2, ⋯, N TUE can be written as where The following is a summary of the notations used in machine learning algorithms: (vii) randnðN 1 , N 2 Þ represents a randomly generated N 1 × N 2 matrix whose components follow a Gaussian distribution with zero mean and unit variance (viii) ð·Þ T represents the transpose operator (ix) For input vector x, f Sum ðxÞ represents a function whose output is given as ∑   A full overview of the proposed machine learning algorithm is given in Algorithm 1. In both ML1 and ML2, the forward signal propagation algorithm, the backward error propagation algorithm, and the parameter update algorithm are performed N epoch × N TUE times in two "for" loops. In the forward signal propagation algorithm, the input vector x in is initialized with s m . The output vector y out resulting from x in is used to yield the final loss (or final cost) ξ. To compute ξ, the sequence of d = f SoftMax ðt m Þ that satisfies ∑ N data n=1 dðnÞ = 1 is prepared in advance, where t m denotes the m-th target Doppler spread index sequence written as a one-hot encoded binary sequence consisting of multiple 0's and a single 1. One example of t m can be written as where the index of a single 1 in t m represents the Doppler spread index defined in (11). The whole data that needs to be prepared in advance to run the proposed machine learning algorithm for Doppler spread estimation can be arranged in two matrices; first, the target Doppler spread index sequence matrix, and second, the training data matrix, To gather information about T, the BS can have a TUE move with a constant velocity v and have that TUE report its Doppler spread index, u D = T · f c · v/c, where c = 3 × 10 8 (m/sec) and f c is the carrier frequency. To gather information about S, the BS needs to estimate the CSI from that TUE and use the CSI to compute the training data. This method of collecting information by the BS entails system overhead. However, unless there are significant changes in Doppler spread statistics, reusing trained machine learning weights for Doppler spread estimation can reduce the overhead. Since it is difficult to actually collect the measured CSI at various times and places that can generate different Doppler spreads, in this paper, the collection of CSI necessary for machine learning or performance evaluation is limited to work through simulation.
By the double "for" loops of Algorithm 1, three subalgorithms, namely, "Forward Signal Propagation" (Algorithm 2), "Backward Error Propagation" (Algorithm 4), and "Parameter Update" (Algorithm 6) are executed iteratively. In Algorithm 2, the training data are propagated through the proposed neural network structure. Firstly, the input sequence x in is incremented by a bias factor 1 to produce the increased input sequencex 1 . Then,x 1 is multiplied by W 1 to producex 1 . Then,x 1 is batch-normalized to yield x 1 . The batch normalization function [39] is written in Algorithm 3. Then, x 1 is activated by the ReLU function [41] to 1 Input: 2 S which denotes the data of the averaged PSD sequences 3 T which denotes the data of the target Doppler 4 shift index sequences 5 6 Fixed Parameters: 7 α A = 0:001, α B = 0:01, R D = 0:02 8 9 Weight Matrix Initialization: where Then,x 1 is applied to the dropout function [40] resulting in x 2 . Dropout is a technique to solve the overfitting problem by omitting hidden nodes with predefined probabilities. The dropout function can be written as where R D denotes the dropout rate given by 0.02. z D,i for i = 1, 2, 3 denotes a randomly generated vector of zeros and ones that satisfy the condition that the number of zeros is equal to the product of the value of R D and the number of components inx 1 . In (22), for the purpose of maintaining the same power of x 2 as inx 1 , z R D ,1 is multiplied by 1/ðR D − 1Þ. Then,x 2 is obtained by inserting a bias factor of 1 into x 2 . Similar to the process above,x 3 andx 4 will also be obtained. Then, W 4 is multiplied byx 4 to producex 4 . Finally, y out is created by activatingx 4 with SoftMax function ( [42]), i.e., y out = f SoftMax ðx 4 Þ, where for n = 1, ⋯, N out . In Algorithm 4, the back-propagation error is derived using the backward error propagation technique [44,45]. To determine the final cost (or final loss) ξ, cross-entropy function [42] is used with two input vectors y out and d as When an error of amplitude 1 is back-propagated from the output position of the cross-entropy function to the n -th input position of the SoftMax function, a backpropagation error is induced as Therefore, if an error of amplitude 1 propagates back from the output position of the cross-entropy function to the input vector position of the SoftMax function, the resulting back-propagation error can be written as By the chain rule [46] accompanying (26) and (27), back-propagation error generated at the input position of W 4 can be written as Taking into account the bias factor contained inx 4 , we derive where 0 N mid −1 denotes a zero-column vector of length N mid − 1. Therefore, the back-propagation error at the input position of the dropout function can be written aŝ Sincex 3 = f ReLU ð x 3 Þ, we can write for n, m = 1, 2, ⋯, N data − 1. By expanding (31), we derive From the result of (30) and (32), the back-propagation error at the input position of the ReLU function can be written as The backward error propagation function for batch normalization is defined in Algorithm 5, whose detailed derivation is provided in the Appendix. Applying e 3 to that function will result in the back-propagation error,ȇ 3 , at the input vector position of the batch normalization function. Similar to the process above,ȇ 2 andȇ 1 can also be obtained. In Algorithm 6, batch normalization parameters, γ 1 , β 1 , γ 2 , β 2 , γ 3 , and β 3 , and weight matrices, W 1 , W 2 , W 3 , and W 4 , are updated according to stochastic gradient descent [47]. The batch normalization parameters, γ i and β i for i = 1, 2, 3, are updated according to where α A denotes the learning rate chosen as 0.001. The values of dγ i and dβ i are generated by the backward batch normalization function, the detailed derivation of which is presented in the Appendix. W 4 is updated according to the stochastic steepest descent method as where α B denotes the learning rate chosen as 0.01. To find ÞðM · e − f Sum ð eÞ · 1 M − ð f Sum ððx − μÞ ⊙ e Þ/σ 2 + εÞðx − μÞÞ 9 return ½ȇ, dγ, dβ Algorithm 5: Backward error propagation of batch normalization function ½ȇ, dγ, dβ = f BackwardBatchNorm ðx, γ, eÞ.
u i denotes a vector of length lenðx 4 Þ, where the i-th component is given as 1 and the other components are given as 0's. By substituting the result of (37) into (36), we derive By stacking the results of (38) for all indices of i and j, we can write By substituting the result of (39) into (35), we derive Similar to the process above, the formula for updating W i for i = 1, 2, 3 can also be derived as Note that the BS can train the weight matrices offline using Algorithm 1. The trained weight matrix can be used by the BS to estimate the Doppler spread of the target user equipment (UE) online. When the weight matrices are trained by ML1, the BS can find the Doppler spread of the target UE by computing the CFR sequence based on the CSI of the target UE and using the CFR sequence as x in in Algorithm 2 to generatex 4 . When the weight matrices are trained by ML2, the BS can find the Doppler spread of the target UE by calculating the averaged PSD sequence based on the CSI of the target UE for x in and applying x in to Algorithm 2 to findx 4 . Since the maximum component index of x 4 represents the Doppler spread index u D , the Doppler spread of the target UE is given by f D = u D /T based on (11).

Simulation Results
In the simulation, OFDM system parameters are chosen as N F = 128, L = 5, and T S = 0:001 sec. The number of pilot subcarriers is chosen to be N P = 8, which means that the subcarrier spacing between two neighboring pilots is given by N F /N P = 16. The machine learning settings for two machine learning approaches, ML1 and ML2, are summarized in Table 1.
To evaluate the Doppler spread estimation performance, we consider the root mean square error (RMSE) and the normalized RMSE (NRMSE) wheref ðmÞ D denotes the estimated Doppler spread at the m-th TUE and N M denotes the Monte Carlo simulation number, which is set to 10 5 . For comparison, the two previous Doppler estimation approaches introduced in [5,6] are also considered and named PREV1 and PREV2, respectively. PREV1 estimates Doppler spread by referring to the maximum position of the maxima of the multiple PSD functions derived from channel multipaths [5]. PREV2 estimates Doppler spread by referencing the maximum position of the averaged PSD values greater than 10% of the maximum averaged PSD value [6].
In Figure 2, we evaluated the RMSEs of ML1, ML2, PREV1, and PREV2 in terms of SNR in a Rayleigh and isotropic channel scenario, where f D , K r , κ, and α were assumed to be 40 Hz, 0, 0, and 0°, respectively. The same channel scenario was used to generate the CSI used for training ML1 and ML2 in the simulation. In the figure, the curve for ML2 is not shown because all RMSEs for ML2 are given as zero. This means that ML2 perfectly estimated the Doppler spread at all SNRs, unlike other approaches. PREV2 outperformed PREV1 and ML1 at SNRs above -0.8 dB, while PREV2 degraded as SNR decreased, performing worse than PREV1 and ML1 at SNRs below -2.5 dB. The reason for the poor performance of PREV2 at low SNR is that the maximum position of the averaged PSD values greater than 10% of the maximum averaged PSD value becomes more uncertain due to the PSD values adversely affected by small SNRs or large channel estimation errors. Although ML1 had higher implementation complexity than ML2, it performed much worse than ML2 because it could not effectively train the weight matrices using 2000(=N TUE ) CFR sequences. This indicates the effectiveness of using averaged PSD sequences as training data in machine learning for Doppler spread estimation compared to using CFR sequences when the number of the training data is limited.
Depending on the location and velocity of the UE and the scattering environment between the UE and the BS, channel characteristic variables such as SNR, f D , θ, K r , and α can be randomly generated. For the purpose of making ML1 and ML2 work well anytime and anywhere, in the following simulations, ML1 and ML2 were trained based on the information of the channels generated assuming a mixedchannel scenario in which SNR, f D , θ, K r , and αð= α 0 Þ have a uniform distribution between -6 dB and 30 dB, between 8 Wireless Communications and Mobile Computing 0 Hz and 100 Hz, between 10°and 30°, between 0 and 2, and between -180°and 180°, respectively. In Figure 3, we evaluated the RMSEs of ML1, ML2, PREV1, and PREV2 in terms of SNR for the channels whose characteristic variables, f D , K r , θ, and α, have a uniform distribution between 0 Hz and 100 Hz, between 0 and 2, between 10°and 30°, and between -180°and 180°, respectively. α 0 was chosen equal to α as in [36]. It was observed that ML2 outperformed the other approaches at all SNRs because ML2 effectively trained the weight matrices in the mixed-channel scenario. ML2 yielded RMSE values of less than 21 for SNR above 0 dB, which means that when the carrier frequency is 2.4 GHz and the SNR is above 0 dB, the amount of error in user speed estimation is less than 9.45 km/hr. On the contrary, ML1 performed worst for SNRs higher than -1 dB because the 2000 CFR sequences were not sufficient to properly train ML1's weight matrices. Since ML1 used unprocessed CSI (i.e., CFR sequences) for Doppler spread estimation, it yielded a constant RMSE curve. PREV2 outperformed PREV1 at SNRs above 6 dB, while PREV2 degraded as SNR decreased, performing worse than PREV1 at SNRs below 6 dB. The reason for the poor performance of PREV2 at low SNR is that the maximum position of the averaged PSD values greater than 10% of the maximum averaged PSD value becomes more uncertain due to the PSD values adversely affected by small SNRs or large channel estimation errors.
In Figure 4, we evaluated the NRMSEs of ML1, ML2, PREV1, and PREV2 in terms of f D for the channels whose characteristic variables, SNR, K r , θ, and αð= α 0 Þ, have a uniform distribution between -6 dB and 30 dB, between 0 and 2, between 10°and 30°, and between -180°and 180°, respectively. It was observed that ML2 outperformed the other approaches for all values of f D because ML2 effectively trained the weight matrices in the mixed-channel scenario. ML1 yielded a constant NRMSE curve because it used unprocessed CSI (i.e., CFR sequences) for Doppler spread estimation. ML1 performed worst when f D was greater than 53 Hz. PREV2 performed worse than PREV1 at all f D values because PREV2 tended to yield very large NRMSEs at low SNRs, which led the overall performance of PREV2 to be  Figure 2: Comparison of RMSE in terms of SNR when f D was 40 Hz, K r was 0, κ was 0, α 0 was 0°, and α was 0°.  Figure 3: Comparison of RMSE in terms of SNR when f D , K r , θ, and αð= α 0 Þ have a uniform distribution between 0 Hz and 100 Hz, between 0 and 2, between 10°and 30°, and between -180°a nd 180°, respectively. 9 Wireless Communications and Mobile Computing worse than PREV1 when the SNR was uniformly generated between -6 dB and 30 dB.
In Figure 5, we evaluated the RMSEs of ML1, ML2, PREV1, and PREV2 in terms of θ for the channels whose characteristic variables, SNR, f D , K r , and αð= α 0 Þ, have a uniform distribution between −6 dB and 30 dB, between 0 Hz and 100 Hz, between 0 and 2, and between -180°and 180°, respectively. Note that κ representing the azimuth AOA width can be expressed as θ through κ = ð360°/θ/πÞ 2 [37]. It was observed that ML2 outperformed the other approaches at all values of θ because ML2 effectively trained the weight matrices in the mixed-channel scenario. PREV1 outperformed ML1 and PREV2 for all θ values. As θ increased, the performance of PREV2 decreased, which can be explained as follows. As θ increases, the channel autocorrelation takes a sharper shape according to (5). With sharper shaped channel autocorrelation, the PSD function forms a smoother shape according to (10), resulting in a smaller maximum PSD value. Therefore, the maximum position of the averaged PSD values greater than 10% of the maximum averaged PSD value becomes more uncertain by a larger θ,   Figure 7: Comparison of RMSE in terms of αð= α 0 Þ when SNR, f D , θ, and K r have a uniform distribution between -6 dB and 30 dB, between 0 Hz and 100 Hz, between 10°and 30°, and between 0 and 2, respectively. 10 Wireless Communications and Mobile Computing especially if the PSD values are adversely affected by a small SNR or large channel estimation error. In Figure 6, we evaluated the RMSEs of ML1, ML2, PREV1, and PREV2 in terms of Rician K-factor K r for the channels whose characteristic variables, SNR, f D , θ, and αð= α 0 Þ, have a uniform distribution between -6 dB and 30 dB, between 0 Hz and 100 Hz, between 10°and 30°, and between -180°and 180°, respectively. It was observed that ML2 outperformed the other approaches at all K r values because ML2 effectively trained the weight matrices in the mixed-channel scenario. PREV2 degraded as K r increased, which can be explained as follows. As K r increases, the effect of the fixed channel component becomes more significant than the fading (or fluctuating) channel component. The greater the effect of the fixed channel component, the more imprecise it is to find the Doppler spread by referencing the maximum position of the averaged PSD values greater than 10% of the maximum averaged PSD value. Therefore, the performance of PREV2 degrades as K r increases. At high SNR, PREV1, which estimates the Doppler spread by referring to the maximum position of the maxima of the multiple PSD functions derived from channel multipaths, outperformed ML1 and PREV2 because at least one channel multipath can have a strong fading component with high probability [5] and PREV1 could benefit from that multipath. Very small values of K r cause very large RMSEs in PREV1 and PREV1. This is because for very small K r values, a low SNR or a large channel estimation error greatly deteriorates their Doppler spread estimation performance and thus causes a large overall Doppler spread estimation error even if the SNR is uniformly generated.
In Figure 7, we evaluated the RMSEs of ML1, ML2, PREV1, and PREV2 in terms of mean direction of the azimuth AOA α for the channels whose characteristic variables, SNR, f D , θ, K r , have a uniform distribution between -6 dB and 30 dB, between 0 Hz and 100 Hz, between 10°and 30°, and between 0 and 2, respectively. α 0 was chosen equal to α as in [36]. It was observed that ML2 outperformed the other approaches for most α values. However, for α values with 165°< |α | <180°, ML2 performed worse than PREV1 and PREV2. The reason for this phenomenon is that ML2 trained the weight matrices with the aim of improving the overall Doppler spread estimation performance for all α values while effectively avoiding the overfitting problem. It was observed that PREV1, PREV2, and ML2 produced RMSE curves that fluctuate in terms of α because PREV1, PREV2, and ML2 used PSD sequences for Doppler spread estimation. Notice the fact that the maximum position of the PSD function fluctuates due to the cos α 0 term as shown in (6). However, ML1 yielded a constant RMSE curve because it used unprocessed CSI (i.e., CFR sequences) for Doppler spread estimation. Because PREV1 and PREV2 assumed α 0 = 0°and estimated the Doppler spread, the performance of PREV1 and PREV2 deteriorated as α 0 ð= αÞ value changed from 0°to 90°.
In Figures 8(a) and 8(b), we investigated the impact of N TUE on the performance of ML2. In Figure 8(a), we evaluated the RMSE in terms of SNR using the same simulation parameters as given in Figure 3. In Figure 8(b), we evaluated the NRMSE in terms of f D using the same simulation parameters as given in Figure 4. It can be seen from

Conclusion
A neural network structure in machine learning for Doppler spread estimation was proposed for an OFDM system. The weight matrix update algorithm was derived with the help of the backward error propagation technique and the stochastic steepest descent method. Numerical simulations have shown that averaged PSD sequences can be used to effectively train machine learning weights for Doppler spread estimation. The proposed machine learning approach using averaged PSD sequence as training data outperformed other Doppler spread estimation approaches under various channel conditions.