A Structured Sparse Bayesian Channel Estimation Approach for Orthogonal Time—Frequency Space Modulation

Orthogonal time–frequency space (OTFS) modulation has been advocated as a promising waveform for achieving integrated sensing and communication (ISAC) due to its superiority in high-mobility adaptability and spectral efficiency. In OTFS modulation-based ISAC systems, accurate channel acquisition is critical for both communication reception and sensing parameter estimation. However, the existence of the fractional Doppler frequency shift spreads the effective channels of the OTFS signal significantly, making efficient channel acquisition very challenging. In this paper, we first derive the sparse structure of the channel in the delay Doppler (DD) domain according to the input and output relationship of OTFS signals. On this basis, a new structured Bayesian learning approach is proposed for accurate channel estimation, which includes a novel structured prior model for the delay-Doppler channel and a successive majorization–minimization (SMM) algorithm for efficient posterior channel estimate computation. Simulation results show that the proposed approach significantly outperforms the reference schemes, especially in the low signal-to-noise ratio (SNR) region.


Introduction
Future mobile communication systems must be better suited to serve a variety of developing applications in environments with high levels of mobility, including loworbit satellites, manned and unmanned aircraft, manned and unmanned vehicles, and high-speed trains [1][2][3]. In these high-mobility scenarios, mobile channels exhibit the characteristics of doubly dispersive channels due to the influence of the Doppler effect and the multipath propagation effect. Orthogonal frequency division multiplexing (OFDM) technology, which is currently widely used in fourth-generation mobile communication systems, fifth-generation mobile communication systems [4,5], wireless local area networks (WLANs), digital video broadcasting (DVB), and other broadband transmission systems, can overcome intersymbol interference (ISI) caused by time dispersion [6]. Of course, the studies on OFDM systems over doubly dispersive channel are also plentiful. Some researchers [7,8] have studied the channel estimation and data detection problems of OFDM systems over doubly dispersive channels, and the simulation results showed that the performance of the proposed method is close to the ideal case with perfect channel state information. However, due to the strict orthogonality among the subcarriers of OFDM systems, the orthogonality will be greatly destroyed by the channel frequency dispersion effect caused by frequency bias and fast time-varying channels. This results in serious intercarrier interference (ICI), which greatly degrades system performance and makes it impossible to provide efficient and reliable services. Therefore, OFDM is highly sensitive to frequency offset and channel frequency dispersion [9].
Orthogonal time-frequency space (OTFS) is a new modulation technology that can adapt to the time-frequency double dispersion propagation characteristics of channels in a high-speed mobile environment; it has attracted extensive attention in industry [10][11][12]. A time-varying channel in the time-frequency (TF) domain is changed into a time-invariant channel in the DD domain using OTFS modulation. After such a two-dimensional transformation, the transmitted data symbols are subjected to minor subcarrier interference, and all symbols in the same OTFS data frame experience slow fading independent of time selectivity. OTFS modulation is carried out in the DD domain.
At present, OTFS modulation has made certain research progress in many aspects, such as system performance analysis and modulation and demodulation algorithms. It is assumed in [10] that the ideal pulse-forming waveform satisfies orthogonal criteria in both time and frequency, but in reality, this is not possible because of Heisenberg's uncertainty principle. As a result, the biorthogonality assumption was loosened, and an OFDM-based OTFS modulation system was presented in [13,14]. This system takes into account the cyclic prefix (CP) of each OFDM symbol in the OTFS frame as well as the non-ideal rectangular pulse-shaping waveform. This benefit stems from the simplicity of implementation, but it also introduces the issue of high out-of-band radiation causing interference with adjacent channels. At the moment, frequency local pulse shaping [15] and time-domain windowing technology [16] are the two major techniques for maximizing out-of-band attenuation by lowering out-of-band radiation. Compared with the TF domain, the advantage of the DD domain is that only a small area, which is based on the channel response that will be introduced in the second part of the system model, is needed to describe the channel, and it has stronger sparsity for better channel estimation.
Currently, the main technical research on OTFS modulation involves the estimation of channel state information and data detection after estimation. Ref. [17] proposed an embedded pilot-assisted channel estimation scheme. The delay-Doppler plane's arrangement rules for data symbols, pilot symbols, and guard symbols were created to successfully prevent pilot symbols and data symbols from interfering with one another during each OTFS frame. The pilots were created for multipath channels with integer and fractional Doppler shifts using the OTFS modulation technology. However, if only one pilot symbol was inserted, it caused high peak-to-average-power ratio (PAPR) problems. For example, in this article, pilot power is greater than data power. In [18], a channel estimation and data detection framework based on a superposition pilot was proposed. This framework superimposes a low-power pilot on data symbols in the DD domain, thus increasing the space of data transmission and effectively improving the spectral efficiency of OTFS modulation transmission. Such a design induces high computation complexity, and the interference between data and pilot symbol also reduces the estimated accuracy.
As emphasized in [19], integer approximation cannot adequately simulate the channel's noninteger delay and Doppler shift. Therefore, using an orthogonal matching pursuit approach based on binary refinement estimation, which has a tolerable computing complexity and much lower estimated normalized mean square error, has been suggested. According to the sparseness of the channel in the delay-Doppler domain, ref. [20] suggested a new pilot mode and a channel estimate method based on sparse Bayesian learning (SBL). In this new pilot pattern, the guard interval was not set, and the data and pilot had the same energy. To update the parameters in the previous model using the expectation maximum (EM) algorithm, a sparse Bayesian learning framework was provided. Channel estimation was treated as a sparse recovery problem. To a certain extent, the pilot symbol power cost was reduced, as was the pilot symbol power consumption, and the noise interference was weakened.
In this paper, to reduce the pilot cost and improve the accuracy of channel estimation, we treat the channel problem of estimating the DD domain as a sparse recovery problem and derive the sparse model expression from the input-output relationship of the DD domain. We provide a new sparse Bayesian framework based on the channel's sparse features, and the block successive majorization-minimization (SMM) technique is employed to handle the problem of the new prior structure. The simulation results show that, in comparison to existing reference approaches, our suggested method greatly reduces the noise interference in the situation of a low signal-to-noise ratio.
The rest of the paper is organized as follows. The OTFS modulation system model is described in Section 2. The structured sparse Bayesian approach for channel estimation is presented in Section 3. The simulation results are reported in Section 4, and Section 5 concludes the paper.

System Model
We consider an OTFS modulation system with M symbols and N subcarriers. The transmitter and receiver are equipped with a single antenna. The OTFS modulation frame is shown in Figure 1.

OTFS Modulation and Demodulation
The transmitter transforms the information symbol x[k, l] ∈ C M×N from the DD domain to the time-frequency domain using the inverse symplectic finite Fourier transform (ISFFT).
At this time, X[n, m] generates a time waveform s(t) using a transmission pulse g tx (t). This transform is called the Heisenberg transform.
where ∆ f and T are the subcarrier spacing and symbol period, respectively. After that, the signal s(t) is transmitted through a fast time-varying channel with a complex baseband channel impulse response h(τ, ν), which specifies the channel response to an impulse with delay τ and Doppler ν.
The channel only has a few parameters in the delay-Doppler domain. The sparse representation of channel h(τ, ν) is given as where P denotes the number of paths for propagation. δ() denotes the Dirac delta function, whereas h i , τ i and ν i stand for the path gain, delay, and Doppler shift associated with the i-th path, respectively. We choose the delay and Doppler taps for the i-th path as follows: where the integers l τ i , k ν i represent the delay tap and Doppler tap indices corresponding to the delay τ i and Doppler frequency ν i , respectively, and κ ν i ∈ [−0.5, 0.5]. We do not need to examine the impact of the fractional delay because the resolution of the time axis is sufficient to approximate the path delay to the nearest integer grid point. The received signal r(t) is given by the Wigner transform and is realized by the receiving matching filter of impulse response g rx , obtaining the time-frequency domain symbol Y[n, m].
Finally, the symplectic finite Fourier transform (SFFT) is used to convert the information symbols Y[n, m] in the TF domain into the symbols y[k, l] in the DD domain .
When the transmitter pulse g tx and the receiver pulse g rx are ideal pulse functions, the relationship between the transmitted signal and the received signal in the delay-Doppler domain can be written as where v[k, l] denotes the system noise in the DD domain. We set the proper Q to counteract the impact of fractional Doppler. The delay-Doppler plane is discretized to an M × N grid. According to Equation (7), due to the influence of the doubly dispersive channel, the signal received at the k-th Doppler point is subjected to the superposition of signals from the 2Q + 1 (k − Q to k + Q) Doppler points. In other words, the received signal y[k, l] is a linear combination of the sent signal Ω = ∑ P i=1 2Q + 1.

Sparse Delay-Doppler Domain Channel Model
To implement channel estimation, we place L = (2k max + 2Q + 1)(l max + 1) pilot symbols x p [k, l] in the DD domain. The greatest delay τ max and Doppler ν max related to the delay and Doppler taps are defined as k max and l max . In addition, we formulate Equation (7) in a different form, as follows: After a series of formula calculations, we finally obtain the matrix form of the formula as where Φ ∈ C S×L , h ∈ C L×1 , and V ∈ C S×1 are the received signal, transmitted signal. and Gaussian noise in the DD domain, respectively. S is the size of the channel estimation area. A simple proof is provided in Appendix A.

Structured Sparse Bayesian Approach for Channel Estimation
In this section, according to the sparse characteristics of the channel in the DD domain, we use Equation (8) to introduce the improved sparse Bayesian algorithm in detail. First, we briefly introduce the design of the pilot pattern at the receiver and transmitter. Second, we consider the sparse characteristics of the channel and design an improved sparse Bayesian algorithm. Finally, the complexity of the proposed algorithm is discussed.

Pilot Placement and Pattern Design
The power of the pilot symbol in the embedded pilot channel estimate approach is frequently substantially larger than that of the data symbol, which is unrealizable in actual applications. Only inserting one pilot symbol will result in a high PARP issue [17]. Meanwhile, unlike [20], we have set up a protection interval symbol to avoid interference between data and pilot symbols. Here, we set the pilot symbol power to be the same as the data symbol power, transforming the channel estimation problem into a sparse channel recovery problem. This not only effectively solves the problem of high pilots power but also avoids interference between data that affects estimation results.
We arrange the pilot and data symbols in the delay-Doppler grid for OTFS frame transmission, as shown in Figure 2. We select k ∈ [k p − k max , k p − k max ] and l ∈ [l p , l p + l τ ] as the pilot symbols. The other fields are guard symbols and data symbols.
Usually, (k p , l p ) are placed onto ( N 2 , M 2 ). l τ is used to control the proportion of pilots in the whole transmission symbol. The final channel estimation region is set to k ∈ [k p − k max − Q, k p − k max + Q] and l ∈ [l p , l p + 2l τ ]. In other words, the size of the channel estimation section S = (4k max + 2Q + 1)(l max + l τ + 1), as shown in Figure 3.

Structured Sparse Bayesian Channel Estimation Problem Formulation
We present the channel coefficient characteristics in the DD domain in Figure 4. According to the characteristics of channel sparsity, we propose a new estimation algorithm. We provide an example of the DD domain channel corresponding to (7). As shown in Figure 4, the DD domain channel only has 4 non-zero responses over the whole DD channel, whose coordinates align with the associated delay and Doppler shifts. Therefor, estimating channel h can be viewed as a sparse recovery problem. Although the traditional sparse Bayesian algorithm can be used here, there are only values on each path according to the channel sparsity in the DD domain, and when a value is detected, it means that there are values before and after the value, so the traditional algorithm does not reflect this feature.
As above, in the design of the diagonal matrix D l , the design of the parameter α l,k not only affects the kth element of channel h to be estimated; it also affects its neighbouring elements. As α l,k tends to infinity, the k-th, (k + 1)th, and (k − 1)th elements are simultaneously driven to zero. The distribution of zero elements exhibits a blocky distribution. On the other hand, by dividing the whole h into l max column vectors, the function of γ is used to distinguish the difference between the different column vectors.
According to Equation (9), the posterior distribution of h can be expressed as p(h|y, α l,k , γ) ∝ p(y|h)p(h|α l,k , γ) (15) where the likelihood function of y is p(y|h) , which is By substituting Equations (13) and (16) into Formula (15), it is not difficult to prove that this is a posterior distribution that follows a complex Gaussian distribution through mathematical calculation, and the mean and covariance are as follows: Here, as in the traditional sparse Bayesian algorithm, we express the Gamma hyperprior with respect to α as follows: where a and b are fixed parameters. In this section, we introduce the small positive numbers a and b a = b = 10 −4 . Such a configuration encourages a flat prior for α. We also select a Dirichlet hyperprior for γ, where u = [u 1 , u 2 , . . . , u l max ] is a fixed parameter. C(u) = Γ(∑ lmax l=1 u l ) Γ(u 1 )···Γ(u l ) is the normalization constant. The role of γ is to model the relative difference between different sub-vectors u. By exploiting the common sparsity structure, the suboptimal solutions in the conventional methods that predict a value of zero for one element of h [l] and predict a nonzero value for the element of h [l =l] in the same position can be eliminated. Since the expectation of γ with respect to the distribution (19) is given by E[γ] = u l /∑ l max l =1 u l , we can interpret the parameter u, which provides a initial guess as to the relative difference between the submodules of h. The parameter that provides an initial estimation of the relative difference between the submodules of h can be interpreted as u. Note that a default setup for u can be u l = 1 for l = 1, . . . , l max , which indicates an initial assumption of different delay modules.
The maximum posterior (MAP) estimation of h can be derived by its posterior mean in Equation (17) as long as α l,k . As a result, in the sections that follow, we concentrate on obtaining the ideal α l,k and γ by solving the MAP issue.
Since we proposed a new structured prior model for h, it is clear that the above problems cannot be solved if we continue to use the traditional sparse Bayesian algorithm. Therefore, we provide a solution using the block SMM algorithm.

SMM Algorithm for Posterior Channel Estimate Evaluation
The proposed block SMM algorithm is similar to the expected EM algorithm. It has two main steps. The first step is to replace the optimization step with the expected step, and its purpose is to find the alternative function of the objective function, that is, the local approximate solution. The second step is the minimization step, in which the alternative function found in the first step is minimized. While the block SMM technique divides the parameters to be estimated into many blocks for alternate optimization, the EM algorithm updates all parameters concurrently. This is helpful when solving structured SBL is the only obstacle to successfully minimizing all of the proxy function's parameters.
After a series of mathematical calculations, the specific algorithm is as follows. (1) Step 1: A substitute for the objective function − log p(α l,k .γ , y) is required at the corresponding step of the (i + 1)th iteration. In this study, we make use of the surrogate function proposed by [21], which meets the requirements stated in (21), that is, a simple proof is provided in Appendix B.
G(α l,k , γ, y|α where the final inequation obeys Jensen's inequality, and when the inequation satisfies (α l,k , γ) = (α (i) l,k ,γ (i) ), the equal sign is true. By substituting (15) into (23), we can obtain G(α l,k , γ, y|α where the expectation is in relation to p(h|y,α (i) l,k ,γ (i) ). const denotes that it is not related to α l,k or γ. (2) Step 2: To minimize the surrogate function, the parameters α l,k are changed sequentially in the corresponding step of the (i + 1)th iteration.
To update α l,k ; γ is regarded as a constant when α l,k is estimated. By substituting (13) and (18) into (24), we obtain G(α l,k ,γ (i) , y|α where Equation (17) is used to compute∑ ∑ ∑ andμ, and (α l,k , γ) is the result obtained from the i-th iteration. Next, we consider the derivative of (25) with regard to α l,k to analyse the optimality condition.
Specifically, when k = 1 where MAX = 2k max + 2Q + 1. Because α l,k is coupled with the adjacent terms α l,k−1 and α l,k+1 in the term ξ k , we are unable to acquire α l,k by setting (26) to zero. Here, we have α l,k , α l,k+1 , α l,k−1 . The expression for Equation (26) is By setting (29) to zero, we can obtain Here, we do not address special cases (when k = 1 and k = MAX).
To update γ, by leaving out the terms that are unrelated to γ and inserting (13) and (19) into (24), we obtain where α l,k is fixed at its latest estimation α l,k = α (i+1) l,k . In relation to γ, the derivative of (31) can be written as The best value of γ can be attained by setting (32) to zero aŝ Step 1 and Step 2 are performed iteratively until convergence. Algorithm 1 is a summary of the SMM algorithm presented in this paper.

Input: threshold ε, maximum iterations iter max
Setting the initial value α k and γ as the first loop value. Output: Channel estimation h. 1: Step 1: the i-th iteration α (i) l,k ,γ (i) as fixed parameters α l,k , γ. Update the posterior mean and covariance matrix of h using (16). 2: Step 2: Update α l,k by (29).
Update γ by (32). Afterwards, the algorithm will be further studied and introduced into the MIMO system. The dimensions of the channel change from the SISO system to the MIMO system, and the channel dimension increases linearly with the increase of the number of antennas. The sparse structure of the channel will therefore expand from one-dimensional to multi-dimensional.
The calculation of the posterior µ and covariance matrix Σ of h in the MA-step in Algorithm 1 accounts for the majority of the computing effort for each iteration iter max . Therefore, the maximum computational complexity order of the structured SBL-based channel estimation algorithm is O iter max L 3 S 3 .

Simulation Results
In this section, we verify the performance of our proposed algorithm by the normalized mean square error (NMSE). The NMSE was defined as h −ĥ 2 2 / h 2 2 . We set the parameters as follows (see Table 1): the number of symbols N = 64, the number of subcarriers M = 128, the carrier frequency was 15 kHz, and the subcarrier frequency was 4 × 10 9 Hz. The maximum Doppler shift varied according to the speed, and the maximum delay value was l τ = 10. k max was a parameter related to speed. First, we compared the performance of different algorithms under different signalto-noise ratios (SNR). Here, we used the reference algorithm for the least square (LS) algorithm, the orthogonal matching pursuit (OMP) algorithm, and the conventional SBL algorithm [22]. The user speed was 250 km/h, and l τ = 10. As shown in Figure 5, in the case of a low signal-to-noise ratio, our proposed structured SBL algorithms have a significant advantage over other algorithms. Specifically, our algorithm better reduces interference. Then, we compared the cases where l τ is constantly changing in different delay regions, as shown in Figure 6. In this OTFS modulation system, we set the SNR to 20 dB, the user speed to 250 km/h, and Q = 4. The figure shows that the structured SBL algorithm we proposed maintains certain advantages. As the delay range l τ continues to expand, the estimation results become increasingly accurate. We can see that under the same estimation error, compared to other algorithms, the structured SBL has the smallest value in delay dimension, which means that the pilot proportion is the smallest. For example, under the condition of a mean square error of 10 −2 , the values of the delay dimension l τ of the structured SBL and conventional SBL algorithm are 7 and 8, respectively. The pilot overhead (except guard symbols) is η = (2k max +1)(l τ +1)

M×N
. The pilot overheads of the structured SBL and the conventional SBL are about 1.07% and 1.21%, respectively. Next, we set the different speeds for performance comparison in Figure 7. Here, we set the speed variation range to [150,350]; the SNR was 20 dB, Q = 4, and l τ = 5. The figure shows that the channel estimation results are affected to some extent with increasing speed. The higher the speed is, the slightly worse the accuracy of the estimation becomes. However, in terms of the trend of the curve, the curve is smooth overall with little change, and the LS algorithm is highly sensitive to changes in speed. On the other hand, the structured SBL algorithm maintains high robustness.

Conclusions
In this paper, we derived a sparse structure model formula from the input-output relationship of the DD domain. Based on the characteristics of the channel's sparse structure, we treated channel estimation as a sparse recovery problem. In addition, we used block and continuous alternating optimization to solve the parameter problem of block distribution, which further improved the accuracy of channel estimation. Finally, through simulation analysis, we verified that the proposed algorithm has certain advantages over the traditional SBL algorithm. Under the condition of a low signal-to-noise ratio, the estimation accuracy was higher, and the influence of interference was diminished.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In an OTFS system, the input-output relation of DD domain is denoted as