Long-Block Differential Spatial Modulation

Spatial modulation (SM) can transmit additional data bits by selecting antennas and avoids synchronization and interferences among antennas. Differential SM (DSM) is a noncoherent technique of SM, but it is less bandwidth efficient than SM. This paper proposes a novel DSM scheme called long-block DSM (LB-DSM), whose block is longer than the original DSM’s. Each block of LB-DSM consists of two parts. One part is the block of the original DSM, and the other is similar to SM. With the aid of the latter part, the data rate of LB-DSM is higher than that of DSM. Besides, we propose bit mapping and low-complexity noncoherent maximum-likelihood detection for LB-DSM, by which LB-DSM is less complicated than DSM. In addition to higher bandwidth efficiency and lower decoding complexity, distance analysis and computer simulation results indicate that LB-DSM provides better error performance than DSM for 8PSK (eight-ary phase shift keying). Furthermore, the idea of LB-DSM is applied to generalized DSM (GDSM), which activates more than one transmit antenna each time. The resulting scheme, called long-block GDSM, offers better bandwidth efficiency and error performance than GDSM.


I. INTRODUCTION
Multi-antenna techniques can increase transmission rates or diversity order and thus are widely used in wireless communication systems. Spatial modulation (SM) is a multi-antenna technique that activates a single transmit antenna each time [1], [2], [3], [4], [5], [6]. By selecting transmit antennas, SM can send additional data bits. Besides, SM only needs a single radio-frequency chain to avoid synchronization and interferences among transmit antennas. Differential SM (DSM) is a noncoherent scheme of SM which does not need pilot overhead and channel estimation [7], [8]. There are restrictions on selecting antennas in DSM, so the data rate of DSM is lower than that of SM with the same signal constellation.
Various DSM schemes have been proposed. In general, they can be classified into three categories: (1) to reduce detection complexity, e.g., [9], [10], [11], [12], and [13], (2) to increase diversity order, e.g., [14], [15], [16], [17], and [18], (3) to increase transmission rates, such as dif- The associate editor coordinating the review of this manuscript and approving it for publication was Abdel-Hamid Soliman . ferential quadrature spatial modulation (DQSM) [19], rectangular DSM [20], and DSM using amplitude-phase shift keying (APSK) [21], [22], [23]. In this paper, we propose a novel DSM scheme with increased transmission rates. In the proposed scheme, each transmitted block, composed of two parts, is longer than the original DSM's and thus is called long-block DSM (LB-DSM). One part is the block of DSM which contains reference symbols for the next transmitted block. The other part does not restrict activating antennas, so its data rate is identical to SM's. Consequently, LB-DSM is more bandwidth efficient than the original DSM.
However, there are three challenges to realizing the idea of LB-DSM. The first one is to perform differential encoding (DE) because all data matrices of DE are square so far, i.e., the block length is identical to the number of transmit antennas. The second one is to to map data bits to the two parts jointly. The last one is to reduce the complexity of noncoherent maximum likelihood (ML) detection. In this paper, we propose solutions to these challenges. In [11], we indicate that the DE of DSM is equivalent to the DE of DPSK executed several times while the antenna selection of the data matrix is equivalent to selecting reference symbols of the previous VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ block. Hence, we propose that all symbols of the two parts in LB-DSM select reference symbols of the previous block to perform the DE of DPSK. Besides, we offer various methods of bit mapping and corresponding low-complexity noncoherent ML detection. Regarding the number of real-valued multiplications at the receiver, LB-DSM with the proposed methods is less complicated than DSM. Besides, the distances of codeword pairs are analyzed. We find that LB-DSM does not increase the minimum distance of DSM. For QPSK, compared with DSM, LB-DSM has more nearest neighbors, so its error performance is worse. But for 8PSK, LB-DSM does not increase the number of nearest neighbors, and simulation results show that the error performance of LB-DSM is better than that of DSM at medium and high SNRs (signal-to-noise ratios). The generalized DSM (GDSM) scheme proposed in [24] uses Alamouti's space-time block code (STBC) [25] to construct data matrices, so the transmit diversity order is two. Because the GDSM scheme permutates blocks of STBC only, there are only a few possible permutations of blocks of STBC. Consequently, the transmission rate of the GDSM scheme is low. Similar to the LB-DSM, we propose a new GDSM scheme called long-block GDSM (LB-GDSM). Compared to GDSM in [24], LB-GDSM provides higher data rates and better error performance.
In DQSM, the in-phase and quadrature parts select antenna bits independently, so their antenna bits are more than DSM's. However, the phases of the received signals are affected by channel coefficients, and noncoherent receivers do not have information on channel coefficients. Consequently, noncoherent receivers cannot recover the in-phase and quadrature parts of the transmitted signals. Therefore, it is incorrect that the receiver of DQSM uses the in-phase and quadrature signals. On the other hand, in rectangular DSM, the number of time slots per transmitted block is smaller than the number of transmit antennas, so its bandwidth efficiency is excellent. The probability of not activating any specific antenna during finite transmitted blocks is nonzero, so this transmit antenna may not be used in limited reference blocks. Thus, finiteblock differential detection may fail. Hence, only recursive decision-feedback differential detection [26], which uses all previously received blocks, including the initial reference block (activating all transmit antennas), with a forgetting factor, can be applied to the transmitter of rectangular DSM. The proposed LB-DSM is designed for two-block differential detection, so it is not compared with rectangular DSM, which uses multi-block differential detection. The signal of LB-DSM can be either M -ary phase-shift keying (MPSK) or APSK, so LB-DSM using APSK is more bandwidth efficient than DSM using APSK.
The main contributions of this paper are summarized as follows.
1) A new bandwidth-efficient DSM scheme with new DE is proposed.

2)
Bit-mapping methods of DSM are designed.

3)
Low-complexity noncoherent ML detectors are proposed. The remainder of this paper is organized as follows. In Sec. II, we briefly review DSM. After that, the transmitter of LB-DSM, including differential encoding and bit mapping, is proposed in Sec. III. Subsequently, low-complexity ML detectors are proposed in Sec. IV, and the bound of the error probability is derived and the distances of codeword pairs are analyzed in Sec. V. Then we propose LB-GDSM in Sec. VI. Simulation results are presented and discussed in Sec. VII. Finally, the conclusion is given in Sec. VIII.
Notation: (.) † and ∥.∥ denote the conjugate transpose and the Frobenius norm of a matrix, respectively. diag{.} represents the operation from a row vector to a diagonal matrix. I n and 0 n denote the n × n identity matrix and the null matrix, respectively. ⌊⌋ denotes the floor function. CN (0, σ 2 ) denotes the zero-mean, varianceσ 2 , complex Gaussian distribution.

II. REVIEW OF DSM
Consider a communication system with N T transmit antennas and N R receive antennas, and the channels between antenna pairs are Rayleigh-fading and independent of each other. Each transmission block contains T time slots, and the transmitted signals are represented by an N T × T matrix S(t). The received signals of the tth block are represented by an N R ×T matrix Y(t) given by where H(t) is the N R × N T matrix of channel coefficients whose entries are CN (0,1), and N(t) is the N R × T matrix of additive white Gaussian noise (AWGN) with CN (0, N 0 ) entries.
In the original DSM scheme, the value of T is T = N T . In each block, data bits first form an N T × N T matrix where A(t) is an N T × N T antenna-index matrix with entries 0 and 1, and x(t) is a 1×N T matrix representing N T data symbols. Then the transmitted matrix is obtained by differential encoding At the receiver, the noncoherent ML detector determines X(t) byX (t) = arg min Only one transmit antenna is activated at each time slot, and each transmit antenna in each block is activated exactly once. Hence, there is only a nonzero entry in each row and column of S(t), and so do X(t) and A(t). Consequently, the number of permutating the antennas index in each block is N T !, so Q = 2 ⌊log 2 N T !⌋ permutations for A(t) are used. The signal constellation, denoted by S, is MPSK where M = 2 m and m is an integer. For the tth block, log 2 Q data bits determine A(t) ∈ {A 0 , A 1 , · · · , A Q−1 } and mN T data bits decide x(t) = [x 1 (t), x 2 (t), · · · , x N T (t)] where x i (t) ∈ S for i ∈ {1, 2, · · · , N T }, so the spectral efficiency is The transmitted symbol at the kth time slot of the tth block, denoted by s k (t), is the nonzero element of the kth column in S(t). For A(t) and A q where q ∈ {1, 2, · · · , Q}, the reference order p(t) = [p 1 (t), p 2 (t), · · · , p N T (t)] and p q = [p 1 q , p 2 q , · · · , p N T q ] is defined in [11] where p k (t) and p k q ∈ {1, 2, · · · , N T } (k ∈ {1, 2, · · · , N T }) represent the position of 1 in the kth column of A(t) and A q , respectively. According to (3), the kth column of S(t) is the kth column of X(t) multiplied by the rows of S(t − 1). Because the only nonzero element in the kth column of X(t) is the p k th element, we have for k ∈ {1, 2, · · · , N T }. The activated antenna for s k (t) is the antenna used by s p k (t) (t − 1). Therefore, differential encoding of DSM, (3), is equivalent to differential encoding of DPSK, (5), executed N T times independently according to p(t). In other words, differential encoding (3) can be realized without matrix multiplication. The reference symbols of differential encoding (5) are the symbols in S(t − 1), and the order of the reference symbols is specified by the reference order p(t). Unlike coherent SM that selects orders of antenna activation, DSM chooses orders of reference symbols of the previous block.

III. THE TRANSMITTER OF LB-DSM A. DIFFERENTIAL ENCODING
In DSM, all N T symbols in a block are reference symbols of the next block. Consequently, in one block, each transmit antenna ought to be activated exactly once, so the number of A(t) is N T ! only. In SM, without such limitation, the number of the patterns of selecting antennas is more than that in DSM, which are N N T T antenna patterns for N T time slots. In the proposed LB-DSM scheme, in each block, N T symbols are the reference symbols of the next block, and the remaining T − N T symbols are uncorrelated DPSK symbols that freely select their reference symbols in the previous block. In other words, each block is divided into two parts: one part is identical to the block of DSM, and the other part does not have any restriction on selecting antennas, which is similar to SM.
The data matrix X(t) is also formed by (2), but both X(t) and and X (2) (t) is an N T × N T reference data matrix for the next block. The coherence time needed in the detection, which is one plus the maximal difference of time indexes of two received symbols used in each detection, is 2N T in the original DSM scheme. In order to minimize the coherence time needed in LB-DSM, we put the reference matrix near the next block, i.e., X (2) (t) is put at the end of X(t). By doing so, the coherence time needed in the detection of LB-DSM is N T +T , which is larger than the coherence time needed in the detection of DSM. Similarly, we have S(t) = [S (1) Because S (2) (t) is the reference matrix for the next block, each transmit antenna should be activated exactly once. Therefore, the number of permutating p (2) (t) for LB-DSM is identical to that for DSM, which is N T !. On the other hand,  (1) (t) and p (2) N T !−1 }, respectively. At the transmitter, there are two ways to obtain S(t): by either p(t) or A(t). For the former, the transmitted signal s k (t) where k ∈ {1, 2, · · · , T } is obtained by differential encoding of DPSK, i.e., For the latter, X(t) is obtained from A(t) and x(t) by (2), and then S(t) is obtained by differential encoding of DSM using the reference matrix S (2) (t − 1), i.e., The two approaches are equivalent, but the latter requires matrix multiplication and thus is more complicated than the former, especially if N T is large. Figure 1 is a schematic to explain the differential encoding of LB-DSM. Compared with DSM, LB-DSM increases the time duration of the transmitted block and thus increases the delay in detection. But the increased delay in detection is slight and thus usually tolerable.

B. MAPPING FROM DATA BITS TO THE REFERENCE ORDER
If data bits of jointly mapping for p (1) (t) and p (2) (t) are more than data bits of separately mapping, i.e., N = jointly mapping from N data bits to p(t) is used; otherwise, the mapping of p (1) (t) and p (2) (t) is separate.
In the case that N T is a power of two, the mapping of p (1) (t) and p (2) (t) should be separate because jointly mapping VOLUME 11, 2023 does not increase the data rate. Besides, the mapping of p (1) (t) in this case is very simple, which is symbol by symbol independently. Total log 2 N T −N T T = (T − N T ) log 2 N T data bits for p (1) (t) are divided equally into T − N T groups. For each k ∈ {1, 2, · · · , T − N T }, the value of p (k) is determined by log 2 N T data bits of a group. For simplicity of reference, this mapping is called mapping I of p (1) If the mapping of p (1) (t) and p (2) (t) is separate and N T is not a power of two, a mapping method of p (1) (t), which is similar to the bit mapping in [12] and [18], is proposed. First Therefore, the relation between b 1 and p (1) , p (2) This mapping is called mapping II of p (1) (t).
A method of jointly bit mapping for p (1) (t) and p (2) (t), which is similar to the above one, is proposed in the following. First, N data bits form an integer, denoted by gives out a quotient of q with a remainder of r, i.e., The reference order p (1) (t) and p (2) (t) is p (1) r and p (2) q , respectively. If N T −N T T is not large, p (1) r can be obtained by a look-up table directly; otherwise, to reduce the complexity, mapping II of p (1) (t) with b 1 = r can be used.
Examples for N T = 3, 4 and 5 are given below and their rates of antenna bits are listed in Table 1.

C. DISCUSSION ON LB-DSM WITH HIGHER RATES
It is possible to design LB-DSM with even higher data rates. In a valid pattern of p(t), to provide reference symbols of all N T transmit antennas, all N T integers {1, 2, 3 · · · , N T } should be found in the entries of p(t). In other words, invalid patterns are patterns in which at least an integer in {1, 2, 3 · · · , N T } cannot be found. Hence, the number of all valid patterns of However, the positions of reference symbols are unknown to the receiver, so the problem of this scheme is the detection. There are two possible solutions: either trying all received symbols in the previous block as the reference symbols or using the detected values of the previous block. The former increases the number of candidates, and the latter may cause error propagation, so both approaches result in unsatisfactory error performance.

IV. REDUCED-COMPLEXITY ML DETECTORS
Representing the received matrix and the noise matrix by Y(t) = [Y (1) (t) Y (2) (t)] and N(t) = [N (1) (t) N (2) (t)], respectively, where Y (1) (t) and N (1) (t) are N R ×(T −N T ), and Y (2) (t) and N (2) With (7), (1) becomes Subsequently, using H(t) = H(t −1) and (11) can be rewritten as where N ′ (t) = N(t) − N (2) (t − 1)X(t). Each entry in N ′ (t) is CN (0, 2N 0 ) whose variance is twice as large as the entry in N(t). Therefore, the noncoherent ML detection iŝ The brute-force ML detection needs to test all possible X(t) in (13) individually. The complexity of the detectors is evaluated in terms of the number of real-valued multiplications. For a complex number x and y, xy and |x| 2 need four and two realvalued multiplications, respectively. In (13), because there is only a nonzero entry in each column of X(t), Y (2) (t − 1)X(t) requires 4N R T real-valued multiplications. In addition, the Frobenius norm in (13) needs 2N R T real-valued multiplications, so total 6N R T real-valued multiplications are necessary for calculating (13) once. Consequently, the brute-force ML detector has to do 6N R 2 N ≈ 6N R N T −N T T N T ! real-valued multiplications per time unit, which increases exponentially with T .
Although the low-complexity ML detection proposed in [11] can reduce the complexity, it still has to evaluate all candidates of p(t) = [p (1) (t) p (2) (t)], which is rather complicated. In this section, instead of jointly detecting p (1) (t) and p (2) (t), we consider separately detecting, which is less complicated.

V. PERFORMANCE ANALYSIS
The error performance of DSM is determined by the coherent distances of the data matrix X(t) [27]. Let {X 1 , X 2 , · · · , X n } denote the set of all possible matrices of X(t) where n = ⌊log 2 (N T −N T T N T !)⌋ + mT . Define D ij = (X i − X j )(X i − X j ) † whose rank is denoted by r ij . Say the nonzero eigenvalues of D ij are λ ij1 , . . . , λ ijr ij . The transmit diversity order, denoted by r min , is the minimum value of r ij , i.e., r min = min 1≤i<j≤n r ij [28]. The squared distance between X i and X j is (20) and the coding advantage which only considers r ij = r min is defined as [28] Apparently, for LB-DSM, the transmit diversity order r min is 1. To compute d 2 min , consider codeword pairs which may result in the minimum distance d min . Assume T = 2N T . Let , and The minimum distance d min is due to either differing in only one nearest MPSK symbol, which is d(X 1 , X 2 ) = d(X 1 , X 3 ), or differing in least antenna pattern, which is d( The values of d 2 (X 1 , X 4 ) and d 2 (X 1 , X 5 ) are 2 and 4, respectively, and the value of d(X 1 , X 2 ) = d(X 1 , X 3 ) depends on the signal constellation S, which is 2 and 2 − √ 2 = 0.586 for M = 4 and 8, respectively. We find that changing the value of T does not affect these values. Consequently, in the case of QPSK, the difference of antenna patterns from X (2) (t) does not result in d 2 min = 2, but some different antenna patterns from X (1) (t) result in d 2 min ; while in the case of 8PSK, all different antenna patterns from X (1) (t) and X (2) (t) do not result in d 2 min = 0.586. Notice that the minimum distance and the number of nearest neighbors (the codewords with the minimum distance) dominate the error performance at high signal-to-noise ratios (SNRs). Therefore, for QPSK, LB-DSM increases the number of nearest neighbors compared with the original DSM, so we expect that LB-DSM has worse error performance than DSM. For 8PSK, because LB-DSM does not increase the number of nearest neighbors and has higher data rates than DSM, we expect that LB-DSM has better bit error rates (BERs) v.s. E b /N 0 than DSM at high SNRs.

VI. LONG-BLOCK GENERALIZED DIFFERENTIAL SPATIAL MODULATION
In the GDSM scheme in [24] where N T is an even integer, the data matrix for the tth block is where X 1 (t), X 2 (t), · · · , X N T /2 (t) are Alamouti's STBC, and , 2, · · · , Q} is composed of I 2 and 0 2 . The transmitted matrix of GDSM is also obtained by differentially encoding (3), and the noncoherent ML detection is (4) as well. The number of permutating I 2 and 0 2 in each A(t) is (N T /2)!, so the value of Q is Q = 2 ⌊log 2 (N T /2)!⌋ . For N T = 4, only one data bit determines A(t) because there are

A(t) carries two data bits only.
In the proposed LB-GDSM, we have where A (1) (t), X (1) (t) and S (1) (t) are N T × (T − N T ), and A (2) (t), X (2) (t) and S (2) is the coherent antenna-index matrix, coherent generalized spatial modulation (GSM) schemes that use Alamoiti's STBC can be applied to A (1) (t). For simplicity, we consider the simplest GSM scheme as follows. Because two transmit antennas are used for sending an STBC block, transmit antennas are divided into N T /2 groups, each containing two transmit antennas. For every two time slots, the number of possibilities of selecting antennas is N T /2. In other words, in every two columns of A (1) (t), there are one I 2 and N T 2 − 1 0 2 . Therefore, the number of all possibilities for A (1)  Consequently, if MPSK symbols in X(t) and 1 and 0 in A(t) are replaced by Alamouti's STBC, I 2 and 0 2 , respectively, DSM becomes GDSM in [24] and LB-DSM becomes LB-GDSM. Therefore, the spectral efficiency of LB-GDSM with N T transmit antennas and T time slots is identical to the spectral efficiency of LB-DSM with N T /2 transmit antennas and T /2 time slots. The bit mapping and reduced-complexity ML detectors proposed for LB-DSM can be applied to LB-GDSM straightforwardly. Notice that there are some complicated schemes of coherent GSM using STBC, e.g., [29], [30], [31], [32], and [33], and perhaps applying these GSM schemes to LB-GDSM can further improve the spectral efficiency with the price of increased complexity. Figure 2 presents simulation results of the LB-DSM examples using QPSK with N R = N T . The transmission rates are in Table 1 plus 2 bits/sec/Hz. For the same N T , all curves of LB-DSM are worse than those of DSM, as expected in Sec. V.  Simulation results of the LB-DSM examples using 8PSK are shown in Fig. 3, whose transmission rates are in Table 1 plus 3 bits/sec/Hz. For the same N T , at high SNRs, all curves VOLUME 11, 2023  of LB-DSM outperform the curves of DSM, and LB-DSM with T = 3N T is slightly better than LB-DSM with T = 2N T . At high SNRs, the gap between LB-DSM and DSM for N T = 4 and 5 is more significant than for N T = 3. The gain of LB-DSM over DSM at high SNRs is due to higher data rates, as the distance analysis in Sec. V. Compared with DSM, in addition to the advantage of higher bandwidth efficiency, LB-DSM has better error performance except at low SNRs.

VII. SIMULATION RESULTS
Suppose the signal constellation is higher-order, such as 16APSK. In that case, the distance between two antenna patterns of X (2) (t) is larger than the minimum distance caused by the two nearest points in the signal constellation. Consequently, LB-DSM would have better BERs v.s. E b /N 0 than DSM at high SNRs, as in the case of 8PSK, which we explain in Sec. V. The signal constellation of 16APSK is shown in Fig. 4 where the used ring ratio r 1 /r 0 is two. Fig. 5 presents simulation results of the LB-DSM examples using 16APSK similar to those of 8PSK. The transmission rates are in Table 1 plus 4 bits/sec/Hz. Simulation results of GDSM and LB-GDSM using QPSK with N R = N T are shown in Fig. 6. We find that all LB-GDSM outperforms GDSM except at low SNRs, and LB-DSM with T = 3N T is slightly better than LB-DSM with T = 2N T .

VIII. CONCLUSION
In this paper, we have proposed a novel DSM scheme called LB-DSM, which can be viewed as a combination of DSM and SM. Benefiting from the part of SM, LB-DSM is more bandwidth efficient than DSM. We also propose low-complexity methods of bit mapping and ML decoding for LB-DSM. In addition to the advantages of higher data rates and lower decoding complexity, both distance analysis and simulation results show that LB-DSM using 8PSK has better error performance than DSM using 8PSK at high SNRs. Besides, LB-GDSM also offers higher data rates and better BERs than GDSM.