Power Allocation and Low Complexity Detector for Differential Full Diversity Spatial Modulation Using Two Transmit Antennas

Differential full diversity spatial modulation (DFD-SM) is a differential spatial modulation (DSM) scheme that makes use of a cyclic unitary M-ary phase shift keying (M-PSK) constellation to achieve diversity gains at both the transmitter and receiver. In this paper, we extend the power allocation concept of generalized differential modulation (GDM) to DFD-SM to improve its block error rate (BLER). A novel power allocation scheme is formulated, and its optimum power allocation is derived. An asymptotic upper bound is presented for the new scheme and results are verified throughMonte Carlo simulations. It can be seen that for a large enough frame length, the proposed scheme can almost achieve coherent performance. We also propose a low complexity detection scheme for DFD-SM. We evaluate the computational complexity of the maximum-likelihood (ML) detector and compare it to that of the proposed algorithm. It is shown that our scheme is independent of the constellation size. Numerical simulations of the BLER are presented, and it can be seen that the proposed scheme provides nearML performance throughout the entire signal-to-noise ratio (SNR) range with a complexity reduction of about 55 % and 52 % for one and two receive antennas respectively, in the high SNR region.


Introduction
Spatial Modulation (SM) [1] is an efficient multipleinput multiple-output (MIMO) system which has a low complexity implementation.Coherent SM generally requires full knowledge of the channel state information (CSI), which adds to the complexity of implementing the system at the receiver.Coherent systems are also susceptible to pilot overhead and estimation errors [2].Non-coherent systems do not require CSI and are thus less complex to be implemented at the receiver, however they do suffer from an error performance penalty when compared to coherent systems.As such, to combat pilot overhead and estimation errors, multiple differentially encoded SM (DSM) systems have been introduced in [3][4][5][6].
Bian et al. in [3] introduced the concept of an N R × 2 DSM system, where N R is the total number of receive antennas, and 2 is the total number of transmit antennas.In DSM, communication is carried out block-wise.Two antenna matrices are created, which encode the space and time dimensions of the two M-ary phase shift keying (M-PSK) symbols to be transmitted in two time slots.At any given time slot, only one transmit antenna is active.The recursive formula to differentially encode the transmit symbols is introduced.A maximum-likelihood (ML) detector is derived which estimates the transmitted symbols without the need for CSI.The detector searches through a total of 2M 2 possible combinations in order to find the optimum solution.
Bian et al. in [4] further extended the work of [3] to an N R × N T DSM system, where N T is the total number of transmit antennas.The design for antenna selection is introduced to accommodate for the increase in the number of antenna configurations.This increases the system's spectral efficiency.The ML detector has to search through a total of 2 log 2 N T !M N T possible combinations to find the optimal solution, where • denotes the floor function.The system's results are compared with that of conventional SM, and it can be seen that DSM only suffers from a 3 dB penalty [4].
Ishikawa in [5] introduced a unified DSM architecture.In order to attain a diversity gain, the number of symbols employed per antenna-index block is a design variable.It can be seen that based on this design, a flexible rate-diversity tradeoff is achieved [5].
Zang et al. in [6] proposed an N R × 2 DSM scheme which uses a cyclic M-PSK constellation to achieve full diversity, i.e. transmit and receive diversity.In order to achieve a transmit diversity gain; the data rate has to be lowered, which coincides with [5].The system transmits the same symbol over the two time slots.The ML detector searches a total of 2M combinations in order to find the optimal solution.
In order to improve error performance of conventional differential modulation (CDM) a generalized differential modulation (GDM) scheme is introduced in [7], [8].In GDM, a frame is split up into two parts, namely a reference part and a normal part.Both the reference and normal parts convey information.The reference part differentially encodes the normal part in the current frame and the reference part in the next frame.The system allocates more power to the reference part in order to improve the system's error performance.It can be seen, that for a large enough frame length, the error performance of GDM can almost approach that of coherent detection.The optimal power allocation of GDM for two-way amplify-and-forward relaying [7] differs from that of space-time block codes [8], as it depends on the statistics of the differential modulation scheme.However, the proposed power allocation scheme in [8] can be applied to any differential modulation scheme, as it is only dependent on the structure of the received signal in CDM.
The work of [7], [8] motivates us to extend the power allocation concept of GDM to differential full diversity spatial modulation (DFD-SM) to further improve its error performance.We also propose a low complexity detection algorithm for DFD-SM, as only the ML detector is discussed in literature.
The paper is organized as follows: Section 2 is broken down into 3 subsections.Section 2.1 gives a brief overview of conventional DFD-SM and introduces its system model.Section 2.2 introduces the proposed scheme and Sec.2.3 discusses the power allocation of the proposed system.In Sec. 3, the optimum power allocation and asymptotic upper bound on the block error rate (BLER) are derived.Section 4 introduces the low complexity detection scheme for conventional DFD-SM and Sec. 5 explores the complexity analysis of the proposed detection scheme against the optimal detector.Section 6 provides the simulation results and discussion and finally, Section 7 concludes the paper.Notation used in this paper: Bold upper/lower case letters represent matrices/vectors.(•) T , (•) H , and (•) * represent the transpose, Hermitian and complex conjugate operations respectively.X (i, j) denotes the element located at the i th row and j th column of matrix X and Tr(X) denotes the trace operation, which is the sum of all elements on the main diagonal of matrix X. (z) and ∠z denotes the real part and the phase of the complex number z respectively.arg max returns the maximum argument passed to it and (•)!! denotes the double factorial operator.The modulo operation is represented as mod (x, y) = x − y x/y and round(x) rounds x up or down to the nearest integer.

System Model
In this section, the system model of conventional DFD-SM is first introduced.Based on GDM, we discuss the system model of the proposed scheme.Finally, we discuss the power allocation of the proposed DFD-SM system.

Conventional DFD-SM System
The conventional DFD-SM system is represented in Fig. 1 [6].The system consists of two transmit antennas and N R receive antennas.Let H = h 1 h 2 and N = n 1 n 2 denote the N R × 2 fading channel matrix and the N R × 2 additive white Gaussian noise (AWGN) channel matrix, respectively, where The entries of h i and n i are independent and identically distributed (i.i.d.) complex Gaussian random variables with zero mean and a variance of 0.5 and σ 2 n 2 per dimension, respectively.The transmitted symbols are drawn from a unit M-PSK constellation.The system's average signal-to-noise ratio (SNR) for conventional DFD-SM is therefore defined as γ = 1 σ 2 n .In conventional DFD-SM, communication is carried out block-wise.Two antenna index matrices are defined according to [6] as where φ is the rotation angle to be optimized to achieve transmit diversity.The codebook as seen in Fig. 1, is defined to be the set of M distinct unitary matrices chosen from a cyclic signal constellation whose l th element is of the form , where the parameters of u 1 and u 2 are optimized to achieve full transmit diversity [6].V l consists of a single information carrying M-PSK symbol over the two symbol durations.It can be seen that only a single antenna is activated for each symbol duration.From the analysis provided in [6], it was found that φ = π/4, u 1 = 1 and u 2 = 7 for M = 16, which will be used in this paper.
In the t th block, log 2 (M) + 1 information bits are mapped to (q, l) where q, q ∈ {0, 1}, indicates the selected antenna activation order matrix A q and l, l ∈ {0, 1, . . ., M − 1}, indicates the selected unitary matrix V l [6].A C 2×2 signal matrix S (t) , which encodes the space and time dimensions, is defined as [6] The signal matrix is differentially encoded in a C 2×2 spacetime matrix X (t) as [6] In the first block, the differentially encoded matrix is set as where I 2 represents the 2 × 2 identity matrix, for simplicity.
The received signal in the t th block is given by Assuming quasi-static fading, H (t) = H (t−1) , the ML detector can be derived as [6] ( q, l) = arg max

Proposed DFD-SM System
In this subsection, we extend the power allocation concept of GDM to the conventional DFD-SM system.In [8], the differential detector is considered to have an estimationdetection structure.This implies that the previous received block, Y (t−1) , is used as an estimation to the fading channel matrix, H (t) , in order to coherently detect the information conveyed in the current received block, Y (t) [8].We exploit this property in our new scheme.
We assume each frame contains (K + 1) blocks.K blocks, defined as normal blocks, will convey information.The first block transmitted in a frame, defined as a reference block, will serve as reference to the next K blocks in the frame.The normal and reference blocks are transmitted with unequal power, with more power allocated to the reference block.
The reference (first) block transmitted in a frame is given by where Gaussian random variables with CN (0, σ 2 ref ) distribution in the reference block.Thus, the reference block has an average SNR of γref = 1 This block provides the channel estimation for the next K blocks in the frame.We further denote the K received normal blocks as where X (t) norm = X ref S (t) .The entries of N (t)  norm are also i.i.d.complex Gaussian random variables with CN (0, σ 2 norm ) distribution in the normal block.Thus, the normal block has an average SNR of γnorm = 1 norm , 1 ≤ t ≤ K, we can re-write the received signal in (7) as The ML detector can now be derived similar to [6] as

Power Allocation
In GDM [7], [8], both schemes allocate a very large portion of power to the reference blocks in the frame.In order to ensure an average transmit power, P at the transmitter, the system needs to abide to the following power constraint [7], [8] P 1 + (L − 1)P 2 = LP (10) where P 1 is the power allocated to the reference block, P 2 is the power allocated to the normal block and L is the number of blocks in a frame.The optimal power allocation for each scheme is found by formulating a minimization problem based on (10), as well as the statistics of the modulation scheme.In [7], the power allocation problem is formulated into a function of one variable based on the performance analysis of the system, after which the derivative is taken in order to find the optimal solution, whereas in [8], the Lagrange multiplier method is used to find the optimum power allocation.Similar to GDM, we introduce a type of power allocation to DFD-SM.
Defining the average transmit power constraint (10) in terms of the system's average SNR, γ, for our proposed scheme, we have: γref + K γnorm = (K + 1) γ.We propose a novel re-allocation of power scheme.First, we remove a fraction of power, denoted as α, from each of the normal blocks in the frame.This can be represented mathematically, in terms of the system's average SNR, as γnorm = (1 − α) γ.We then re-allocate this fraction of power from all K normal blocks to the reference block, i.e. γref = (1 + K α) γ.It can be seen that the power allocated to the reference block is greater than that of the normal blocks, so as to improve channel estimation and hence reduce errors.The proposed scheme does not require any information on the statistics of the system and can therefore be applied to any differential modulation scheme.

Optimal Power Allocation and Asymptotic Error Performance Analysis
In this section we find the optimal power allocation for the proposed DFD-SM scheme and then derive an upper bound on the BLER.Following the normal and reference SNRs defined in the previous section, making the variance of noise the subject of the formula, we have If we analyze the received signal in (8), it is obvious that the signal contains two noise elements comprising of two different variances.We define the effective noise variance, which is the coherent equivalent noise variance, as We can find the optimal α by minimizing the noise variance, i.e. taking the derivative of (12) with respect to α and equating it to 0. After some algebraic manipulations, we find that When selecting the value of K, one should be mindful of the practical implementation of the peak to average power ratio (PAPR) required [8], as well as the block duration for which the channel remains constant.
Based on the optimal power allocation, we can derive the effective SNR and thus further derive the asymptotic BLER.The effective SNR can be derived following (12) as γeff = 1 σ 2 eff .This is found to be γeff By providing one reference block with a high SNR, for the K normal blocks, the effective SNR per block is increased.
From [6], the asymptotic upper bound on the BLER for conventional DFD-SM is given by where Λ(u, l, φ) = sin πul M − φ and ∆(u 1 , u 2 , l) = sin πu 1 l M sin πu 2 l M .Substituting ( 14) into (15), the up-per bound for the proposed power scheme is found to be

Low Complexity Detection Scheme
In conventional DFD-SM, the ML detector seen in ( 5), searches through a total of 2M possible combinations made up of all codebook and antenna index elements.The authors in [6] suggest that the high complexity of the detector in the proposed scheme is outweighed by the performance gains of the systems against which it was compared.In DFD-SM, there exists a symmetric relationship between the two symbols contained in each codebook entry.We exploit this relationship and propose a low complexity detection algorithm in this section.In the proposed detection algorithm, we first estimate the received symbols based on the activated antennas.We then estimate which elements of codebook V was received based on the estimated received symbols.Using these estimates, we reduce the number of elements needed to be tested by the ML detector, thereby reducing the complexity of the conventional scheme.The proposed detection scheme comprises of three steps.In the first two steps, we assume A q = A 0 and A q = A 1 respectively and apply our algorithm for each case.In the final step, we choose the most likely solution.
Step 1: Low Complexity Detection for A 0 .Consider a symbol in an M-PSK constellation, of the form We will denote the symbols found at element V l in codebook V for DFD-SM as It can be seen that there exists a relationship between the phase of s b 0 and s p 0 , i.e. if the phase of s b 0 (l) = 2πu 1 l M , the phase of s p 0 (l) = 2πu 2 l M .This relationship can be seen in Fig. 2. Following (2), the signal matrix is After obtaining the received signal in (4), we solve for the estimation of s b 0 (l) and s p 0 (l) as , we can then proceed to find their indices using a modified version of [9] Substituting ( 22) and ( 23) into (17), we can solve for the transmitted symbols.Note that the solutions to ( 22) and ( 23) are of the form lb = mod (u 1 l , M), lp = mod (u 2 l , M), lb , lp , l , l ∈ {0, 1, . . ., M −1} and l and l are the indices of the codebook.In essence, we are finding the index of codebook V, using the estimated symbols index.The received signal is distorted by the effects of fading and AWGN, and as a result the calculated index may lie in the wrong decision region.To improve the error performance of our detection scheme, we utilize a type of nearest neighbor algorithm.As a result, we add and subtract π/M to both Q b and Q p independently in order to more accurately determine its index, i.e.
Thereafter we proceed with using ( 22) and (23) to obtain our estimates.We will denote the two estimates obtained from Qb i as l i and from Qp i as l i , i = 1, 2, respectively.Note that the solutions of ( 22) and ( 23) using ( 24) and (25) respectively, are of the form mod (u 1 l i , M) and mod (u 2 l i , M), respectively.In order to determine the l i and l i solutions, we make use of look-up tables, which can be constructed using Fig. 2. Now if l = l , it can be observed that the estimates of ( 22) and ( 23) lie in the same set.In this case, {l 1 , l 2 } ∈ {l 1 , l 2 } and we only need to test two elements of Codebook V However if l l , we find that the estimates of ( 22) and ( 23) do not lie in the same set.In this case {l 1 , l 2 } {l 1 , l 2 } and we will have to test four elements of Codebook V We store the maximum argument obtained from ( 26) or ( 27) as d 1 .
Step 2: Low Complexity Detection for A 1 .
For A q = A 1 , ( 18) is now modified to include the rotation angle For the case of φ = π/4 and M = 16, we have . This shows that two is just being added to the index of the A 0 case, and is represented in Fig. 3. Following (2), the signal matrix is Equations ( 20) and (21) now become We then carry out all steps performed from ( 22) to (25), using look-up tables constructed from Fig. 3.The ML detector from ( 26) and ( 27) now becomes respectively.We store the maximum argument obtained from (32) or (33) as d 2 .
Step 3: Detection.Now based on d 1 and d 2 , the greater of the two will provide us with our transmitted bits.
The proposed low complexity detection scheme is for conventional DFD-SM [6].The algorithm can be modified to be used with the new power allocation scheme by replacing Y (t−1) with Y ref in the above equations, however for the purposes of this paper, we only analyze it for the conventional scheme.The algorithm is summarized below i. Perform A 0 detection a. Solve for ŝb 0 and ŝp 0 using (20) and (21).b.Solve for l and l from ( 22) and (23).
c. Obtain Qb i by using (24), then solve for l i using (22 d. Obtain Qp i by using (25), then solve for l i using (23 e.If l = l , use the ML detector in (26).Else, use the ML detector in (27).
f. Save the solution of the ML detector as (0, l A 0 ), along with its maximum value as d 1 .
b. Solve for l and l from ( 22) and (23).
c. Obtain Qb i by using (24), then solve for l i using ( d. Obtain Qp i by using (25), then solve for l i using ( e.If l = l , use the ML detector in (32).Else, use the ML detector in (33).
f. Save the solution of the ML detector as (1, l A 1 ), along with its maximum value as d 2 .

Computational Complexity
In this section, we analyse the computational complexity of the proposed scheme and compare it to the optimal detection scheme.We use the concept of computational complexity, as discussed in [10], which is defined as the total number of real-valued multiplications in a given algorithm.
We first derive the computational complexity of the optimal detection scheme found in (5).i.
is a 2 × 2 matrix and needs to be computed once.It requires 4N R complex multiplications to be computed, which equates to 16N R real multiplications.
ii.When A q = A 0 • Each trial of A 0 V l will require 2 multiplications of a complex number by a real number which results in a total of 4 real multiplications.
iii.When A q = A 1 • Each trial of A 1 V l will require 2 multiplications of a complex number by another complex number which results in a total of 8 real multiplications.
iv. Computing Y (t) H Y (t−1) A q V l will require 4 complex multiplications, resulting in 16 real multiplications.
v. The trace and real operations require no multiplications.
The computational complexity for A 0 is Eliminating all common steps between C ML1 and C ML2 , the total computational complexity of the optimal detector for conventional DFD-SM is For the derivation of the proposed detection scheme's computational complexity, we consider the steps for A 0 , unless otherwise stated.i. Computing both ŝb 0 and ŝp 0 requires a total of 2N R complex multiplications, translating to 8N R real multiplications.
ii.The calculation of l and l requires 6 real multiplications in total.This is broken down into 2 real multiplications for computing both Q b and Q p , and the modulo operation for finding both lb and lp requires a total of 2 × 2 real multiplications.We treat 2π/M as a constant.l and l are found from lb and lp respectively, using look-up tables and hence require no multiplications.
iii.The addition and subtraction of π/M (treated as a constant) to Q b and Q p (computed in step ii.) requires no multiplications.Solving for l 1 , l 2 , l 1 and l 2 using the modulo operation requires a total of 4 × 2 real multiplications.l 1 , l 2 , l 1 and l 2 can be obtained by using look-up tables and hence involve no real multiplications.

Detection scheme
Real-valued multiplications ML Detector (5) 16N R + 44M Proposed Detector 32N R + 28 + 44µ , µ ∈ {6, 8} Tab. 1. Complexity order for DFD-SM detection schemes.and never 4, as a result µ ∈ {6, 8}.We will assume that µ = 6 in the high SNR region, while µ = 8 will be in the low SNR region to aid discussion, although this will not always hold true.The computational complexity of both schemes have been summarized in Tab. 1.
Figures 4 and 5 show the computational complexity of the detection schemes as a function of N R and M, respectively.It can be seen from Fig. 4, that our proposed scheme is highly dependent on N R and the computational complexity increases as N R increases for a 16-PSK constellation.However, for a practical number of receive antennas, it can be seen that the proposed scheme still requires fewer real multiplications as compared to the ML detector.Since the proposed algorithm reduces the search space of the optimal detector to either six or eight estimates, it is recommended that proposed detection scheme be used for constellations of order M ≥ 16. Figure 5 highlights the fact that the proposed detection scheme is independent of the constellation's size M, as the computational complexity of 16,32,64,128-PSK constellations are found for N R = 2.For M = 16 and N R = 1, our algorithm demonstrates a 55% reduction in computational complexity in the high SNR region as compared to the optimal detector, and a 43% reduction in the low SNR region.For M = 16 and N R = 2, a 52% reduction is realized in the high SNR region, and a 40% reduction in the low SNR region.

Simulations
For the simulations, we assumed a quasi-static Rayleigh fading channel.The simulations were performed for one and two receive antennas.Firstly we compared the new power allocation system against conventional DFD-SM in Figures 6  and 7.The upper bound derived in (16), as well as DFD-SM with coherent transmission/detection are also included in Figures 6 and 7. We choose a frame length of K = 100 and K = 500 for comparison.The BLER is plotted against the average SNR γ (in dB) for the proposed scheme.At a BLER = 10 −4 , we see that the proposed scheme outperforms the conventional scheme by approximately 2 dB and is shown to be 1 dB behind that of the coherent scheme for K = 500.The proposed scheme is seen to obtain a gain of about 0.4 dB when the frame length, K, is increased from 100 to 500.Since α is a function of K, it can be seen that as the frame length increases, the power allocated to the reference block increases.This provides better channel estimation for the normal blocks, and thus better error performance.For a large enough K, the proposed scheme can approach the performance of coherent transmission/detection.The bound of ( 16) is observed to be tight at high SNR.
We next verify that α opt found in (13) allows for optimal error performance.Using (13), we have α opt = 0.0409 for K = 500 and α opt = 0.0818 for K = 100.Fig. 8 contains a plot of P BLER new (16) as a function of α, at γ = 30 dB for N R = 1 and γ = 20 dB for N R = 2 respectively.From Fig. 8, we observe that the BLER is a minimum when α = α opt .Finally we compare the performance of our proposed low complexity (LC) detection scheme for conventional DFD-SM against the optimal detector found in (5) in Fig. 9.It is observed that the proposed algorithm provides near-ML performance throughout the entire SNR range.

Conclusion
In this paper we have provided a new power allocation scheme for DFD-SM, based on GDM.The optimal power allocation and theoretical upper bound on the BLER were derived.It was shown that the proposed scheme outperforms the conventional scheme and closes the gap between conventional differential detection and coherent detection.A low complexity detection scheme for conventional DFD-SM was also introduced.The computational complexity of the optimal detector and proposed detector were presented, with the proposed scheme providing approximately a 55 % and 52 % complexity reduction for one and two receive antennas, respectively.Numerical simulations show that the proposed scheme provides near-ML performance throughout the entire SNR range.