Uplink Non-Orthogonal Multiple Access With Golden Codeword Constellation

Non-orthogonal multiple access (NOMA) is a technique to improve spectral efficiency. In uplink NOMA (UL-NOMA) systems, mobile multiusers are globally synchronized to share the same time and frequency resources, and transmit their own independent symbols to the base station (BS). This paper proposes an UL-NOMA system with Golden codeword constellation (GCC). In the proposed UL-NOMA system, two users, the center user and the edge user, transmit their own independent Golden codewords to the BS. Compared to the conventional UL-NOMA systems, the proposed UL-NOMA system not only preserves the spectral efficiency, but also improves error performance. The fast essentially maximum likelihood (FE-ML) detection with dynamic signal detection subset (DSDS) is proposed to decode the Golden codewords. A lower bound on error performance for both the center user and edge user is further derived. Simulation results show that the derived lower bound well predicts the error performance of UL-NOMA with GCC. Simulation results also show that the proposed UL-NOMA system outperforms the conventional UL-NOMA system by at least 2 dB signal-to-noise ratio (SNR) for both the center user and edge user at a bit error rate of $2 \times 10^{-5}$ . Finally, both complexity analysis and simulation results show that the proposed FE-ML with DSDS result in a 68% complexity reduction compared to the FE-ML with SDS at an SNR of 23 dB for the center user transmitting 64QAM and the edge user transmitting 16QAM.


I. INTRODUCTION
One of the challenges in the 5th and beyond-5th generation wireless communication systems is the requirement for very high data transmission rates [1], [2]. One of the techniques to improve data transmission rate is multiplexing. In general, multiplexing may be implemented in orthogonal multiple access (OMA) wireless communication systems. In OMA wireless communication systems, multiple users are assigned different radio resources, for example, frequency or time. However, radio resources are getting scarcer. Recently, extensive research has been focused on another technique to improve data transmission rate, non-orthogonal multiple access (NOMA). The NOMA technique can be employed in both downlink and uplink wireless communication systems and are denoted, respectively, as downlink NOMA (DL-NOMA) systems and uplink NOMA (UL-NOMA) systems. In DL-NOMA systems, all mobile users share the same radio resources. Symbols which will be transmitted to The associate editor coordinating the review of this manuscript and approving it for publication was Rongbo Zhu . different users are multiplexed by the use of superposition coding at the base station (BS). At the receiver, different users detect their own signal either by treating the other user's signals as interferences or by the use of successive interference cancellation (SIC) [3]. In UL-NOMA systems, all users share the same time-frequency resources to transmit their signals to the BS. At the receiver of the BS, either the SIC or joint maximum likelihood (ML) detection is employed to detect different users' signals [4].
Since NOMA provides high spectral efficiencies, it has recently become a hot topic in the research area of the wireless communications. Similar to the conventional wireless communication systems, reducing signal detection complexity and improving error performance for different users are also two of the research areas in NOMA systems. In this paper, we are mainly interested in these two areas for UL-NOMA systems.
Diversity is one of techniques to improve error performance in wireless communication systems. In the literature, diversity gain is achieved in cooperative UL-NOMA systems. For example, a relay-aided UL-NOMA system is proposed in [5]. In [5], an Alamouti codeword matrix of the transmitted symbols is generated at the receiver, which achieves a diversity gain of two. In [6], the space-time line code has been applied to the UL-NOMA to achieve spatial diversity without channel state information (CSI) at the receiver. However, it is assumed that the CSI is known at the transmitter. Spatial diversity is achieved through precoding at the transmitter in [6]. Spatial diversity can also be achieved in the Golden code, which is a linear dispersion space-time block code (LD-STBC) with two transmit antennas [7], [8]. The Golden code is a full-rate and full-diversity STBC code, where the rate is defined as the number of transmitted symbols per antenna use per transmission time slot. In the Golden code system, the transmitted symbols are Golden codewords, not the conventional M -ary quadrature amplitude modulation (M QAM) symbols or M -ary phase shift keying (M PSK) symbols. Golden codewords have been applied in single-input multiple output (SIMO) systems to improve error performance [9]. In this paper, we also apply the Golden codewords to the UL-NOMA systems to improve error performance. The proposed system is named as UL-NOMA with Golden codeword constellation (GCC). Compared to the conventional UL-NOMA system, the proposed UL-NOMA system with GCC also has the same spectral efficiency as the conventional UL-NOMA system.
We only take into account two mobile users in the proposed UL-NOMA system. But the two user UL-NOMA system can be directly extended to the multiuser UL-NOMA system. The two mobile users are the center user and the edge user, which are denoted as User c and User e. Both User c and User e are globally synchronized to transmit their own symbols to the BS. Each user in the proposed UL-NOMA system transmits one pair of Golden codewords in two transmit time slots. In the proposed UL-NOMA system, it is assumed that the modulation order of User c and User e may be different. Let M c and M e denote the modulation orders of User c and User e, respectively. Since the two pairs of transmitted Golden codewords contain four complex-valued symbols, the ML detection complexity at the BS is proportional to O(M 2 c M 2 e ), which is extremely high. Hence, reduced complexity detection algorithms are required for applications of the proposed UL-NOMA with GCC. At the BS, the received signal model for the proposed UL-NOMA system is the same as the conventional Golden code system in the fast fading channels. Here the fast fading channel means that each channel coefficient only lasts one time slot. The channel coefficient takes another independent complex value in the next time slot. Hence, all signal detection algorithms for the conventional Golden code can be applied to detect the signals for the proposed UL-NOMA with GCC. Reduced complexity detection algorithms for the conventional Golden code have been studied in the literature. There are two main types of detection algorithms for the Golden codes, the fast essentially ML (FE-ML) and sphere decoding (SD). The FE-ML detection algorithm was proposed in [10]. Recently, two new reduced complexity detection algorithms have also been proposed in [11]. These two detection algorithms are the FE-ML with signal detection subset (SDS) and the SD with SDS. LetL c andL e be the average cardinalities of SDSs for User c and User e, respectively. If the FE-ML with SDS is applied in the proposed UL-NOMA system, the detection complexity of the FE-ML with SDS is O(2 ×L cLe ),L c < M c andL e < M e , which is much smaller than the ML detection complexity O(M 2 c M 2 e ). The detection complexity of the SD with SDS is also greatly reduced compared to the conventional SD. Very recently, the SD with sorted SDS (SSDS) has been proposed [12]. The detection complexity of the SD with SSDS is even smaller than the SD with SDS. In this paper, we only investigate the FE-ML with SDS to reduce the detection complexity for the proposed UL-NOMA system.
The detection complexity of the FE-ML with SDS mainly depends on the average cardinality of the SDS. The construction of the SDS is based on the estimated complex signal. In order to further reduce the detection complexity, in this paper, an FE-ML with dynamic SDS (DSDS) is proposed. The DSDS is constructed based on the received in-phase and quadrature components, not the estimated complex signal. Compared to the SDS in [11], the average cardinality of the DSDS reduces as the signal-to-noise ratio (SNR) increases. Thus the detection complexity of the proposed FE-ML with DSDS algorithm is further reduced as SNR increases.
In UL-NOMA systems, the error performance analysis has been presented in the literature. For example, the exact closed-form bit error probability (BEP) expression for an UL-NOMA system with QPSK in the presence of additive white Gaussian noise (AWGN) was derived in [13]. The error performance of the UL-NOMA system with a single receive antenna at the BS was studied for the fading channel in [3]. An upper bound on the error performance for the UL-NOMA with joint ML detection was derived in [4].
Since the received signal model for the proposed UL-NOMA system is identical to that of the conventional Golden code system, the error performance analysis of the Golden code may also be used to analyze the error performance of the proposed UL-NOMA system. The lower bound on the error performance for the conventional Golden code has been derived in [11]. Thus, in this paper, the lower bound on the error performance will be adopted to analyze the error performance for the proposed UL-NOMA system with GCC.
Based on the above discussion, the main contributions of this paper are summarized as: • The UL-NOMA system with GCC is proposed. Compared to the conventional UL-NOMA system, the proposed UL-NOMA system not only preserves the spectral efficiency, but also improves the error performance.
• The theoretical lower bound on the average BEP (ABEP) of each user is derived for the proposed UL-NOMA system.
• The FE-ML with DSDS is proposed to further reduce the detection complexity. VOLUME 9, 2021 The remainder of this paper is organized as follows: in Section II, the system model of the proposed UL-NOMA with GCC is presented. In Section III, the theoretical lower bound on the ABEP of each user is formulated for the proposed UL-NOMA system with GCC. The FE-ML with DSDS is presented in Section IV. Simulation results are demonstrated in Section V. Finally, the paper is concluded in Section VI.
Notation: Bold lowercase and uppercase letters are used for vectors and matrices, respectively. [·] T , (·) H , |·| and · F represent the transpose, Hermitian, Euclidean and Frobenius norm operations, respectively. D(·) is the constellation demodulator function. (·) −1 is the inverse. E{·} is the expectation operation. j is the complex number

II. SYSTEM MODEL
The key component of the proposed UL-NOMA system is the Golden codeword. In this section, we briefly present the concepts of the Golden code and the Golden codeword, and then describe the proposed UL-NOMA system in detail.

A. THE GOLDEN CODE
The Golden code is an LD-STBC with two transmit antennas and two or more receive antennas [7]. The Golden code achieves full-rate and full-diversity. Its encoder takes four complex-valued symbols and generates four super-symbols. The Golden code transmission matrix is given by [7]: where In (1), there are four super-symbols, 1 In this paper, we refer to these super-symbols as the Golden codewords. There are two pairs of super-symbols 1 ) in the conventional Golden code. In this paper, only the pair of super-symbols 1 is applied in the proposed UL-NOMA system. Since only the amplitudes, not the phases, of α andᾱ affect the average transmit signal power, and further affect the error performance of the proposed system, we use 1

B. SYSTEM MODEL OF THE UL-NOMA SYSTEM WITH GCC
In this paper, we consider an UL-NOMA system with two users, which has been documented in the literature [3]. The UL-NOMA system consists of one BS with N r receive antennas and two mobile users, User c and User e in a cell. Both User c and User e are each equipped with a single transmit antenna. User c is at the center of the cell, while User e is at the edge of the cell. Relatively, User c has a larger average channel gain, while User e has a smaller average channel gain.
User c and User e in the UL-NOMA system share the same time and frequency resources. It is assumed that both User c and User e are globally synchronized to transmit their individual symbols to the BS. Both User c and User e also have their individual transmit power constraints. The received signal vector y i ∈ C N r ×1 at the BS is given by: (2) where h u,i ∈ C N r ×1 is the channel fading coefficient vector between the BS and User u, where u ∈ [c, e]. P u is the average transmit power for User u. n i ∈ C N r ×1 is the AWGN vector.
For convenience of the following discussion, let y i = The entries h k u,i and n k i of h u,i and n i are independent and identically distributed (i.i.d.) complex Gaussian random variables (RVs) distributed as CN(0, σ 2 u ) and CN(0, σ 2 n ), respectively. It is assumed that the channel coefficient h k u,i is fast fading. The fast fading means that each channel coefficient h k u,i only lasts one time slot, and takes another independent value in the next time slot.
Let G,u be the signal set of the Golden codewords for User u. In (2), x c,i and x e,i are the modulated Golden codewords to be transmitted to the BS, where x c,i ∈ G,c and x e,i ∈ G,e . For User u, x u,i is given by: In (3.1) and (3.2), s i u ∈ u , where u is the signal set of QAM/PSK with modulation order M u . In this paper, we only consider square QAM. In order to achieve reliable communications for User e, we consider M e ≤ M c because User e experiences severe channel conditions. In the system model, it is also assumed that E{|s i c | 2 } = E{|s i e | 2 } = 1. It is also seen that E{|x u,i | 2 } = 1. Similar to the discussion in [13], the instantaneous SNR of branch k for User u in (2), is defined as (2), the received signal given by (2) may be rewritten as: 1 (s 1 e + s 2 e θ )} + n 1 , (4.1) The received signal model given by (5) will be used to discuss the FE-ML with SDS in Section IV.
The received signal given by (4.1) The received signal model given by (6) will be used to discuss the QR decomposition based signal detection in Section IV.

III. LOWER BOUND ABEP ANALYSIS OF THE UL-NOMA SYSTEM WITH GCC
The error performance of the UL-NOMA system with QPSK modulation has been discussed in [3], [4] and [13]. To the best of the authors' knowledge, the error performance analysis of the UL-NOMA system with high-order QAM and PSK modulations has not been reported in the open literature. The received signals in (2), show that two pairs of Golden codewords are transmitted during two time slots. Alternatively, the received signals in (2), contain four QAM symbols transmitted during two time slots. User c transmits two QAM symbols s 1 c and s 2 c , while User e also transmits two QAM symbols s 1 e and s 2 e . Actually, the received signal model for the proposed UL-NOMA system is identical to that of the conventional Golden code system. The ABEP for the conventional Golden code has been analyzed in [11]. Hence, in this section, the lower bound on the ABEP in [11] will be adopted to analyze the ABEP for the proposed UL-NOMA system with GCC.
For the error performance analysis, (2) is rewritten as:  [11], proves that the bounded pairwise error probability (PEP) of the Golden code may correspond to the assumption that at high SNR only one pair of Golden codewords is detected with errors, while the other pair of Golden codewords is detected correctly. This paper also uses Appendix B in [11] to derive the ABEP for the proposed UL-NOMA system with GCC.
Suppose that the pair of Golden codewords {x c,1 , x c,2 }, is detected with errors, while the other pair of Golden codewords {x e,1 , x e,2 }, is detected correctly. Then (7.1) and (7.2) may be simplified as: Let u ∈ [c, e]. In general, the equivalent received signals to derive the lower bound on ABEP for User u can be written as: Appendix B in [11] further proves that the bounded PEP may correspond to the assumption that at high SNR only one QAM symbol in a Golden codeword pair is detected with errors, while the other QAM symbol in the Golden codeword pair is detected correctly.
Suppose s 1 u is detected with errors, while s 2 u is detected correctly, then (11.1) and (11.2) may be simplified as: where Similarly, suppose s 2 u is detected with errors, while s 1 u is detected correctly. Taking into account |α|θ = |ᾱ| and |ᾱ|θ = −|α| we have: where g 1 2 = 1 . The equivalent models of error performance analysis in either (12) or (13), can be regarded as the transmission of either s 1 u or s 2 u over non-identical fading channels with fading variances σ 2 1 σ 2 u and σ 2 2 σ 2 u , respectively. Hence, the maximal ratio combining (MRC) technique with non-identical fading channels [14] can be used to derive the error performance of the above equivalent model.
Based on the exact symbol error probability of M QAM in (8.10) in [14], and the approximated expression of the Gaussian Q-function using the trapezoidal rule, the lower bound on the ABEP of the User u in the proposed UL-NOMA systems may be derived as: where n ≥ 10 is the number of partitioning intervals in this algorithm of numerical integration.γ = P u , a = 1 − 1

IV. LOW COMPLEXITY DETECTION ALGORITHMS FOR THE UL-NOMA SYSTEM WITH GCC
The optimal detection for the UL-NOMA with GCC is the ML detection. However, the complexity of the ML detection is proportional to O(M 2 c M 2 e ). One of the near-ML error performance detection schemes is the FE-ML with SDS [11]. The SDS is constructed based on the nearest neighbors of the estimated complex signal. In order to further reduce the detection complexity for the FE-ML with SDS, a new approach is proposed to construct the SDS with small cardinality. The SDS is constructed based on the in-phase and quadrature components of the received complex signal. The received signal based SDS is dynamic. In this section, the QR decomposition based signal detection is firstly presented, then the FE-ML with SDS is described, finally the proposed approach is explained to construct the SDS with small cardinality.

A. QR DECOMPOSITION BASED SIGNAL DETECTION
The QR decomposition based signal detection has been documented in the literature, which is presented below. In the detection of the Golden codewords transmitted by User c and User e, we assume that the CSI is fully known at the BS. Based on the complex QR decomposition of G in (6), we have: where Q ∈ C 2N r ×2N r is a unitary matrix and R ∈ C 2N r ×4 , R = R 1 R 2 T , where R 2 is a zero matrix with (2N r − 4) × 4 dimension and R 1 is an upper-triangular matrix with 4 × 4 non-negative real diagonal elements which is given by: Substituting (16) into (6), and multiplying both sides by Q H , we have: In general, we have: Again, v k is equivalent to the noise corrupted s k . The estimation of s k , k ∈ [1 : 3], is given by: The detection complexity of the above QR decomposition based signal detection is low. However, the error performance is worse. In the next subsection, the estimateds k will be used to construct the SDS for the FE-ML with SDS scheme.

B. THE FE-ML WITH SDS
The FE-ML with SDS has been described in [11], which is briefly described below.
The FE-ML with SDS is based on the received signal model given by (5). Ignoring noise N in (5), the pair S 1 of symbols can be estimated, given the pair S 2 of symbols: Alternatively, the pair S 2 of symbols can be estimated, given the pair S 1 of symbols: The complexity of (22) or (23) ) and (21). The SDS has been defined in [11], which is given in the following Definition 1. Definition 1: Given an i th symbol s i , an i th SDS is defined as (s i , δ) = s j , |s j − s i | 2 ≤ δ, j ∈ [1 : M u ]. For M u ≥ 16, the detection complexity in (22) or (23) can be greatly reduced if the SDS is used to replace the entire signal set.
The FE-ML with SDS is summarized as follows: Initialization: Construct an SDS for each symbol s i , i ∈ [1 : M u ] given δ.

C. THE FE-ML WITH DSDS
In the above FE-ML with SDS, the SDS is constructed based on the estimated symbols i , i ∈ [1 : 4]. Based on Definition 1 the SDS is constant for a given δ. In this subsection, we will present the method for constructing a DSDS and then present the FE-ML with DSDS.
Actually, the SDS can also be constructed based on the real and imaginary parts of the received v k , k ∈ [1 : 4] given in (18) and (20). As discussed in the QR decomposition based signal detection in Section IV.A, v k is equivalent to the noise corrupted s k , which is given by: where w k is the equivalent noise term.
Let v k = v I k + jv Q k , s k = s I k + js Q k and w k = w I k + jw Q k . Based on (26), we have: where p ∈ [I , Q]. In (27), the probability for s p k − ε ≤ v p k < s P k + ε actually is larger at high SNR, where ε > 0. We can partition the received signal range into more segments to construct in-phase and quadrature detection subsets. Thus, the cardinality of the complex SDS is reduced. Hence, the detection complexity of the FE-ML with DSDS will be further reduced. Now we use 4 amplitude shift keying (4ASK) modulation as an example to explain the construction of the in-phase and the quadrature detection subsets. The signal partitions for the received 4ASK signals are shown in FIGURE 1. FIGURE 1 shows that the received 4ASK signal is partitioned into 7 segments. Based on the 7 segments, the in-phase and quadrature SDSs are given by: Finally, based on I 4ASK (v I k , ε) and Q 4ASK (v Q k , ε), it is easy to construct the complex SDS 16 (v k , δ).
Clearly, the SDS proposed in this section is dynamic. Lastly, the steps of the FE-ML with DSDS are exactly the same as the FE-ML with SDS except for different SDS. The SDS in the FE-ML with SDS is based on the estimated signal, while the SDS in the FE-ML with DSDS is based on the received signal.

D. COMPLEXITY ANALYSIS OF THE FE-ML WITH DSDS
In this subsection, we analyze the detection complexity of the FE-ML with DSDS. The detection complexity is formulated in terms of floating point operations (flops) [15]. The FE-ML with DSDS algorithm includes three steps. We analyze the computational complexity for each step.

1) COMPUTATIONAL COMPLEXITY OF STEP 1
In Step 1, we need to perform G = QR, the QR decomposition in (15). The computational complexity of QR decomposition has been shown in [15]. In (6), G is a 2N r × 4 matrix. The QR decomposition is done by applying the Householder unitary transformations, which requires 2 × 4 2 (2N r − 4 3 ) complex operations. In Step 1, we also need to compute Z = Q H Y in (17). Q is a 2N r × 2N r matrix and Y is of 2N r × 1 dimension. This requires 2N r × (2N r ) 2 multiplications and 2N r × (2N r ) × (2N r − 1) additions. The overall number VOLUME 9, 2021 of flops for Step 1 is: 2) COMPUTATIONAL COMPLEXITY OF STEP 2 Since the constellation demodulation function represents a one-to-one mapping, we ignore the computational complexity of constellation demodulation function in this paper. Estimations ofs 1 ,s 2 ,s 3 ands 4 cost 1, 3, 5 and 7 flops, respectively. The overall number of flops for estimating s i , i ∈ [1 : 4] is: 3) COMPUTATIONAL COMPLEXITY OF STEP 3 In this step, we need to compute (24.1), (24.2) and (25.1), (25.2). As earlier, we ignore the computational complexity of constellation demodulation function. Since the computation of (24.1) and (24.2) is identical to the computation of (25.1) and (25.2). We only need to compute (24.1) and (24.2). The computational complexities of (G H 2) have been analyzed in Table 1 and Table 2 in [11]. The number of flops Definē L =L cLe ,L c andL e are the average cardinality for User c's SDS and User e's SDS, respectively. The number of flops of (24.2) is (20N r − 1)L. The overall flops of Step 3 is: Suppose the SDS only contains the nearest neighbors, the average cardinalityL c orL e is 3, 4 and 4.5 for 4QAM, 16QAM and 64QAM, respectively. However, the proposed new SDS is dynamic. Given a modulation order, the averagē L becomes smaller as the SNR increases. We will find the averageL through simulation. Finally, the overall number of complex operations per detected symbol imposed by the FE-ML with SDS or DSDS is:      for the (c:64QAM, e:16QAM) UL-NOMA system in the FE-ML with DSDS, which are shown in FIGUREs 2 and 3.
From FIGUREs 4 and 5, it is evident that the detection complexity of the FE-ML with SDS is almost constant, but the detection complexity of the FE-ML with DSDS reduces as SNR c increases. This is becauseL decreases as SNR c increases.
We define the percentage of complexity reduction for the FE-ML with DSDS compared to the FE-ML with SDS [11] as: where µ SDS FE−ML and µ DSDS FE−ML are the overall number of complex operations for the FE-ML with SDS and DSDS, respectively.
From FIGUREs 4 and 5 it is found that β = 55% for the (c:16QAM, e:4QAM) UL-NOMA at 16 dB and β = 68% for the (c:64QAM, e:16QAM) UL-NOMA at 23 dB.    From FIGUREs 6 to 9, it is easily found that the proposed NOMA system with GCC achieves diversity compared to the conventional UL-NOMA system. At a BER of 2 × 10 −5 , both User c and User e in the proposed UL-NOMA, achieve at least 2 dB gain compared to the conventional NOMA system. The proposed FE-ML with DSDS achieves the error performance of the FE-ML with SDS. But the detection complexity of the FE-ML with DSDS is much lower compared to the FE-ML with SDS. Finally, it is also observed that the error performance lower bounds based on (14) very well predicts the simulation results.

VI. CONCLUSION
In order to improve error performance of the UL-NOMA system, in this paper, the UL-NOMA with GCC was proposed. The proposed UL-NOMA with GCC not only preserves the spectral efficiency, but also improves error performance because the UL-NOMA with GCC achieves diversity. Simulation results showed that both User c and User e in the proposed UL-NOMA with DSDS, achieve at least 2 dB gain at a BER of 2 × 10 −5 . Since the received signal model at the BS was identical to the received signal model of the conventional Golden code, the FE-ML with SDS was also investigated. Consequently, a new FE-ML with DSDS was also proposed. The detection complexity reduces as the SNR increases. For example, in the proposed (c:16QAM, e:4QAM) UL-NOMA system with GCC, the proposed FE-ML with DSDS and four receive antennas results in 55% complexity reduction compared to the FE-ML with SDS at 16 dB.