M-Ary QAM Asynchronous-NOMA D2D Network With Cyclic Triangular-SIC Decoding Scheme

The complexity of successive interference cancellation at the receiver’s end is a challenging issue in conventional non-orthogonal multiple access assisted massive wireless networks. The computational complexity of decoding increases exponentially with the number of users. Further, under realistic channel conditions, a synchronous non-orthogonal multiple access scheme is impractical in the uplink device-to-device communications. In this paper, an asynchronous non-orthogonal multiple access-based cyclic triangular successive interference cancellation scheme is proposed for a massive device-to-device network. The proposed scheme reduces the decoding complexity, energy consumption, and bit error rate of a superimposed signal received in an outband device-to-device network. More specifically, the scheme follows three consecutive stages; optimization, decoding, and re- transmission. In the optimization stage, a dual Lagrangian objective function is defined to maximize the number of data symbols decoded at the receiver by determining an optimal interference cancellation triangle, under the co-channel interference and data rate constraints. In the decoding stage, the data in the optimal interference cancellation triangle is decoded using a conventional triangular successive interference cancellation technique. Next, the remaining users’ data are decoded in sequential iterations of the proposed scheme, using the retransmissions from such users. Utilizing the successive interference cancellation characteristics, the performance of the proposed device-to-device network is defined in terms of energy efficiency, bit error rate, computational complexity, and decoding delay metrics. Moreover, the performance of the proposed decoding scheme is compared with the conventional triangular successive interference cancellation decoding scheme to demonstrate the superiority of the proposed scheme.


I. INTRODUCTION
The forthcoming beyond-5G/6G network is expected to provide reliable, error-free, and energy-efficient services with ubiquitous connectivity [1], [2], [3]. On the other hand, the usage of new wireless devices is also increasing mobile traffic The associate editor coordinating the review of this manuscript and approving it for publication was Xiaofan He . by thousand times in comparison to the existing networking systems [4]. In this context, widely used 5G Orthogonal Multiple Access (OMA) schemes such as Orthogonal Frequency Division Multiple Access (OFDMA) [5] and Code Division Multiple Access (CDMA) [6] can support a large number of devices by utilizing the existing network resources effectively. Moreover, in OFDMA, the complete bandwidth of the channel is allocated to all the sub-carriers in such a manner that the interference between each sub-carriers can be minimized. Since the number of sub-carriers within the bandwidth is limited, OFDMA suffers from spectrum limitation issues in public networks.
Non-Orthogonal Multiple Access (NOMA) scheme is included in 3GPP as a multiple access scheme for B5G/6G [7]. This has the potential to enhance the spectral efficiency, by allowing multiple users to simultaneously access the same subcarrier Resource Blocks (RBs) [8]. The fundamental concept of NOMA is based on superposition coding and successive decoding where the superposition coding of multiple users' signals can be done over the same subcarrier with different power levels that enables the receiver to decode the signal by using the Successive Interference Cancellation (SIC) technique [9], [10]. At the receiver, the SIC treatment is applied to decode the data of the user having the strongest channel conditions up to the last user in descending order of Signal to Noise Ratios (SNRs). Once the strongest signal is directly decoded, the detected data is passed through to an iterative SIC algorithm. Next, the strongest signal is reconstructed based on the prior knowledge of Channel State Information (CSI) and the modulation scheme used for transmission. Finally, the reconstructed signal is subtracted from the received superimposed symbol to reduce its interference and increase accuracy in decoding the rest of the user signals. However, the SIC decoding scheme is applicable only for time-synchronous transmissions that limits the NOMA scheme performance, due to the negligence of the added interference from the overlapping symbols [7], [8], [9], [10], [11], [12], [13], [14].
In D2D networks, the users are geographically distributed and their respective signals propagate over different paths that encounter distinct channel effects [15]. As a consequence, all signals arrive at the receiving terminal with varying time offsets and hence, time-synchronous data reception is not possible at the receiver terminal of a D2D network [16]. In [17], an Asynchronous NOMA (A-NOMA) scheme has been proposed under the consideration that the interference is induced by co-channel during the decoding process of the desired user's data symbols at the receiver terminal. Therefore, the studies in [11], [17], [18], [19], and [20] have illustrated that the use of the A-NOMA scheme can improve the decoding ability of the receiver terminal as well as enhance the D2D network performance. Such studies claim that the asynchrony between user signals can in-fact enhance spectral efficiency. In [21], it has been proven that the asynchrony in the transmission can enhance the signal detection at the receiver terminal in an uplink A-NOMA scheme even under equal-power asynchronous transmissions. Moreover, in [22], it has been defined that the optimal mismatch between user signals in an uplink A-NOMA can enhance the throughput and energy efficiency. Authors in [18] have proposed and experimentally implemented an A-NOMA scheme for an uplink optical access scenario, which has a higher Bit Error Rate (BER) reduction of one order magnitude than the synchronous NOMA schemes. Moreover, in [19], an uplink A-NOMA with a sufficiently large data frame length has been shown to outperform synchronous NOMA in terms of sum throughput.
Other than spectral efficiency and reliability metrics, one of the main targets of future 6G D2D networks is to optimize its' Energy Efficiency (EE) [1], [3], [23], [24], [25]. Nowadays, massive Device-to-Device (D2D) networks are in demand due to their key merits namely i) infrastructure-free network, ii) ability to operate with limited resources and iii) ubiquitous connectivity to the end-users. Indeed, a massive NOMAassisted D2D network conventionally suffers from high complexity during the decoding phase [26], [27]. Moreover, the authors in [28] have shown that there is an increase in decoding complexity in the NOMA scheme when the number of users increases for more than three. Thus, an increment incur high complexity that reduce the energy efficiency of the D2D networks. The energy consumption at mobile devices can be higher, particularly in energy limited D2D communication scenarios such as emergency disaster scenarios, since such devices cannot cope flexibly with energy consuming computations [2], [29]. Additionally, limited resources for allocation, complex SIC decoding procedure can be observed in NOMA based massive D2D networks. Hence, it was noted that, a reliable and an efficient A-NOMA for massive D2D networks is understudied.
Recently, in [17], an iterative signal processing based Triangular-SIC (T-SIC) technique has been proposed to enhance the spectral efficiency of A-NOMA uplink transmissions. Due to the triangular pattern used for decoding symbols in T-SIC, the interference caused by multiple asynchronous data symbols on each desired user's symbols is considered to enhance the symbol detection and residual interference cancellation in D2D network. Therefore, the T-SIC technique can also be exploited to optimize the decoding ability and spectrum efficiency of the A-NOMA D2D network. To the best of authors' knowledge, it is yet not been used in D2D networks to date. The overall objective of this paper is to investigate and optimize the performance of massive A-NOMA D2D transmissions under a modulation such as M -ary QAM. The fourfold contributions of this work are summarized as follows: • A novel binary optimization algorithm is proposed to decide the optimal combination of data symbols in the received superimposed signal to be decoded under maximum co-channel interference and minimum data rate constraints. The corresponding optimization problem is a hard constraint problem due to its binary optimization variables and non-linear constraints. Hence, such binary variables and non-linear constraints are reformulated to form continuous and linear constraints, and a Lagrangian dual objective function is formed. Then, this can be solved efficiently by applying the Lagrangian dual algorithm.
• Moreover, a new Cyclic T-SIC scheme is proposed to ensure the decoding of each user's data in consecutive iterations, considering also the retransmissions by the users whose data were not yet decoded in a prior iteration of optimization. Moreover, the EE and BER performance of the Cyclic T-SIC is shown to be significant in comparison to Conventional T-SIC (Conv T-SIC).
• The optimal trade-off between the BER and EE of the proposed Cyclic T-SIC is studied, and compared with conventional Conv T-SIC. The Cyclic T-SIC achieves a higher EE and reduced BER with a lower transmit SNR, lower received power ratio, and higher asynchrony among user signals as compared to a competitive scheme, Conv T-SIC.
• The computational complexity of the proposed Cyclic T-SIC over n iterations is less than that of the Conv T-SIC scheme when the error tolerance for algorithm termination, ϵ, is in the order of 10 −1 magnitude. Further, the total simulation delay is shown to be lesser in Cyclic T-SIC than Conv T-SIC due to its reduced computational complexity. The remaining of this paper is organized as follows. In Section II, the existing A-NOMA T-SIC scheme is introduced, and in the Section III the received signal structure and the iterative signal processing at the receiver are presented. Section III-A presents the problem formulation and the proposed Cyclic T-SIC scheme is presented in Section III-B. Next, the performance analysis is given in Section IV. Furthermore, a numerical result analysis is presented in Section V. Finally, conclusions are drawn with remarks in Section VI.

II. SYSTEM MODEL
In this section, a D2D A-NOMA signal model and its preliminaries are presented. Hence, an A-NOMA assisted D2D network with k ∈ K geographically distributed transmitting users, where K is the maximum number of transmitters, and one receiving terminal, R X , within a small neighborhood is considered as shown in Figure 1. It is assumed that there are K NOMA users sharing each subcarrier with N subcarriers, and K ≥ 1. It is assumed that the transmitted signals are not aligned at the receiver, and hence the channel is symbol asynchronous. Therefore, a power domain A-NOMA schemeassisted decoding method is applied in our proposed work where the waveform technique used is based on Orthogonal Frequency Division Multiplexing (OFDM) [12], [16], [17], [19], [30], [31].
Moreover, due to the timing offset between users in A-NOMA, inter-carrier interference (ICI) can occur and the resultant OFDM frequency components can get distorted. Such ICI at a subcarrier is formulated as follows. Consider an OFDM signal at time t modeled as: where f n is the frequency of the n th subcarrier, 0 ≤ t ≤ T N and j denotes the complex number. Moreover, T N is the symbol time, and X [n] is the signal transmitted over the nth subcarrier. Furthermore, the frequency offset due to asynchrony in the A-NOMA signal will introduce a multiplicative timevarying distortion, represented as β(t) = e j2πρ ft , where ρ = δf f . As a result, the ICI on the m th subcarrier is modeled as [17]: where the gives the distance of the interfering subcarrier to the desired subcarrier. Meanwhile, in NOMA systems Multi-Access Interference (MAI) given in (4) is another interference that can distort the OFDM symbol of the desired user. Also, it is noted that in A-NOMA, the ICI is comparatively lesser than Multi-Access Interference [17]. Hence, for this analysis, the MAI is considered the dominant source of interference, and one subcarrier is focused on for the analysis.
The received signal at R X for the k * th 1 user at the sth symbol is given by [17], where X k * [s] denotes the k * th user's sth data symbol which is complex and output from a M-ary-QAM symbol mapper, and P k * is the transmit power of X k * [s], which is same for all symbols of k * th user for transmission time duration. Hence, the signal transmitted from the k * th user at the sth symbol can be denoted by √ P k * X k * [s]. The frequency response on one subcarrier for one symbol time period is considered flat and assumed to follow a Rayleigh distribution independently and identically (i.i.d) [32], [33], [34]. The symbol time is assumed to be considerably lesser than the channel coherence time. Hence, h k * [s] is constant for a block of symbols during a transmission period. Further, n 0 is the Additive White Gaussian Noise (AWGN) at the receiver side with variance σ 2 and η k * [s] is the total interference to the k * th user's sth symbol [17].
where e jθ k * ,i depicts the i th user's phase mismatch of the signal to the k * th and k * ,i denotes the symbol duration that the i th user's ς symbol overlap with the desired symbol, as a percentage out of the total symbol period, T sym . Moreover, research works such as [35], [36], and [37] have addressed the problem of estimating the time offset, and carrier offset which leads to estimating k * ,i and θ k * ,i with high reliability in D2D communications using time of arrival measurements.
Following the standard SIC procedure, after subtracting the reconstructed interference,η k * ,i [s, ς], of all overlapping interferes' symbols, the remaining interference cancelled signal for desired symbol is formulated in terms of desired signal, residual interference plus noise as [17], where the latest residual interference to the desired symbol, k * [s], is modelled as [17], Moreover, such residual interference is utilized in the A-NOMA T-SIC scheme to improve the decoding reliability.

III. PROBLEM FORMULATION
First, the Conventional T-SIC (Conv T-SIC) scheme [17] that can enhance the decoding performance in A-NOMA uplink transmissions is presented. Secondly, a new Cyclic T-SIC scheme that can be applied in A-NOMA D2D networks is proposed.

A. CONV T-SIC DECODING SCHEME
The Conv T-SIC decoding scheme [17] is applicable at the receiver R X terminal for a communication system containing k ∈ K transmitters as shown in Figure 1. The T-SIC decoding procedure is followed once an asynchronous superimposed signal with misaligned data symbols is received at R X . First an Interference Cancellation (IC) triangle is constructed by exploiting the triangular pattern of data symbols detected.
The weakest symbol received out of such data symbols is added to the IC triangle as the last symbol to be detected. Next, the symbols that overlap with the weakest user's symbol are added to the IC triangle. Then, all the symbols that overlaps the second weakest users' symbols are included to the IC triangle. This procedure is repeated until the strongest users' symbols are entered to the IC triangle. Once such an IC triangle is constructed for n number of transmissions received from k 1 to k n users, the first symbol of the strongest user, k 1 , is decoded. Next the consecutive symbols of k 1 are decoded. Afterwards, the first symbol of the second strongest user, k 2 is decoded by subtraction of prior estimated symbols of k 1 . Then, the second symbol of the k 2 user is decoded by subtracting the prior estimated symbols belonging to k 1 and k 2 . Next, the first symbol of k 3 user is decoded by subtracting all the prior estimated symbols that belong to both k 1 , k 2 from the received signal Y . Similarly, the rest of the symbols of k 3 , up to k n users are decoded by subtracting the prior symbols estimated. Moreover, the conventional T-SIC [17] is repeated iteratively between users for a fixed number of times, N T-SIC . Specifically, the conventional T-SIC not only utilizes strong user signals to decode weak user signals but also uses weak user signals to decode strong user signals iteratively. In the Conv T-SIC scheme [17], the N T SIC is within the range 1 ≤ N T SIC ≤ N T SICmax , where it is used primarily to improve the T-SIC accuracy in terms of BER. Furthermore, Conv T-SIC may not be optimal for decoders, particularly in massive D2D communication networks with energy-limited nodes. In the forthcoming section, a Cyclic T-SIC decoding scheme is proposed for A-NOMA to improve its decoding efficiency in terms of factors such as energy consumption, reliability, complexity, and delay is proposed.

B. CYCLIC T-SIC DECODING SCHEME
A Cyclic T-SIC scheme is proposed for A-NOMA D2D decoders comprising of three stages as, i) Optimization, ii) Decoding, and iii) Re-transmission.

1) OPTIMIZATION STAGE
Let a superimposed signal is received at the receiver terminal R X . Furthermore, to conserve the decoder energy consumption and reliability parameters, a binary optimization method is used by R X as follows, where a decision vector of the data symbols to be decoded is introduced as D u = [D u k ] k∈K , and D u k = 1 if the data symbols of the kth user are selected to be decoded. Further, the duration of symbol time that ith user overlap with the desired k * th user symbol, co-channel interference threshold, and channel bandwidth are given by k * ,i , I th , B respectively. The transmission power of kth user, channel gain of kth user, thermal noise variance, transmission power of ith user, channel gain of ith user are given by P k , g k , σ 2 , P i , g i , respectively. Also, R min is the minimum rate threshold defined for the communication system. Moreover, the main aim of the optimization is to maximize the number of decoded data symbols while minimizing the energy consumption over computing, and hence an objective function is formulated using Lemma 1.
Proposition 1: D u k is approximated to a binary, D u k ∈ {0, 1}, by using a difference of two convex functions/sets constraint (D b ) as follows [38]: □ Lemma 1: The number of symbols selected from an IC triangle per each kth user, N sym k , is given as, (9) Proof: It is observed that the total number of symbols that is included in an IC triangle is K !. Further, the maximum number of symbols that overlap a symbol is two, since each data symbol is received over an equal T sym . Hence, the total number of symbols starting from the weakest symbol increments from 1 to K for each kth user. Thus, for each kth user the number of symbols is (K − k + 1), where k = 1 denotes the strongest user. Moreover, the total number of users' data decoded, k opt = K k=1 D u k . □ Moreover, the constraint in (7d) corresponds to the minimum rate, and (11f) corresponds to the maximum allowable co-channel interference constraint. Note that the optimization algorithm in (7a) involves non-linear constraints, such that the optimization is a hard constrained problem. Hence, it is reformulated by relaxing its binary variables and non-convex constraints. Additionally, the binary constraint in (7b) is transformed to linear constraints as given in (11b) and (11c). Such constraint re-formulation assure that D u k is approximated to a binary [39] using Proposition 1. Further, the non-linear constraint with regard to minimum rate constraint in (7d) is converted into a linear constraint as in (10) The relaxed binary optimization with reformulated convex constraints is given as, Moreover, the reformulated convex optimization problem in (11a) is solved efficiently by applying the Lagrangian dual algorithm using Proposition 2.
Proposition 2: The Lagrangian optimization problem is formed as, where λ, δ, φ, µ are the Lagrangian multipliers corresponding respectively to constraints in the optimization problem in (11a). Proof: The Lagrangian objective function of problem in (11a) is formed as, □ Note that the objective function as well as all the constraints of the dual problem are linear with respect to the Lagrangian multipliers. Thus, the dual problem is convex over the dual variables µ, ς which can be optimized through one dimensional searching algorithm. Thus, the optimal D u VOLUME 11, 2023 can be achieved by running a gradient descent algorithm [41]. First the gradient of the Lagrangian function with respect to D u k which is formulated as follows.
Further, D u k can be updated as follows, where β denotes the iteration step size of D u . After finding the optimal D u using the gradient descent algorithm, the optimal λ can be achieved. Moreover, the dual function in (12a) is not guaranteed to be differentiable. Hence, an iterative scheme based on gradient descent algorithm is used to obtain the optimal λ, δ, φ, δ [3] using Proposition 3. Proposition 3: The sub-gradient of the dual function, g(λ, δ, φ, µ) = max Du L(D u , λ, δ, φ, µ), with respect to λ, δ, φ, µ can be derived as, Further, the dual variable λ, δ, φ, µ can be updated according to the following expression.
where α denotes the iteration step size of λ, δ, φ, µ. The approach to solve the optimization problem in (7a) is summarized in Algorithm 1.

2) DECODING AND RE-TRANSMISSION STAGES
Once the optimal set of data symbols is derived using the proposed optimization in (11a), they are decoded using the Conv T-SIC scheme. Next, the R X terminal repeats the process of listening to the same subcarrier and receiving retransmissions from the remaining users. Note that for outband D2D emergency scenarios, user signal retransmissions are necessary in order not to lose any critical data [42]. In addition, it is assumed that the network users follow a D2D protocol such as M-HELP [43], which is used for D2D emergency call transmissions. A M-HELP enabled UE re-transmits their own data after a fixed time interval, when their data have not been forwarded up-to a pre-defined limit, n RS , by neighbor devices. Here, the n RS is calculated by transmitting users, which correspond to the number of relaying of its own data by the neighborhood. The re-transmitted data signals by transmitters share the same subcarrier and are superimposed such that it can be decoded using the optimization problem in (11a) at the R X terminal. Correspondingly, this method is repeated until all K user data are decoded using the proposed optimization-assisted T-SIC.
To sum up, the procedure followed at the transmitting UEs is given in Algorithm 2, where n Rt gives the data signal re-transmissions count.. Further, the Cyclic T-SIC scheme performed at the R X terminal is summarized in Algorithm 3. Moreover, the method of decoding each user data using the proposed Cyclic T-SIC scheme is presented in Figure 2. First, a superimposed signal comprising K user data is received over a T 0 duration. Optimization problem in (11a) is utilized to derive the optimal data symbol combination which comprises an IC triangle. Such data symbols are decoded using the T-SIC decoding scheme. In this case, received data symbols of some users can be left undecoded. Hence, after a T 0 duration, retransmissions from such users occur after a t r time interval and received as superimposed signals. The decoding procedure in Cyclic T-SIC is repeated for such a signal received. This procedure is repeated for an arbitrary r number of iterations until the successful decoding of each user data. Note that retransmissions from users are not mandatory in scenarios where all the user data are decoded in one Cyclic T-SIC iteration. Furthermore, the performance analysis of such a D2D A-NOMA decoding scheme in terms of energy efficiency, reliability, computational complexity, and delay aspects has been understudied.  Re-transmitting in the same resource in next period 5 until n RS ≥ RS threshold or n RT ≥ n rth ;

Algorithm 3 At the Receiver: Cyclic T-SIC Decoding
Data: Total number of users, K , sharing same resource 1 Initialize k opt = 0 2 repeat 3 Receive n(≤ K ) user data superimposed signal 4 Derive the optimal D u using (11a) 5 Decoding the k opt (≤ n) user data 6 until All K user data decoded;

IV. PERFORMANCE ANALYSIS
In this section the EE, average BER, computational complexity, and simulation delay of the proposed Cyclic T-SIC method are investigated.

A. EE ANALYSIS
The EE of the A-NOMA D2D scheme can be defined as the ratio between the total achievable rates and total power consumption of the communication system [3]: where E c is the total energy consumption per SIC decoding cycle as defined in Lemma 2, P circuit presents the power dissipated by user device hardware circuits [3], [40], and ε denotes the reciprocal of the transmitter power amplifier drain efficiency.

Lemma 2: The total energy consumption per decoding instance is denoted by,
where E max is the initial maximum energy in a user device, N T-SIC is the number of repetitive times the same data symbols are decoded, n sym 1 is the number of symbols decoded at the initial decoding and E initial is the remaining energy after such decoding instance. Moreover, the total number of symbols decoded at t is n sym (t) = K k=1 N sym k (t) . Also, E t−1 , E t are the residual energy levels in the (t − 1)th and tth instances. Further, n sym 2 is the maximum number of symbols possible to decode with E max , and E final is the remaining energy at the end of decoding such n sym 2 . Further, the behavior of such energy dissipation is illustrated in Figure 3. Proof: The energy consumption for decoding has an exponential deterioration depending on the n sym [44]. The remaining energy after the initial decoding VOLUME 11, 2023 instance is given as, where α denote the attenuation of energy depending on the receiver physical properties and processing efficiency. Moreover, the final remaining energy at the end of decoding n sym 2 is, From (26) and (27), □ Hence, the energy consumption per T-SIC decoding cycle for a specific K k=1 N sym k and n iter is formulated as,

B. BER ANALYSIS
The average theoretical BER of k * th user data, P bit,k * , in asynchronous transmissions can be formulated in [17] were updated as follows. First, by considering the latest detected symbols, the Signal to Interference Noise Ratio (SINR) of the sth symbol of the k * th user is expressed with respect tõ η k * [s] as [17], where z represents the latest detection status of the interfering symbols, i.e. correct or erroneous. Further, the error probability of sth symbol detection can be presented with respect to the (γ k [s]|z, h k * , k * [s]) as [17], where, Q(·) represent the Q function and M denotes the order in M -QAM [45].
where d e * k is half the distance between two nearest constellation points given by 3.P k * h k * 2.(M −1) [46]. Further, the conditional error probability of the sth symbol detection can be obtained as [17], where, where P(e k * [s]), P(e k * [s]|α k * , k * [s]) denote respectively, the probability of error and conditional probability of error on the sth symbol of the k * th user. Also, sth symbol of the k * user is the desired symbol to be detected and M is the modulation order. Further, h k * denotes the magnitude of fading for the sth symbol of the k * th user. The k * [s] denotes the percentage of symbol time that the overlapping symbols of the ith user has on the sth symbol of the k * th user. Moreover, iP z , denotes probability of the ith permutation of z, 1 ≤ i ≤ 2 2(k opt −1) and Pr i,k [ζ ] denotes the probability of detection being correct or in error.

C. COMPUTATIONAL COMPLEXITY ANALYSIS
The computational complexity of decoding K user data under Conv T-SIC and Cyclic T-SIC are presented as follows. 1) Conv T-SIC: The total interference to the desired symbol sth symbol of k * th user, η k * [s], is successively cancelled in Conv T-SIC process and hence the maximum computational complexity depends on the number of user devices, K , which is approximately of order O(K 2 ) [3]. 2) Cyclic T-SIC: The decoding process of the proposed method has a complexity of O(k 2 opt ), where k opt ≤ K . On top of that, the computational complexity for the stochastic gradient descent method based optimization in the Cyclic T-SIC scheme is in the order of O(log( 1 ϵ )) [3]. To sum up, the proposed Cyclic T-SIC method has a computational complexity of O(log( 1 ϵ )k 2 opt ) per single decoding iteration. Further, it is proven from Lemma 3 that the computational complexity in Cyclic T-SIC scheme is lower than the Conv T-SIC scheme. Lemma 3: The computational complexity of Cyclic T-SIC decoding of k i users' data over n iterations is less than that of Conventional SIC decoding of K users' data in one iteration [3]. Note that k i corresponds to k opt in the i th iteration and the sum of k i equals K .

V. ANALYSIS OF THE NUMERICAL SIMULATION RESULTS
In this Section, the numerical simulation results are presented to demonstrate the performance of the proposed system model and the algorithm. User terminals are considered to be randomly located within a radius of 100m to the corresponding receiving UE. Also, for a Monte-Carlo simulation analysis, 10 3 random A-NOMA D2D setups are considered. Different parameters used for simulation analysis are summarized in TABLE 1. 2 Furthermore, the performance of both the Conv T-SIC and Cyclic T-SIC are considered under the condition that each IC triangle formed across the received asynchronous NOMA signal is decoded for one iteration, i.e., N T-SIC = 1. 2 It is noteworthy that these given values can be modified to any other values depending on the specific scenario under consideration.  Figure 4 depicts that the Energy Efficiency (EE) obtained via the proposed optimization algorithm converges to a stable value within a fixed number of iterations. However, the converging EE value decreases as the K increases. With the increment in the K , the total n sym to be decoded increases, which considerably elevates E c and reduces the EE. Furthermore, the number of iterations required to converge EE increases with the increment in K . This is mainly due to the fact that the vector size of D u increments with K . FIGURE 5 depicts the EE performance versus K with Cyclic and Conv T-SIC under different I th values. It can be seen from the plot that EE decreases when the value K increases. The total n sym symbols increment with K and hence result in a higher E c . Moreover, when the I th increases, n sym increases and reduces the EE. It is also noteworthy that since Conv T-SIC is not optimizing the K , it is not impacted by I th . Furthermore, in Conv T-SIC, the total K data are decoded during one decoding iteration which leads to a higher number of computations and thereby a higher E c compared to Cyclic T-SIC. It is noteworthy that in Figure. Figure 6 depicts the variation of EE against the received power ratio, ν. It is seen from the plot that the EE increases as ν increments in both Cyclic and Conv T-SIC schemes. P k increases with the elevation of ν, which leads to the increment in γ k and the average sum throughput, which improves the EE. Moreover, Cyclic T-SIC has a significant improvement in comparison to Conv T-SIC, because the increment in ν enables the optimization algorithm to derive D u easily due to the distinct P k levels of each user. Hence, the D u derivation enables the optimal data symbols to be decoded. Also, a convergence in EE in the Cyclic T-SIC is observed when VOLUME 11, 2023     The BER of both schemes increment against K since with a higher K , the γ k difference among user data decreases. This lowers the accuracy of the received data detection, which impacts the decoding process. Moreover, the average BER in Cyclic T-SIC is lower than Conv T-SIC. The main reason behind this is that the Cyclic T-SIC scheme derives the optimal k opt to decode based on the D u derived. The availability of distinct γ k values between the user data decoded in the Cyclic T-SIC scheme increases the accuracy in decoding. Further, the Cyclic T-SIC decodes all K data over successive iterations by taking into consideration of the γ k of each user data. Conversely, Conv T-SIC decodes such data in a single iteration, which increases the BER. Figure 8b depicts the BER against the ν for the nearest user, middle user, and farthest user. A significant BER gap is seen between the nearest user and middle user. As ν increases, a comparable γ k level difference is seen between the near user and far user. Hence, the decoding accuracy reduces as the γ k gap increases between consecutive users. The highest γ k level is received from the nearest user. Under ν < 2, the Cyclic T-SIC has a higher BER than Conv T-SIC. The reason being that the when ν < 2, the interference faced by the nearest user is higher than the estimated interference using Cyclic T-SIC. In the proposed optimal D u of Cyclic T-SIC, only a set of selected users are decoded assuming that the interference from the remaining users is minimum. Hence, the decoding error is higher than Conv T-SIC. Meanwhile, as ν ⪆ 2, the interference from the remaining users on the near user is less significant and hence the decoding accuracy is high and similar to Conv T-SIC. Furthermore, for the middle and farthest users' BER of Cyclic T-SIC is lesser compared to Conv T-SIC. In the middle and farthest user case, the γ k of the consecutive users is minimum that Cyclic T-SIC decodes the user data during a successive iteration of Cyclic decoding utilizing the retransmissions from the transmitters. Thereby, the γ k level of the received signals from the middle and farthest users are sufficiently high enough during the successive iteration since the stronger signals also do not interfere. Figure 9a depicts the BER performance against the transmit SNR under the Cyclic T-SIC and Conv T-SIC schemes. BER in Cyclic T-SIC is lower than Conv T-SIC for the case where transmit SNR is < 23 dBm. The optimal D u derivation depending on the γ k levels of each kth user and I th constraint increases the accuracy of decoding. Meanwhile when the transmit SNR ⪆ 23 dBm, BER of Cylic T-SIC > BER of Conv T-SIC. The difference between γ k levels of each kth user is less significant when all users transmit at SNR ⪆ 23 dBm. Hence, The optimal D u derivation is less accurate, thereby increasing the BER in Cyclic T-SIC. Moreover, Figure 9b depicts that the EE of the proposed system is higher than Conv T-SIC, because the optimal D u reduces the E c per decoding iteration. Moreover, with a lower transmit SNR, a higher EE can be gained using Cyclic T-SIC compared to the Conv T-SIC scheme. Jointly considering the results obtained in Figure. Figure 10a represents the BER performance against the relative symbol time offset, φ, between users under Cyclic and Conv T-SIC schemes. The BER performance worsens with the increment in φ since the co-channel interference increases. However, the Cyclic T-SIC has a reduced BER because of deriving the optimal D u to select n sym decoded per iteration based on the I th . Figure 10b, depicts that the EE of Cyclic T-SIC is higher than Conv T-SIC, although the VOLUME 11, 2023 co-channel interference between users increased with the φ. The optimization of number of user data decoded based on the I th limits the k opt decoded per iteration. This results in reducing the interference among the symbols which are decoded and improves the EE of Cyclic T-SIC compared to Conv T-SIC. Further, it is observed that a lower BER and higher EE can be obtained using the Cyclic T-SIC scheme, even if the φ increases the asynchrony among the received data signals. In addition, EE starts converging after the φ ⪆ 0.3. BER converges since I th and minimum rate thresholds are reached with the K = 10 and hence the same D u is obtained, and the n sym reaches a maximum limit.
In Cyclic T-SIC, EE converges respectively under K = 5, K = 10 and K = 15, to 1.3 × 10 5 , 8.3 × 10 4 and 6.4 × 10 4 bits per Joule at transmit SNR = 23 dBm and φ = 50%, proving the consistency between results obtained in Figure. 9b and Figure. 10b. Similarly, under Conv T-SIC, EE converges respectively under K = 5, K = 10 and K = 15, to 7 × 10 4 , 3.6 × 10 4 and 2 × 10 4 bits per Joule at transmit SNR = 23 dBm and φ = 50%. Figure. 10b depicts the BER performance of nearest, middle, and farthest users against φ. In general, the BER performance worsens with the increment in φ since the co-channel interference increases. The BER of the nearest user of both Cyclic and Conv T-SIC schemes are nearly similar since, the nearest user data has a significant γ k level in both schemes which increases the accuracy of decoding its data. Moreover, the Cyclic T-SIC of both middle and farthest users have significantly reduced BER compared to Conv T-SIC because of decoding their data in successive iterations of Cyclic T-SIC with the use of retransmissions. This leads to deriving the optimal D u and selecting n sym decoded per iteration based on the I th .
Moreover, consistent results are obtained in Figure. 8a, Figure. 8b, Figure. 9a and Figure. 10a under K = 10, ν = 2, transmit SNR = 23 dBm and φ = 50%. The BER value of respectively the nearest user, middle user, and farthest user converges to approximately 2.2×10 −7 , 0.037 and 0.14 under Cyclic T-SIC. The BER value of respectively the nearest user, middle user, and farthest user converges approximately to 2.2 × 10 −7 , 0.15 and 0.68 under Conv T-SIC. Figure 11a depicts the theoretical computational complexity against K under Cyclic and Conv T-SIC schemes.
The curve with the Cyclic T-SIC has a considerably lower complexity compared to the Conv T-SIC. In Cyclic T-SIC, the user data are decoded in sequential iterations, where only k opt ≤ K data is decoded per iteration. In contrast, Conv T-SIC decodes the total K data in a single iteration. Hence, a complexity reduction of 56.14% by Cyclic T-SIC over Conv T-SIC is observed in this case.
The Figure 11b depicts the total simulation delay against K under Cyclic T-SIC and the Conv T-SIC schemes. The average simulation delay in Cyclic T-SIC is lower than that of Conv T-SIC. As K increases, the decoding delay increases in Conv T-SIC decoding. In contrast, the delay is less in Cyclic T-SIC since k opt ≤ K per decoding iteration. The delay decreases with the successive iterations as the k opt data decoded in such iterations decreases. Also, the Cyclic T-SIC achieves a simulation delay reduction of 22.44% compared to Conv T-SIC in this case. Note that the complexity reduction of the proposed scheme is in Figure. 11a is significantly higher than the simulation delay in Figure. 11b. Complexity analysis is done under theoretical assumptions and in contrast simulation delay is measured with respect to the processing delay of the computer used. Hence, depending on the processing speed and other hardware delays, the corresponding decoding delay of the proposed scheme is varying as in Figure. 11b.

VI. CONCLUSION
In this paper, we have designed a Cyclic T-SIC scheme to reduce decoding complexity, energy consumption, and bit error rate of a superimposed signal received in a massive A-NOMA enabled D2D network. The proposed scheme's performance was verified through a computerbased Monte-Carlo simulation analysis. The Cyclic T-SIC showed a significant EE improvement in comparison to Conv T-SIC. The optimization algorithm in Cyclic T-SIC enabled the receiver terminal to derive the optimal symbols to decode under to the co-channel interference and data rate constraints. Further, it was observed that Cyclic T-SIC achieves a lower BER and higher EE due to the optimization algorithm, even if the asynchrony among the received data signals is φ ⪆ 0.3. Furthermore, a convergence in EE and BER in the Cyclic T-SIC was observed when ν ≥ 5.0 and the transmit SNR ≥ 23 dBm due to the thresholds in the optimization algorithm which limits the k opt decoded per iteration. Also, the decoding complexity and delay were less in Cyclic T-SIC since k opt ≤ K per decoding iteration. As a future direction, investigating the performance of the proposed scheme for 5G Ultra-Reliable Low Latency Communications (URLLC) is an interesting extension to this work.