Joint uplink and downlink resource allocation for device to device communications

With the ever-increasing number of intelligent devices, extensive research efforts have been spent towards developing efﬁcient resource allocation in underlying device to device communications. Centralised and distributed approaches have been proposed to improve the network performances through uplink and downlink schemes. However, the full-duplex resource allocation remains challenging. Here, a joint uplink–downlink resource block assignment for device to device communications is proposed. The device’s geometric distribution via the Poisson point process is formulated and the optimization problem as a mixed strategy non-cooperative game is modelled. The proposed resource allocation scheme allows many device to device communication pairs to share the same cellular resource block but does not restrict their number in advance; it is determined dynamically according to the channel’s capacity and interference condition. Inversely, a device to device communication transmitter can simultaneously be allocated many resource blocks according to its bandwidth requirement and the number of its antennas. Furthermore, the optimal amount of data and power transmitted by each device is calculated. The efﬁciency of this algorithm is evaluated through computer simulations, and the results demonstrate good performances in the spectral and energy efﬁciency with relatively low complexity and convergence time.


INTRODUCTION
The expansion of smart devices number and content sharing between users results in a tremendous increase in the wireless data traffic and local service demands. According to [1], there will be more than 30 billion connected devices by the end of 2021, and this trend will keep on growing in the next decade. However, with the limited availability of the cellular spectrum and the slow progress in the network infrastructure development, the conventional cellular system became incapable of satisfying these increasing demands.
Facing this situation, device to device (D2D) communication was integrated as one of the key characteristics of 5G mobile networks [2]. It has drawn significant attention for its potential to improve the system performance and the user experience [3]. Its benefits mainly stem from the device's proximity, the singlehop nature of the communication link, and the radio resources reuse [4].
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Communications published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology A general session setup of a D2D communication includes the following steps [3]; first, a D2D user initiates a communication request. Then, the eNodeB checks if the receiver is in the same proximity. If some criteria are satisfied; as the distance, the interference situation, and the channel quality, the eNodeB triggers a direct D2D link between the transmitter and the receiver in overlay or underlay mode [2]. In the overlay mode, the communicating pairs use dedicated resource blocks (RBs) to route their data. Whereas underlay D2D mode, they dynamically reuse resources occupied by the primary cellular users [3]. The first mode easily splits the spectrum between cellular and D2D communications; however, it leads to resource waste if the reserved spectrum exceeds the D2D demands. The second enhances the spectrum reuse and the system capacity. Yet, it generates severe interference between co-channel communications if the resource allocation is processed randomly [4].
To tackle this issue, we focus in this paper on a set of D2D challenges and propose solutions to provide efficient and scalable underlay D2D communication. We try to answer a set of questions, basically how to synchronize and meet both the CUEs and DUEs throughput demands without relying on the eNodeB? How to manage the co-channel interferences in uplink and downlink channels? And what are the measurements to maximize the network spectral and energy efficiency?
Extensive research efforts have been spent on solving these problems either in a centralized or distributed fashion.
Centralized approaches such as  assume the channel state information (CSI) knowledge at the eNodeB. They rely on several tools as: stochastic geometry [5,6], centralized graphtheoretic approaches [7,8], and mixed-integer programming [9,10] to enhance the network capacity.
Paper [10] considers a centralized power control scheme to maximize the cellular and D2D channel quality. The authors analyse the non-convexity of the problem and propose an alternative sub-optimal solution. Ref. [11] investigates a novel scheme to maximize the sum-rate in one cellular and one D2D scenario. Whereas in [12,13], they consider a partial CSI knowledge of the D2D links and suggest a solution to maximize the device's throughput.
Ref. [14] proposes an interference control mechanism in uplink subcarriers. The D2D pairs broadcast their signal to interference plus noise ratio (SINR) through a control channel. If the received SINR at the eNodeB level is higher than the maximum threshold, it stops scheduling cellular users on the occupied resource blocks.
Inversely, in refs. [15,16], the cellular users measure their power levels and forward them to the eNodeB via a dedicated control channel. Consequently, it avoids allocating the same RBs to high-power D2D transmitters.
Paper [17] proposes a downlink RA scheme to maximize the sum rate. The original concave problem is transformed into a convex formula and solved via approximations [18]. Paper [19] investigates a centralized spectrum and power control scheme. The optimization problem is also formulated as a concave formula and solved via non-linear programming [20].
Limited research investigated centralized full-duplex RA [21][22][23] to manage the co-tier/cross-tier interference. The authors of [21] propose a suboptimal approach, whereas, in paper [23], they use a heuristic suboptimal algorithm to solve the problem. Paper [24] presents a centralized two-step scheme to enhance the spectrum reuse [25]. However, a D2D pair can share only one cellular subcarrier either in uplink or downlink. Moreover, the spectral and energy efficiency were not analysed.
The former centralized approaches  assume the knowledge of the CSI at the eNodeB. However, this information is not practical when the channels vary rapidly with time [10]. Besides, the CSI reporting requires additional resources, while a limited number is available for the network control [13]. It also increases the complexity and generates significant signalling overhead between the eNodeB and the connected devices [26,27].
Hybrid methods proposed joint centralized-distributed RA [28][29][30] to address these issues. However, fully distributed approaches have gained more merit . They generally perform a utility function that represents every device's preference [31]. Then develop distributed resource allocation algorithms via: pricing, auctions [32], cooperative [33], and non-cooperative games [34]. They combine the advantages of low information sharing, traffic offload, and low complexity [35,37].
In this context, refs [36,37] propose a two-stage energyefficient scheme. The first deals with the uplink resource allocation, and the second controls the device's power. Finally, the problem is solved using graph theory. Paper [38] proposes a distributed power control scheme to maximize the uplink D2D SINR in a massive MIMO system. Paper [25] applies a joint mode selection, RA, and power control scheme to minimize the interference at the D2D level.
Ref. [39] proposes a distributed game-theoretic scheme to solve the uplink RA problem in a multi-cell scenario. In [40], the uplink RB assignment between cellular and DUEs is performed based on a Q-learning solution. Paper [41] evaluates the resource allocation in a cognitive D2D network, whereas many other papers consider approximations such as stochastic geometry to manage the uplink spectral resources [42][43][44].
In our previous work [45][46][47], we also focused on distributed uplink RA to avoid the eNodeB downlink interference. In [45], we proposed an interference-aware algorithm based on a mixedstrategy game. This approach prioritizes the RBs occupied by quite far cellular users. In [46] and [47], we added a power control stage to investigate the optimal transmission power and RB maximizing the network EE.
Paper [50] investigates a downlink RA scheme based on eNodeB power mitigation. Paper [51] considers only D2D power control to maximize the system EE, whereas refs. [52][53][54] yield the joint EE and SE maximization for D2D pairs.
Refs. [55,56] propose joint uplink/downlink bipartite matching to optimize the EE in a relay-assisted D2D scenario [44]. They investigate two-stage algorithms to reduce the complexity [25,60]. The first stage manages the power consumption, and the second allocates the resources between the active D2D pairs. However, they ignored the QoS requirements of CUEs [57][58][59].
In [56], the authors proposed a maximum weight bipartite matching [25] to select the RB requiring the minimum power transmission. They suggest a distributed power-based scheme [58] to deal with the interference problem. However, they also focused on D2D performance maximization and ignored the primary cellular communications [59,60].
In light of the related works, we investigate a joint uplink/downlink RA to maximize the total spectral and energy efficiency of the network. Our main contributions are summarised as follows: (i) The proposed algorithm maximizes the spectrum reuse in both uplink and downlink subcarriers.
(ii) It simultaneously addresses two main critical issues in cellular networks: the interference and the power consumption. It improves the cellular capacity and further enhances the network's energy efficiency. (iii) Given that centralized RA algorithms require additional data broadcast and generate significant signalling overhead, we perform a distributed algorithm based on a mixed strategy game. (iv) Many distributed RA approaches adopt pure strategy games in which a UE can select only one RB. Consequently, the devices must run the algorithm each time slot to be allocated a new RB. This process is taught to converge, especially when the devices number increases. (v) The proposed RA algorithm allows many D2D pairs to share the same cellular RB but did not restrict their number in advance. It is set dynamically according to the channel's capacity and interference condition. Besides, a D2D can simultaneously be allocated many RBs according to its bandwidth requirement and antennas. (vi) Each timeslot, many devices can access the cell sequentially and play the game. (vii) The proposed approach calculates the optimal amount of data and power transmitted by each device. To the best of our knowledge, this is the first study that provides such results. (viii) Since the available bandwidth and the devices' transmission power are essential to satisfy the QoS requirements, the proposed algorithm determines the optimal RB, data and transmission power for each D2D pair based on the UEs density, locations and co-channel devices power. (ix) The proposed algorithm reduces the calculation and the complexity. Moreover, it guarantees a fast convergence to the mixed strategy Nash equilibrium (MSNE) due to the limited amount of information broadcast. (x) Simulations results prove the efficiency of the proposed approach even with many devices number.
Throughout the rest of the paper, we present in Section 2 our system model. In Section 3, we formulate the joint spectral efficiency (SE) and energy efficiency (EE) maximization problem through a mixed strategy non-cooperative game. Section 4 explains the proposed algorithm and solves the RA problem. Section 5 analyses the system performances via computer simulations and compares the obtaining results to the existing approaches. Finally, Section 6 concludes the paper and proposes some perspectives for future work.   two device types, namely cellular UEs (CUEs) and D2D UEs (DUEs). Each CUE is allocated an orthogonal link and communicates with the base station via uplink/downlink channels, whereas DUEs communicate directly in pairs and reuse the same cellular RBs. Accordingly, the D2D receivers' achievable data rate in the RB c is: Where S c i and W are the SINR of the ith D2D receiver at the RB c and the channel bandwidth.
Let N be the total UEs number in the cell. N = {C, D}. C denotes the cellular users' number and D represents the D2D pair's number. The spatial distributions of C and D follow an independent homogeneous Poisson point process (PPP) with intensities c and d respectively.

Uplink and downlink SINR distribution
In a dense network, there is no dominant line of sight propagation. So Rayleigh fading is the most applicable. We assume the independent and identically distributed (iid) Rayleigh fading for all the network channels.
Let g Bc i be the channel gain between the eNodeB and the ith CUE, g Bd j the fading coefficient between the eNodeB and the j th DUE, g cd i j the one between the ith CUE and the j th D2D pair, g dd jk the fading coefficient between the j th DUE and the kth DUE, and g dd j j the channel gain between the transmitter and the receiver of the j th D2D pair.
Considering the large scale fading effects; the interference channel gain between the j th D2D pair and the ith cellular channel in the RB c can be modelled by: where d i j is the distance between the D2D pair and the CUE, h i, j is a complex Gaussian channel coefficient satisfying h i, j ≈ CN (0, 1), represents the channel path loss exponent and N 0 is the average white Gaussian noise AWGN. We assume the symmetry between the downlink and the uplink channels. Accordingly, the communication channel can be modelled as: Where P t and P r are the transmitted and received power.
(i) For a CUE: the interferences occur only from the cochannel D2D pairs. (ii) For a D2D receiver: The uplink interferences came from the co-channel D2D communications and the primary CUE, and the downlink interference came from the co-channel D2D communications and the eNodeB.
In the rest, we characterize the sensed interference at the ith CUE as I c i,c and that received by the j th D2D in the uplink and the downlink subcarriers as I U ,c j ,d and I D,c j ,d respectively. A D2D pair can reuse downlink or uplink cellular channels. Therefore, we obtain the following interference model for the CUE and the D2D receiver.
In the downlink mode:

Joint spectral and energy efficiency
Consequently, the joint uplink/downlink spectral efficiency SE of the ith CUE can be expressed by: A cellular UE is allocated a RB, and a D2D link could be allocated many RBs. Accordingly, the j th D2D achievable data rate is the sum of the allocated RBs data rates and its joint uplink/downlink system capacity can be expressed as: The overall network system capacity is given by: The network SE is maximized when all the transmitted signals arrive at the receiver with the maximum SINR. Accordingly, the system SE and EE optimization can be expressed respectively as: Subject to: The SINR is constrained by a minimum threshold to guarantee the QoS. Besides, the transmission powers are kept under a maximum value to preserve the energy consumption and reduce the interferences.
The objective function defined in Equation (16) is nonconvex. It is a mixed-integer programming NP-hard complex.

MIXED STRATEGY GAME FORMULATION
Game theory is proven as an efficient tool to model the strategic interaction among rational decision-makers. It addresses the games, in which each player's gains or losses depend on those of the other participants [45].
The optimal outcome of the game is Nash equilibrium. It describes a situation where no player can improve unilaterally its payoff when the others' remain unchanged [46].
Pure strategy games have been extensively used to analyse interactive and conflict decisions in cellular networks. However, the convergence is not always guaranteed, mostly when the player's number increases. Yet, a mixed strategy game can appropriately fit the requirements of the studied problem.

ame theory model
We model the proposed problem via a mixed strategy noncooperative game. Each UE is a rational player. It acts independently from others and aims to maximize its payoff. The strategy sets of the ith CUE and C∖{i} cellular UEs are denoted s c i and s c −i respectively.
The j th D2D pair's strategy s d j depends on all the strategies taken by the other UEs s c i ,s c −i , and s d − j . The utility function U j of the j th player is a mathematical representation of its preferences. It assigns a value (payoff) to each alternative. If the player prefers the strategy S1 than S2; its payoff applied to S1 is higher than this applied to S2. We define the utility function of the ith CUE as: And, the utility function of the j th D2D pair as: Similarly, the ith device power mitigation is expressed through the utility function u i pow as:

Mixed strategy payoff
The mixed strategy i of the ith player, is the probability distribution over the pure strategies space S i .
where i j is the probability assigned to the pure strategy S i j . = ∏ i∈N i is the strategy space of the game that is, when a UE plays a mixed strategy , the probability that the pure strategies combinations (S 1 In pure strategy games, a player makes only one choice without involving chance or probability. In a mixed strategy game, all the players' payoffs become random variables, and the mixed strategy payoff of the ith player u i ( ) is defined as: Where u i (s) is the ith player payoff in the pure strategy space S{N}. Given that a UEs' payoff depends not only on its own strategy, but also on the strategies taken by others UEs in S∖{i}, the mixed strategies is a combination of i and −i .

ixed strategy Nash equilibrium
Definition 1: A mixed strategy profile σ* is called Nash equilibrium (MSNE) of the game Γ if: The MSNE is reached when no player in the game can maximize its payoff by unilaterally changing his strategy when the other players keep theirs' unchanged.
Theorem (Nash 1950): Every finite n-player game in strategic form has a mixed strategy Nash equilibrium.
According to Nash theorem, there is at least a mixed strategy equilibrium for any game in a strategic form. The MSNE s* is mathematically represented by the combination of the optimal strategies for all the players in the game: With The MSNE represents the optimal resource blocks and power allocation of all the devices in the cell.

PROPOSED MSJUD APPROACH
We consider a cellular network with C cellular and D D2D pairs sharing the same bandwidth. During the pair discovery step, the eNodeB broadcasts the players' coordinates. Accordingly, a DUE can calculate the interference sensed by each cellular link and co-channel D2D pair, and can easily derive the SINR combinations from the distances and the random power distributions. This information serves to develop the strategic form of the game.

Mathematical formulation
The strategic form of the proposed n-person non-cooperative game is represented by: Where N denotes the player's number, s i is the pure strategies space and u i is the utility function of the ith player.
The number of possible pure strategies in the game is P = C × D and the space of the pure strategies combinations is: All the possible pure strategies combination can be grouped in the MP c s matrix as follows: For instance, suppose a cell with four D2D pairs and three CUEs. D = 4 and s 1 = s 2 = s 3 = s 4 = 3. So, the number of the pure strategies is C × D = 12 and the number of the pure strategies combinations is 2C D = 81.
The pure strategy combination matrix MP c s is P c s × D, so the ith player can calculate its payoff according to each combination.
MU c s characterises the payoff matrix of all the players in the game, it is a (P c s × D) matrix, and u i j represents the payoff of the ith player at the j th strategy considering the other players' strategies.
By the same reasoning, we investigate the MSNERA and MSNEPow. The first is a (C × D) matrix. It represents the optimal RB allocation of all the devices in the game, and the second provides their optimal transmission power.
The ith column of the final MSNE matrix represents the optimal RB allocation i of the ith D2D pair. It represents the optimal probability assigned to the j th pure strategy played by the ith D2D.
The sum of the optimal mixed strategies associated to a players' pure strategy combination: ∑ P k=1 i k = 1. Let u i max ( i , − i ) be the optimal payoff of the ith D2D pair when fixing other players strategies at −i .
Theorem: A necessary and sufficient condition for σ to be a Nash equilibrium of the game Γ is to be an optimal solution of the following minimization problem: subject to:

MSJUD Algorithm
The following algorithm summarizes the proposed approach. It requires only the devices coordinates and the random power distribution to calculate the MU c s and perform the equilibrium. Every time slot, new CUEs and D2D pairs can access the cell, and play the game.
All the active DUEs compete for the best RBs. The eNodeB broadcasts the locations and RB allocations of CUEs to D2D in the pair discovery step.
The algorithm iterates until reaching the equilibrium. For any j th D2D pair and at each iteration, the resolution of the utility function provides the optimal resource allocation.
The variable n represents the DUE index. It is initialized to 1 and incremented when the j th DUE payoff converges to its optimal strategy. it i is the iteration index, it is also initialized to 1. it max represents the maximum iterations needed to ensure the convergence. However, if the optimal SE is reached at it < it max , the loop terminates and the algorithm moves to the next D2D pair. Δ = 10 −3 is used to compare the difference between the EE obtained at the ith iteration and the previous one. If the difference is less than Δ for all the devices; the algorithm reached the convergence.
According to this process, the optimal payoff matrixes of the cellular and the D2D pairs payo f fRACell and payo f fRAD2D are obtained. MSNERA represents the optimal resource allocation matrix, u h pow is the power utility function of the hth player and MSNEPow[h] is its optimal transmission power.
The proposed distributed algorithm allocates the RBs that maximize the SE/EE trade-off. It operates with a sequential UE access.
After executing the algorithm on a set of random PPP distributions, an average of seven iterations ensures the convergence. The total time taken from generating the random cellular environment to obtain the final MSNERA and MSNEPow matrices is 39.669342 s.
The explanation and the proof of the Nash equilibrium are detailed in Appendix D.

Second Stage MSJUD:
For h = 1 to N do

ANALYTICAL AND SIMULATION RESULTS
In this section, we used "Matlab" to analyse the performance of the proposed algorithm. The system model is formed by a 500 m cell radius and a set of UEs randomly scattered inside. We simulate the geometric distribution of both cellular and DUEs through Poisson point process (PPP) with intensities c = C and d = D respectively. After the peer discovery step, the close DUEs can form D2D pairs and communicate directly via the cellular RBs. The channel gain coefficient of all the links is small-scale Rayleigh fading with path loss and log-normal shadowing. We used this formula to calculate the gain: where h is a complex Gaussian channel coefficient satisfying h ≈ CN(0, 1) and = 4 is the path-loss exponent for multipath fading and shadowing. Figure 2 shows the locations of D = 18 D2D pair and C = 10 cellular UEs generated in one simulation. The maximum distance between a D2D transmitter and receiver is 30 m. The D2D transmitters are represented by filled red dots, the receivers by empty circles and CUEs are drawn with blue diamonds. Each UE is identified by its type and id; C: cellular UE, T: D2D transmitter, and R: D2D receiver. From the random PPP distribution, we can easily derive the distances between a D2D transmitter and receiver and between a D2D pair and any CUE. Initially, the D2D pairs allocate random resource blocks and transmission power, as shown in Appendix A, Table A1. The number of RB allocated simultaneously to a D2D link can reach 2 × C.
However, to give clear insights about our mathematical formulation, we set RB_ max = 6. It means a D2D can reuse simultaneously six uplink/downlink channels.
Accordingly, we obtain a total number of P = 108 pure strategy combinations representing all the possible D2D allocations of the uplink/downlink channels.
Through the objective function Equation (16) and based on the mixed strategy combination (MP c s ), we derive the MU c s matrix. It represents the finite strategic form of the game. Next, we execute our MSJUD algorithm.

FIGURE 3
The SE corresponding to the iteration number Appendix B, Table B1 represents the optimal cellular RBs obtained through the proposed MSJUD algorithm. We observe that the D2D links change their random RBs and power allocation to select the optimal ones obtained at the equilibrium of the game.
Appendix C Table C1 represents the MSNERA matrix. It shows the probability of the RBs allocation corresponding to each D2D pair. Contrary to the pure strategy approaches, where every D2D is allocated only one RB, the proposed algorithm calculates the optimal probabilities assigned to each D2D link.
Take the example of the fifth D2D pair. Its optimal RB allocation derived from the Appendix C Table C1 is: (0.2971, 0.2695, 0.2183, 0.1739, 0.03080, 0.0032) That is it should divide its data according to these probabilities before routing it via the optimal RBs represented in Appendix B Table B1.
At the equilibrium, each D2D pair is allocated the RBs occupied by cellular and co-channel D2D devices quite far from it. These results match very well with the initial objective functions in Equations (15) and (16); The SE/EE requirement is satisfied when the signal channel gain is larger than the interference gain, and when the D2D transmitter and receiver are close to each other but far from the other interference sources.
The same table shows a remarkable increase in the spectrum utilization rate. The active D2D pairs allocate most of the RBs. Accordingly, more traffic is offloaded from the eNodeB, and the network capacity is greatly enhanced. Figure 3 reveals the performance characteristics of the proposed approach. It represents the normalized average EE corresponding to the game iteration numbers. We compare the results obtained through the "second stage MSJUD" including the interference and the power control steps, "The first stage MSJUD", the distributed pure strategy EE (labelled as "pure strategy UL/DL"), and the random power allocation (named "Random RA").
The results are averaged through 1000 Monte Carlo simulations and normalized by the maximum value. The average EE of the proposed MSJUD algorithm converges to 0.57 after only seven iterations, whereas the spectralefficient stage converges to 0.49. The EE pure strategy algorithm converges to 0.37 after 14 iterations, and the random RA algorithm fluctuates around 0.26.
The proposed two-stage algorithm significantly outperforms the other approaches. The random algorithm provides the worst EE performance as the RA and transmission powers are randomly selected. The first stage MSJUD algorithm displays mean results because each UE is self-interested and wants to maximize its SE rather than EE. Furthermore, the power consumption is ignored in the optimization process. The pure strategy EE algorithm performs less than the first stage MSJUD as a D2D pair can be allocated only one RB.
The pure strategy and auction games require additional transformation in the utility function to reach the Nash equilibrium. They present convergence problems especially with the increase of the players number, and often exceed the acceptable convergence time, while the eNodeB needs to make RBs decisions every time slot. Figure 4 displays the averaged energy efficiency/spectral efficiency trade-off corresponding to four resource allocation schemes; the first stage MSJUD, the second stage MSJUD, the distributed pure strategy EE, and the random power allocation.
Again, the proposed two-stage MSJUD provides the highest EE/SE results. The first stage MSJUD selects only the optimal uplink and downlink channels that maximize the spectral efficiency without considering the power control.
The pure strategy EE approach performs worse than the first stage MSJUD because the EE gain achieved by decreasing the transmission power is not always able to compensate for the interference caused by co-channel cellular UE, especially when the DUEs reduce their transmission power.
Simulation results show that there is always a maximum SE value that bound the highest achievable EE.
In all the presented schemes, the EE gain achieved by the SE increase was not able to compensate for the EE loss caused by the transmission power increase; the EE increases correspond-

FIGURE 5
The EE corresponding to the transmitter power ing to the SE rise. Then, it decreases as the devices require more energy to route the additional data. Figure 5 displays the averaged energy efficiency corresponding to the DUEs transmission power. In all the presented approaches, the EE firstly increases with the increase of the transmission power, then decreases after a maximum threshold t. t = 9 dBm for the proposed MSJUD, t = 19.5 dBm, t = 19.7 dBm, and t = 20 dBm for the first stage MSJUD, the pure strategy EE, and the random algorithm. The EE decrease is due to the co-channel interference resulted from the co-channel devices power.
The same figure confirms that the proposed MSJUD scheme outperforms the others in terms of SE and EE. Again the first stage MSJUD exceeds the pure strategy power control scheme. This performance is obtained from the significant SE gain obtained through the cellular resource reuse.
This assumption matches very well with the EE formula given in Equation (13); the SE is the numerator of the EE formula.
The random strategy provides the worst results because both the resource and power allocation are performed arbitrarily. The same figure reveals that low CUE transmission power can always improve the D2D SINR. Therefore, when serving D2D users, the algorithm considers both the CUE transmission power and location. Figure 6 presents the average EE corresponding to the D2D pair distance. In both uplink and downlink transmission, there is no interference between CUEs due to the orthogonality. We observe from Figures 7 and 8 that higher CUEs and eNodeB powers cause large interferences to the residual D2D pairs.
When we fix the eNodeB power in the downlink channels, the less the distance between a CUE and the eNodeB, the higher is the interference sensed by the D2D pairs sharing the same cellular RB.
To transmit the uplink data to the eNodeB with the required QoS, far CUEs must transmit with higher power. Accordingly, the interferences to co-channel D2D pairs increase, and their SINR decrease as displayed in Figure 7.

FIGURE 6
The EE corresponding to the D2D pair distance

FIGURE 7
The uplink SINR of D2D pairs corresponding to their location in the cell When considering a joint uplink/downlink D2D resource allocation, the optimal scheme provides the best agreement between the SINR and the devices' consumed power.
In the downlink scheme, as plotted in Figure 8, the interferences sensed by a DUE is generated from the eNodeB and the other co-channel D2D pairs. Therefore, the closer the cochannel CUE is to the eNodeB, the stronger the signal it will receive from it, and the lower the co-channel D2D SE/EE. This degradation further rises if the D2D pair is close to the CUE. That is why in the equilibrium (see Appendix B) DUEs around the cell centre share downlink RBs with cellular UEs far from them and almost near to the cell edge. For instance, (D5,C4), (D6,C8), (D11,C5), (D17,C1).
Many co-channel D2D pairs and large distances between the transmitter and receiver degrade the SE. For this reason, the MSJUD algorithm equilibrates the D2D RA load between far Inversely, in the uplink, D2D select the RBs occupied by CUEs close to the eNodeB. The CUEs apply the same reasoning. Due to the initial random power distribution, there are high and low power devices. Low power CUEs, especially those located at the cell-edge, may be subjected to interference from high power co-channel D2D pairs and vice versa.
In the uplink, the CUEs need more spectra and power to route their data to the eNodeB. For this reason, at the equilibrium, the proposed algorithm reduces the number of co-channel D2D and their transmission powers. The downlink subcarriers can support more D2D links than the uplink.
CUEs at the edge did not share their RBs with close D2D pairs. MSJUD allows only far and low power DUEs to share these RBs.
As expected, in the uplink channels, a CUE closer to the eNodeB has a better SE than one further. When a D2D communication uses cellular uplink resources, the transmit power of DUEs influences both CUEs and co-channel D2D pairs significantly. This problem increases the interference to all the other UEs. Other parameters as the co-channel DUEs and distance constraints impact the D2D performance similarly to the downlink case. The second stage MSJUD was implemented to limit the transmit power of DUEs.
Both the obtained analytical and simulation results confirm the performance of the MSJUD algorithm. It provides a spectrum reuse rate above 90% and satisfies the main requirements of 5G networks.

CONCLUSION
This paper presents a new approach to jointly optimize the uplink and downlink resource allocation of D2D communications underlaying 5G networks. We modelled the optimization problem as a mixed strategy non-cooperative game. Simulation results demonstrate good spectral and energy performances with a relatively low convergence time.
At the same, the outputs of the proposed algorithm present many perspectives for future work. We considered a traditional homogenous network in which all the users have the same utility function. However, the more challenging issues in a 5G network are the heterogeneous networks (Het-Nets) densification and the devices' mobility. So, more work is needed to integrate these criteria in the utility function.
Moreover, the eNodeB transmits at a high power and generates significant interference to the close D2D communications. Meanwhile, many devices are experiencing inferior channel conditions at the cell edge. As a solution, we propose to proceed with a guard area around the eNodeB and suggest extending the coverage with various relay nodes.    (15) and (16) are non-convex. Yet, they can be transformed into concave functions by using the non-linear fractional programming developed in [25]. Take the example of the ith D2D pair. We define its maximum payoff as:

ORCID
b i is the best response of the ith D2D transmitter given the other UEs' strategies. It represents the maximum payoff that can be reached by this player.
Here the sense of optimization for each player is to maximize his payoff when others are playing by their Nash equilibrium strategies. Consequently, a player denoted i attempt to minimize the gap between its optimal payoff b i and the payoff obtained by a possible mixed strategies combination. The following theorem can be proved: This theorem shows that the transformed problem with an objective function in subtractive form is equivalent to the nonconvex problem in fractional form, that is they lead to the same optimum solution * .
The optimization problem of this player can be modelled as follows Where ( −i , s i j ) denotes the mixed strategies combination where player i plays with his j th pure strategy, that is the probability assigned to the j th pure strategy is one.
(P i ) is a non-linear programming problem. Applying the Karush-Kuhn-Tucker (KKT) optimality conditions, we can obtain a local optimum satisfying the KKT first order necessary and sufficient conditions, and derive the solution.
Lemma 1: A necessary and sufficient condition for to be a Nash equilibrium of the game G is: If such exists then it is nothing then the optimal solution of the non-linear programming problems (P i ) for i ∈ N, with global optimal value equals zero.
In the rest, we show that the Nash equilibrium strategy is an optimal solution of a single optimization problem.
Theorem 2: A necessary and sufficient condition for to be Nash equilibrium of game G is that it is an optimal solution of the following minimization problem: The optimal value of this problem is 0, and b i at the optimal point gives the expected payoff of the player i.
Proof of Theorem 2: In light of Lemma 1, the feasible set of (P i ) is non-empty as for every finite non-cooperative game at least a Nash equilibrium exists.
Thus, if * is a Nash equilibrium it is feasible for (P i ): Yielding that * is an optimal solution of (P i ). Conversely, suppose that ( * , b * 1 , … , b * n ) is an optimal solution of (P i ) then it satisfies (a), (b) and (c) of Equation (D3).
With the existence of Nash theorem, there must exist at least one ( , b 1 , … , b n ) to be a global minimum for (P i ), verifying: Consequently * is a Nash equilibrium of the game. On account of Lemma 1 the payoff b * i is obviously the optimal expected payoff of the player i.
The computation of the solution in the formulated n-person non-cooperative finite game is equivalent to the non-linear optimization problem (P i ). Their constraints and objective functions are polynomials, and the number of its variables is equal to the sum of players' numbers and the total number of available pure strategies in the game. To solve this problem, we coded a sub algorithm with MATLAB. We used the sequential quadratic programming based quasi Newton method [40,42].
The input of the sub algorithm is the matrix M representing the u i j payoffs.
And the outputs are the x i variables.