Fundamental Limits of Non-Orthogonal Multiple Access (NOMA) for the Massive Gaussian Broadcast Channel in Finite Block-Length

Superposition coding (SC) has been known to be capacity-achieving for the Gaussian memoryless broadcast channel for more than 30 years. However, SC regained interest in the context of non-orthogonal multiple access (NOMA) in 5G. From an information theory point of view, SC is capacity-achieving in the broadcast Gaussian channel, even when the number of users tends to infinity. However, using SC has two drawbacks: the decoder complexity increases drastically with the number of simultaneous receivers, and the latency is unbounded since SC is optimal only in the asymptotic regime. To evaluate these effects quantitatively in terms of fundamental limits, we introduce a finite time transmission constraint imposed at the base station, and we evaluate fundamental trade-offs between the maximal number of superposed users, the coding block-length and the block error probability. The energy efficiency loss due to these constraints is evaluated analytically and by simulation. Orthogonal sharing appears to outperform SC for hard delay constraints (equivalent to short block-length) and in low spectral efficiency regime (below one bit per channel use). These results are obtained by the association of stochastic geometry and finite block-length information theory.


Introduction
The Internet of things (IoT), connecting objects instead of humans, is one of the major applications of 5G and future generations of communications systems. Moreover, the transition towards machine to machine communications induces an important shift from the theoretical modeling of these systems. Indeed, the IoT paradigm relies on bursty but massive distributed communications to comply with the transmission requests of billions of communicating objects spread over a large area, while transmitting only few packets per day, month or even per year. In such a scenario, the classical fundamental limits of communication systems derived using the tools introduced by Claude E. Shannon [1] need to be revised. From this perspective the capacity, or the capacity region in case of multi-user communications, becomes less important in regard to other metrics [2,3].
The seminal Shannon's second theorem established the capacity in additive white Gaussian noise (AWGN) channel, which can also be expressed as the fundamental tradeoff between energy efficiency (EE) η E and spectral efficiency (SE) η S [4]. Let us define η S : = R/W where R and W are, respectively, the rate in bits/s and the channel bandwidth in Hz, respectively. In an ideal system, the number of channel uses (symbols in a narrowband transmission) is 1/W and η S can be expressed in bits per channel use (bpcu). Now, letting the energy E be normalized with respect to the noise power density N 0 , the energy 2 of 21 efficiency is defined as η E : = RT E/N 0 , where T corresponds to the transmission duration. By using the definition of η S , the following relation holds η E = η S σ 2 P , with the average power P = E/T and σ 2 = W N 0 . Therefore, the energy efficiency can be alternatively thought as a power efficiency metric. In the following, by a slight abuse of notation, we will give to η E the dimension of bits per relative power unit (bppu).
Using this notation, Shannon's channel coding theorem can be written as follows: This trade-off is achievable only in the asymptotic regime, i.e., when the encoding time spreads over an infinite number of channel uses (c.u.).
This asymptotic result relies on two assumptions which are no longer valid in the context of the IoT paradigm: • the traffic is characterized by a continuous data flow; • the encoding length is over an infinite number of channel uses and, hence, without any latency constraint.
While modeling IoT packets consisting of a few information bits under ultralow latency constraint (ULLC), the asymptotic regime becomes irrelevant. The first attempts to derive fundamental limits in the non-asymptotic regime dates back to Feinstein and Shannon in the 1950's [5,6]. They provided an achievability bound on the rate considering maximal and average decoding error probability, respectively. A refinement on these results including cost constraints on codewords has been provided by Gallager [7]. The problem of achievability and converse bounds on the rate in the finite block-length (FBL) regime has recently received a renewed interest with the work of Polyanskiy et al. [8], who studied the fundamental limits of the point-to-point AWGN channel. This has paved the way to study latency and reliability constraints from a fundamental point of view. One of the major results in [8] is the asymptotic expansion of the achievable rate R where C(γ) = 1 2 log 2 (1 + γ) is the channel capacity in bpcu per dimension under the signalto-noise ratio γ, V(γ) = γ(γ+2) 2(γ+1) 2 (log 2 e) 2 is the channel dispersion defined as the variance of the information density between emitted and received codewords, n the number of c.u., being the error probability (average or maximal), and Q(x) = 1/ √ 2π ∞ x exp(−y 2 /2)dy. This work has been extended to the multi-antenna case and fading channels in [9][10][11].
The work initiated in [8] has an impact far beyond theory and is of great interest for practical specifications for IoT networks, since the expression in Equation (2) links three fundamental constraints, identified as critical for IoT; i.e., reliability, latency or spectral efficiency, and energy efficiency [12].

Multi-User Finite Block-Length: State of the Art
Information theory has been proved to be a powerful tool to establish fundamental limits of point-to-point (P2P) or multi-user communication systems including the multiple access channel (MAC) and the broadcast channel (BC). These models fit well with the uplink and the downlink in a wireless cell, respectively. In the asymptotic regime, the exact characterization of achievable rate regions has been obtained [2].
According to [13], Gaussian MAC and BC capacity regions are dual of each other, under transferable power hypothesis. This hypothesis means that in the uplink, the sumpower is constrained but not the individual powers. This MAC with transferable power represents an optimistic model, but guarantees that BC bounds constitute outer bounds for the MAC, while providing a more tractable expression.
FBL information theory has been initially extended to MAC and BC scenarios in [14]. Gaussian MAC has been particularly investigated in [15,16] among others, and achievable rate regions have been characterized. Interesting results can also be reported on the dispersion of Gaussian BC in [17]. Unsal et al. have also investigated the Gaussian BC dispersion with superposition coding (SC) [18] leading to an achievable bound. However, several issues may limit the applicability of these bounds to the IoT context starting by the decoding error probability definition, which is often a joint probability and thus not suitable for a massive connectivity in an IoT context. Moreover, the achievability bound defined with joint-rate region also limits the insights that an IoT operator may extract from these expressions, and existing results are often limited in the number of users considered.
Fundamental bounds with many MAC users have recently been investigated from complementary perspectives. The authors in [19] gave bounds on the joint decoding error probability and capacity region when the number of users grows exponentially or subexponentially fast with respect to the number of c.u. and when their communications are asynchronous. The asynchronism and the number of users are linked exponentially with the number of c.u. The main conclusion is that reliable transmission (i.e., vanishing error probability) is impossible when the asynchronism is much more important than the number of users, but it remains possible when the number of users is sub-exponential with respect to the number of c.u. However, the authors focused on joint decoding error probability and typicality-based decoders. The authors of [20] studied a similar problem to that in [19] but when the number of users K grows linearly with the number of c.u. n, i.e., K = µn where µ is the user density. Moreover, the authors considered the per-user decoding error probability criterion, which is a much more relevant metric than the joint decoding error probability, when the number of users is large. The authors gave achievable and converse bounds on the minimal energy per bit for which reliable communication is possible; i.e., vanishing error probability, in the many-user MAC. However, they did not consider second-order expansions as introduced by [8] for Gaussian channels. In [21] the authors defined the many-access channel, which considers a large number of users in the MAC, and they studied the performance when the number of users grows. This work has been further extended in [22] and provided a fundamental limit for the sum-rate. However this model is not connected to the radio cell physical parameters. In [23], the authors explored the fundamental limit of the massive access, taking into account random packet arrivals and decoding error probabilities. The model is quite realistic and complementary to our work because the finite block-length regime is not considered, and specific random access policies are evaluated. The impact of random policies has also been investigated in [24], where the information aging is controlled.
Compared to these contributions, our work introduces the use of the spatial continuum broadcast and multiple access Gaussian channels (SCBC and SCMAC) [25] to model a spatial density of users and physical channel parameters, associated with the finite blocklength analysis, to introduce latency and decoding error probability constraints. This model allows one to obtain an achievable bound for the symmetric rate case in the finite block-length regime.

Contributions and Related Work
In the context of IoT, the dense deployment of a large number of nodes in a finite area implies reconsidering the BC/MAC with a spatial distribution of the nodes leading to SCBC/SCMAC models [25,26], well adapted to represent NOMA cellular systems. This new model provided the fundamental EE-SE trade-off under equal-rate conditions in the asymptotic regime. This trade-off can be interpreted as an equivalent of the asymptotic Shannon capacity for a wireless cell with an ultra-dense distribution of users, when every user requests the same rate. The minimal power requested to satisfy a continuum of users using NOMA has been derived with SC in the asymptotic regime.
While these results provide interesting insights, on the maximal load of dense cells, latency and reliability were kept off the study. Hence, the critical question for IoT networks relates to estimating the price of latency and reliability constraints. As we shall see using FBL formalism, latency and reliability constraints come essentially at the cost of a reduction in the EE-SE region.
To measure this cost, two complementary issues are investigated in this paper. Firstly, to avoid the infinite time transmission induced by the asymptotic regime, a finite time constraint is introduced in the model, following [27] where the idea was introduced. The formal proof is provided with a discussion on its tightness based on simulation results. Secondly, transmission errors associated to transmitting small packets over finite time slots are modeled rigorously in the FBL regime. These two contributions allow to establish an achievable latency-reliability trade-off.
The core of our contribution lies in deriving the minimal requested power to serve a large number of users when the number of channel uses does not tend to infinity. That is, only a finite number of users can be superposed contrarily to [25,26]. This approach introduces a scheduling problem that can be reduced to a simpler splitting problem. Moreover, we show that for a given number of superposition levels, i.e., a finite number of splits of the cell, the scheduling order for two users belonging to the same level does not have any influence on the minimal requested power to serve the requested rate density.
The remaining of the paper is organized as follows. Section 2 introduces notations and the system model. Section 3 reviews our previous results on asymptotic SCBC. In Section 4, a finite time constraint (FTT) is introduced. Section 5 deals with decoding error probabilities associated with the FBL regime and estimates the impact on the achievable EE-SE trade-off. Finally, Section 6 draws conclusions and future works.

System Model
The model described below was first presented in [25].

Model and Parameters
A unique cell area denoted by Ω ⊂ R 2 is served by a unique base station (BS). In this paper, a unique cell is considered without inter-cell interference. For a multi-cells extension, the reader can be referred to [28] where the inter-cell interference from a Poisson point process was considered, or to [29] where a fluid model was used. Further, the impact of cell geometry distribution has been explored in [30], but the association of the spatial continuum multi-cell and FBL is beyond the scope of this paper. (Ω, A, m) denotes the corresponding measurable space with A the Borel σ− algebra in Ω and m the Lebesgue measure. Without loss of generality, the BS is assumed to be located at point (0, 0).
The measurable space (Ω, A, m) can be extended to (Ω × T , A , m ) where T = R + represents the time and with A the Borel σ− algebra in Ω × T and m the associated Lebesgue measure.
Let U(x, t) be defined as a Poisson point process (PPP) on Ω × T , which represents the packet request arrivals at position x and time t. Thanks to the stationary properties of PPP, for any subset B ∈ A, the average number of user requests per time unit for one realizationũ(x, t) is The global average number of requests per time unit associated with the whole cell service area Ω is denoted by U T .
Definition 1 (Requested rate density). The requested rate density ρ : Ω → R is a Borel measurable function that represents the information rate spatial density ρ(x) requested at point x.
The rate density is expressed in bit-per-channel-use (bpcu) per m 2 . A quantity measured in bpcu can be indeed converted into a physical rate for a real system using the number of channel uses per time unit, which relies on system parameters such as band-width, slots or frames. Clearly, a channel use can be interpreted as a resource element (RE) in an orthogonal frequency division multiple access system. However, its meaning is more general and may also correspond to one channel unit in any other access technology.
In addition, we are interested in this paper with the transmission of small packets. The first idea behind is the transmission of time-constrained small information quantities with no recurrent flows. This aspect is conventional in most papers related to the massive access for IoT. The second idea that relies on an information theory view considers that one packet is transmitted within a small number of channel uses (typically less than few hundreds). Under this assumption, the classical asymptotic regime used in information theory (e.g., [23,24,31]) does not hold, and the finite block-length regime needs to be used [19][20][21][22]. This constraint increases the difficulty of the mathematical analysis, but it also gives access to the latency versus reliability fundamental trade-off.
For the sake of simplicity, we further assume that all packets transport the same information quantity (in bits), denoted by I 0 . This scenario is referred as the symmetric information, by analogy with the widely used property called symmetric rates in information theory. This assumption allows to keep the mathematical model tractable and is reasonable in many IoT applications.
Under this assumption the requested rate density of the cell relies directly on the user spatial density: where N cu is the number of channel uses per time unit and represents the bandwidth allocated to the system. The cell sum-rate per channel use is called the spectral efficiency (SE) of the cell: For the symmetric information scenario, one has η s = I 0 U T N cu . In order to connect rate estimates to physical parameters of the cell, let be defined the equivalent noise, as the virtual noise level referenced back to the BS, where it matters for power allocation.
Definition 2 (Equivalent noise distribution). In a given radio cell in the downlink, for any receiver located at position x, the equivalent noise power is given by where σ 2 is the receiver noise power, and g(x) is the channel power gain associated with this position.
Without any fading nor shadowing, the maximal equivalent noise ν M is obtained at the cell edge, while the minimal equivalent noise ν m is obtained in the near field of the BS. Note that shadowing and fading are removed from the analysis for not to clutter the main output of the study, but the latter is general and can be easily extended considering fading and shadowing.
The requested rate density ρ(x) is distributed with respect to the equivalent noise associated with each request. Consider the following functions: where 1[.] is the indicator function.
G ν (ν) represents the probability with which a packet request is made with an equivalent noise above ν (the most noisy requests). This is nothing but the complementary cumulative density function (ccdf) of the equivalent noise, with respect to the rate requests. Its derivation is therefore the probability density function (pdf) of ν with respect to the rate distribution. The meaning of f ν (ν) and G ν (ν) is illustrated in Figure 1 for a circular cell. These definitions provide the key elements to characterize the set of rate distributions that are achievable under some power, latency, spectral efficiency and energy efficiency constraints.

Reference Scenario
Despite the fact that the model is general, the analytical results will be illustrated on a simplified reference scenario, for the sake of clarity, herein described.
The unique cell covers a disk of radius R c in the downlink mode. Simple power-law pathloss and omnidirectional antennas are considered with no shadowing. Hence, the channel gain is written as where g 0 and α represent, respectively, a reference pathloss and the attenuation slope. |x| is the geometric distance of the point x to the BS in (0, 0) . For numerical results, the following values are used: α = 3.65 and g 0 = σ 2 . Additionally, the transmission power is constrained by a maximal power, i.e., P ≤ P M . The rate demand is uniformly distributed, i.e., ρ(x) = ρ 0 ∀x ∈ Ω which relies on two assumptions: the symmetric rate hypothesis and a uniform spatial distribution of requests (u(x) = u 0 ). It follows that η S = m(Ω)ρ 0 = πR 2 c ρ 0 . Under these assumptions, the equivalent noise distribution introduced in the former section is given by: with ν c = σ 2 g 0 R α c the equivalent noise at the cell edge. The reader may refer to [25] for technical details.

Superposition Coding
SC is capacity achieving for Gaussian BC with successive interference cancellation (SIC) [32]. For a given set of N u users ordered according to their channel quality, i.e., from the strongest to the weakest, user 1 can only decode its own signal after having decoded the signals sent to users 2 to N u , user 2 decodes its own signal after having decoded signals from users 3 to N u and so on. To make the rest of the paper clear, the main steps of SC are herein reviewed.
For two coded messages of length n, X n 1 and X n 2 , assumed to be randomly drawn according to two independent distributions, i.e., P n X 1 and P n X 2 , with average powers P 1 and P 2 respectively, the following holds. All decoding steps are done in an equivalent Gaussian channel, where Z n i (k) ∼ N 0, σ 2 I d , ∀i ∈ {1, 2}, and according to the following: • The second user, with the largest equivalent noise, decodes its own signal in a Gaussian channel given by The normalized version of this equation is given by For this receiver the power of the equivalent additive Gaussian noise is P 1 + ν 2 , and its maximum achievable rate in the asymptotic regime, i.e., n → ∞, is C P 2 P 1 +ν 2 . • The first user, with the smallest equivalent noise, has two decoding iterations. It first has to decode the second user information in the following channel: with the normalized version given by For this receiver the power of the additive Gaussian noise is P 1 + ν 1 , and the achievable data rate is C P 2 P 1 +ν 1 . Then, after canceling the second user signal, receiver one decodes its own signal: and can achieve its full achievable data rate, i.e., C P 1 ν 1 , in the asymptotic regime.

Fundamental Trade-Off with SC
When SC is used, BS waits for a time T to aggregate a set of packet requests that are transmitted in the next slot to the corresponding nodes in n channel uses. Under no latency constraint, the time T can be taken arbitrarily large allowing to verify n → ∞, corresponding to the asymptotic regime. The study of this regime leads to the access capacity region defined in [25] as the set of rate spatial distributions ρ(x) for which an encoder-decoder pair exists such that the transmission error tends to 0 when T tends to infinity.
In comparison with Shannon's asymptotic regime, our model adds a complementary parameter: when n → ∞, the number of randomly distributed nodes (each node represents a message request) tends to infinity. Since we do not consider individual rates, but individual fixed information quantities I 0 , the sum-rate converges to U T I 0 while individual rates tend to 0 as they are equal to I 0 /n. This is illustrated in Figure 2. It is worth noting that a cell's sum-rate tends to its average spectral efficiency. In [25], based on an iterative splitting process, the maximal sum-rate the cell can achieve when a continuum of users is considered has been established. The corresponding fundamental limit is expressed as Theorem 1 (GSCBC fundamental limit [25]). The achievable EE-SE trade-off for a given rate spatial density ρ(x) is given by where a = 2 log(2) and η E is the energy efficiency. Figure 2. The asymptotic regime is obtained at the limit when n → ∞. The cell spectral efficiency is kept constant, but the number of packets transmitted simultaneously tends to infinity. Each packet I 0 is spread over n channel uses, and the individual spectral efficiency tends to 0.
This result can be applied to the reference scenario described in Section 2.2, using Equation (9) in Equation (16), which yields with 1 F 1 (a; b; x) the confluent hypergeometric function ( [33], Section 9.21).P m is the minimal transmission power required at BS to serve the rate spatial density ρ(x). The fundamental EE-SE limit of the corresponding cell is provided by Equation (17). Given the power normalized by the equivalent noise at the cell edge p r =P m /ν c , EE in bppu is defined as η E = η S /p r , leading to the fundamental EE-SE limit The EE should be understood as the total number of bits the base station can transmit under a transmission power constraint expressed as the relative sum-power received by an edge user. So, the term 1/η E plays for the symmetric SCBC the role of the classical E b /N 0 for a point-to-point link. Clearly, Equation (18) is equivalent to the Shannon's second theorem in Equation (1), for the symmetric SCBC. The symmetric SCBC capacity C(γ) is obtained by inverting Equation (17) with respect to η S and denoting by γ =P m ν M the SNR at the cell edge.

Fundamental Trade-Off with Orthogonal Sharing
A classical alternative to SC is to exploit orthogonal multiple access (OMA), e.g., by time division. In this case, to maximize the symmetric information, the BS allocates a fixed number of channel uses to each packet, and the transmission power adapts to constantly preserve the spectral efficiency. The transmission power used for a node in x is Lemma 1 (Achievable bounds with OMA). In a single cell under the spatial continuum model, the fundamental EE-SE trade-off achievable by orthogonal multiple access is Proof. See Appendix A.
The corresponding curve is given as a baseline in Figure 4 where the EE-SE limit is represented. The blue curve represents the EE-SE fundamental limit achievable with OMA with α = 3.65, and the red curve the fundamental limit established with NOMA (SC), plotted using Theorem 1.
To sum up, this section reported the fundamental EE-SE limit of the SCBC in the asymptotic regime as derived in [25]. Note that the asymptotic regime refers to a doubly asymptotic regime. Indeed, when n → ∞, it follows that → 0 but with the SCBC, the number of nodes transmitting simultaneously also tends to infinity with individual rates going to 0. However, the sum-rate converges to the SCBC capacity.

Finite Time Transmission Constrained Model
The objective of this paper is to introduce a transmission time constraint into the former model to obtain achievable bounds of NOMA under more realistic assumptions than the doubly asymptotic regime.
Consider the situation in which each packet has to be transmitted in a finite time T ∈ N, i.e., within a finite number of channel uses. For the moment, we still consider arbitrary low error probabilities, sustained with γ and n sufficiently high. This hypothesis will be relaxed in Section 5.

FTT Formulation
This constraint can be formalized as follows:

Definition 3 (Finite Time Transmission Constraint).
A multi-user network with packets of I 0 bits, with I 0 ∈ N, is said to be FTT constrained when each transmission lasts at most n * channel uses.
This definition only imposes that a packet of I 0 bits is transmitted in at most n * channel uses, but the queuing delay is not controlled. The FTT constraint is then a necessary but not sufficient condition for delay-constrained transmissions. The FTT constraint provides interesting insights anyway. For instance, it allows to setup the transmission duration of each packet, thereby controlling the activity time of each receiver.
To assess the symmetric rate fundamental limit of the cell under the FTT constraint, let us review that the average spectral efficiency (i.e., the sum-rate) of the cell notedη s shall be equal toη When the BS transmits a packet of I 0 bits in n * channel uses, the individual rate for this packet is η u = I 0 n * . In order to achieve the target spectral efficiencyη s , the BS has to use SC to transmit simultaneously several packets. Therefore, the FTT constrained problem is equivalent to the following scheduling problem: Definition 4 (SC Scheduling policy). The following are given: • a frame of N cu channel uses of duration T, itself divided into L slots, each slot s l ; ∀l ∈ {1, 2, . . . , L} contains n * channel uses. One has N cu = Ln * ; • a BS's queue containing a random number of packets to be transmitted to a set of nodes N U , selected according to the PPP U(x, t) restricted to the subset Ω × T. A SC scheduling policy selects a subset of users N u (l) ⊂ N U for each slot s l , which are ordered with their increasing equivalent noise, i.e., ν k+1 (l) > ν k (l). Decoding is performed at each user, according to the SC technique.
The number of users associated to each slot s l is noted N u (l) = |N u (l)|, and the corresponding spectral efficiency is

Optimal Scheduling Policy
We now propose to determine an optimal scheduling policy in the asymptotic regime.
Definition 5 (Optimal scheduling policy). A scheduling policy for the PPP U(x, t) over Ω × T is asymptotically optimal under a FTT constraint, if all user requests are served within n * channel uses at most, and if the transmission power is minimal over all possible scheduling policies, when T → ∞.
Note this asymptotic regime is conditioned on the FTT constraint and is, thus, more constrained than the regime studied in Section 3.
Let γ k (l) be the effective signal-to-interference-plus-noise-ratio (SINR) for node u k (l), defined as This SINR is effective when the appropriate decoding is used and thanks to the superposition coding principle.
The BS transmission power for slot l is then P m (l) = ∑ k P k (l). Then, in this symmetric rate setup, where all nodes require the same SINR, the following Lemma holds, where the slot numbering is omitted for the sake of clarity.
Proof. The proof relies on the decomposition of P k according to Equation (25), i.e., P k = ∑ k j=1 c(k, j) · ν j , where the c(k, j), represented in Table 1, are given by These coefficients c(k, j), can be computed recursively, with c(k, k) = γ * and c(k, k − 1) = γ * 2 , and the following recursion for j < k − 1: In Table 1, each row represents a decomposition of the power of one message, with respect to the equivalent noise ν(k) of all users. In parallel, the sum-power can be computed column-wise first, leading to with d(k) = ∑ N u i=k c(i, k). Using Equation (27), these coefficients can be straightforwardly rewritten as Then, with Equation (28), one obtains d(k) = γ * (1 + γ * ) N u −k leading to the final expression: Now, numbering the nodes in the reverse order (notedk for clarity), i.e., from the farthest to the nearest one, ends the proof.
This theorem shows that P m is a linear combination of the equivalent noises ν¯k weighted with d¯k. Then, each coefficient relies only on γ * and grows exponentially withk. The optimal strategy, which minimizes P m , should obviously allocate the users according to their channel quality.
We draw the reader's attention to the fact that thek-th term of the sum in Equation (26) should not be interpreted as the power used to transmit thek-th message but as the additional power induced by thek-th equivalent noise level in the sum-power. The power associated to each message is given by (25). Nevertheless, the linear relation in Equation (26) of Lemma 2 is more appropriate to demonstrate the optimality of the proposed scheduling policy.
Consider a set of L slots and a set of users N U requesting a message, with N U = |N U |. A scheduling policy associates N u (l) users for each slot s l . Let us recall that, according to our definition, the users u 1 (l) . . . u N u (l) (l) are ordered with respect to their equivalent noise. Using the notation of Lemma 2, we refer tok as the coding level. A message encoded at levelk means that the corresponding receiver needs to decode first the packets of lower level. Definition 6 (Natural ordering policy). Assume that N u (l) = N u , ∀l, and then N U = L × N u . This comes without loss of generality, as shown at the end.
Let now the nodes in N U be ordered from the strongest to the weakest equivalent noise, from u 1 to u N U . The natural ordering policy proceeds by assigning the users u 1 to u L to the first coding level over the L slots. Once the first coding level is filled out, the second level is filled and so on up to the last coding level. It follows that This scheduling policy is illustrated in Table 2.
Theorem 2 (Optimal scheduling). For a given set of users indexed by {1, . . . , N U } and ordered from the strongest to the weakest equivalent noise, the natural ordering policy is optimal with respect to the average transmission power.
Proof. From Lemma 2, it follows that the natural ordering must be used. The remaining question is about the repartition of the users through the different slots.
To prove that the natural ordering policy is optimal, let us consider another policy, for which one of the first L users noted u i is not allocated to the first coding level, but to the levelk i . Then, there exists a user u j allocated to the first coding level, such that j > L. Then, a simple permutation noted π(u i , u j ) is sufficient to reduce the sum power, since the equivalent noise of u j is lower than that of u i , and the power difference between the two policies is which is strictly negative. So starting from any policy, moving all the L first users with permutations to the first coding level reduces the sum-power. Then, proceeding the same way as the higher-order coding levels will also reduce the power. At the end of these permutations, each coding levelk contains the same users as the natural ordering policy.
It should be also noted that any permutation between two users at the same coding level does not change the sum power. Therefore, the natural ordering policy is one of the optimal ones. It is worth noting that when doing such a permutation, the individual power allocated to each message may change, but with no impact on the sum-power. This completes the proof.
Finally, if N U = L × N u is not verified, the same permutations can be used, and one obtains a policy where the last coding level is partially filled in. In this case, the number of levels (or superposition codes) is given by η S /η u .

Optimal Scheduler When T → ∞
According to the previous result, the optimal scheduler relies on transmitting at each round to exactly N u users with N u = η s η u . Let us assume this ratio to be an integer. If it is not the case, alternate rounds may be used with η s η u and η s η u + 1 users. The optimal scheduling policy is enforced by partitioning the cells in N u subsets of equivalent sum-rates:B where B¯k := x; ν(x) ∈ [ν¯k; ν¯k −1 ) , as illustrated in Figure 3 for a regular circular cell. The thresholds ν k are defined with ν 0 = ν M , ν N u = ν m , and such that |B¯k| = U T /N u ; ∀k with U T the total number of requests. Once this partition is done, at each slot, the BS picks up a user per subset and transmits to these users with SC. This scheduler achieves the minimal average power when T and L tend to infinity. Indeed, the partition B (∞) converges to a partition where all B¯k are of equal surface (due to the properties of the uniform PPP model), and the asymptotic average power is given bȳ which can be expanded asP whereν k stands for the average equivalent noise over thek-th subset.
Using the expression of d¯k and expanding γ * , one obtains The last approximation comes when the number of subsets is sufficiently large such that aη u << 1.
Interestingly, this result can be compared to the fundamental limit established in Theorem 1. These expressions are similar, except that the continuous integral has been replaced by a discrete sum, and the equivalent noise ν byν¯k. The term aη s G ν (t) in the exponential is replaced by its discrete version aη u k and f ν (t) by 1/N u . It is then straightforward to show that Theorem 1 is obtained as the limit of Equation (38) when N u tends to infinity, i.e., when the constraint n * → ∞, which proves the doubly asymptotic optimality of this scheduler.

Application Example
The former analytical results are applied to the reference scenario of Section 2.2 and represented in Figure 4 with the cross curves for different numbers of SC layers (indicated by N u ). Moreover, the path loss exponent is α = 3.65, and the reference path loss and the noise power are normalized, i.e., g 0 = σ 2 = 1. The two asymptotic curves are given in blue for OMA, i.e., with Lemma 1 and red for NOMA, i.e., with Theorem 1. The orange curve with diamonds corresponds to the 2-user NOMA. The green, cyan and magenta curves with square, stars or circles are obtained with 4, 8 and 16−user NOMA, respectively. These curves are obtained by applying Equation (37), where η u is simply the total spectral efficiency η S divided by the number of users, since all users receive the same amount of information. These curves highlight the performance loss when the number of superposed codes is equal or lower than 4. This model also shows that the fundamental limit established for NOMA in the asymptotic regime is almost achievable with a reasonable number of coding levels; 90% of the gain is achieved with only 4 coding levels, and even 30% of the gain is achieved with 2 coding levels.
In addition, the capacity of the cell is represented in Figure 5 in the same conditions. In both figures, one can remark the sub-optimality of OMA in the doubly asymptotic regime. Moreover, one can also observe the quick convergence to the optimal performance, i.e., the EE-SE Pareto front in Figure 4 and the asymptotic cell capacity in Figure 5, with the number of partitions of the cell.
Both figures highlight the interest of our fundamental limit given by Theorem 1, being almost achievable with a NOMA strategy.

Finite Block-Length (FBL) Constrained Model
The last step addressed in this section to compare practical NOMA schemes to the fundamental limit is to relax the error-free assumption to cope with the FBL regime, more appropriate for small packets.
We herein develop an approximation of the achievability bound with a NOMA scheme (SC) by exploiting the normal approximation derived in [8] for a point-to-point transmission and reviewed in Equation (2).
For a fixed number of bits I 0 to be transmitted, Equation (2) leads to which provides a relationship between P m (through γ), n and . Consider a 2-users BC before the generalization to N u -users BC. In the following, we denote i,j the decoding error probability of message j by user i.

Achievable Minimal Power for the 2-User Gaussian BC
Let us review that in our setup, the BS aims at transmitting two independent packets of I 0 bits each, to two users in Gaussian channels in at most n channel uses and with an average individual error probability lower than * for each user, i.e., i ≤ * , ∀i ∈ {1, · · · , N u }.
Considering the targeted rate is R * = I 0 /n and assuming the interference caused by the other users to be Gaussian, we can write for the weakest user: Contrary to the asymptotic situation described in Section 4, the target SNR value needs to be adapted for each user as a consequence of the SC technique.
Considering user 2, γ * 2 is obtained as the unique solution of Equation (40) for some tuple (n, * , I 0 ). The solution is unique because this equation is monotonically increasing with respect to γ. This imposes the following relation between P 1 and P 2 : Now considering user 1, it first decodes message 2 with a lower error noted 1,2 < * , because the SNR is stronger.
By the union bound, the decoding error probability of user 1 is bounded by the sum of the decoding errors associated with the two messages 1 ≤ 1,1 + 1,2 . Then, to keep a global error probability lower than * , the error probability on its intended message 1,1 , should satisfy 1,1 ≤ * − 1,2 .
(42) Therefore, the minimum required SNR for the strongest user, γ * 1 , is the solution of which is bigger than γ * 2 because the error constraint is stronger. Solving these equations provides the minimal transmission powers P 1 and P 2 as Although an analytic expression cannot be written, numerical computation is straightforward.

Impact of the Power Sharing between P 1 and P 2
In Equation (40), we determined the minimal power allowing to achieve the error target on user 2. However, the use of a larger power P 2 could be justified from a theoretical point of view, since it would allow to reduce 1,2 , then allowing to reduce P 1 as the solution to Equation (43). The influence of reducing 2 on the sum-power is illustrated in Figure 6 for the simulation parameters described in Section 2.2 and for a target individual error probability * = 10 −3 . The sum-power is plotted for different information size and blocklengths ((I 0 = 40, n = 100), and (I 0 = 400, n = 1000)). Each curve is obtained when the users are positioned at distance r 1 and r 2 from the BS.
The reference solution obtained with 2 = * is on the right of each plot (indicated with a plain circle). A sum-power reduction by increasing P 2 and reducing P 1 exists but is significant only when r 1 /r 2 approaches 1. Clearly, when SC is used for users with significantly different positions, the reference solution is nearly optimal. This is justified because when the SNRs of the two users are sufficiently different, then 1,2 << 2 = * ; therefore, the impact of 1,2 in Equation (43) Figure 6. Minimal sum-power P m = P 1 + P 2 for the 2-user Gaussian BC with SC, for different ratios between BS-user distances r 1 /r 2 and with respect to the decoding error probability of user 2 2 . The plain circles indicate the reference solution when 2 = * . Reducing 2 induces an increase in P 2 but permits a decrease in P 1 . When the reduction in P 1 is larger than the increase in P 2 , a power gain can be obtained. This gain is more significant for small packets. Note that the vertical scale has been magnified to highlight the sum-power variations. Indeed the variations are relatively small compared to the effective sum-power values.

Achievable Power for the N-User BC
Extending the former result to the N-user Gaussian BC is straightforward with SC, when the power of each user is optimized according to Equation (43). At each level, an additional penalty on the error is introduced. γ * k is thus the solution to The sum introduced in the Q −1 function shows how the error probabilities accumulate, which is the key issue of a SC approach in FBL regime.
The consequence for the sum-power follows from the iterative relations: where γ * k is the solution of Equation (45). The NOMA achievable EE-SE trade-off for two different block-lengths (n = 100 or n = 1000) and an individual error probability threshold * = 10 −3 is represented in Figure 7 with the iterative power allocation described above for a number of coding levels in N u ∈ {1, 2, 4, 8, 16}. The EE-SE trade-off of OMA in asymptotic and FBL regimes is also plotted for reference.
Clearly, for small block-length (n = 100, Figure 7a), the achievable region shrinks the most with the 16-user SC due to the impact of error accumulation. The best FBL SC configuration is the 4-user SC (green curve), in moderate to high spectral efficiency regime. The 2-user SC is almost optimal in these regimes. In the low spectral efficiency regime (below 1), OMA (dotted blue line) outperforms NOMA. When the block-length is larger (n = 1000, Figure 7b), OMA remains optimal in the low spectral efficiency regime.
However, in the moderate to high spectral efficiency regime, the degradation of SC reduces significantly, and all NOMA schemes outperform OMA. In this situation, the 4-user or 8-user SC are the best ones.
An important conclusion is that SC is inappropriate for very small packets at low SNR. This is in line with [20] that pointed out the better performance of OMA when the density of users µ = K/n << 1, with K the number of users, compared to a full decoder. Note that [20] considered a MAC scenario while we are considering BC in this paper. However, thanks to the MAC-BC duality, the conclusions could be easily transposed to the MAC scenario because, in both cases, successive decoding is used as a baseline.

Conclusions
In this paper, we proposed an analytic model to evaluate the performance of NOMA with many users when the transmission time is constrained and when small packets are transmitted. For that, we merged the spatial continuum model introduced in [25] with the finite block-length second-order rate expansion limit introduced by Polyanskiy et al.
We first show that the fundamental limit obtained with the spatial continuum model is relevant as this fundamental limit can be reached with a reasonable number of superposition coding layers when the messages are transmitted over large block-lengths. This result justifies the use of the proposed fundamental limit (Theorem 1) to optimize the design of cellular networks for NOMA IoT cells.
By exploiting a SC scheme in FBL, we further show the performance degradation when n reduces below 1000. However, it is worth mentioning that our FBL analysis relies on assumptions which prevent us from claiming that NOMA is necessarily worse than OMA in FBL. Indeed, (i) we used the normal approximation, (ii) we impose a SC strategy and (iii) we used a sub-optimal reference power allocation in SC.
Even if the normal approximation has been observed to be tight by simulation, the classical Berry-Esseen bounds are not sufficient to prove this tightness [8,16,17]. The recent paper [34] explores the tightness of saddle-point approximations for the P2P channel and could be used in the future to determine tighter bounds for the N-user BC. Nevertheless, additional simulations not presented here for the sake of consistency show that the degradation of SC in the FBL is not due to this approximation.
Concerning the SC strategy, clearly responsible of the performance degradation due to the successive decoding algorithm, an open question is to determine if a dirty paper coding technique in FBL could outperform the SC technique. Answering this question may rely on [17,35].

Acknowledgments:
The authors thank H. Vincent Poor for his insights in the development of the initial model. They also thank all members of the ANR project Arburst, especially Laurent Clavier, who contributed to the development of this paper through insightful comments. The authors also thank Samir Perlaza for his suggestions and proofreading.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript: