Deep Learning-Based Blind Multiple User Detection for Grant-free SCMA and MUSA Systems

Massive machine-type communications (mMTC) in 6G requires supporting a massive number of devices with limited resources, posing challenges in efficient random access. Grant-free random access and uplink non-orthogonal multiple access (NOMA) are introduced to increase the overload factor and reduce transmission latency with signaling overhead in mMTC. Sparse code multiple access (SCMA) and Multi-user shared access (MUSA) are introduced as advanced code domain NOMA schemes. In grant-free NOMA, machine-type devices (MTD) transmit information to the base station (BS) without a grant, creating a challenging task for the BS to identify the active MTD among all potential active devices. In this paper, a novel pre-activated residual neural network-based multi-user detection (MUD) scheme for the grant-free SCMA and MUSA system in an mMTC uplink framework is proposed to jointly identify the number of active MTDs and their respective messages in the received signal's sparsity and the active MTDs in the absence of channel state information. A novel residual unit designed to learn the properties of multi-dimensional SCMA codebooks, MUSA spreading sequences, and corresponding combinations of active devices with diverse settings. The proposed scheme learns from the labeled dataset of the received signal and identifies the active MTDs from the received signal without any prior knowledge of the device sparsity level. A calibration curve is evaluated to verify the model's calibration. The application of the proposed MUD scheme is investigated in an indoor factory setting using four different mmWave channel models. Numerical results show that when the number of active MTDs in the system is large, the proposed MUD has a significantly higher probability of detection compared to existing approaches over the signal-to-noise ratio range of interest.


I. INTRODUCTION
Massive machine-type communications (mMTC) is foreseen as one of the leading service classes for sixth-generation (6G) wireless communication systems [3]. mMTC supports a wide range of industrial, medical, commercial, defense, and general public applications in the Internetof-Things domain. mMTC focuses on supporting the uplink-dominated massive number of lowpower and low-complexity devices that sporadically transmit short data packets [4] with low transmission rates during the short period of the active state. Therefore the conventional four-step scheduling-based multiple access schemes in which the base station (BS) allocates orthogonal time/frequency resources to each device are inefficient because of the heavy signaling overhead and higher access delay [5].

A. State-of-the-art
Grant-free non-orthogonal multiple access (NOMA) schemes have been proposed as a promising solution [6] to overcome the above-mentioned limitations. Grant-free access allows the machine-type devices (MTDs) to communicate pilot and information symbols without acquiring a transmission grant, thereby significantly reducing the signaling overhead. NOMA techniques have been extensively investigated to potentially support a massive number of MTDs by sharing a limited amount of time and frequency resources in a non-orthogonal manner [6]. In this approach, there is inter-user interference because of the orthogonality violation. Therefore, NOMA applies device-specific non-orthogonality sequences to mitigate the inter-user interference [6].
In this regard, several signature-based NOMA schemes were proposed based on device-specific codebook structures, interleaving patterns, delay patterns, scrambling sequences, and spreading sequences [7]. From this perspective, the sparse coded multiple access (SCMA) is designed as a code-domain-NOMA technique [8] by Huawei, and multi-user shared access (MUSA) is introduced by ZTE [9] as a spreading-sequence-based NOMA scheme.
Recognizing the underlying challenge of grant-free access that each MTD transmits information without scheduling, a mechanism is required at the BS to detect the active MTDs among all inherent devices in the network. This procedure is defined as multi-user-detection (MUD).
In mMTC network, a single MTD is not active for a long period, and only a few devices are active in a particular time frame. Due to the infrequent nature of mMTC traffic, the activity vector, which marks the active devices out of all available MTDs, can be represented as a sparse vector. This problem can be formulated as a sparse signal recovery problem by considering the sparsity of the activity vector. Furthermore, based on the antenna configuration of the BS, this problem can be regarded as a single measurement vector (SMV) MUD problem where BS has a single antenna or multiple measurement vector (MMV) MUD problem where BS has multiple antennas. Solving this problem involves two subqueries, i) how many devices are active and ii) which are those active devices.
A number of studies based on compressive sensing (CS) theory have recently been proposed to exploit the sparse characteristic of device activity [10]- [14]. The study [10] proposes a maximum a posteriori probability (MAP) based approach to identify active devices and their corresponding data symbols simultaneously. The authors in [11] proposed a generalized MMV based CSapproximate message passing (AMP) algorithm for joint active user detection (AUD) and channel estimation (CE) by considering the sporadic traffic nature and the virtual angular domain sparsity of massive multiple input multiple outputs (MIMO) channels. The work in [12] investigates the structured sparsity of the active users and proposes a low-complex structured CS-based iterative algorithm to detect the active users and data jointly. To improve the MUD performance, the authors in [13] propose a prior-information-aided adaptive subspace pursuit (PIA-ASP) algorithm to exploit the intrinsically temporal correlation of active user support sets in several continuoustime slots. Moreover, dynamic CS-based MUD proposed in [14] to detect both active users and their corresponding data jointly in several continuous-time slots by exploring the temporal correlation of the active user sets. In addition, the designers of SCMA and MUSA started the ball rolling on blind MUD [15], [16]. Reference [15] proposes an iterative algorithm to detect the active pilots and decoding mechanism of user's data. In [16], the authors present MUD without the reference signal at the BS for MUSA. First, the spreading codes used by the active users are estimated by an iterative algorithm. Then the active users are detected by blind equalization.
With a high overloading ratio (OR) where the number of devices is higher than the number of resources, the performance of the previously mentioned [10]- [16] CS-based approaches degrade due to the increased correlation between the columns of the sensing matrix. Furthermore, the performance degradation would be high when the sparsity of the input vector increases in mMTC. Besides, conventional CS-based solutions strongly depend on the channel estimation quality. However, perfect channel estimation cannot be accomplished in practical mMTC. The aforementioned iterative algorithms also take a considerable amount of time to converge, which increases the communication latency. Even though a significant amount of literature has been published based on conventional algorithms, those literature have not entirely addressed the realistic nature of the MUD for mMTC. Taken together, developing a practically feasible, scalable   MUD scheme for grant-free SCMA and MUSA for mMTC is a challenging open problem. Cutting-edge explorations and expansions of machine learning have started to address critical problems in wireless communication driven by the advancement of computational capabilities and algorithm complexities [17]. In this viewpoint, two studies have attempted to explore the MUD problem by deep learning, e.g., [18], [19]. Both studies propose MUD for a grant-free low-density signature (LDS) scheme. In [18], the authors present deep learning-based parallel receivers with a softmax estimator for each distinct sparsity level. However, sparsity estimation based on softmax thresholding performs better only for specific sparsity levels where the predefined threshold value satisfies the estimation conditions. Furthermore, designing a distinct MUD strategy for each sparsity level is impractical, and it limits the algorithm's scalability. From a theoretical point of view, softmax takes the value of the layer before and produces a probability distribution, where output values are interrelated. Also, the initial problem formulation consists of independent devices; therefore, using softmax is not a proper approach. Moreover, depending ensemble learning has a risk of carrying a high bias toward its aggregate and is undoubtedly expensive in computational and implementation complexity. In the study [19] the authors fix the number of active devices during the training, which does not enable deep neural networks (DNN) to learn the entire codebook of the network. Therefore, it induces misdetection during practical implementation. These approaches as given [18], [19] are unsatisfactory because they did not wholly consider realistic mMTC scenarios and deep learning design criteria. Consequently, there is a need to develop a practically feasible deep learning-based MUD for the SCMA and MUSA schemes.

B. Contribution
The major contributions of this study are as follows: • We propose a novel pre-activated residual neural network (ResNet)-based blind DNN-MUD architecture for two different grant-free NOMA schemes: SCMA codebooks and MUSA spreading sequences with diverse ORs. Our proposed architecture learns the correlation between the received signal and the device-specific codebooks for SCMA and spreading sequences for MUSA during the offline training, further jointly detecting the received signal's sparsity and corresponding active devices by a sigmoid estimation online without channel state information (CSI).
• We formulate a SMV sparse signal recovery problem for jointly achieving sparsity and active devices. Further, we re-structure the received measurement vector by separating the real and imaginary components and stacking them as an input vector to construct the training data while annotating the status of the MTDs to generate the training labels. This training data and corresponding labels are given to the DNN to learn the internal parameters, including the codebook and spreading sequence entries and combinations of the active devices in the mMTC system.
• Since there are no readily available SCMA codebooks and MUSA spreading sequences for massive devices, we design SCMA codebooks for different ORs with constellation rotation and interleaving based on [8], [20] and generate a feasible set of MUSA spreading sequences based on [9]. However, it is challenging to choose the best MUSA spreading sequences from the available set; therefore, we propose a scalable heuristic algorithm to determine the spreading sequences with the required orthogonality factor among the potential set to reduce the correlation between the MUSA spreading codes.
• We further expand our DNN algorithm to address the MUD of the MMV system, where we separate the real and imaginary elements of each antenna measurement and serially stack them to construct the input training data. However, we use the same SMV technique to annotate the training labels.
• We broadly evaluate the proposed DNN-MUD performance with a comparison of existing well-known algorithms such as least squares-block orthogonal matching pursuit (LS-BOMP) [21], complex-AMP (C-AMP) [22], and stagewise orthogonal matching pursuit (stOMP) [23] algorithms. Also, we examine the SMV scenario for the mmWave indoor factory environment [24] to validate the performance of the DNN-MUD in realistic circumstances.
• We derive the computational complexity of the proposed DNN-MUD architecture and provide the complexity of the previously mentioned well-known algorithms to compare the collective effectiveness with MUD.
The structure of our paper is as follows. Section II presents the system model and problem formulation, including the concept, generation, and code selection of SCMA codebook and MUSA spreading sequences. Section III describes the deep learning approach and MUD structure for SMV and MMV systems. Section IV shows the complexity, training, and testing data generation. Section V presents the simulation setup, parameters, and numerical results. Finally, Section VI concludes the study.
Notations: Boldface uppercase, boldface lowercase, and lower case letters represent matrices, vectors, and scalars, respectively, whereas calligraphy letters denote sets. R and C denote the space of real and complex numbers, respectively. The operations (.) H and (.) T denote conjugate transpose and transpose, respectively. ℜ(s) and ℑ(s) are the real and imaginary part of a complex number s, respectively. I denotes the identity matrix, where the size is evident from the context.
In addition, complex Gaussian distribution with zero mean, and variance σ 2 w is represented by CN (0, σ 2 w ). The Hadamard (element-wise) product operator is denoted by •. Also, the absolute value of the complex number x is represented by |x| and Euclidean norm of the vector x is denoted by x .

A. System Model
Consider the uplink grant-free NOMA system of a mMTC network where a set N of N randomly distributed MTDs are served by a BS. In the SMV scenario, the BS and MTDs are each equipped with a single antenna, and in the MMV scenario, the BS is equipped with multiple antennas, and MTDs are equipped with a single antenna. In this study, we focus on the overloaded scenario, where the number of MTDs is higher than the available uplink radio MTDs share information spontaneously without scheduling, the BS needs to classify the active MTDs, i.e., determine the number of active devices, their identity, and transmitted data.

B. Problem Formulation
We define the binary state indicator (active or inactive) a i of the i-th MTD as The probability of the active state of the i-th MTD is p i , which is independent of other devices in the cell. The received signal y ∈ C K at the BS can be expressed as where s i ∈ C K is the spreading sequence vector of the i-th MTD (device specific codeword vector for SCMA and complex spreading sequence for MUSA), h i is the complex uplink channel coefficient from the i-th MTD to the BS which includes both small and distance-dependent large scale fading, x i is the transmit symbol of the i-th MTD, and w ∈ C K ∼ CN (0, σ 2 w I) is the complex-Gaussian noise vector. As specified earlier, at the beginning of the transmission, BS does not have knowledge of the number of active devices and corresponding spreading sequences.
Therefore BS performs MUD to identify them. In particular, all the active MTDs transmit the pilot symbol x p,i to the BS in order to assist it to detect the active devices and corresponding spreading sequences (s i ). After that, it estimates the channel coefficients (h i ). Finally, BS decodes the J data symbols x [1] d,i , . . . , x [J] d,i transmitted by the active MTDs. The transmission is over J + 1 slots, where J is selected such that the channel coherence time is greater than the transmission time of J + 1 transmission slots. The pilot measurement vector y p ∈ C K is given by Let us define φ i = s i x p,i , and Φ = [φ 1 , . . . , φ N ], we then have where a = [a 1 , . . . , a N ] T is the activity vector, and h = [h 1 , . . . , h N ] T is the channel vector. Let We can rewrite (4) as We consider a fraction of MTDs (n out of N) to be active at any given time. The accumulated sparse vector ϕ has n non zero elements. Therefore, the received vector y p can be represented as

C. Codebook-SCMA
SCMA is designed as a non-orthogonal multi-dimensional structured codebook with layerspecific shaping gain where shaping gain is achieved by utilizing the QAM modulation [25]. The fundamental approach of SCMA is directly mapping the incoming MTD's bits into a codeword vector, where bit to symbol mapping and spreading are coupled together. Furthermore, SCMA codewords are sparse, whereas the non-zero entries in the codeword are the assigned orthogonal channel resources for the uplink communication to the specific MTD [26]. The number of nonzero elements in the codeword is defined as a diversity order of the codeword. Moreover, the same diversity order assigns for different devices. In this work, we design the codebooks based on [8], [20]. The design includes constellation rotation and interleaving. The multi dimensional SCMA codebook of i-th MTD device S i ∈ C K×R is defined as where R is the constellation points, V i ∈ B K×U is the U-dimensional SCMA binary mapping matrix of i-th MTD, △ i is the constellation operator of the i-th MTD, and M c is the mother constellation. The matrix V i can be generated by including K − U all-zero row vectors inside the rows of I U [8]. Furthermore, the SCMA design contains the properties below: N = ( K U ) and number of MTDs connected to the same resource, d f = ( K−1 U −1 ). Details of △ i and M c are described in the below subsections. The design of SCMA consists of two major parts, designing the multi-dimensional mother codebook and constructing the codebook based on the number of MTDs.
1) Design of SCMA mother codebook: First, two-dimensional lattice (Z 2 ) with R points is defined as the first dimension of the mother constellation, where the steps of the design and theory of the lattice constellation are described in [8]. Then, we perform the Gray mapping for each element in the first dimension set. After that, the remaining required number of dimensions is generated using the first dimension's rotation with the corresponding Gray mapping, which gives us the complete mother constellation. However, this constellation is not efficient enough to control the effect of channel fading [20]. Therefore, to improve the energy efficiency of the codeword and reduce the peak to average power ratio, the authors in [20] propose to use interleaving. Furthermore, they claimed that this interleaving could improve the performance in the fading channels. During interleaving, even rows of the initial constellation are reordered as given in [20] while the odd rows are kept as before, which provides us with the final mother constellation (M c ).
2) Generation of factor graph for a separate codebook: The next step is to construct a device-specific codebook for each MTD. Two significant conditions must be followed while constructing the device-specific codeword, such as retaining the Euclidian distance profile and structure of the mother codebook. The authors in [8], [20] propose to use Latin square and factor graph-based constellation to ensure the conditions. The factor graph representation describes the relation between the MTDs and the resource elements. The factor graph matrix of this system Next, we replace the d f non-zero elements of each row of the factor graph matrix F with each of the d f phase rotation angles ̺ i according to the Latin structure. This ensures that the any given phase rotation angle does not appear twice within the same row or column. The phase rotation angle ̺ i is defined as [27] ̺ As an example, for d f = 4, the phase rotation angles will be (̺ 1 , ̺ 2 , ̺ 3 , ̺ 4 ), and the corresponding updated factor graph matrix is Finally, the device-specific constellation operator △ i ∈ C U ×U for i-th MTD can be define as wheref i is the i-th column of the updated factor graph matrixF without zero entries.

D. Spreading sequence-MUSA
MUSA spreading sequences are designed to support massive MTDs in a dedicated amount of radio resources. In the MUSA scheme, each MTD information is spread with a group of low cross-correlation complex spreading sequences. BS can generate a large set of possible MUSA codes with specific M-ary values and required code lengths (number of radio resources).
Also, MUSA codes are designed to support high OR with small code lengths to handle power consumption and delay efficiently. Furthermore, the different M-ary values and code length define the space of available MUSA sequences [9]. MUSA considers the independence of choosing real and imaginary parts of the complex spreading elements; it has the feasibility to select from M = 2, where the set {1, −1} is considered as binary sequences. It can produce only 4 elements We consider only M = 3 codes for our study, which is recommended by the MUSA designers [9] and able to produce an adequate amount of codes.
After generating the complex spreading elements, we can create the space of all possible MUSA sequences by permuting them according to the required code length. Therefore, the set M of 9 K complex sequences can be generated from M = 3 codes. However, we need only N number of MUSA spreading sequences. Out of all available sequences, some are highly crosscorrelated sequences, which reduces the MUD performance. Rather than randomly choosing N columns of the all possible MUSA sequence, we propose a scalable heuristic algorithm to choose a required number of low-cross correlated sequences, which is summarised in Algorithm 1. Given a cross-correlation threshold ρ, the proposed algorithm selects a subsetM of M, such that the mutual cross-correlation between any two elements ofM is less than or equal to ρ.

A. Proposed Solution Approach
This study aims to pave the way for the practical implementation of mMTC traffic MUD under the grant-free access scheme. Considering the sporadic nature of MTD activation pattern in mMTC traffic, only a small subgroup of the entire group of MTDs is active in a given time.
By employing this fact, the MUD problem can be expressed as a deep learning-based multi-label classification. As the name suggests, multi-label classification detects the mutually non-exclusive multiple labels associated with the single data. In our problem, the labels are active and inactive states of the MTDs, and data is the received signal at the BS.
The objective of the DNN-based MUS is to detect the corresponding labels of the ϕ. Here, the number of labels and their positions will be the number of active devices and their codebooks for SCMA and spreading sequences for MUSA. Therefore, we are not focusing on the recovery of the nonzero components. A major advantage of our proposed scheme is that since we train our network with sufficient training data, we do not need to estimate the channel before the detection.
Therefore, we introduce a ResNet-based architecture to solve the above-mentioned tasks jointly.
This architecture consists of several hidden nodes connecting the input layer to the output layer.
Here each data item from the training data (y p ) set is tagged with the corresponding label (ψ).
In order to solve the two subtasks, the proposed model learns the mapping function associating the input nodes and related labels by updating the hidden parameters using the backpropagation process. Clearly, It is the nonlinear mapping g between y p and sparse vector elements of ϕ.
Subsequently, we reformulate the problem (6) as where ω is the set of weights of the hidden layers, and θ is the set of biases of the hidden layers of the neural network. The explicit intention of this DNN-MUD is to obtain g characterized by ω and θ given y p , nearest to the unknown actual mapping function g * . Therefore, we model our DNN-MUD to thoroughly understand the correlation of the sensing matrix Φ. Mutual coherence is the measure of the correlation between two columns of a matrix which is defined as In order to learn the correlation structure of the matrix Φ and to improve the MUD performance, we consider the restricted isometric property (RIP) [28] of the sensing matrix Φ to the design of the architecture. There exists a restricted isometric constant (RIC) δ s ∈ (0, 1) satisfying Hence, a smaller δ s achieves better MUD performance. Therefore we consider the minimization of RIC by the design of our DNN-MUD architecture.

B. MUD and Sparsity Estimation Architecture for SMV
The p is a complex vector that cannot give direct input to the DNN because our label is not in the complex form. Therefore we split real and imaginary parts separately and stack them as a vector input to the system as  where W in ∈ R υ×2K is the initial weight, b in ∈ R υ×1 is the initial bias and υ is the width of the hidden nodes. After that, D number of resulting vectors are assembled in the mini-batch B.
Then, we add the batch normalization layer, where each elementz where γ and η are scaling and shifting parameters, respectively. µ B,j and σ 2 B,j are the batch-wise mean and variance, respectively. The function of the normalization layer is to ensure that the input distribution has a fixed mean and variance, which improves the learning of the DNN.
Also, larger variances of the training data reduce the learning of the internal features from the input signal. Therefore, batch normalization controls the interpretation affected by the various wireless channels and noise. Then, a nonlinear activation function is applied to the output of the batch normalization layer to determine whether the information passing through the hidden unit is activated. We added the ReLu activation [30] unit before inputting the data to the identity blocks as the nonlinear activation function. The ReLU activation function is expressed by σ(x) = max(x, 0). Therefore the input to the first hidden layer is given by After the ReLU activation, the output vector passes through several identical blocks. Every block consists of four dense layers, four batch normalization layers, three ReLU activation units, a dropout layer, and a residual connection. Inside the identical block, the output of the first three layers follows as (16), (17) where P dr is the dropout probability of the Bernoulli random variable. In complex DNN models, there is a possibility that the model learns the noise in the training data and performs better during the training phase, but fails to detect during the testing phase where the noise is different. To avoid this, we added another dense layer with L2 regularizer after the dropout layer, where the regularizer introduces penalties to the specific layer parameters to avoid high variance. Here, we intend to introduce a decay function to the layer's weights, introducing a summed value to the regular loss function. Therefore, the updated loss function is given bỹ where L() is the loss function, W l 2 ∈ R υ×υ is the weights of the second dense layer of the l-th hidden block, and ǫ is the regularization ratio. During the training process, the L2 regularizer adds further subtraction on the current weights of the specific layer based on the ǫ value, which further improves the learning quality of the DNN. After that, we continue with the layers mentioned in Fig. 2. Specifically, we include a ReLU activation at the identity block's end to refine the path from one identical block to another. Therefore activated units are connected to improve the contribution of every identical block in the DNN-MUD. After having gone through all identical hidden layers, the last FC layer produces N output values equal to the total number of MTDs.
The output vector is given by wherez (d) l is the output of the hidden block l, W out ∈ R N ×υ , and b out ∈ R N ×1 are the corresponding weight and bias of the last FC layer, respectively. After that, the sigmoid function produces N independent probabilities (p 1 , . . . ,p N ) based on the previous layer. The probability of the j-th MTD being active is given bŷ Finally, DNN detects the active devices based on the sigmoid probability. The output estimation is given byΩ Here, the value of n and their positions solve the previously stated two subproblems. DNN-MUD learns the sparsity by properly annotating training labels during the training phase. Therefore, DNN-MUD satisfies (23) and detects the active MTDs from the test data set.

C. MUD and Sparsity Estimation Architecture for MMV
Given that BSs are not limited by energy and computation complexity constraints, BSs in many practical network deployments are equipped with multiple antennas. We consider BS equipped with a X > 1 antennas and MTDs equipped with a single antenna. Accordingly, BS receives multiple measurement vectors for each transmission. Therefore the previously mentioned SMVbased MUD becomes MMV-based MUD. Considering the facts, the received signal at the BS can be written as where Y p = [y p,1 . . . y p,X ] and Ψ = [ϕ 1 . . . ϕ X ]. Here, the set of active MTDs at a given time slot is the same for all X antennas. Therefore, the sparsity of (ϕ 1 . . . ϕ X ) are the same.
Hence, the MMV scenario gives additional information about the different channel parameters to support MUD. Therefore we are only considering changing the input data to our DNN-MUD while the labels remain unchanged. In this subsection, we generate the setD of D training data p,X ) and split each of the vectors in the same way of SMV. We split real and imaginary parts of the vectors separately and stack them according to the antenna order, and create a vector input to the system aŝ After that, we use the same architecture as the SMV mentioned in Fig. 2 to support the MMV scenario with different layer parameters. In MMV, we have X-fold increase in the training data for the same label, which leads to improved MUD with the lower training overhead for the same performance level.

A. Complexity Analysis
It is necessary to investigate the computational complexity of the proposed DNN-MUD to examine the practical feasibility of the algorithm in mMTC networks. In this regard, we calculate the floating-point operations (FLOPs), which include all floating-point operations of the FC layer, batch normalization layer, activation functions, dropout, and sigmoid function. First, FLOPs of the initial FC layer (16), which include matrix multiplication of W in ∈ R υ×2K withỹ (d) p ∈ R 2K×1 and a vector addition of b in ∈ R υ×1 is given as Second, FLOPs of bath normalization layer (17), which involves element-wise addition, elementwise multiplication, scaling, and shifting, is given by Then, FLOPs of the pre-activation ReLU unit is stated as Next, there are L hidden layers involved in the design. Each hidden layer contains four dense layers, four batch normalization layers, three ReLU activation units, a dropout layer (FLOPs = υ ), and a identity connection (FLOPs = υ ). Therefore, the total FLOPs of hidden layers are defined by Also, the FLOPs of the final FC layer with W out ∈ R N ×υ and b out ∈ R N ×1 is given as Finally, the FLOPs of the sigmoid layer (22), which contains four separate operations, is defined by Collectively, the computational complexity of the proposed DNN-MUD (From (26) to (31)) in FLOPs is derived as We further investigate the computational complexity of stOMP [23], [31], LS-BOMP [18], and C-AMP [22], [32] algorithms. The complexities in FLOPs are presented in Table I. In addition, τ is the number of iterations in the C-AMP algorithm. We compare the computational complexities in Table I, it is clear that the complexity of the DNN-MUD depends on the internal parameters of the DNN while stOMP, LS-BOMP, and C-AMP rely on the system parameters. Therefore, it is obvious that using DNN algorithms in the mMTC systems is feasible and practical. Furthermore, the complexity of the DNN-MUD is significantly smaller than LS-BOMP and C-AMP algorithms.

B. Training Data Generation and Implementation
The proposed DNN-MUD is supervised learning, more specifically a multi-label classification problem that takes the labeled data as input and learns the mapping between the data and the corresponding labels. The well-trained and calibrated model can detect the new set of data labels.
Therefore, a sufficient amount of training data is required for the convergence of the DNN model.
We validate the adequate amount of data by the loss function of the DNN model. The proposed model can learn the codebook parameters from the actual received signal. However, we can train, validate, and test the DNN using synthetically generated data. The industrial application can directly feed the received signal with proper labels to the DNN model, which is feasible since this application is for uplink communication of mMTC.
In the generated data, the received signal contains the sparse input vector ϕ and the sensing matrix Φ where all environment channel properties and randomness are involved in the sparse vector, not in the sensing matrix. The core learning component of the DNN depends on the codebook for SCMA and spreading sequences for MUSA, which is known to the BS a priori.
Hence, the synthetically generated data does not degrade the performance of the proposed DNN-MUD. Consequently, we generate the SCMA codebook and MUSA spreading sequences for the different simulation requirements. Then, create the channel vector and random noise vector w.
Finally, the training data is produced using (

V. SIMULATION RESULTS
We explore the performances of the proposed DNN-MUD for SCMA and MUSA systems in this section. CSI is not assumed at the BS since the BS does not know the active users a priori and cannot extract the CSI from the pilot sequences. First, we show the results of the system performances of the DNN to ensure correctness and reliability. Then, we present the outcomes for the SMV scenario in comparison with LS-BOMP [21], C-AMP [22], and stOMP [22]. The  Table II. 20

Number of hidden layers (L) 4
Learning rate 1 × 10 −3 Batch size 1 × 10 3 Epoch 50 Drop out, P dr 0.5 We use the binary cross-entropy as the loss function and the Adam optimizer to guarantee the

A. Performance of MUSA sequence selection algorithm
First, we consider the performance of the proposed MUSA sequence selection algorithm by plotting the correlation among the selected codewords. Fig. 3 compares the correlation of the MUSA sequences selected by the proposed Algorithm 1 (Fig. 3a) against a random selection of MUSA sequences from the available set (Fig. 3b). The proposed algorithm shows a high degree of auto-correlation and low cross-correlation for the selected MUSA sequences as it chooses

B. System Performance of DNN
We evaluate the performance of the DNN-MUD with the perfect detection of active devices. Fig. 4a and Fig. 4b show the calibration curve for the SCMA and MUSA systems, respectively.
The calibration curve compares the expected output and the predicted output of the system. Furthermore, it is one of the standard approaches to evaluating system performance in multilabel classification. Our calibration curve clearly shows the perfectness of the DNN-MUD for OR = 150%. In contrast, the requirement for the specific mechanism to improve the detection performance at higher ORs like 300% which we discussed in subsection D.

C. MUD for the SMV Scenario
First, we study the probability of detection of the proposed DNN for both SCMA and MUSA systems and show the result in Fig. 5. We observe that DNN-MUD outperforms the stOMP, LS-BOMP, and C-AMP algorithms for both SCMA and MUSA systems even though the OR increases from 150% to 300%. In MUSA systems, stOMP is able to detect active devices to a certain extent with known sparsity, nevertheless unable to defeat the DNN-MUD. We evaluate precision, binary accuracy, and AUC apart from the recall value to ensure the trustworthiness of the DNN and plot the results for the SCMA and MUSA system in Fig. 6 and Fig. 7, respectively. Combining the outcomes of Fig. 5 with Fig. 6a and Fig. 7a two different ORs beyond 94% indicates the overall performance. Likewise, the AUC reaches 1 when SNR increases, which indicates the detection accuracy of our proposed approaches.
Second, we explore the probability of detection versus the number of active MTDs at SNR = 20 dB. We observe that the DNN-MUD is superior to its counterparts in the entire region.
Specifically, the examination was carried up to 10% and 30% of active devices out of all devices in SCMA and MUSA systems, respectively. Regardless, mMTC requirement of active devices is around 5% to 10% of total devices in a time frame. Furthermore, we can observe that the likelihood of detecting stOMP is around 50% and C-AMP is around 47% for OR = 150% in SCMA systems while DNN reaches more than 95%. We can see the robustness of the DNN-MUD scheme with an increased number of active devices. According to the present results, when n increases, the mutual coherence of the system rises sharply, leading to performance degradation. However, the DNN-MUD can handle this subject by learning the sensing matrix during the training phase, while stOMP, LS-BOMP, and C-AMP do not. samples when n = 9 for SCMA codebooks with 150% ORs.

D. MUD for the MMV Scenario
Next, we evaluate the performance of DNN-MUD for MMV systems. Fig. 9 presents the variation of the probability of detection versus SNR with three different antenna configurations for both SCMA and MUSA systems; when X increases, the spatial diversity increases, improving the detection probability. For example, at SNR = 10 dB the probability of detection improves from 81% to 97% when X changes from X = 1 to X = 2 for SCMA codebooks with 300% OR. Likewise, for MUSA spreading sequences with 300% OR, the performance improves from 73% to 92%. Moreover, the probability of detection reaches 100% for both SCMA and MUSA systems with 150% and 300% ORs, when X goes to X = 4 at SNR = 15 dB. Here, we design our DNN such that with twice the number of input neurons when X increases from one to two and two to four to achieve the advantage of the spatial diversity of the MMV system. Furthermore, the increased input neurons improve the learning capability of the DNN.
Notably, the MMV system is the proactive solution to the issue we observed and discussed in Fig. 4 for higher OR ratios. Based on the outcomes, we can state that increasing the number of antennas at the BS is ideal for overcoming the drawbacks of higher ORs of SCMA and MUSA systems, which are applicable in mMTC for future wireless communication requirements.

E. MUD for the mmWave Indoor Factory Environment
In this subsection, we consider the next-generation mMTC application environment. 5G NR spectrum includes the 26 GHz mmWave band, while IEEE 802.11 ad/ay is defined for the 60 GHz band (known as WiGig or 60 GHz Wi-Fi). Future beyond 5G/6G wireless networks are expected to operate at even higher frequency bands [3]. Thus, there is a requirement to investigate high-frequency channel models. In that respect, we propose the SMV DNN MUD problem for mmWave channel models based on the 3GPP specifications. We consider the indoor factory (InF) environment with various sizes and various density levels of machinery. 3GPP proposes five types of channel models for InF, first is sparse clutter with low BS height (InF-SL), second is dense clutter with low BS height (InF-DL), third is sparse clutter with high BS height (InF- In all four InF scenarios, we consider the line of sight (LOS) and non-LOS (NLOS) based on the LOS probabilities defined by 3GPP [24]. In accordance, the 3D distance r 3D between BS and the MTD is given as where r 2D is the 2D distance between the BS and the MTD, h BS is the height of the indoor BS, and h M T D is the height of the location where the MTD is deployed. The LOS pathloss of the all four InF scenarios define as P L LOS = 31.84 + 21.50 log 10 (r 3D ) + 19.00 log 10 (f c ) + χ σ LOS , where f c is the normalized carrier frequency and with the corresponding shadow fading standard deviation σ LOS = 4.3. The NLOS pathloss define as P L = a + b log 10 (r 3D ) + 20.00 log 10 (f c ) + χ σ NLOS , where values of a and b varies with different InF scenarios. The values of a, b, and shadow fading standard deviations σ N LOS are given in Table III. The LOS probabilities of all four InF scenarios define as where k s given by where r c is the clutter size, ς is the clutter density, and h c is the effective clutter height. We train our model for all four types of InF scenarios to detect the active MTDs. InF environment parameters are presented in Table III. Here, we define different r 2D ranges for all InF scenarios to compare the performances in specific SNR values while maintaining the channel model condition that, 1 ≤ r 3D ≤ 600 m given in [24]. Furthermore, we used the same ORs, N in both SCMA and MUSA, p i , noise spectral density, noise figure at BS, and DNN parameters same as the Finally, we simulate the four different InF environments and evaluate the performance of the DNN MUD for the SMV system to verify the algorithm's applicability in future smart factories. Fig. 11 presents the probability of detection versus SNR for InF-SL, InF-SH, InF-DL, and InF-DH. Here, we observe that InF-DH achieves the highest overall performance in both SCMA and MUSA scenarios because of the higher chance of the LOS probability. Also, due to the ς difference between InF-DH and InF-SH, InF-SH reaches the second-highest overall performance.
Likewise, the performance of InF-DL and InF-SL follows the argument as mentioned earlier. The low computational complexity of the DNN-MUD further highlights the significance of the proposed approach, which encourages the less complex DNN powered architectures for future mMTC applications.