Combined Deep Learning and SOR Detection Technique for High Reliability in Massive MIMO Systems

In this paper, a novel iterative detection technique that combines deep learning (DL) and the approximated algorithm of successive over relaxation (SOR) is proposed to achieve high reliability and reduce the computational complexity. Recently, as the demanded data rates increase, the massive multiple-input and multiple-output (MIMO) system has drawn attention in wireless communication. In massive MIMO, the implementation of traditional detectors for high reliability has become impractical, and the reduction for the complexity of detectors has emerged as a practical implementation challenge. The existing DL-based detection technique of orthogonal approximate message passing network (OAMPNet) can provide high detection performance. However, the computational complexity is too high for the implementation in massive MIMO systems. The proposed detection technique uses SOR algorithm to reduce the computational complexity, and the relaxation parameter of SOR is adaptively determined by a learning algorithm. A non-linear estimator using the DL algorithm is combined with the SOR algorithm to achieve high reliability, and regardless of the size of the MIMO system, only the size of the DL architecture determines the complexity of the non-linear estimator. Simulation results show that the proposed detector outperforms the conventional linear detector based on minimum mean square error (MMSE) and achieves high reliability with lower complexity than OAMPNet in various channel environments with spatial correlation.


I. INTRODUCTION
Multiple-input multiple-output (MIMO) is a critical technique that enables independent data transmission of multiple streams to increase data throughput and gains diversity benefits to improve link reliability. The use of multiple antennas is essential in the fifth-generation (5G) wireless communication to address the explosive growth of incremental data along with the fast growth of wireless communications [1]. For the high spectral efficiency that can be achieved by employing MIMO systems, a large scale MIMO system well known as massive MIMO has been proposed as a key technology in 5G wireless communication [2], [3]. To obtain the multiplexing gain that increases a channel capacity proportional to the number of antennas used, the receiver has to accurately restore the transmitted original data from the transmitter, and this requires a well designed detection technique. Re-cently, smart applications utilizing artificial intelligence and machine learning have been mentioned as a new paradigm towards the sixth-generation (6G) wireless communication [4]. Many studies that use deep learning (DL) algorithms in the field of communication systems including the physical layer have been conducted due to the successful cases of DL application in various fields, and this has led to the emergence of DL-based detection techniques [5]- [8]. The goal of MIMO detection is to infer the original signal from the transmitter. The most commonly known optimal detection technique is the maximum likelihood (ML) detector [9]. However, the exhaustive search algorithm of ML detector exponentially increases the complexity proportional to the modulation order and the number of transmit antennas. To balance between computational complexity and accuracy, suboptimal detection techniques are classified into linear and non-linear algorithms [10]. The representative linear detection techniques known to have low complexity are based on the zero-forcing (ZF) and minimum mean square error (MMSE) algorithms. In general, the non-linear detection technique as a more advanced technique includes more complex signal processing, e.g., the interference cancellation based MIMO detectors are known as ordered successive interference cancellation (OSIC) and decision feedback equalizer (DFE) [11], [13]. The iterative detector, such as approximate message passing (AMP), provides asymptotically optimal detection performance for large independent and identically distributed (i.i.d.) Rayleigh fading channels [12]. The tree search based MIMO detectors known as QR decomposition-based M (QRDM) and sphere decoding (SD) reduce the computational complexity by exploring a limited number of reference signals compared to the ML detector based on the brute force search [13]- [15].
With the advances in hardware devices and big data, DL algorithms have drawn much attention in various fields including speech processing and computer vision. The deep neural network (DNN) known as a representative DL algorithm has the potential for optimizing communication systems, and it was shown that DL-based detection techniques can achieve the same optimal detection performance as ML detector [16]. The DL-based detection techniques can be divided into two categories: data-driven and model-driven algorithms. Datadriven detectors replace conventional detection techniques with classic fully connected layers of DNN that can solve the detection problem without communication expertise. Therefore, purely data-driven detectors that only design the entire signal processing as a neural network architecture using the end-to-end learning algorithm are considered as a black box [17], [18]. The completely learned data-driven detector provides better accuracy than conventional detection techniques without accurate channel information, and it can be adaptively trained and operated according to various channels. However, the data-driven detector with many trainable parameters to optimize for a specific channel requires many training samples. In [19], a model-based iterative successive interference cancellation (SIC) detector was redesigned as the data-driven detector to fully exploit the potential of DNN, and an algorithm that can learn with fewer training samples was proposed. To solve the aforementioned problems of considering the DL algorithm as a black box and requiring a large number of training samples, model-driven detectors are designed to improve detection performance by combining existing detection techniques with trainable parameters, and such architecture is mainly referred to as a network. In [20], the first model-driven detector, known as detection network (DetNet), was designed by unfolding iterations of a projected gradient descent algorithm into a neural network. DetNet provides high detection performance for ideal channels and low-order modulations of binary phase shift keying (BPSK) and quadrature phase shift keying (QPSK), but still has many trainable parameters. In addition, DetNet has poor accuracy for correlated channels and high-order modulations. In [21], [22], a model-driven detector was proposed by unfolding the orthogonal AMP (OAMP) of iterative detector into a neural network. OAMPNet achieves high detection performance even for correlated channels and high-order modulations, and has only two trainable parameters per one iteration compared to DetNet.

A. MOTIVATION AND CONTRIBUTIONS
For massive MIMO systems, most existing detection techniques including OAMPNet as a DL-based detector are too complex to be implemented. In addition, near optimal detection performance can be achieved by using the conventional linear detectors by the channel hardening property that makes fading channels behave like a deterministic channel. To achieve the same performance as the linear MMSE detector and reduce the computational complexity, many iterative linear detectors utilizing the channel hardening property were proposed, and these approaches can solve the problem of calculating the matrix inversion [23]. Iterative algorithms for solving linear systems are generally classified into two categories: Stationary and Krylov subspace-based algorithms. Typical detectors using stationary algorithm include the jacobi, Gauss Seidel (GS), and successive over relaxation (SOR) detectors [24], [25]. The representative detector based on Krylov subspace is the conjugate gradient (CG) detector [26]. Moreover, to balance between computational complexity and high accuracy by using DL algorithm for massive MIMO detection techniques, MMNet, learning CG network (LcgNet) and jacobi-based detection network were proposed in [27]- [29]. Inspired by studies for massive MIMO detectors to deal with the implementation issue, this paper proposes a novel DL-based detection technique that can extend the detection performance and benefits of OAMPNet to massive MIMO systems. The proposed detector is designed as the iterative framework of MIMO detection like AMP, OAMP, and OAMPNet.
The main contributions of this paper can be summarized as follows: • In the iterative framework, a linear estimator and a nonlinear estimator are iterated as one block. In this paper, the SOR detector is introduced as a linear estimator of the proposed detector to alleviate the implementation complexity problem, and the SOR detector with the aid of the relaxation parameter can provide MMSE detection performance and has robustness even when the channel hardening property becomes weak. • The detection algorithms of the iterative framework require studies on an appropriate non-linear estimator that can improve the detection performance of a specific linear estimator. For example, the OAMP detector uses an MMSE-based linear estimator and a denoising-based non-linear estimator. This paper proposes a DNN-based denoiser as a novel non-linear estimator for the SOR detector and shows that the proposed denoiser can be combined with the SOR detector through the end-toend learning algorithm. The DL algorithm optimizes the proposed detector for specific channel environments and makes it possible to detect original signals with high detection performance. • In the simulation results, the convergence analysis for the proposed detector is presented to select the appropriate number of iterations, and the high detection performance of the proposed detector and its adaptability to various channels are verified. Although the number of parameters to be learned increases compared to OAMP-Net per one iteration, simulation results show that the proposed detector has high robustness to channels with spatial correlation due to its high flexibility using more training parameters. • The computational complexity is mainly related to the implementation issue for massive MIMO. This paper discusses the computational complexity analysis between the proposed detector and OAMPNet, and this supports that the proposed detector can significantly reduce the computational complexity compared to OAMPNet in massive MIMO systems.

B. PAPER OUTLINE
The rest of this paper is organized as follows. In Section II, the system model is presented, and the iterative linear detector and OAMPNet are introduced. Section III presents the main idea of the proposed detector and details the training process and the computational complexity. Section IV presents the implementation details, the convergence and learning curve analyses, and the performance evaluation for the proposed detector, and Section V provides the conclusion.

C. NOTATIONS
In this paper, lower and upper cases, bold lowercase, and bold uppercase letters are used for scalars, vectors, and matrices, respectively. a i , A i,j denotes the i-th and (i, j)-th elements of the vector a and matrix A. (·) H denotes the conjugate transpose for the matrix of arbitrary size. E denotes the expectation operation. R and C denote the real and complex value. |·| denotes the cardinality of a set. · denotes the Euclidean norm of a vector. tr (·) denotes the trace operation. I n denotes identity matrix of size n.

A. SYSTEM MODEL
For uplink massive MIMO systems, it is considered that the base station (BS) with N b antennas simultaneously communicates N u user equipments (UEs) with a single antenna. The received signal y ∈ C N b ×1 at BS is expressed as follows, where H ∈ C N b ×Nu is the channel matrix, n ∈ C N b ×1 is the i.i.d. complex Gaussian noise with zero mean and variance σ 2 , and x ∈ S Nu×1 is the transmitted symbol vector that S denotes constellation points given by a quadrature amplitude modulation (QAM). All points of S are normalized to unit average power, and it is assumed that accurate channel information is known at BS.
To solve the problem of handling complex values for the implementation of DL algorithm, the equivalent real valued representation that is twice the original size can be expressed for an arbitrary column vector a and matrix A as follows, where (·) and (·) denote the real and imaginary parts. The real valued representation of equation (1) is as follows, For detecting received signals, the ML detector provides optimal detection performance by solving the optimization problem known as NP-hard as follows, wherex is selected as the minimum distance between the received signal and all combinations from the shifted constellation by the channel matrix. However, its computational complexity has led to a variety of researches for non-linear detectors with near ML performance and less complexity.

B. ITERATIVE LINEAR DETECTOR
For massive MIMO systems, high detection performance can be achieved by a linear solution of equation (4) as follows, where A = H H H is based on ZF, and A = H H H + σ 2 I Nu is based on MMSE. The channel hardening property of massive MIMO systems make the matrix of H H H diagonally dominant, and this means that the large size and ratio ρ = N b /N u makes the channel more deterministic. In addition, the channel hardening property enables that the matrix inversion operation of A −1 with the O N u 3 complexity is calculated as an iterative approach of solving the linear system.
The GS detector using one promising approximated algorithm can achieve MMSE performance without matrix inversion operation. The matrix A based on MMSE is divided into three parts as follows, where L, A diag and L H are strictly lower triangular matrix, diagonal matrix and strictly upper triangular matrix, respectively. The GS detector is expressed as follows, where z = H H y is defined as a received signal vector after matched filter, x (n) is the estimation of x for n iterations. The above equation can be expressed in form of elements for VOLUME 4, 2016 matrix as follows, is the i-th element for the estimated signal with n iterations. According to equation (8), the previously estimated element as a feedback signal is used to calculate the following estimated element, and the feedback architecture improves the approximation performance.
Iterative linear detectors based on approximation algorithms suffer from drastic performance degradation in situations where the diagonal dominance is weakened. Therefore, to alleviate the problem by adjusting the weight of diagonal elements on the approximation performance, a relaxation parameter w is combined with the GS detector as follows, The above equation is called the SOR detector, and the form of elements is expressed as follows, Although the relaxation parameter improves the detection performance, it is difficult to select an appropriate relaxation parameter from various channels.

C. ITERATIVE ARCHITECTURE FOR OAMPNET
The MIMO detector can be modeled as iteratively combined linear and non-linear estimators such as the AMP detector that provides a computationally tractable option for large system dimensions. In [12], it was proven that the AMP detector can achieve near optimal detection performance by solving equation (4) with low complexity for i.i.d. Rayleigh fading channels. However, the AMP detector does not work well in ill-conditioned environments including correlated channels. Therefore, to apply the AMP architecture to various channels, the OAMP detector uses an MMSE-based linear estimator as follows [30], where the non-linear estimate v 2 k called state evolution is calculated as follows, Furthermore, OAMPNet improves the detection performance through DL algorithm and is expressed for k iterations as follows, where γ k is the trainable parameter for the linear estimator, η k is the non-linear estimation function, and the linear estimator can be differently expressed as r k = x k + w k affected by the i.i.d. Gaussian noise w k with zero mean and variance σ 2 k . For the non-linear estimator, the variance σ 2 k should be calculated iteratively as follows, where C k = I Nu − θ k W k H and the θ k is the trainable parameter for non-linear estimator. In order to eliminate the noise w k , the non-linear estimator η k is used as the optimal denoiser for the i.i.d. Gaussian noise as follows, where . OAMPNet with two trainable parameters per one iteration has very high detection performance for various channels. However, the MMSEbased linear estimator and the noise variance calculation per iteration require high complexity for large system dimensions, and this becomes an obstacle that prevents the application of OAMPNet to massive MIMO systems.

III. PROPOSED ITERATIVE DETECTOR
In this section, the modified SOR detector is first presented as the linear estimator, and then the detailed description of the DNN-based denoiser as the non-linear estimator is presented. The proposed detector combines the SOR detector and the DL algorithm to achieve the detection performance of OAMPNet while reducing the computational complexity in massive MIMO systems. Finally, the training process and complexity of the proposed detector are discussed.

A. MODIFIED SOR DETECTOR
In order to achieve high detection performance by using the learning algorithm, the SOR detector with trainable parameters is proposed. Prior to the description, to avoid confusion about the term iteration, this paper uses the term layer as one iteration of the proposed detector, and linear iteration means iteration of the SOR detector. The proposed detector using the modified SOR detector is expressed for the k layers as follows, x where n is the iteration number of the modified SOR detector for the linear estimator, θ (n) k ∈ R is the n-th weight parameter that is mostly set to default value as 1 for existing iterative linear detectors, w k ∈ R is a relaxation parameter for the linear estimator of the k-th layer, and η k denotes the DNNbased denoiser described in the following subsection.
The initial vector x 1 is the zero vector. The weights and relaxation parameters are optimized through the training process, and therefore the modified SOR detector can obtain adequate flexibility. This solves the problem that selects an appropriate relaxation parameter. Although trainable parameters increase the computational complexity, the modified SOR detector still requires lower complexity than the linear estimator of OAMPNet that calculates the matrix inversion.

B. DNN-BASED DENOISER
The SOR detector provides a result different from the linear MMSE detector for insufficient iteration, and it is difficult to mathematically characterize the noise of the SOR detector to utilize the denoiser for i.i.d. Gaussian noise. For simplicity, equation (17) is referred to as the Gaussian denoiser. The DNN-based denoiser that implements the Gaussian denoiser as a data-driven algorithm is designed to combine with the modified SOR detector. In general, the DNN has multiple inputs and a fully connected layer architecture that consists of input and output layers, l hidden layers, and neurons. However, the Gaussian denoiser is implemented as the elementwise denoising function. Therefore, in order to imitate the Gaussian denoiser as a DNN, the DNN-based denoiser follows the single input architecture depicted in Fig. 1 and can be expressed as follows, where • denotes the connection of subsequent function operation, Λ l is the l-th dense layer that means a fully connected layer calculating the output of N o size from the input of N i size with weight W l ∈ R No×Ni and bias z l ∈ R N o , ϕ is the element-wise rectifier linear unit (ReLU) activation function, ψ is the element-wise softmax function, and ζ is the proposed function to obtain the result of non-linear estimator. For high flexibility of the non-linear estimator, one hidden layer consists of a dense layer and the ReLU activation function, and this can achieve high detection performance with sufficiently large training samples. The output layer uses the dense layer of |S| output with the softmax function to represent the posterior probability of the Gaussian denoiser. The z ∈ R |S| is the vector representation for the result of the output dense layer, and after the calculation of the softmax function, the output result is expressed as follows, , . . . , exp z |S| |S| j=1 exp (z j ) .
To obtain a single value from the DNN of single input, the result of the non-linear estimator is calculated as follows, where s ∈ C |S|×1 is the vector representing all points of S, and this formula is very similar to the Gaussian denoiser. The proposed denoiser based on a data-driven algorithm requires many training samples depending on the number of neurons related to denoising performance and computational complexity. This can control the balance between detection performance and computational complexity. Therefore, it is very important to choose the appropriate size and number of the hidden layer. The number of neurons required for different channels can be slightly different to achieve satisfactory performance, and the high-order modulation requires higher denoising accuracy compared to the general-order modulation. In this paper, the size and number of the hidden layer for the proposed detector are fixed according to the modulation order. For the hidden layer of the general-order modulation as 16QAM, the 1 × 30 first layer and the 30 × 20 second layer are used as illustrated in Fig. 2(a). For the high-order modulation as 64QAM, the 30 × 30 hidden layer is added to improve the denoising performance as illustrated in Fig.  2(b). The DNN-based denoiser replaces the calculation of the noise variance in equation (16) with the training process, and the high flexibility of DNN allows the design of various linear estimators like modified SOR. VOLUME 4, 2016

C. TRAINING PROCESS
The trainable parameters of the proposed detector are optimized by offline training, and the learning algorithm is implemented in Tensorflow utilized the adaptive moment estimation (ADAM) optimization function [31]. For the training process, an appropriate loss function is employed to maximize the improvement of detection performance that can be achieved through the learning algorithm, and such loss function plays a critical role in controlling the learning direction and updating the trainable parameters. In order to obtain optimal trainable parameters, the mean squared error (MSE) loss function based on the end-to-end learning is used and expressed as follows, where M is the number of training samples, K is the layer number of the proposed detector, and the real valued representationx is used for simple learning. The number of training parameters that is mainly determined by the DNN-based denoiser affects the number of required training samples and computational complexity.

D. COMPLEXITY ANALYSIS
In this subsection, the complexity comparison between the proposed detector and OAMPNet is approximately presented except for similar operations as equations of (17) and (23). The proposed algorithm uses linear iteration of N lin for the modified SOR detector. In order to calculate the complexity of the linear estimator, the form of elements for the modified SOR detector is expressed as follows, The complexity comparison is based on the number of real multiplications. For the complexity analysis, one complexby-real and complex-by-complex multiplications are equivalent to two and four real multiplications. The complexity of the iterative detectors in one layer can be analysed for the operation with the highest computational complexity. The linear estimator of OAMPNet requires the matrix inversion operation of O 8N b 3 . The number of real multiplications required for one iteration of the modified SOR detector is calculated as O 4N u 2 + 16N u in equation (25). The non-linear estimator of OAMPNet has the complexity of O 8N b 3 for W k W H k operation in equation (16). The number of real multiplications required for the DNNbased denoiser is equivalent to the number of weights for the DNN using 630 and 1530 weights for general and high-order modulations in hidden layers and 20 × |S| weights in the out- 1) for general-order modulation 2) for high-order modulation put layer. The complexity of the proposed detector is much lower than OAMPNet, and this makes the implementation of the proposed detector reasonable in massive MIMO systems. The summary for the complexity comparison is expressed in Table 1.

IV. SIMULATION RESULTS
In this section, the implementation details of the system configuration, channel models, training, and compared detectors are discussed to evaluate the proposed detector. Then, the convergence and learning curve analyses for the proposed detector are presented according to the number of layers and linear iterations. Finally, the performance evaluation of the proposed detector is presented compared to existing detectors.

A. IMPLEMENTATION DETAILS
In order to assume the massive MIMO system, the N b is fixed at 64 according to the minimum condition mentioned in [3], and the N u of 8 and 16 is used for the performance comparison at various ratios. The simulated channels adopt the i.i.d. Rayleigh fading channel and three-dimensional (3D) channel by widely used Saleh-Valenzuela model [32]. In addition to the i.i.d. Rayleigh fading channel as an ideal stochastic model, the 3D statistical channel with the more realistic statistics is presented to show the adaptability of the proposed detector. Equations for the 3D statistical channel model are detailed in [32]. For the 3D channel, it is important to set parameters of the 3D channel model like the number of scatters assigned to a cluster. The setting parameter of angular standard deviation (ASD) determines the randomly generated interval of azimuth angles of arrival and departure (AoA and AoD) and the zenith angles of arrival and departure (ZoA and ZoD). These factors affect the correlation degree of 3D channel environments, and a limited condition such as 10 • ASD deteriorates the channel hardening property. In addition, 3D channel simulations consider not only the uniform linear array (ULA) of the BS antenna but also the uniform planar array (UPA) arranged in a square arrangement (i.e. the 64 BS antennas are arranged in 8 × 8). Therefore, simulation results for the 3D channel with spatial correlation can show the adaptive learning capability and high detection performance of the proposed detector. In the simulation, the propagation environment of the 3D channel adopts the nonline-of-sight (NLOS) environment, sufficient 20 scatterers, half wavelength for the spacing between adjacent BS antennas, and 30 • ASD for moderate spatial correlation in both ULA and UPA environments. Furthermore, to provide performance comparisons on the strong spatial correlation, the ULA environment of 10 • ASD is additionally considered. The performance evaluation is performed with the proposed detector that is sufficiently trained, and datasets consist of (x,ỹ) pairs generated by the randomness of the channel matrixH, modulated symbol vectorx from the QAM, and received noise vectorñ in equation (3). The noise variance σ 2 is calculated by the signal to noise ratio (SNR), and symbol error rate (SER) as the detection performance is presented according to the received SNR that is as follows, The offline training is carried out with training samples that are uniformly generated according to the simulated SNR range for all training batches. For all channel environments, the proposed detector is equally trained on 20K iterations with a batch size of 500 samples. In order to optimize the trainable parameters, the learning rate is set to 0.003, and the exponential decay algorithm of 0.97 is applied after 1K iterations. The training is again performed for each modulation order, system configuration, and channel environment. For various comparisons of detection performance, the MMSE detector shows the linear detection performance, and the QRDM detector with full reference signal candidate that has near ML performance is presented to provide a baseline of the optimal detection performance in all simulations. Furthermore, the MMSE-based DFE and AMP detectors are presented to show the non-linear detection performance in simulations for the i.i.d. Rayleigh fading channel, and the AMP detector is updated for 50 iterations. For the performance comparison for the DL-based detectors, OAMPNet with 10 layers and DetNet with 30 layers is presented in all simulations. OAMPNet is trained on 50K iterations with the batch size of 1K and learning rate of 0.0008, and DetNet is trained on 100K iterations with the batch size of 3K and learning rate of 0.0003. For more details on comparison networks including learning parameters, this paper is referred to [20], [21]. For compared DL-based detectors, all training datasets are generated randomly from the simulated SNR range and assume the same condition for the training sample. Finally, in order to briefly summarize the implementation details, the setting parameters for training and simulation are given in Table 2.

B. CONVERGENCE AND LEARNING CURVE ANALYSES
The convergence rate of the proposed detector is different according to the number of linear iterations and affects the computational complexity. According to the complexity analysis in section III, the non-linear estimator of the proposed detector occupies an overwhelming proportion of the computational complexity for one layer. Therefore, using fewer layers by increasing the convergence rate with a large number of linear iterations is more efficient in terms of computational complexity than using deep layers to achieve the same detection performance. In Figs. 3-4, simulations to analyse the convergence property and learning curve of the proposed detector are performed in a 3D channel with a single base station of ULA arrangement. The number of UEs is set to 16, and 10 • ASD is considered to investigate the robustness of the proposed detector in a more challenging environment with strong spatial correlation. The proposed detector is trained on datasets randomly generated in simulated ranges of 6-16dB SNR for 16QAM and 12-22dB SNR for 64QAM, and the SER and loss are shown for the maximum SNR of ranges. Fig. 3(a) shows the SER performance of the proposed detector according to the number of layers for 16QAM. In order to evaluate the effect of linear iterations on the convergence rate, the proposed detectors with linear iterations from 1 to 5 are presented in Fig. 3(a). The proposed detector with 1 linear iteration shows a very low convergence rate at 5 layers and over, and there is a large performance gap compared to that with more linear iterations than 1. This means that the 1 linear iteration is very inefficient for channels with strong spatial correlation to achieve the optimal detection performance that the proposed detector can reach. On the other hand, the SER performance of the proposed detector with 5 linear iterations converges to the maximum using only 5 layers. Fig. 3(b) shows the SER performance for the proposed detector according to the number of linear iterations for 64QAM. This result is presented to observe the detection performance limit that can be achieved by increasing linear iterations with a fixed number of layers. Since the high-order modulation like 64QAM requires higher accuracy for the iterative linear detector, it is easy to observe the performance difference according to the linear iterations. In Fig. 3(b), the proposed detector with a single layer has poor detection performance compared to that with more layers than 1, and it is shown that an appropriate number of layers should be selected. The increase in the performance limit according to the number of layers gradually decreases, and the proposed detector approaches the maximum achievable performance when the number of layers increases from 4 to 5. These results show that the proposed detector can achieve high performance by adjusting the number of layers and linear iterations even for high-order modulation and the correlated channel. Fig. 4 shows the loss of the proposed detector with 5 layers and 5 linear iterations according to the number of training iterations. The loss to observe the learning curve is calculated by equation (24). In this result, the validation batch size of 20K samples is used, and the validation is performed for every 200 training iterations. The loss decreases significantly before 10K training iterations and fluctuates with various widths. Variations in the learning curve that do not decrease continuously are caused by randomly generated datasets with different channel condition numbers that affect the detection performance, and the learning curve appears to converge. This means that learning the proposed detector with 20K iterations can minimize the loss and provide high detection performance.

C. PERFORMANCE EVALUATION
Simulation results show the detection performance for general-modulation of 16QAM and high-order modulation of 64QAM, and simulations are performed over an SNR range of 10dB to clearly show SER differences. The number of layers of the proposed detector is fixed at 5, and the number of linear iterations is selected based on the convergence analysis for various simulations. Fig. 5 shows the SER performance comparisons on different channel models where the number of UEs is set to 16. In the performance comparison of Fig. 5(a), the i.i.d. Rayleigh fading channel is assumed, and all comparison detectors show performance differences within the range of about 1dB SNR for 16QAM. However, for 64QAM, AMP shows a drastic robustness problem in the range of 18-20dB SNR, and DetNet has poor detection performance compared to MMSE. On the other hand, the proposed detector provides better detection performance for both modulations than MMSE and non-linear detectors except for QRDM. For the proposed detector of Fig. 5(a), 1 and 3 linear iterations are used for 16QAM and 64QAM, and this is sufficient to achieve the optimal detection performance since the i.i.d. Rayleigh fading channel is ideal for the channel hardening  property that affects the approximation performance of the iterative linear method. In Fig. 5(b), the 3D channel with a single BS of ULA arrangement of 30 • ASD is assumed, and the performance comparison of the DL-based detectors is presented. In the performance comparisons of Fig. 5(b), the SER performances for the QRDM and MMSE detectors are shown similar to Fig. 5(a), within approximately 1dB SNR difference, and therefore the same number of linear iterations for the proposed detector is adopted. The proposed detector and OAMPNet provide nearly the same detection performance as the QRDM detector for both modulations, but DetNet still shows a large performance loss for 64QAM. This means that DetNet has limits for high-order modulation. The QRDM algorithm detects symbol combinations by minimizing the accumulated squared Euclidean distance in a treebased searching algorithm, and this efficiently achieves near-ML detection performance according to the number of ref-  erence signal candidates. However, the QRDM detector with full reference signal candidates still has high computational complexity compared to other detectors due to the overall complexity of calculating the squared Euclidean distance proportional to |S| 2 for each layer of a tree of N u depth in addition to channel matrix inversion and QR decomposition. These results show that the proposed algorithm can achieve equivalent performance to OAMPNet and QRDM detector with lower complexity under the ideal channel model. In addition, this means that the DNN-based denoiser is successfully combined with the modified SOR detector. Fig. 6 shows the SER performance comparisons for a more correlated channel according to the different number of UEs. In order to assume a highly correlated channel, for the same channel environment of Fig. 5(b), only ASD is changed to 10, and this assumption is equivalent to the channel environment in Figs. 3-4. Therefore, the detector proposed adopts 5 layers  and 5 linear iterations based on the results of the convergence and learning curve analyses. Fig. 6(a) shows that DetNet suffers from the robustness problem in the 10-12dB SNR range and shows even lower detection performance than MMSE at 12dB SNR. In addition, DetNet still suffers a large performance loss for 64QAM as shown in Fig 5. On the other hand, the proposed detector achieves the performance of QRDM for both modulations. The proposed detector has a very small gap in SER performance compared to QRDM, and the gap becomes slightly wider as the number of UEs increases to 16. However, the detector proposed in Fig. 6(b) still provides higher detection performance than OAMPNet, and the SER performance differs only within about 1dB SNR from QRDM for both modulations. Fig. 7 shows the SER performance comparisons for the 3D channel with a single BS of UPA arrangement for 30 • ASD, and the BS of the UPA arrangement structurally increases the spatial correlation of the channels. In Fig. 7(a), the SER performance changes for all comparison detectors are very similar to Fig. 6(a), and the proposed detector with the same number of layers and linear iterations provides high detection performance in the same SNR range. Unlike Fig.  6(b), although the performance gap of the proposed detector from QRDM is slightly wider in Fig. 7(b), it is still possible to achieve the SER performance close to QRDM within approximately 1dB SNR difference. The performance comparisons in Fig. 7 show that the proposed detector has robustness for channels with spatial correlation and can achieve better performance than OAMPNet. Fig. 8 shows the number of real multiplications for iterative detectors of K layers according to Table 1. For the complexity comparison, the number of UEs is set to 16, and 1 and 5 linear iterations are used for the proposed detector. In Fig. 8, it is shown that the difference in complexity due to the difference in the number of linear iterations is relatively small compared to the increase in complexity as the layer of the proposed detector becomes deeper. The complexity of OAMPNet that is mainly determined by matrix inversion operation of N b size is very high for massive MIMO. Conversely, the complexity of the proposed detector is mainly determined by N u size and the architecture size of the DNNbased denoiser. This result shows that the proposed detector can greatly reduce the computational complexity and requires about 7-15× fewer real multiplications even for 10 layers than OAMPNet with a single layer.

V. CONCLUSION
In this paper, a novel iterative detector is proposed for high reliability in massive MIMO systems. In order to alleviate the complexity problem and extend OAMPNet benefits to massive MIMO systems, the proposed detector utilizes the SOR detector and the DL algorithm, where the difference from OAMPNet is that it uses the iterative linear detector and the DNN-based denoiser. The DNN-based denoiser is designed