Deep Learning Approach in DOA Estimation: A Systematic Literature Review

In array signal processing, the direction of arrival (DOA) of the signal source has drawn broad research interests with its wide applications in fields such as sonar, radar, communications, medical detection, and electronic countermeasures. In recent years, the application of deep learning (DL) to DOA estimation has achieved great success. 0is study provides a systematic review of research onDOA estimation using deep neural networkmethods.Wemanually selected twenty-five papers related to this research from five prominent databases (SpringerLink, IEEE Xplore, ScienceDirect, Scopus, and Google Scholar) for exploration. Six questions describing the overall trend of DOA estimation using deep learning are put forward.0en, we answered these questions by reviewing the literature. 0is review is helpful for researchers in this field because it provides more specific and comprehensive information needed for future research. Specifically, we first analyzed the background of the selected papers, including the type of publication, the number of citations, and the country of origin.0en, the DL technology used in DOA estimation is systematically analyzed, including the purpose of using DL in DOA estimation, various DL models (convolutional neural network, deep neural network, and combination network), and various DOA estimation schemes. Finally, various evaluation criteria (root-meansquared error, accuracy, and mean absolute error) are used to evaluate the DL technology in DOA estimation, and various factors (signal-to-noise ratio, number of snapshots, number of antennas, and number of signal sources) affecting DOA estimation are analyzed. Based on our findings, we believe that deep learning can perform DOA estimation well, and there is still room for improvement in deep learning technology. In this study, the factors affecting DOA estimation can be used as the direction for researchers to conduct in-depth research.


Introduction
Early DOA estimation has its origins in the conventional beamforming (CBF) [1], which directly corresponds to the traditional Fourier spectrum estimation method from the time domain signal processing method to the spatial domain signal processing, such that the array angle resolution is restricted by the Rayleigh limit constraint. e Rayleigh limit means that the two signals can be distinguished only when the angular separation between the two far-field signal sources is greater than the antenna beamwidth. e Capon method [2] can minimize the output energy in the interference direction while keeping the output energy in the desired direction constant. is method does not require the number of sources in advance and is robust, but its resolution is not high enough. Eigen subspace methods can break the Rayleigh limit. e Pisarenko method is a harmonic analysis method [3]. It obtains the signal subspace and noise subspace by performing eigenvalue decomposition or singular value decomposition on the array covariance matrix and uses the orthogonality between each other. e eigenvector corresponding to the smallest eigenvalue is taken as the noise subspace, and a high-precision DOA estimation of the target is obtained with a small computational cost. However, this algorithm has limitations because it is only suitable for the number of array elements that exceeds the number of signal sources by one. e DOA estimation is obtained with super-resolution capabilities, the most promising of which are the multiple signal classification (MUSIC) method [4] based on noise subspace and the estimation of signal parameters by the rotation immutability technology (ESPRIT) [5].
Traditional DOA estimation methods are generally based on a physical array structure, so they have certain limitations. For virtual antenna array technology, the signal received by the actual array antenna can be used to construct the signal of the virtual array element [6]. It increases the degree of freedom of the array and expands the array aperture, which allows it to handle more signal sources and improve the performance of DOA estimation.
is technology mainly includes the interpolation transformation method and the fourth-order statistics method. e interpolation transformation method divides the transformation area according to the approximate direction of the received signal from the array antenna. en it obtains the transformation matrix according to the virtual transformation array flow pattern vector and the actual array flow pattern vector and calculates the virtual transformation covariance matrix, ultimately achieving the virtual array transformation [7]. e fourth-order statistics method treats the fourthorder statistics of the data received by the array as a crosscorrelation between the specific actual array elements and the corresponding virtual array elements. en, it replaces the second-order covariance matrix with the constructed fourth-order covariance matrix [8]. ese methods further enhance the performance of DOA estimation because they break through the limitations of physical arrays on DOA estimation. In addition, there is a DOA estimation algorithm based on the compressed sensing theory, which converts the DOA estimation problem into a row sparse matrix reconstruction problem and reduces the amount of calculation by singular value decomposition.
In practical applications, the number of signal sources is usually unknown. However, except for a few methods (such as the Capon method), most DOA estimation methods require prior knowledge of the number of signal sources. If the estimated number of signal sources does not match the actual number, the orthogonality of the signal subspace and noise subspace will be affected.
is severely affects the accuracy of DOA estimation. In response to this limitation, many scholars have proposed classical methods for estimating the number of signal sources. Akaike [9] proposed a method based on the Akaike information criterion (AIC). Wax and Ziskind [10] proposed a method based on the minimum description length (MDL). However, the AIC and MDL criteria are only applicable to Gaussian white noise. In a practical environment, colored noise is more common than Gaussian white noise. erefore, these two methods are not applicable in the condition of colored noise. By contrast, the RAIC [11] and RMDL [12] estimation methods using diagonal loading technology in the information theory criterion can effectively suppress the divergence of noise eigenvalues to smooth color noise, with good estimation performance under any form of array. e Gerschgorin disk estimation (GDE) method proposed by Wu et al. [13] can estimate the number of signal sources for an unknown array signal against white noise and colored noise background. e GDE method does not obtain the signals from all sensors, so when the number of sources is large, the probability of correct detection will decrease rapidly due to insufficient degrees of freedom.
Although researchers have proposed a number of DOA estimation methods, a method that can accurately estimate DOA has yet to be found. With the development of artificial intelligence (AI), estimating DOA with AI has the potential to yield a remarkable result. Compared with traditional algorithms, the DOA estimation algorithm based on deep learning greatly improves the estimation performance and generalization, but there is still room for improvement in performance under high noise or strong reverberation conditions. In this paper, we present a systematic literature review (SLR) on the approach of deep learning in DOA estimation computation. is paper is organized as follows. Section 2 is the introduction of deep learning. Section 3 presents the methodology used to perform the systematic literature review. Section 4 analyzes the current research status of DOA estimation. Section 5 examines the different architectures of DL used in DOA estimation. Section 6 compares the performance of various methods of DOA estimation using deep learning. Finally, the paper is concluded in Section 7.

Deep Learning
Machine learning is a process by which machines use artificial neural networks (ANNs) to compute large amounts of Internet-based data to autonomously simulate the process of human learning and ultimately make smart decisions. ANN is an adaptive nonlinear dynamic network system composed of a large number of neurons connected to each other. It is a simulation and approximation of biological neural networks. As shown in Figure 1, the MP model [14] is the first mathematical model of neurons. Later, Rosenblatt [15] used the linear optimization method to simulate the nervous system of human learning and proposed a single-layer perceptron model. However, their model cannot deal with the linear inseparability problem. To make up for this shortfall, Rumelhart et al. [16] proposed the backpropagation (BP) neural network, which is a multilayer feedforward neural network trained according to the error backpropagation algorithm. However, when the number of network layers increases, the BP neural network was found to have problems such as local optimization, overfitting, and gradient diffusion, which restrict the use of the BP neural network. In addition, some other excellent shallow neural networks have emerged, such as support vector machines (SVM) [17] and Gaussian mixture models (GMM) [18]. But they fail to address problems of machine learning such as overfitting [19], the curse of dimensionality [20], and theoretical guarantees [21]. To solve these problems, a new branch of machine learning was produced: deep learning. Because such learning does not require artificial design or feature extraction and can learn from data autonomously, it is also called unsupervised learning.
Deep learning originated from machine learning and statistical mechanics. Based on the discrete Hopfield network [22], Ackley et al. [23] used Boltzmann distribution to propose the Boltzmann machine (BM), which introduces statistical probability into the neuron state changes. e equilibrium state of the network follows the Boltzmann distribution, and the optimal solution is sought through the simulated annealing algorithm. In the following year, Smolensky [24] proposed a restricted Boltzmann machine (RBM) by defining BM as a two-layer network, namely, the visible unit layer and the hidden unit layer. Besides, it is stipulated that the neurons in different layers are independent of each other while those in the same layer are connected to each other [25], as shown in Figure 2. Hinton and Salakhutdinov [26] proposed to connect RBMs in series to form a deep belief network (DBN) consisting of multiple stacked RBMs and a backpropagation (BP) network. e training process is divided into two steps: pretraining and fine-tuning. In the pretraining process, the output of the RBM of the previous layer is used as the input for the next layer, and the unsupervised greedy method is used to train each RBM from bottom to top with the weight being updated during training. e weight obtained after the pretraining is used as the initial weight of the DBN, and then the entire network is adjusted by the error between the top-level output and the expected output. e number of hidden layers of RBM is increased to obtain a deep Boltzmann machine (DBM), which uses the contrast divergence CD algorithm [27] to greatly improve the training speed of the DBM. e network structures of DBN and DBM are shown in Figure 3 [26].
Convolutional neural networks (CNNs) are currently more popular than other artificial neural networks. e basic structure of the CNN consists of an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer [28]. e convolutional layer is composed of many feature maps, with each map consisting of many neurons. Each neuron is connected to a part of the functional graph of the previous layer by a convolution kernel that serves as a weight matrix [29]. en the weighted sum of the local area is passed to a nonlinear function to obtain the output value of each neuron in the convolutional layer. e weights of the CNN are shared between the same input feature map and the output feature map. e convolutional layer of the CNN extracts different features of the input through a convolution operation, with the first layer extracting low-level features and the higher-level one extracting more advanced features. e convolutional layer is followed by the pooling layer, which is also composed of many feature maps. Each of the maps uniquely corresponds to each feature map of the convolutional layer, so the number of feature maps is constant. e pooling layer, which plays the role of secondary extraction of features, obtains spatially invariant features by reducing the resolution of the feature map [30]. Each neuron in this layer performs pooling operations on the local receptive field. Commonly used pooling methods include maximum pooling, which is to take the point with the largest value in the local receptive field, and average pooling, which is to take the average of the values in the locally accepted domain [31]. Among the commonly used pooling methods, overlapping pooling methods can be used even when different neurons in the same feature map of the pooling layer do not overlap with the local receptive field of the previous layer. For this pooling method, there is an overlapping area between adjacent pooling windows [32]. e pooling layer reduces the number of neurons through the pooling operation and reduces the amount of calculation of the network model. In a CNN, after several convolutional layers and several pooling layers, one or more fully connected layers are connected. In the fully connected layer, which is also called a softmax layer, each neuron is fully connected to all neurons in the previous layer, and the activation function of the neurons in it is generally the ReLU function [33]. is layer can integrate categorydiscriminatory information in the convolutional layer or the pooling layer [34]. en, its output value is passed to an output layer for classification using softmax regression [35].

Mobile Information Systems
Recurrent neural networks (RNNs) can be used to process and predict sequence data. It overcomes many limitations of traditional machine learning methods on input and output data and can be well applied when there is certain time dependence in the data. erefore, RNN is an important model for deep learning. e long short-time memory (LSTM) network proposed by Hochreiter and Schmidhuber [36] is now the most effective sequence model in practical applications. LSTM is improved on the basis of RNN. LSTM uses three gate structures to control the state and output at different times, namely, input gate, output gates, and forget gate. e LSTM architecture mentioned by Graves and Schmidhuber [37] is shown in Figure 4. LSTM mitigates gradient disappearance by combining short-term memory with long-term one through a gate structure. e gate structure can be regarded as a fully connected layer that stores and updates information, and its activation function is the sigmoid function. e Sigmoid function will output a value between 0 and 1 to indicate the number of messages that can pass through the door at the current moment. Zero means that no information can be passed, and 1 means that all information can be passed. e forget gate is a key component of the LSTM unit, which controls which information should be retained and which should be forgotten, thus avoiding the gradient disappearance and the explosion caused by gradient backpropagation over time. e input gate controls the amount of information that flows into the memory unit from the current input data, and the output gate determines the amount of information that the memory unit outputs in the current state.

Review Methodology
In this research work, twenty-five papers related to the topic have been selected. Five prominent databases including SpringerLink, IEEE Xplore, ScienceDirect, Scopus, and Google Scholar have been explored. Six questions describing the overall trend of DOA estimation using deep learning are put forward in the research questions section. Based on the guidelines provided by Kitchenham and Charters [38] (as shown in Figure 5), three main steps have been undertaken to carry out this SLR work, namely, planning, conducting, and reporting, and each step consists of several activities.

Research Questions.
e proposed system evaluation aims to research how DOA estimates can benefit from the application of DL technology. e research questions are listed below: RQ1: What types of publication distributions available from the databases over the last seven years related to the topic area? RQ2: Why is DL technology applied to DOA estimation? RQ3: Which DL techniques are applied to DOA estimation? RQ4: What are the key aspects of DOA estimation? RQ5: What evaluation criteria are used for DOA estimation, and how do they perform? RQ6: What factors affect DOA estimation, and how do they affect DOA estimation?

Study Identification and Selection.
e searched phrases are divided into two groups: DOA estimation and deep learning. e string is defined as any term related to signal processing (e.g., "DOA estimation," and "source number estimation," "source number enumeration," and "direction of arrival," combined with the function OR) with any term related to deep learning (e.g., "artificial neural network," "human convolutional neural network," "deep learning," "CNN," "DNN," and "RNN"). e search platform chosen were EZAccess Portal (Malaysia Putra University Library Database) and Google Scholar. e former portal contains many well-known databases, namely, SpringerLink, IEEE Xplore, ScienceDirect, and Scopus. e latter portal has a wide range of academic literature, making it easier to search.
Using the above string combination in the database, a total of 2,499 papers are returned. Table 1 shows the distribution of papers in each database. e duplicated papers are deleted, and a total of 2,444 papers are excluded using the exclusion criteria (not written in English, repetitions, books, inaccessible papers, works not in the shipping area, and papers less relevant to the research direction of the review). During the extraction process, the remaining 55 papers are analyzed. Given the great number of papers, we analyzed the abstracts of these papers, and the 25 most relevant papers were selected for review.

Analysis of Current Research
is section aims to introduce the basic information of related papers and provide the answer for RQ1 based on the following findings: (1) Figure 6 shows the number of related articles from 2015 to March 2021. As can be seen, since 2017, the use of deep learning in DOA estimation studies has become increasingly popular. (2) e number of journal publications from each country is presented in Figure 7. Among them, papers from China account for the vast majority, Figure 4: LSTM unit structure.
with a total of 17 papers, showing that China attaches the most importance to this research direction. is is followed by two papers each from India, the USA, and Germany, and one each from Japan and Turkey.
(3) Table 2 shows our statistics on the number of citations and publication types for the 25 related papers. As can be seen from the table, 7 papers were published in conferences, and the other 18 papers were published in journals. We included 7 conference papers because there were not enough journal papers to select as the application of deep learning in DOA estimation is not yet widespread. (4) Citation count is a reference indicator for published papers. e more the citation count, the more valuable the work is supposed to be. Table 3 provides statistics on the number of times relevant papers have been cited. It can be seen from the table that there are 14 papers with a citation count less than 5, 2 papers with citations greater than or equal to 5 and less than 10, and 9 papers with a citation count greater than or equal to 10. It can be seen that nearly half of the papers have citation counts greater than 5. In addition, it should be noted that some papers have a zero citation count due to their short publication period. Among them, the work done by Huang et al. [57] have been cited 321 times; the work done by Chakrabarty and Habets [61] have been cited 114 times; the work done by Liu et al. [58] have been cited 83 times; and the work done by Chakrabarty and Habets [54] have been cited 70 times.

Deep Learning in DOA Estimation
Research on related papers shows that different DL models frameworks and algorithms are applied to DOA estimation.
is section answers the research questions RQ2, RQ3, and RQ4. To this end, the advantages of DL applied to DOA estimation compared to traditional DOA estimation • Background searching for current and upcoming reviews; • Identification of research papers; • Preliminary selection of the papers; • Data management; • Analysis of the research situation; • Evaluation of neural network models; • Analysis of variables; • Final report of the SLR; • Description of the key aspects of DOA estimation; • Clear definition of the research questions; • Application of paper selection criteria; • Definition of search terms; Selection of database; • Definition of inclusion and exclusion criteria; Planning Conducting Reporting Figure 5: A systematic literature review process [38].    Mobile Information Systems 5 methods are emphasized. en, we introduced various DL techniques used in DOA estimation. Finally, we summarized various scenarios in DOA estimation.

Purpose of Using DL.
To answer RQ2, we analyzed the disadvantages of the traditional DOA estimation method and the advantages of the DL method. e traditional DOA estimation algorithm is generally limited by many factors. For example, the received coherent signal will cause the signal subspace and the noise subspace to permeate each other, which reduces the accuracy of many classical subspace-like DOA estimation algorithms. In addition, the resolution of the DOA estimation algorithm will be limited by the physical aperture of the array, and the maximum number of resolvable targets will be limited by the number of elements. Besides, in practical applications, the number of signal sources is often unknown, and most DOA estimation methods need to know the number of signal sources in advance. Otherwise, the estimation of the signal subspace and the noise subspace will be inaccurate, which affects the orthogonality of the signal subspace and the noise subspace and ultimately DOA estimation accuracy. Compared with traditional methods, the DL method converts DOA estimation into pattern recognition. e DOA estimation is carried out by extracting the features of the signal data, which overcomes the disadvantages of the traditional DOA estimation algorithm and improves the accuracy of the DOA estimation.

Deep Learning
Techniques. To answer RQ3, the DL techniques used were reviewed from several papers. Table 4 is the summarization of the array model, DL model, activation function, number of network layers, and description of the network structure used in the considered papers. e information given in Figures 8-10 provides the answer to RQ3. Figure 8 summarizes the array model used in DOA estimation. It can be seen from the figure that ULA is used the most (81%), followed by UCA (14%), and SMA (5%) is also used in speech DOA estimation. Figure 9 is the distribution of DL techniques applied in the research work of the selected reviewed papers. e most used DL model is CNN (40%), followed by DNN (36%). en, some combination networks (12%) are used, such as CNN-RNN, CNN-LSTM, and DNN-SVM.
ere are also some rarely used network models (12%), such as DFN, RNN, and SVM. Figure 10 summarizes the activation functions used in the DL model for DOA estimation. It can be seen from the figure that the most used activation functions are ReLU (57%), followed by Sigmoid (19%), Tanh (10%), and others (14%).
In summary, ULA is the most frequently used array model by researchers; CNN and DNN are the most frequently used DL models; and ReLU is the most commonly used activation function.

DOA Estimation in Signal
Processing. DOA estimation is an important task in signal processing, and it has a wide range of applications in fields such as radar, sonar, and so on. Xiang et al. [51] analyzed the unknown multipath signal and concluded that the unknown multipath signal severely distorts the phase characteristic distribution of the desired signal. e designed supervised deep neural network is used for phase enhancement, thereby effectively reducing the phase distortion. e verification of real data shows that this method effectively improves the DOA estimation. Goodman et al. [52] evaluated two new techniques for estimating the direction of arrival of RF sources: constrained integer optimization and deep learning. Research has found that deep learning is more robust to significant calibration errors. To adapt to the DOA estimation in the urban environment, Shi et al. [42] proposed a complex-valued convolutional neural network (CCNN). Experiments show that CCNN has a faster convergence rate than CNN and a higher DOA estimation accuracy. Xiang et al. [47] proposed a supervised CNN phase enhancement model, which can reduce phase distortion by enhancing the phase characteristics. eir  3 √ Shi et al. [42] 0 √ Chen et al. [43] 0 √ Elbir [44] 12 √ Yao et al. [45] 2 √ Cong et al. [46] 0 √ Xiang et al. [47] 1 √ Zhu et al. [48] 0 √ Rogers et al. [49] 0 √ Yang et al. [50] 6 √ Xiang et al. [51] 6 √ Goodman et al. [52] 0 √ Fu et al. [53] 2 √ Chakrabarty and Habets [54] 70 √ Wajid et al. [55] 3 √ Pan et al. [56] 3 √ Huang et al. [57] 321 √ Li et al. [31] 15 √ Liu et al. [58] 83 √ Kase et al. [59] 13 √ Wang et al. [60] 12 √ Chakrabarty and Habets [61] 114 √ Zheng et al. [62] 10 √  [46] proposed a DNN-based DOA estimation framework that includes an autoencoder, a feedforward network, a network parameter database, and a collection of a series of directed acyclic graph networks (DAGN). Among them, the autoencoder is equivalent to the noise filter, and each subnet of DAGN is composed of a convolutional neural network (CNN) and two bidirectional long short-term memory (BiLSTM) networks. e simulation shows that the DOA estimation performance of this network is better than the traditional subspace algorithm. Yao et al. [45] proposed a DOA estimation model based on a recurrent neural network. With the help of Toeplitz matrix reconstruction, the model can estimate DOA for signals with unknown signal sources. However, this model does not perform well in an environment with a low signal-to-noise ratio and color noise. Elbir [44] designed multiple CNNs, and such that each CNN is dedicated to an angular spectrum to learn the multiple signal classification (MUSIC) spectra of the corresponding angle subregion. is method reduces the amount of calculation and improves the accuracy of DOA estimation. Kase et al. [59] designed a stacked DNN with multiple singlelayer neural networks. e lower triangular part of the correlation matrix of the received signal vector is used as input to train the DNN. e simulation results show that the DNN designed in a specific scenario has good DOA estimation performance. Liu et al. [39] designed multiple CNNs based on the number of array elements and used covariance matrices containing real and imaginary numbers for training. After a large amount of data learning, this method can effectively identify the direction of underwater acoustic signals. Liu et al. [58] proposed a DNN framework, which consists of a multitask autoencoder and a series of parallel multilayer classifiers. e encoder and the classifier are trained on different data sets. e function of the autoencoder is to decompose the input into multiple components in different spatial subregions. Simulation shows that this method can be well applied to array defects, but in practical applications, it faces a significant challenge as it requires a large amount of labelled data for training. Chen et al. [43] proposed a DNN framework for DOA estimation of radio waves. e network is divided into a detection network and a DOA estimation network. For the detection network, the search area of the antenna array is divided into several sectors, with each of them corresponding to a DOA estimation network. is is to detect the signal radiated by each sector. According to the detection result, one or more DOA estimation networks can be activated for DOA estimation. Simulation shows that compared with the traditional method, this method streamlines the calculation, improves the estimation accuracy, and has excellent generalization ability. Huang et al. [57] proposed a novel DNN framework for super-resolution DOA estimation and channel estimation through offline learning and online learning. Among them, offline learning is to use simulated data for training under different channel conditions, and online learning is to obtain the corresponding output data based on the current input data. Experiments have confirmed that methods based on deep learning can achieve better DOA estimation than traditional methods. Xiao et al. [41] proposed a DeepFPC network structure similar to the deep residual network. DeepFPC has high sparse signal recovery performance and good DOA estimation performance under low SNR.

Speech DOA Estimation.
Speech DOA estimation can be applied to distant automatic speech recognition. Varanasi et al. [40] proposed a network architecture applied to DOA estimation by discussing the azimuth angle and source height existing in the amplitude and phase characteristics of spherical harmonics. ey also expanded the DOA estimation method into a dense DOA search grid. Training and testing were performed using data sets of simulated and real environments, respectively, and performance evaluation showed that DOA estimation was improved even in noisy and reverberant environments. Fu et al. [53] proposed a new blind DOA estimation method that uses the 2D convolution nonnegative matrix factorization method to generate a new array signal to estimate the azimuth angle of the reverberation signal. Wajid et al. [55] proposed to use the recurrent neural network (RNN) model to learn some similar features used in DAS beamforming. e results show that the DOA estimation result based on RNN is better than DAS beamforming. Li et al. [63] developed a supervised learning algorithm combining CNN and long short-term memory (LSTM) network for DOA estimation. Retraining the model with new data makes the method robust to noise and reverberation and can quickly adapt to new microphone arrays. Chakrabarty and Habets [54] proposed to use the supervised learning method of CNN to estimate the DOA of multiple speakers. is method formulates the multispeaker DOA estimation as a multiclass multilabel classification problem. Among them, the characteristics of each input element are regarded as a separate binary classification problem. is method can accurately locate the speaker in a dynamic acoustic scene. Zheng et al. [62] proposed to use different values of SNR and noise to train DNNs, which achieved higher DOA accuracy at low SNR and improved the intersensor data ratio (ISDR) performance of a single acoustic vector sensor (AVS) in a noisy environment. Wang et al. [60] proposed to use acoustic vector sensors (AVSs) to estimate the DOA of multiple voice sources through clustering of data ratios between sensors. is method designs a connection between DNN and SVM. Using a soft mask learner, the time-frequency points (TD-TFP) dominated by the target speech can be extracted under different noisy and reverberation conditions, thus improving the estimation performance. Chakrabarty and Habets [61] proposed to use CNN as a classifier for wideband DOA estimation. eir method uses the synthesized noise signal to train the CNN. It directly feeds the phase component of the short-time Fourier transform coefficients of the received microphone signal into the CNN and performs wideband DOA estimation. Experimental evaluation shows that the method has good robustness to noise and small disturbances.

Signal Source Number Estimation.
Most of the existing DOA estimation algorithms require prior knowledge of the signal source number, so the estimation of the number of signal sources is the primary task of DOA estimation. If the estimated quantity is different from the actual one, the DOA estimation will be affected. Pan et al. [56] proposed a source number enumeration model (M-UCA) of UCA with M antennas.
is model can estimate the number of signal sources at most M − 1 by extracting features from the instantaneous phase of the array signal and then using SVM as a classifier to classify signals with different numbers of signal sources. Yang et al. [50] proposed to use a regression network (ERNet) and a classification network (ECNet) for source number detection. en the signal's covariance matrix is taken as input and the number of signal sources as data labels for training. is data-driven method can automatically learn the threshold used to separate signal and noise characteristic values and does not require a Gaussian hypothesis for derivation like traditional methods. Simulation experiments have verified the effectiveness of this method. Rogers et al. [49] designed a 15-layer deep learning network with parameter correction linear units, which uses eigenvalues and spatial smoothing covariance matrix entries as inputs to estimate the number of sources of narrowband signals. Although this literature survey is preliminary, it plausibly suggests that deep learning can have better source number estimation capabilities.

Performance Metrics and Explanatory Variables
is section aims to answer the research questions RQ5 and RQ6. For this purpose, the performance metrics of related papers are counted, and the factors affecting DOA estimates are analyzed.

Performance Metrics.
Evaluating the performance of the DL model is important for verifying the quality of the DOA estimation algorithm. To answer RQ5, we summarized the performance of the DL model in the considered papers. As shown in Table 5, some performance criteria are used to evaluate the performance of the DL model. Figure 11 helps answer RQ5 by counting the distributions of different evaluation criteria used by researchers. e most commonly used evaluation criteria are root-mean-squared error (RMSE), accuracy (A), and mean absolute error (MAE), accounting for 44%, 24%, and 16%, respectively. In addition, there are some other evaluation criteria that are less used, such as mean squared error (MSE), gross error (GE), mean error (ME), and average accuracy (AA). e following is a brief description of the frequently used criteria. e root-mean-squared error is used to measure the deviation between the observed value and the true value. e average absolute error can avoid the mutual cancellation of errors, so it can accurately reflect the size of the actual forecast error. We assume: (i) y � true value; (ii) h(x) � observed value; and (iii) m � number of observations. en the expressions of RMSE and MAE are as follows:   predicted samples and positive actual samples.
e expression can be expressed as follows: e gross error is an error other than a random error or systematic error. e mean error is the average of all errors in a group. e average accuracy refers to the average of the accuracy rates of all categories.
By comparing each method, Fu et al. [53] had the best performance when using RMSE as the evaluation method, and RMSE is 0. When using A as the evaluation method, Yang et al. [50] had the best performance, and A is 100%. When using MAE as the evaluation method, Chakrabarty and Habets [54] had the best performance, and MAE is 0.6. However, the performance of different methods is subject to the influence of various factors, which lead to performance variations. e various factors affecting DOA estimates will be described in the next section.

Factors Affecting the DOA Estimation.
e result of DOA estimation is related to the incident signal source and the environment in which it is applied.
is section aims to answer RQ6, so we first count the various factors that affect the DOA estimation and then analyze the results of each factor's influence on the DOA estimation. e DL method of DOA estimation is affected not only by the DL models but also by the data set. e data set is determined by the characteristics of the incident signal. Table 6 counts the various factors affecting DOA estimation in the papers reviewed, including signal-to-noise ratio (SNR), as well as the number of snapshots, antennas, and signal sources. e following is an analysis of how each factor affects DOA estimation.
6.2.1. e Impact of SNR on DOA Estimation. SNR directly affects the performance of the super-resolution DOA estimation algorithm. As shown in Table 6, increasing the SNR from −5 dB to 5 dB in [51] reduces its RMSE from 0.08 to 0.02. Increasing the SNR from −5 dB to 5 dB in [53] reduces its RMSE from 3.3 to 0.3, while the RMSE drops from 1.3 to 0.28 when increasing the SNR from −10 dB to 10 dB in [43]. From the above relationship between SNR and RMSE, it can be seen that as the signal's SNR increases, the DOA estimation performance of the algorithm also improves. erefore, improving the DOA estimation performance of the algorithm under low SNR conditions is the primary task of the high-resolution DOA estimation algorithm.

e Impact of the Number of Snapshots on DOA Estimation.
e number of snapshots is defined differently: in the time domain, it is the number of sampling points; in the frequency domain, it is the number of time subsegments of the discrete Fourier transform. As shown in Table 7, if the number of snapshots in [45] increases from 20 to 100, the MAE drops from 1.6 to 0.1. Increasing the number of snapshots from 50 to 400 in [46] reduces the RMSE from 0.4 to 0.1. Increasing the number of snapshots from 10 to 1,000 in [50] increases the accuracy from 91% to 100%. e above relationship between the number of snapshots and the evaluation criteria shows that the DOA estimation performance improves as the number of snapshots increases.

e Impact of the Number of Antennas and the Number of Signal Sources on DOA Estimation.
e number of antennas in the array and the number of sources of the incident signal also affect the DOA estimation. Table 8 shows the relationship between the ratio of the number of signal sources to the number of antennas and the evaluation criteria. If the ratio in [41] increases from 1/20 to 5/20, the MAE increases from 0.35 to 1.02. In [51], if the ratio increases from 1/20 to 1/10, the accuracy will drop from 100% to 97%; if the ratio in [44] increases from 1/8 to 3/8, the RMSE increases from 0.01 to 0.018. From the above relationship, the DOA estimation performance will improve as the ratio increases.
at is to say, the smaller the number of signal sources and the greater the number of antennas, the better the DOA estimation.

Conclusions
Deep learning (DL) has been successfully applied in many fields due to its powerful capabilities. erefore, this article presents a systematic literature review of the main papers on DOA estimation using DL technology. is research first conducted a cursory analysis of the 25 selected papers, including the type of publication, the number of citations, and   the country of origin.
en, a systematic analysis of DL techniques used in DOA estimation is then presented, including the purpose of using DL in DOA estimation, various DL models, and various DOA estimation scenarios. Finally, the DL technology in DOA estimation is evaluated, and various factors affecting DOA estimation are analyzed. RQ2: Why is DL technology applied to DOA estimation? e DOA estimation is carried out by extracting the features of the signal data, which overcomes the disadvantages of the traditional DOA estimation algorithm and improves the accuracy of the DOA estimation.
RQ3: Which DL techniques are applied to DOA estimation?
ULA is the most frequently used array model by researchers; CNN and DNN are the most frequently used DL models; and ReLU is the most commonly used activation function.
RQ4: What are the key aspects of DOA estimation? (i) DOA estimation is an important task in signal processing, and it has a wide range of applications in fields such as radar, sonar, and so on. (ii) Speech DOA estimation can be applied to distant automatic speech recognition. (iii) e estimation of the number of signal sources is the primary task of DOA estimation. If the estimated quantity is different from the actual one, the DOA estimation will be affected. RQ5: What evaluation criteria are used for DOA estimation, and how do they perform? e most commonly used evaluation criteria are rootmean-squared error (RMSE), accuracy (A), and mean absolute error (MAE), accounting for 42%, 27%, and 16%, respectively. Fu et al. [53] had the best performance when using RMSE as the evaluation method, and RMSE is 0. When using A as the evaluation method, Yang et al. [50] had the best performance, and A is 100%. When using MAE as the evaluation method, Chakrabarty and Habets [54] had the best performance, and MAE is 0.6. RQ6: What factors that affect DOA estimation, and how do they affect DOA estimation?
Various factors affecting DOA estimation in the papers are reviewed, including signal-to-noise ratio (SNR), as well as the number of snapshots, antennas, and signal sources. (i) As the signal's SNR increases, the DOA estimation performance of the algorithm also improves. (ii) e DOA estimation performance improves as the number of snapshots increases. (iii) e smaller the number of signal sources and the greater the number of antennas, the better the DOA estimation.

Final Remarks.
e application of deep learning technology to DOA estimation has achieved good results. is paper reviews methods for research and achievement. It is hoped that this paper can provide an overview for researchers interested in this field.

Data Availability
e data generated and analyzed during the current study are available in the public domain.

Conflicts of Interest
e authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.