Signal Property Information-Based Target Detection with Dual-Output Neural Network in Complex Environments

The performance of traditional model-based constant false-alarm ratio (CFAR) detection algorithms can suffer in complex environments, particularly in scenarios involving multiple targets (MT) and clutter edges (CE) due to an imprecise estimation of background noise power level. Furthermore, the fixed threshold mechanism that is commonly used in the single-input single-output neural network can result in performance degradation due to changes in the scene. To overcome these challenges and limitations, this paper proposes a novel approach, a single-input dual-output network detector (SIDOND) using data-driven deep neural networks (DNN). One output is used for signal property information (SPI)-based estimation of the detection sufficient statistic, while the other is utilized to establish a dynamic-intelligent threshold mechanism based on the threshold impact factor (TIF), where the TIF is a simplified description of the target and background environment information. Experimental results demonstrate that SIDOND is more robust and performs better than model-based and single-output network detectors. Moreover, the visual explanation technique is employed to explain the working of SIDOND.


Introduction
Target detection is essential to radar signal processing and plays a vital role in all sensor fields. For radar systems, it means deciding whether radar data represent an echo coming from a target. The presence of a target prompts the system to engage in further processing [1]. However, the robustness of the detection algorithms may suffer due to the complexity and dynamic variability of the environment, which can be broadly categorized into three scenarios [2]. The first is homogeneous background. In this model, the stationary background noise exists throughout the reference window. The second is the clutter edge model. This model describes the transition areas between different background regions. The third scenario is multiple targets. This situation represents two or more spatially close targets in the detection window.
Radar target detection can be achieved by either model-based or data-driven detectors, where the former employs statistical models to build a likelihood-ratio test (LRT), while the latter transforms the task of target detection into a classification problem. According to the Neyman-Pearson criterion, the model-based constant false-alarm ratio (CFAR) technique can maintain a constant probability of false alarms (P f a ) while maximizing the probability of detection (P d ), which provides an adaptive detection threshold for the LRT by estimating the cell-under-test (CUT) background noise power level (BNPL) using reference cells adjacent to the CUT. According to the BNPL estimation strategy, CFAR algorithms can be divided into three categories, mean-level (ML), ordered statistics (OS), and adaptive CFAR. The ML CFAR algorithms, such as cell average CFAR (CA-CFAR) [3], the smallest-of CFAR (SO-CFAR) [4], and the greatest-of CFAR (GO-CFAR) [5], estimate BNPL by weighted averaging of leading, lagging, or the entire reference window samples. They can provide open literature. Furthermore, the proposed method exploits all the available information from the detection window, which includes target echoes, interfering echoes, clutter, and other relevant data to determine the optimal threshold for the SPI-based detector. The TIF is a compact representation of the significant information in the detection window. A higher TIF value reflects the presence of more discernible SPI in the detection window, leading to a lower threshold requirement for the detector. By employing a dynamic-intelligent threshold mechanism, the proposed SIDOND can effectively enhance the target detection performance in complex environments.
The main contributions of this paper are as follows: 1.
The proposed single-input double-output network detector (SIDOND) is a promising approach to extracting both the target and background environment features without significantly increasing network capacity and training complexity. 2.
The dynamic-intelligent threshold mechanism can adaptively adjust the threshold based on the estimated target and environmental information, which enhances the detection performance in a complex environment while maintaining a low falsealarm rate. 3.
The CNN based on periodic activation function and a particular initialization strategy can effectively avoid the gradient disappearance problem of deep networks, which improves the convergence speed and network performance in the target detection task.
The remaining sections of this paper are organized as follows. In Section 2, the model of target echos is introduced, and the target detection task is formulated. Section 3 analyzes the methodology and structure of the proposed SIDOND. Section 4 presents simulation results under various conditions, including multiple targets, clutter edges, and complex environments. Finally, Section 5 provides conclusions.
Some symbols used in this paper are explained as follows. The boldface characters represent vectors or matrices. N µ, σ 2 is defined as a normal distribution with mean µ and variance σ 2 . stands for Hadamard product. ⊗ stands for convolution operation. (·) T represents the transpose.

Signal Model
The echo signal of a target in a radar system can be approximately modeled as [1] where k contains all of the factors related to amplitude in the radar range equation. R 0 is the distance from the target to the radar, and A(t) is the baseband transmit waveform. v is the target radial velocity, c represents the propagation speed of electromagnetic waves, and λ is the wavelength. n(t) here is clutter and noise. A(t), the signal waveform, is the critical information feature for target detection. Without loss of generality, the linear frequency modulated (LFM) signal is used as the transmit signal waveform, which can be stated as where β is bandwidth and τ is pulse width. The received signal is sampled with the frequency F s and the corresponding sampling interval is T s = 1/F s . In the received data x, the target will affect the q-th to the (q + L)-th samples, where q = 2R 0 F s /c, L = τF s . Then, the echo can be stated as x q , x q+1 , x q+2 , · · · , x q+L−1 , where whereA l = exp jπβ(T s l − 2R 0 /c) 2 /τ , f v = j2π(2v/λ)T s . Then, the radar echo data segment x L q = x q , x q+1 , x q+2 , · · · , x q+L−1 related to the target can be simplified as [19]. It needs to be stated that the detection window length should be set as the waveform length L in order to acquire the complete information of the transmit waveform. In complex environments, each detection window contains not only target echo but also interference echo, clutter edges, and noise. Based on the detection principle, two hypotheses are defined for the target echo: the null hypothesis H 0 and the non-null hypothesis H 1 [1]. The H 1 means that the detection window contains a complete transmit waveform. Figure 1 shows a schematic diagram of these hypotheses. Figure 1. The sensor echo model.
In Figure 1, the translation symbol Ξ(s, n) represents shift of the vector s to the left (n < 0) or right (n > 0) by n sampling points. Then, the detection problem can be written as where N I is the number of interference, n C represents the sudden change of clutter power, I and I C are shift samples at the edge of interference and clutter, and −L ≤ I, I C ≤ L − 1. It can be seen that the SPI exists in the detection window.

Posterior Probability Detector
According to the Bayesian detection criteria, the problem in (5) can be solved by constructing a likelihood ratio detector, where f (x|H 1 ) and f (x|H 0 ) are the probability density of x under H 1 and H 0 , respectively, Λ(x) is the likelihood ratio, P(H 0 ) and P(H 1 ) represent the prior probabilities of H 1 and H 0 . According to Bayes' theorem, (6) can be recast as where P(H 1 |x) and P(H 0 |x) are the posterior probabilities of H 1 and H 0 , P(H 1 |x) + P(H 0 |x) = 1. The decision rule (7) can be rewritten as where the posterior probability is the sufficient statistics and the Λ (·) is the map between the posterior probability and x. Theoretically, the detection threshold η can be found from In radar systems, the estimation of detection threshold η is frequently accomplished using samples in reference cells, which are independent and identically distributed from the noise in the CUT. Nevertheless, in complex environments, it is crucial to have a threshold that is dynamic and adaptive to the changing conditions. The appropriate threshold selection involves a trade-off between maintaining a constant false-alarm rate and maximizing the probability of target detection. To address these challenges, a dual-output network structure is utilized to dynamically adjust the threshold in complex environments which provides an effective way to estimate the threshold and enhance target detection performance.

Target Detection Using the SIDOND
The proposed SIDOND is presented in Figure 2, which depicts its architecture and flow graph. The raw data are first pre-processed and then input into PBCN, which is the CNN based on the periodic activation function (PAF), to extract the intrinsic features of the SPI. The TIF is then obtained through a TIF estimator based on a fully connected network (FCN), called TIFEFCN, which takes both the combination of features and pre-processed data as its input. Additionally, the SPI feature is fed to the detection sufficient statistic estimator based on FCN (DSSEFCN). Finally, the estimated sufficient statistic is compared with the threshold η , determined based on the TIF and predefined P f a , to achieve the detection task. It should be noted that unlike conventional classification algorithms aimed solely at achieving high accuracy, the proposed algorithm focuses on maximizing the detection probability while maintaining an approximate constant P f a .

The PBCN for SIDOND
The primary component of PBCN is the CNN, which has demonstrated remarkable performance in various fields, including computer vision [20,21], medical diagnosis [22], target recognition [23][24][25], and signal detection. This study selects CNN as the feature extractor for SPI, with the PAF being used as the an activation function. The use of a non-linear activation function is a fundamental aspect of neural network architectures as it allows for the network to model complex non-linear relationships between inputs and outputs. In particular, non-linear activation functions enable neural networks to achieve excellent fit capabilities. Furthermore, this paper proposed a new CNN initialization scheme based on the PAF, which maintains the distribution of output and input of different layers to achieve faster and better convergence while avoiding undesirable situations, such as gradient vanishing in a deep CNN.
The convolutional layer that employs the PAF is called SICLayer, and its design and operation are illustrated in Figure 3. The layer has four parameters: N o , which is the dimension of the output data; N k , which represents the size of the convolution kernel; N s , which is the convolution step length; and N w , which is the expansion factor that can be adjusted to a higher value in the first few layers to preserve more comprehensive feature information.  The input of the l-th SICLayer is Z l with a dimension of N i × H i , the output isẐ l of dimension N o × H o , and the size of the convolution kernel is 1 × N l k , where the required convolution parameter is the weight w l with dimension N o × N l k × N i and the bias b l with dimension N o × 1. Then the output of convolutional iŝ The subscripts specify the position of the element in the raw data. For example, Z l n,: represents all elements ofẐ l whose first dimension is n. Afterwards, the output of SICLayer is The performance of a network is significantly affected by its initialization [26]. The convolution kernel is represented by a weight matrix w with three dimensions: the input depth H i , the kernel length N k , and the output depth H o .

The initialization of w obeys the uniform distribution of
In other words, the c is related to the depth dimension L i of the input data, kernel length N k , and the expansion factor N w of the periodic activation function. With such an initialization, the data before the periodic activation function approximately obeys the standard normal distribution, and the data after the PAF approximately obeys the arcsine distribution. The proof is presented in Appendix A.

The Structure of the DSSEFCN
The FCN is the fundamental unit for DSSEFCN, and it has been widely applied in the field of neural network development due to its efficacy in addressing classification and regression problems. Accordingly, the FCN is employed to construct the posterior probability estimator in this paper. The configuration of the fully connected layer (FCL) and the DSSEFCN is presented in Figure 4.  Assuming that the input of the l-th FCN is Z l whose dimension is 1 × N i , the output is Z l+1 with the dimension 1 × N o , and the weight vector w l with required dimension N i × N o and the bias b l of dimension 1 × N o . The activation function in the figure is Γ. Thus, the output is The activation function commonly used in hidden layers of FCN is rectified linear unit (ReLU) [27]. To retain more information and features, the LeakyReLU function [28] is specially used, and its expression is The SoftMax [29] is used as the output layer activation function. When the input is x of length n, the j-th SoftMax output is The output of the DSSEFCN is the approximation of sufficient statistics and the associated expression can be defined as follows: where the P net (H 1 |x) is the posterior probability approximated by the network. The DSSEFCN(·) and the PBCN(·) are cascades of multi-layer FCL and multi-layer SICLayer, respectively.

Dynamic-Intelligent Threshold Mechanism
After obtaining the estimate of the posterior probability P net (H 0 |x), selecting the threshold η is crucial for solving the target detection problem. It can be seen from (9) that η is determined by P f a , P(H 1 |x) and f (x|H 0 ). Under the background of Gaussian noise, the probability density of x obeys the N-dimensional independent joint Gaussian distribution, which can be formulated as where the noise power and clutter edge determine σ i . When the transmit waveform is fixed, the µ i is determined by the power of the interfering target signal. Substituting µ = [µ 1 , µ 2 , . . . , µ N ] and σ = [σ 1 , σ 2 , . . . , σ N ], the map on threshold η can be formulated as From (18), the threshold η is dynamic since the variable parameters are {σ, µ}, and its exact mathematical expression is extremely hard to derive because of the inability to accurately estimate {P(H 1 |x), σ, µ} under complex changing scenarios.
To address this, using TIF to characterize {σ, µ}, a mechanism that utilizes TIF to approach the optimal threshold is proposed, which can effectively enhance P d and ensure the P f a requirements. Specifically, an TIFEFCN is employed to estimate TIF, as shown in Figure 5. The main building blocks of TIFEFCN are the same as those of the DSSEFCN, which has been discussed in Section 3.2. The extracted features related to the posterior probability and the original data x are fed into this TIFEFCN. The TIF is categorized into a predetermined number of labels that correspond to different signal to noise ratio (SNR) or interference-to-noise ratio (INR) intervals. By jointly estimating TIF and the preset P f a , the current threshold is determined.  Thus far, the fundamental building blocks of the SIDOND architecture have been proposed, which include the PBCN, DSSEFCN, and TIFEFCN. In order to assess the performance of the proposed SIDOND, a single-input single-output detector (SISOND) utilizing a PBCN and DSSEFCN is built for comparison purposes.

Experimental Data
All experiments in this paper are based on simulation, and all signals without special instructions are generated by the signal model given in Section 2. The sensor receiving the signal is set as an LFM signal, and the specific parameters are shown in Table 1. The pre-treatment in Figure 2 includes normalization and splitting the complex values into real and imaginary parts. In the experiments of this paper, the computation of P f a is not direct. Hence, Monte Carlo strategy is used to estimate P f a . Assuming that the Monte Carlo estimation of the false-alarm rate isP f a , it approximately obeys a Gaussian distribution according to the central limit theorem.P f a ∼ N P f a , where K is the number of Monte Carlo trials. Then the false alarm rate error can be calculated as Setting a tolerance error as E, the probability of meeting the tolerance requirement is where Q is the complementary Gaussian cumulative distribution function. Then, the condition that satisfies the tolerance with a certain probability can be obtained by For example, setting P f a to be 0.0001, E to be 0.000025, and P(|e| < E) to be 90%, the number of Monte Carlo experiments is at least 432,843.
Training data set: A total of 1 × 10 7 data were generated In this paper, with an H 1 : H 0 ratio of 1:1. To facilitate the feature extraction based on the SPI, the interfering target number N I was set to 1, and the clutter edge n C was set to 0 in the PBCN. The SNR or INR followed a uniform distribution on the integer set [−13, 5] dB. When only noise was present, the noise power σ 2 followed a uniform distribution on the integer set [−5, 13] dB.
Test data set: The details of the test data set will be explained in each respective test section.

The Process of Training the Network
First, the PBCN and DSSEFCN are trained. The binary classification task corresponds to the binary label y ∈ {0, 1}. When y = 0, it represents H 0 , and when y = 1, it means H 1 . Given x, label y obeys Bernoulli distribution For the labels to be considered and the output of the neural network, there are where P net (H 1 |x) + P net (H 0 |x) = 1.
Relative entropy, also known as Kullback-Leibler (KL) divergence or information divergence, is a type of statistical distance. The relative entropy can measure the difference between the information entropy of the actual distribution P(H 1 |x) and the cross-entropy of P(H 1 |x) and P net (H 1 |x), representing the information loss caused by the fitting distribution. The relative entropy is In (25), only p net (y|x) can be optimized by our algorithm, the loss function is formulated as When the parameters of PAF-based CNN and FCN are defined as W, the process of obtaining the best W can be regarded as an optimization problem In other words, the maximum-likelihood method can be used to train the network. However, it is not feasible to obtain the optimal global solution for W since (26) is highly non-convex. Nevertheless, effective gradient descent optimization methods can yield acceptable solutions. Given a training batch consisting of N output data z 1 , z 2 , z 3 , . . . , z N and labels y 1 , y 2 , y 3 , . . . , y N , then the cost function is The back-propagation algorithm is employed to train the neural network, and the SGD algorithm with an initial learning rate of 0.001 and momentum of 0.99 is used for optimization [30]. The learning rate is reduced by four-fifths every ten epochs. The experiment is conducted on TensorFlow-GPU 2.0.
Subsequently, the Monte Carlo integration method can be utilized to derive the threshold based on a preset value of P f a .
The P d can be formulated by where I[·] denotes the indicator function. To compare the impact of different network architectures on performance, six models with varying numbers of layers and nodes have been generated, and their specific parameters are presented in Table 2. Each model is trained for four epochs on the training set, and a test dataset is generated with the same parameters as the training set to evaluate the algorithm's performance. The average and peak accuracy of each model are shown in Figure 6. Notably, the performance of PBCN is relatively consistent across different parameter settings, indicating that it is not highly sensitive to network architecture. Additionally, it can be seen that when the number of layers exceeds six, the algorithm's performance improves gradually or even deteriorates. Based on these results, this paper has selected the #4 network for subsequent experiments. However, in practical applications, the network size can be adjusted according to hardware and other requirements.  The performance comparison of different activation functions and initialization methods was conducted in this paper, and the results are presented in Figure 7. The PAFactivated network with the proposed novel initialization scheme is observed to converge well, as depicted in Figure 7 . The accuracy performance of the network is evaluated on two related test sets during the training process, and it is observed that the proposed PAF and initialization method outperforms other methods for both test sets. The results suggest that the proposed method is effective in improving the performance of the network.  Table 3. These TIF values are used as the training labels for FCN. The network is trained with the same training parameters and loss function for 30 epochs to achieve the required accuracy. Multiple sets of data are used to conduct Monte Carlo experiments, from which a mapping table is obtained that shows the relationship among TIF, threshold, and P f a . As waveform information is a key feature in this study, the algorithm's performance is evaluated when the parameters of waveform are changed. This comparison focuses on the bandwidth, sampling frequency, pulse width, and number of pulse sampling points. The model was trained with an equal number of samples, and validation was conducted after 30 epochs. Results are presented in Table 4, which indicates that changes in signal bandwidth and sampling frequency do not have a significant effect on the algorithm's performance. However, an increase in pulse width and the corresponding increase in sampling points improves the algorithm. This result demonstrates the effectiveness of our proposed algorithm in addressing the challenges of target detection in the presence of varying waveform parameters.

Performance Results
This subsection compares the performance of the proposed SIDOND algorithm with several traditional CFAR-based methods, including CA-CFAR [3], SO-CFAR [4], GO-CFAR [5], OS-CFAR [2], VI-CFAR [9], BVI-CFAR [11], as well as a data-driven method called SISOND. The traditional algorithms use 20 reference cells and 6 guard cells, except for BVI-CFAR, which uses 32 reference cells. CA-CFAR, SO-CFAR, and GO-CFAR are different mean-level CFAR methods, while OS-CFAR is an ordered statistical CFAR that uses the 15th-ordered statistic for background noise estimation. VI-CFAR is an adaptive CFAR method that uses two statistics, the variability index (VI) and the mean ratio (MR), set as 4.76 and 1.806. BVI-CFAR is an adaptive CFAR method based on VI and Bayesian interference control theory, with VI and MR set to 5 and 3. The number of interfering targets and clutter range partition is set to 4 and 16, which is a commonly used configuration in the literature. In addition to the traditional CFAR-based methods, the data-driven method SISOND is also considered, which is a single-input single-output echo waveform-based method. The desired P f a is set to 0.0001.
To evaluate the performance of the methods, three different scenes are considered: homogeneous background, multiple targets, and complex environment.

Homogeneous Background
The homogeneous background scene is a common scenario in radar applications, and it serves as a benchmark to evaluate the performance of different detection algorithms. To ensure the accuracy of the Monte Carlo experiment, a set of data was generated in a Gaussian white noise environment, with at least 1 × 10 6 data points at each SNR. Figure 8 presents the results of the experiment in the single-target scenario. Traditional methods such as CA-CFAR, SO-CFAR, GO-CFAR, OS-CFAR, and VI-CFAR exhibit good detection performance, with a maintained P f a of around 0.0001. However, the proposed SIDOND achieves the best detection performance among all methods, thanks to its intelligent threshold mechanism. It shows a slight advantage over the data-driven SISOND, indicating the effectiveness of the proposed method.

Multiple Targets Situation
A test dataset was generated that includes one or multiple interfering targets with amplitudes ranging from 1.0 to 1.2 times that of the target under test. The interfering targets were randomly and uniformly distributed in the reference cells. Figures 9 and 10 show that, with the exception of SIDOND, the performance of other detection algorithms deteriorates significantly when multiple interfering targets are present in the reference cell. This degradation is mainly due to the randomly distributed interfering targets in the leading and lagging windows. BVI-CFAR is the most robust of all traditional methods as it can modify the reference cell model based on the uniformity of the reference cell, thereby changing the Bayesian statistics. Under this strategy, BVI-CFAR can maintain a better performance. OS-CFAR estimates the noise power by sorting the power of the reference cells, which effectively avoids the influence of interference. CA-CFAR and GO-CFAR have the worst performance among the traditional methods. However, SIDOND can identify the interfering targets in the detection window and effectively reduce performance loss by obtaining an intelligent threshold. The SIDOND has better P d and P f a maintenance than SISOND, and this advantage increases with the number of interference targets. The discontinuity observed in Figures 9b and 10b for the P f a curves is attributed to the Monte Carlo simulation process where P f a becomes zero and cannot be represented in exponential coordinates. Additionally, the average performance loss is defined as the average difference of all P d between multiple targets and homogeneous background within the range of [−15, 5] dB. Figure 11 displays the average performance loss of SIDOND and SISOND up to seven interferences, and the advantages of SIDOND become more apparent as the number of interfering targets increases. Furthermore, the impact of interference power on algorithm performance is analyzed to comprehensively evaluate the algorithm's robustness. In this experiment, the target SNR is set to −2 dB, and the interference INR is set from −15 dB to 15 dB. Figure 12 shows that VI-CFAR, SO-CFAR, and BVI-CFAR sacrifice P f a to strengthen P d , and the P f a of BVI-CFAR beyond the preset standard is the smallest among the three algorithms. SIDOND and OS-CFAR have the best robustness, with OS-CFAR maintaining some degree of P d even in cases of high INR, while the performance of SIDOND has almost no loss when the INR is lower than 5 dB.

Complex Environment
Due to the uneven distribution of clutter in the reference cell, clutter edges often cause false alarms. A set of target detection data is generated based on the principle that clutter edges are evenly distributed in reference cells and the CNR satisfies uniform distribution of [10,20] dB. Up to four interfering targets appear in the reference cell with random amplitudes [1.0, 1.2] times that of the test target. The result is presented in Figure 13. Traditional methods exhibit a significant performance deterioration, especially SO-CFAR, due to the false alarm probability. SISOND's performance deteriorates significantly when the target power is relatively high. On the contrary, SIDOND, with its dynamic-intelligent threshold mechanism, maintains high P d and low P f a .

Visualization of the SIDOND
A visual explanation technique called gradient-weighted class activation mapping (Grad-CAM) is utilized to analyze the signal structure extracted by the SIDOND feature extractor. This method has enabled us to assess the effectiveness of the feature extraction process. Grad-CAM produces a rough localization map by utilizing the gradients of any target that enters the periodic activation function-based convolutional (PBCN) layer. The generated map highlights the critical areas in the image or signal. In other words, the weighted feature map in PBCN is obtained by back-propagating the gradient of the output category.
The features obtained by Grad-CAM are presented in Figure 14. In the case of an LFM signal, the target echo or the target and interference echo can be obtained from the matched filter output, as shown in Figure 15, without considering the noise. Figure 14 illustrates the features in different SICLayers corresponding to the echo signals visualized by Grad-CAM in four different scenarios, namely Target + Noise, Interference + Noise, Noise, and Target + Interference + Noise. It can be found that for the Target + Noise and Interference + Noise scenarios (the first two columns) that the deeper the PBCN layers, the more they resemble the sampling of the LFM signal, although with different peak positions. In the figure, the peak position and sinc shape are marked by the red circle. The PBCN fails to extract echo waveform-related information in the presence of noise. When dealing with the scenario Target + Interference + Noise, the SICLayer appears to sample the aliasing sinc function of the target and interference echo, as illustrated in Figure 15. This implies that the proposed PBCN progressively captures the representation of the target echo.

Computational Complexity Analysis
The computational complexity of an algorithm is a vital metric to gauge its performance [31]. The preprocessing of traditional methods involves matched filtering, whereas SIDOND and SISOND data require normalization. The training of SIDOND involves 30 epochs using an NVIDIA Quadro P4000 GPU with 8GB of memory, which takes around 10 h. The average runtime of a single detection is presented in Table 5. The mean-level CFAR, OS-CFAR, and VI-CFAR exhibit the lowest computational complexity. However, the computational complexity of the BVI-CFAR is comparable to that of the proposed SIDOND. Furthermore, the computational efficiency of SISOND can be enhanced through parallel computing on the GPU. In practical applications, pruning techniques can be utilized to improve computational efficiency [32]. Table 5. Runtime comparison for the processing of each detector.

Conclusions
This paper proposes a novel DNN-based approach to address the problem of target detection in complex scenarios. The proposed method utilizes a single-input dual-output network architecture consisting of a convolutional neural network with a periodic activation function for feature extraction from waveform intrinsic structure information. Additionally, two fully connected networks are employed to estimate the sufficient statistics and threshold impact factor, leading to a dynamic-intelligent threshold detection mechanism. The simulation results validate the efficiency and robustness of the proposed approach in challenging scenarios such as multiple targets, clutter edges, and their superposition. Furthermore,the visualization technique is adopted to demonstrate the effectiveness of the proposed network architecture.  The data, after batch-normalization, meet the standard normal distribution. Then, input of the first SICLayer is z i ∼ N (0, 1). The output of the convolutional layer is In the following derivation, the specific subscripts of the vectors are not considered since the Hadamard product can replace all the steps of the convolution operation. As shown in Figure 3, before calculating the periodic activation function, the data must be magnified by N w . The initialization of N w in (11) and its value at this position are ignored, or it is set to 1 for simplicity.
The input of the first periodic activation function is d i and its variance Var d i = Var z i w T = Var w T Var z i [33]. When w ∼ U(−c, c), the Var[ w] = c 2 /3. In the first layer, c = 3/ N k L i and the Central Limit Theorem with weak Lindenberg's condition [34,35] can be used to obtain Var d i = N k L i c 2 /3 = 1. Then d i ∼ N (0, 1).
The output of the periodic activation function, Z o , conforms to the arcsine distribution, which can be proven by showing that D i ∼ N (0, 1) and Z o = Sin D i . Specifically, it needs to be demonstrated that when X ∼ N (0, 1), Y = Sin(X) satisfies Y ∼ ArcSin(−1, 1).
Then, Y ∼ Arcsin(−1, 1). Simultaneously, Z o ∼ Arcsin(−1, 1). Next, the input D i before the non-first layer periodic activation function conforms to the standard normal distribution is only need to be proved. For the non-first floor, Z i ∼ Arcsin(−1, 1) and c = 6/ N k L i . Using the Central Limit Theorem with weak Lindenberg's condition, the variance of d i is Var d i = N k L i Var w T Var z i = N k L i c 2 /3 (1/2) = 1. (A6) The random variable D i is normally distributed with mean 0 and variance 1. The initialization scheme used In this paper leads to the approximate normal distribution of data before the activation function, and the approximate arcsine distribution of data after the activation function.