Mitigation of the Effects of Unknown Sea Clutter Statistics by Using Radial Basis Function Network

In this paper, we investigate feasibility of employing Radial Basis Function (RBF) network into non-coherent detection process, for detection of targets embedded in sea clutter of unknown statistics. We particularly have in mind Croatian part of Adriatic Sea, the local sea whose clutter statistic properties are not available in open literature. Performance of the detection process employing proposed RBF network is tested with simulated clutter samples based on real sea clutter data. These data were collected under sea state conditions that represent two thirds of the total wave heights in Adriatic and are chosen to represent unknown clutter statistics due to the fact that no single probability density function equally well fits amplitude distribution of the range bins under test. It is demonstrated that, compared to the traditional [zlog(z)] method, RBF network with just four components and lognormal basis function, yields operating characteristics that better match designed ones.


Introduction
Today, there is a growing need to increase a coastline safety by monitoring gaps of inadequate coverage of the principle radars for the illegal vessels, dim and highly maneuvering by assumption, embedded in extensive sea clutter. In Croatia, there is intention to fill these gaps with lightweight and Commercial Off-The-Shelf (COTS) radar sensors, installed on Unmanned Surface Vehicles (USV). These sensors are mostly impulse radars, with mechanical scanning antenna and logarithmic receiver, operating in X-frequency band and outputting only measured intensity of returned echoes. Thus, mitigation of clutter effect is commonly performed after envelope detection usually applying threshold that relies on clutter amplitude samples. To apply this process in a full scale, accurate clutter amplitude statistics should be known, but usually, samples are used to estimate parameters of assumed (or prior) model. Traditionally, this model is compound Gaussian process of the fast time varying component and the slow time varying texture with gamma distributed intensity (or radar cross section) [1]. While the fast varying component is due to scattering from the small capillary waves (wavelength on the order of centimeter or less, with surface tension as restoration force, generated by a local wind near the surface), the slow time varying component is due to reflection from the larger gravity waves (sea or swell), with the wavelength on the order of meter or above. These waves, whose restoration force is gravity, are generated by a stable wind, distant in the case of swell and local in the case of sea. But in some cases, measured amplitude of the sea returns shows larger distribution tail than expected by the model of gamma distributed texture intensity. To fit data better, other distributions of texture intensity were introduced, such as inverse gamma [2], [3] and inverse Gaussian [4]. These distributions account for the so-called spiky returns, events primarily associated with the breaking waves [5], which affect tail, giving rise to a higher probability of false detection.
The development presented in this paper mostly concerns the Adriatic Sea which is small and enclosed sea. Therefore, it can be assumed that the clutter statistics will be significantly different in comparison to the one commonly observed for the ocean. For the Adriatic it is specific that surface wind waves are limited by fetch and wind duration, so it is mostly immature sea and waves are for this reason steeper, with shorter wavelength than their counterparts in ocean. Moreover, existence of more than 1300 islands greatly affects local wind and wave characteristics and sea depths can not be neglected in the most of the basin. During winter, dominant winds are bura (italian bora), northnorthwest to east-northeast wind and jugo (sirocco family), east-southeast to south-southeast wind. Bura blows transversely over the Dinaric Mountains which shape its focus points. Usual wind speeds are up to 20 ms −1 but can exceed 50 ms −1 . Development of bura wind waves is limited by relatively narrow fetch across the Adriatic, so the maximum wave heights are expected in range from 6.2 m to 7.2 m. Opposed to waves generated by bura, waves generated by jugo are higher and longer and already formed enter through Otranto Strait, causing more developed sea state. In stormy conditions, wind speed of jugo can reach 30 ms −1 and maximum recorded wave height during this wind was 10.8 m [6].
To the best of author's knowledge, measured clutter data and its statistics, particularly of Croatian part of the basin, do not exist in open literature. While one can expect that, qualitatively, behavior of the sea clutter returns follow statistics of measured data for fully developed sea, e.g. one can expect that shape parameter of amplitude distribution (which determines spikiness of the sea returns), is getting lower as radar resolution increases or grazing angle decreases, is lower for horizontal polarization and depends more on wind than on sea direction and slightly increases with increasing surface roughness (or increased sea state) [7], quantitative manifestation, i.e. statistics of returns depending on the sea state, significant wave height, wind speed, etc., are not known. For example, sea condition during measurements reported in [8], taken under short fetch condition (approximately 10 km), can be thought as similar to Adriatic. In mentioned work, vertical polarization clutter data, recorded outside Toulon (France), in the Mediterranean, and in Taranto bay (southern Italy), were analyzed and compared to measurements and models derived for the fully developed ocean. For some trials it was found that model with inverse gamma distributed texture intensity better fits measured data than the one with gamma distribution. In contrast to measurements conducted under fully developed ocean as reported in [9], it follows that under similar wind conditions, shape parameter of the fitted distribution is not showing dependence on azimuth angle and is generally higher.
In this paper, performance of a non-coherent detection process employing non-parametric clutter model is investigated. Non-parametric model is built from the clutter samples only, without inclusion of any prior information about clutter. This is particularly important when the clutter statistics is unknown and target is dim. Any model mismatch in this case certainly leads to degradation of detection performance. Also, a non-parametric clutter model plays significant role in track-before-detect approach [10], where clutter (and target) statistics should be accurately evaluated for the fair tracker performance. So, aim of this paper is to investigate whether the change of paradigm from the prior clutter model (that may not approximate real clutter statistics well and thus cause processing losses) to non-parametric model (built entirely from the clutter samples), is feasible in non-coherent detection process. Such model should not be limited to stationary clutter only and should override shortcomings of model mismatch. In particular, target tracking method employing such model should be robust, in a sense that, compared to a traditional method that relies on prior clutter model, its performance should not be significantly worse when prior model (in traditional method) describes real clutter manifestation well, but should be significantly better when it does not. To build such model, we take advantage of the universal approximation property of neural networks, particularly RBF network [11], applied to the specific problem of density estimation.
There are numerous examples of using neural networks in density estimation. In context of radar application, for example in [12], RBF network was used to estimate clutter and target amplitude probability densities online. The primary reason of using RBF instead of Kernel Density Estimation (KDE) network in latter case lies in the fact that KDE often results in "bumpy" estimation, especially at the distribution tail which is not acceptable, as the tail is important in deriving detection threshold. Increasing width of the kernel function, which acts as the regularization parameter, smooths the estimate, but important information contained in the distribution tail is lost. In [13], neural network, based on recorded data, was used to estimate parameters of the Ricean Inverse Gaussian (RiIG) amplitude distribution (which can not be obtained in a closed form) and estimates were further used to set detection threshold assuming this particular clutter model. Neural network was also successfully used to reduce clutter in a conventional non-coherent radar system [14].
Similar recent approaches applied in detection process can be found in [15], where detection of small objects, distributed over few resolution cells and embedded in clutter, was based on estimation of the clutter amplitude probability density, employing KDE method [16]. Another example of using KDE network in detection process can be found in [17], where network, employing different kernels, was trained offline with real clutter data. Other examples of employing neural network in detection process include approaches of learning clutter statistics from the measured data online, or training to the various expected statistics offline, e.g. [18], where RBF network, based on available clutter data, was used to model nonlinear dynamics of the sea clutter by considering it as a chaotic system. In [19], RBF and Multi Layer Perceptron (MLP) networks were used, trained with different K-distribution shape parameters and Signal to Clutter Ratios (SCR), believed to be a good description of expected clutter and signal statistics. In [20], sea-ice clutter data were analyzed and used to train neural network with the result that such trained network, applied in coherent detection of non-fluctuating target echo embedded in Weibull sea-ice clutter, demonstrated enhanced detection and robustness compared to a conventional approximation of optimum Neyman-Pearson likelihood ratio detector. Amplitude statistics of the returned echoes and their temporal correlations were used in shore-based and non-coherent surveillance radar for detection of land areas and their exclusion from monitoring. MLP neural network was employed for classification of the each resolution bin echo as return from either land or sea [21]. Amplitude statistics of land and sea returns were modeled as Weibull and K-distributed, respectively. Interesting approach in non-coherent detection of a dim and sea surface target is given in [22], where detection of such a target was enhanced by employing Convolutional Neural Network (CNN). Network was trained by sequence of two dimensional images (range bin versus pulse number), in which position of a dim target echo was labeled. Thus trained network extracts essential features of sea clutter and target echo, making detection probability higher than in conventional Constant False Alarm Rate (CFAR) methods. In [23], detection of specific target among group, in background of sea clutter, was performed on the recognition basis, applying neural network and support vector machine with RBF kernel on high-resolution range profile. Network and support vector machine were trained with generated target range profiles and generally, support vector machine showed higher recognition rates under low signal to clutter ratios. In [24], detection of target with unknown Doppler shift, embedded in sea clutter, was improved (in comparison to conventional Doppler-preprocessed CFAR), by application of MLP neural network, trained with generated clutter patterns. Application of MLP neural network reduced clutter power variation and thus maintained fixed threshold value applied to conventional non-coherent CFAR detector. Advantage of this approach is that it has no restrictions on input clutter statistics. Conducted experiment on real sea clutter data showed, in comparison to conventional Doppler-preprocessed CFAR detector, better match to the designed false detection probability, thus, this approach is better approximation to Neyman-Pearson likelihood ratio detector.
In contrast to methods described in preceding paragraph, in this paper we use online learning of the clutter statistics. Similar to [15], the proposed method is not restricted to clutter statistics and does not require offline training or prior clutter model as in [19][20][21] or clutter samples as in [18,[22][23][24]. In a more wider context of application of the proposed detection scheme in target tracking process, this is due to application of external association mechanism that separates set of samples into target echo and set of sea clutter echos, like in [12], where Viterbi based association scheme acts as target from clutter separator. This scheme is particularly suitable when targets are not dense, i.e. there is no overlapping of tracking gates between tracked targets.
Main contributions of this paper are the analysis of non-parametric clutter model accuracy and the performance of proposed non-coherent detector employing such model. Accuracy is derived based on theoretical probability distribution fit on data from illustrative real clutter example and performance is derived based on distribution's parameters of such fit. This paper is organized as follows. In the next section, we briefly introduce RBF network as density estimator. We use such form of neural network to approximate unknown clutter amplitude distribution and employ it in noncoherent detection process. In the first part of Sec. 3, we introduce real sea clutter data from the medium resolution IPIX radar [25], [26] and use it as an illustrative example for testing proposed detection scheme. In the reminder of Sec. 3, performance comparison between employment of RBF network as non-parametric model and traditional [zlog(z)] estimator [27], [28] in detection process is presented. Finally, concluding remarks and outline of future work is given in Sec. 4.

RBF Network as Density Estimator
As already mentioned in the Introduction, unknown clutter density is estimated entirely from the samples and represented in functional form by a sum of some basis densities (or components). Using such an approximation is allowed as any bounded continuous probability density can be approximated by a finite sum of component densities, to any degree of accuracy [29], which is consistent with the universal approximation property of RBF network [11] as density estimation can be viewed as a special case of function approximation. One of the advantages of RBF network over non-parametric method, such as KDE, is the possibility of compacting information by grouping samples with similar statistics into same group. By putting density estimation into classification framework like in [30], parameters of the network can be determined by a maximum likelihood approach, which can be thought as a self-organizing method.
In [31], two main issues related to density estimation using sum of component densities were addressed. First was the quality of approximation, i.e. how well a certain class of basis densities can approximate unknown density, and second one was the quality of estimation, i.e. dependence of estimator performance on data set size. From [31], it follows that, given data size and a certain class of basis densities with a method of estimating their parameters, the total expected estimation error is bounded with a term related to adequacy of basis functions to approximate unknown density and a term related to the impact of fixed data size on the estimator. This can be expressed using expectation of squared Hellinger distance d 2 H between true density p and its maximum likelihood approximationf θ n (constituted from the sum of n basis densities), as in [31], where accuracy measure depends on class F of basis densities constitutingf θ n , positive constant C depends on both the true density and the class F , m depends on maximum likelihood estimator performance and N is the number of samples used to evaluatef θ n . In contrast to commonly used Kullback-Leibler divergence as distance measure between true density and its approximation, squared Hellinger distance [31] satisfies triangular inequality, thus allowing separation of error terms in (1).
From the classification point of view, each one of the n basis densities can be thought as representative of some class ω, hence, for some l−dimensional input x, general approximation f θ n is given as where basis density f σ is radially symmetric and unimodal function, parametrization of generic low-bounded density f , where · is Euclidian norm, µ vector of centers and σ scaling. In (3), probability P(ω i ) is the prior probability of ith class occurrence, and hence, n i=1 P(ω i ) = 1. Total set of parameters is given as Since same scaling σ is used across all dimensions, a prewhitening of data is required if the number of dimensions l > 1.
Given input set of samples X = x j , j = 1, . . . , N , logarithmic likelihood is defined as Maximization of (5) with respect to parameters θ, yieldsf θ n as maximum likelihood approximation of true density p.
Equation (3) is model of RBF network and its parameters can be determined by solving (6) which is done efficiently if we treat this problem in classification framework. Since samples as learning data are not labeled, a set of random variables Z = z j , j = 1, . . . , N is introduced, where each vector z j = z j1 , . . . , z jn indicates which sample is generated by which component (z ji has value either 0 or 1). Thus, Z can be treated as missing data in Expectation-Maximization (EM) algorithm and (5) can be rewritten as [32] (θ | X, and solved by iteration. As shown in [33], at kth step, expectation becomes and the parameter θ k+1 obtained by maximization is The values obtained from (8) and (9) are values for the next iteration step. For generic Gaussian basis density from (4) follows and EM algorithm ensures global convergence since generic density (10) belongs to class of exponential functions [31]. Expectation step performs E z ji | x j , θ k , resulting in posterior probability that ith component has generated jth sample, or, following [32], where i = 1, . . . , n and j = 1, . . . , N. By applying maximization and using (12), the centers are given as and scaling as Finally, posterior probability of class ω i for the next iteration step k + 1 is given as Steps (12)- (15) are repeated until the logarithmic likelihood converges to its maximum.

Practical Example
As already reported in [2], in the case of excessive spiky returns there is evident difference in distribution tail among different texture intensity models. Consider a compound Gaussian process [1]. If texture intensity ζ is modeled with gamma distribution and fast fluctuating component with complex Gaussian process, with unity variance for in-and quadrature-phase components, resultant clutter model is the well-known K-distribution where ν and β are shape and scale parameters, respectively. In (18), Γ is gamma function and K ν is modified Bessel function of the second kind [34]. Texture intensity for amplitude distribution with higher tail than K-distribution is often modeled with inverse gamma distribution as where ν and β are again shape and scale parameters respectively. Here, the resulting clutter model is given as for which its intensity distribution p y = x 2 is known as Pareto Type II distribution.
Example shown in Fig. 1, using real sea clutter data from the medium resolution IPIX radar [25], [26], under condition of sea states 3-4 and 4-5, illustrates the dependence of distribution tail on sea state and polarization. This example is chosen deliberately as these states were reported to occur at 43% (state 3-4) and 17% (state 4-5) of time in Adriatic basin, which is about two thirds of the total wave heights in Adriatic (ignoring sea states 0 and 1-2 which produce insignificant clutter returns) [35]. It is assumed that spikiness of this local sea will not be higher than presented with this example. While model p √ GP refers to sea state 3-4, model p K refers to sea state 4-5. Parameters of the depicted distributions are given in Tab. 1 and were estimated using maximum likelihood approach (in the case of p K ) [36], and using [zlog(z)] method (for the p √ GP ) [27], neglecting thermal noise in both cases, which is valid, considering radar resolution of 0.9°in azimuth and 30 m in range.
It should be noted that there is particular reason that motivated us in choosing this data as example. It is the fact that no single probability density function equally well fits amplitude distribution of the range bins under test. Thus, it is suitable to employ RBF form of neural network to approximate unknown clutter amplitude distribution. This complex behavior of clutter is in agreement with [8], as already discussed in the Introduction.
Set of numerical trials, conducted as extension of [12] but not yet published, reveals that probability of track loss of a dim and maneuvering surface target making a 0.5g U-turn, is below 2 × 10 −1 for false detection probability lower than 1.5 × 10 −1 , shape parameter of K-distributed clutter ν = 10 (which is nearly Rayleigh clutter) and single-hit detection probability of 8 × 10 −1 . With false detection probability lower than 2 × 10 −2 , track loss reaches its plateau region of about 3 × 10 −2 . For the more heavy-tailed K-distributed clutter (ν = 0.5), probability of track loss below 2 × 10 −1 is achieved with false detection probability lower than 1.3×10 −1 and probability of track loss lower than 3 × 10 −2 is achieved with false detection probability lower than 5 × 10 −2 . Therefore, in this paper we are focused on range of false detection probabilities from 10 −3 to 10 −1 . Furthermore, through remaining of this paper, it is assumed that clutter samples are collected employing Viterbi association scheme, described in [12] (for the more details see [37]), which acts as a target from clutter discriminator. Thus, under assumption of perfectly correct associations, clutter samples are not corrupted with target echo. Echoes from land are excluded from processing by means of a digital terrain map.

Network Design
In the reminder of this subsection, we will determine the number of RBF network components. The number of network components depends on the desired accuracy, which depends on the number of samples that are in turn determined by the resolution of the radar system and any prior knowledge about spatial and temporal correlation of the clutter. Inspired by the example shown in Fig. 1, we use its data as input samples, with the parameters given in Tab. 1. As we do not have any prior knowledge about constant C and estimator performance concentrated in m (1), we will attempt to numerically derive the dependence of expectation of squared Hellinger distance (1) on (a): number of samples, given the fixed number of components and (b): on number of components, given the fixed number of samples. This is performed in both amplitude and logarithmic domain, where under term "amplitude domain", we mean that the ith mixture component in (3) is given with Gaussian basis density and under term "logarithmic domain", the ith mixture component is given as lognormal basis density which is Gaussian under transformation y = log(x). Parameters of the network were estimated using EM approach as described in Sec. 2, by iterating through steps (12)-(15) until convergence criterion was satisfied.
Difference among approximation classes is illustrated in Fig. 2. By analyzing (1), it can be seen that expectation of squared Hellinger distance decreases as the number of samples increases, but regarding number of components, it is not obvious that increasing their number would eventually lead to smaller approximation error. This is illustrated in Fig. 2b as saturation region which occurs for larger values of n. Generally speaking, for the considered type of clutter, the RBF network in logarithmic domain shows better approximation performance and saturation occurs for n = 4 components with N = 1024 samples. Figure 3 shows that the same can be expected for smaller number of samples. Important property of detector is dependency of exact false detection probability (that results from the applied method) on designed one. How number of samples influences this property for proposed detector is summarized in Tab. 2 and Tab. 3 for few key values and is also illustrated in Fig. 4, which shows one typical realization of density estimation in logarithmic domain. It can be visualized from threshold values that, for less spiky clutter (Fig. 4a), probabilities of false detection using larger number of samples are lower than designed and using smaller number of samples, are closer to designed. For more spiky clutter (Fig. 4b), opposite is true. Here, exact false detection probability is given n (x) dx, and designed one as where p(x) is true clutter distribution,f θ n (x) is its approximation and T,T θ n are corresponding threshold values. For application of proposed detector, scenario of more spiky clutter is more important than scenario of less spiky one, hence, through the rest of the paper, we will concentrate on larger number of samples.   When choosing number of samples, one must be aware of temporal and spatial correlation of the clutter. As already pointed out in [38], medium resolution radar with scanning antenna (such as conventional radar systems for maritime surveillance), results with texture temporal correlation of the order of the time-on-target and spatial correlation of the order of few tens of resolution range bins, provided that illuminated area is not highly irregular. If we denote a set of L range bins, fixed in azimuth j, as N ρ neighboring bins are correlated. Through the remaining of this subsection, we use the simplest form of correlation coefficients ρ i,ι , between range bins with indexes i and ι, both at azimuth j, as Using M such strips, the set of reference cells is finally given as Therefore, upper value between eight to sixteen such strips, separated by radar resolution in azimuth, each with N ρ = 16 range bins, is usually adequate. If clutter is assumed stationary during several antenna scans, one can collect samples from several antenna scans and thus increase the number of samples. If F such scans are used and if X i denotes the set of samples at time i, the total set of samples X at current scan k is given as where each X k−i in (27) corresponds to set (26). In this paper, we adopt samples from F = 4 to F = 8 successive antenna scans, resulting in total of N = 1024 samples.

Operating Characteristics
For the purpose of mathematical tractability, optimum Neyman-Pearson likelihood ratio test [39], in the case of non-Gaussian clutter, is usually replaced with some other nonoptimum, but sufficient test statistic that depends solely on assumed clutter model. However, under assumption of particular model, clutter can manifest differently due to model mismatch, causing detection losses and eventual track failures as model mismatch results in either too much clutter observations, or in detection threshold that is estimated higher than needed. In this work, we aim to approximate unknown clutter density with the sum of arbitrary probability functions whose statistics, and thus detection threshold, is known. This is similar to the reference method where parameters of the prior model are estimated from the clutter samples, and from the statistics of prior model with such estimated parameters, detection threshold is derived, see e.g. [40] for the K-distribution clutter model. Traditional approach in target detection is based on a binary hypothesis test, i.e. between two hypothesis H0 and H1 (H0 that target is not present and H1 that target is present), one chooses H0 if the amplitude from the cell under test is below some threshold, determined by Type I error (choosing hypothesis H1 although H0 is true) and choosing hypothesis H1 otherwise [39]. From the constraint of Type I error, which is design parameter (in radar context, this is the probability of false detection P FA , given as P FA = ∫ ∞ T p(x) dx where p(x) is the probability density for H0), in the framework of density estimation by RBF network, false detection probability becomes from which one can derive the detection threshold T θ n . In [41], detector for Pareto Type II clutter using sliding window based approach in Bayesian framework was proposed. Although true CFAR detector, it shows lower detection performance in comparison to techniques based on geometric transformation or ordered statistics that require prior knowledge of shape parameter in the former and scale parameter in the latter case [42]. Another drawback of the CFAR detector proposed in [41] is the need for numerical integration, in contrast to the method of moments which can be used in geometrical transformation and ordered statistics approach. Hence, for simplicity, in this paper we adopt traditional [zlog(z)] method as the reference method for shape parameter estimation of both the Pareto Type II (this is density (20) under transformation y = x 2 ), and the K-distributed clutter (18), although it is difficult to capture statistics of such low values of shape parameter as given in example in Fig. 1.
Simulation flowchart is shown in Fig. 5 and its parameters, selected on the basis of emphasized parameters in Tab. 1, are shown in Tab. 4. Simulation was performed using tool for numerical computation Scilab [43], running on operating system openSUSE Leap 15.1 [44] and Intel "Ivy Bridge" processor platform [45]. It was divided into seven stages: 1. Initialization with parameters as per Tab. 4.

Evaluation of reference threshold T and reference Sig-
nal to Clutter Ratio (SCR), depending on designed value of false detection probability P FA and target detection probability P D . Given set of designed false detection probabilities as P FA = P FA 1 , P FA 2 , . . . , for each P FA and given clutter distribution model p, threshold T is calculated from (23), (18) and (20), as solution of Next, one must find signal strength S (target echo power) or equivalently SCR such that, given threshold T as solution of (29), probability detection equals designed value P D . As expected sea targets are usually classified by Swerling Case 1 model [46], detection probability, in the case of gamma distributed texture intensity, is given as dζ .
(30) Equation (30) is solved numerically using the approach given in [40]. In the case of inverse gamma texture intensity distribution, detection probability is given as (31) which is solved using Gauss-Kronrod quadrature variant of Guassian quadrature method [47]. In this paper, we have chosen SCR such that designed detection probability P D always equals 8×10 −1 . For the gamma model of texture intensity, mean clutter power equals ν β/2 and for the inverse gamma model, it equals 4 β(ν−1) , conditioned on ν > 1. Therefore, signal strength (or target echo power) S, is related to SCR as 3. Generation of N clutter samples, respecting clutter distribution model p, correlation parameter N ρ and correlation coefficient given with (25). Samples are given as 34) where N (µ, σ) is normal (or Gaussian) distribution with mean µ and deviation σ and p is p Γ for the Kdistribution model (18) and p Γ −1 for model with the inverse gamma texture (20). 4. Clutter density estimation. Using [zlog(z)] approach, the shape parameter is estimated from intensity samples Y = y i | y i = x 2 i , i = 1, . . . , N . For the K-distributed clutter, neglecting thermal noise and using only one pulse, the [zlog(z)] method gives [27], and the scale parameter follows from the first moment of intensity,β If texture intensity is modeled as inverse gamma distribution, the distribution of amplitude intensity y = x 2 is known as Pareto Type II distribution, so [zlog(z)] method estimates shape parameter according to [27], again neglecting effects of thermal noise and using just one pulse. The above is valid forν > 1 and the scale parameter follows from the first moment of intensity as, Parameters θ of the RBF network were estimated using EM method in logarithmic domain (see Sec.2 and Sec.3.1), employing iteration steps (12) through (15). Also, a squared Hellinger distance d 2 H is outputted. (35) and (36) as input parameters. Detection threshold T √ GP is the solution of equation (37) and (38) as input parameters. Detection threshold T θ n for RBF network is determined by solving (28).

Detection thresholds. For the K-distribution (18), detection threshold is determined by solving
6. Operating characteristics. If one applies thresholds T θ n , T K and T √ GP (derived in Step 5) to true distributions (18) and (20), exact false and detection probabilities can be calculated for the scenario of less spiky and spiky clutter, for known SCR and for the matched and mismatched clutter model. This is done by defining threshold as function of prior clutter distribution model, and substituting it instead of T in (29), (30) and (31). 7. In this stage, by means of sample averages of exact false and detection probabilities, evaluated over J runs, we compare performance of the RBF network with the traditional method of equivalent distribution. Depicting exact false and detection probabilities versus designed ones, gives us idea about quality of approximation and feasibility of employing RBF network in detection process of target embedded in clutter with unknown statistics.    Simulation results for the both N ρ = 1 and the N ρ = 16 correlated range bins are shown in Figs. 6 and 7 for the scenario of spiky and less spiky clutter, respectively. Exact false and detection probabilities are compared with the design ones, for the method which employs RBF network and for the [zlog(z)] method which is either matched to the true distribution, or mismatched. While the index K emphasizes that the estimation of shape parameter is based on prior K-distribution according to (35) and (36), the index √ GP emphasizes that estimation is based on prior Pareto Type II distribution of intensity, where parameters are estimated from (37) and (38). Results are based on average of 100 runs for each false detection probability point, 1024 clutter samples and lognormal basis density.
Results show that, even when the [zlog(z)] method matches true inverse gamma texture intensity distribution, there is significant deviation of exact false detection probabilities from designed one. This impacts the probability of detection, too. Furthermore, the effect of range bins correlation is more pronounced in the case of spiky clutter, which is manifested as increase of the false detection probability. Although the match is not perfect, it should be noted that, in the practical P FA range from 10 −3 onward, the RBF network better matches the designed P FA than the [zlog(z)] methods, especially in the case of model mismatch. Drawback of this method is its computational load (t CPU , refer to Stage 4 in the Fig. 5). Without calculation of the squared Hellinger distance and for the specified hardware and software configuration [43][44][45] (see Sec. 3.2), it takes on average 500 ms for the EM algorithm and 0.2 ms for the [zlog(z)] method which is about 2500 times faster. This is due to more intense calculation in steps (12)-(15) than in (35)- (38) and iterative nature of EM algorithm. On average, it takes about 100 iterations to achieve convergence of logarithmic likelihood (16) to the precision of 10 −6 .
The following discrepancy of [zlog(z)] method in the case of inverse gamma texture intensity can be explained with value of shape parameter that is very close to 1, what is the limit of the estimator validity. To estimate this parameter with better accuracy, larger data sample size is required (e.g., for 1024 samples, the estimated shape parameter is 1.301, for 4096 samples, 1.245, for 16384 samples, 1.201, and so on).
On the other hand, if one is able to solve for the shape parameter using fractal moments, i.e.m 1 = x andm 1/2 = x 1/2 , and to solve the equation numerically [36], more accurate values of shape parameter would be estimated (i.e. the sample size 1024 gives 1.055, while current method gives 1.301), but this imposes too high computational demands for real-time application and is beyond the scope of this paper.

Design Issue
It may happen that, for the given design false detection probability and due to some excessive spike return as illustrated in Fig. 8, the estimated threshold value T θ n exceeds the dynamic range of the receiver. For this particular example, the designed false detection probability is 10 −4 and the estimated threshold value is T θ n ≈ 21 V, which corresponds to the exact false detection probability ≈ 10 −7 . The RBF network is evaluated in logarithmic domain using n = 4 components and N = 1024 samples. One method to counteract this issue could be censoring of the maximum sample value. For this particular example, censoring gives T θ n = 0.6 V, resulting with the exact P FA = 1.9 × 10 −4 which is closer to the designed value. This is explained with better approximation of the true distribution tail when the censored data are employed. Nevertheless, the squared Hellinger distances are approximately the same (prior censoring 0.0021 and after censoring 0.0022). For this example, estimated probability densities prior and after censoring are illustrated in Fig. 9.

Conclusion
Although probability density estimation employing RBF network is not a new concept, in this paper it is tested in a new light of embedment in a non-coherent detection process, in a scenario when the sea clutter statistics is unknown. Performance analysis, conducted in Sec. 3.2, indicates that employment of RBF network in such a clutter yields receiver operating curves that are reasonably close to designed ones, closer than the ones obtained by the traditional [zlog(z)] method which uses estimating parameters of assumed (or prior) clutter model. It is also demonstrated that RBF with lognormal basis function better approximates spiky clutter than the one with the Gaussian function.
Future work will investigate the influence of the described clutter modeling when used in process of maneuvering target tracking and, in more challenging scenario, tracking a dim target using track-before-detect scheme. Such method often relies on clutter likelihood to improve tracking performance, as in [48] for the K-distribution prior model. Based on the presented results and some preliminary tests, it can be assumed that the employment of RBF network would offer larger improvement in comparison to the prior model in a scenario when the prior model does not match well the true statistics of clutter. This will also be addressed in our future work.