Linear discriminant analysis of spatial Gaussian data with estimated anisotropy ratio

. The paper deals with a problem of classiﬁcation of Gaussian spatial data into one of two populations speciﬁed by diﬀerent parametric mean models and common geo-metric anisotropic covariance function. In the case of an unknown mean and covariance parameters the Plug-in Bayes discriminant function based on ML estimators is used. The asymptotic approximation of expected error rate (AER) is derived in the case of unknown mean parameters and single unknown covariance parameter i.e., anisotropy ratio.


Introduction
In the case of completely specified populations and known loss function, an optimal classification rule in the sense of minimum risk is based on Bayesian discriminant function (BDF) [4]. In the practical situations the complete statistical description of populations usually is not possible. It is possible to estimate unknown parameters and plug-in them into BDF when using training sample. Plug-in BDF is called PBDF. The expressions for the expected error rate (ER) are very complicated even for the simplest forms of PBDF, therefore, asymptotic approximations of the ER are especially important.
Many authors have investigated the performance of the PBDF when parameters are estimated from training samples consisting of dependent observations (see e.g., [5]). Plug-in approach to discrimination for feature observations having elliptically contoured distributions is implemented in [2]. Šaltytė and Dučinskas [7] derived the asymptotic approximation of the expected error rate when classifying the observation of a scalar Gaussian random field into one of two classes with different regression mean models and common variance. However, the correlations between observations to be classified and training sample were assumed equal zero in the all publications listed above. This assumption is not correct in all situations, especially in cases where the locations of observations to be classified are close to ones of training sample. The first extension of above mentioned approximation to the case where spatial correlations between Gaussian observations to be classified and observations in training sample are not assumed equal zero is done in [3]. Here only the trend parameters and variance (parameter of covariance function) are assumed unknown. The extension of the latter approximation to the case of complete parametric uncertainty (all means and covariance function parameters are unknown) was implemented in [4]. In the present paper we derive closed form approximation of expected error rate in the case of estimated mean parameters and estimated anisotropy ratio.

The main concepts and definitions
The main objective of this paper is to classify the observations of Gaussian random field (GRF) {Z(s): s ∈ D ⊂ R p } into one of two populations. Suppose that the model for observation Z(s) in population Ω j is where x(s) is a q × 1 vector of non random regressors and is a vector of parameters, j = 1, 2. The error term is generated by zero-mean stationary GRF with covariance function defined by model for all where θ ∈ Θ is a p × 1 parameter vector, Θ being an open subset of R p .
For the given training sample, consider the problem of classification of the Z 0 = Z(s 0 ) into one of two populations when the training sample T is given and Denote by S n = {s i ∈ D; i = 1, . . . , n} the set of locations where training sample T ′ = (Z(s 1 ), . . . , Z(s n )) is taken, and call it the set of training locations. We shall assume the deterministic spatial sampling design and all analyses are carried out conditional on S n . S n is partitioned into union of two disjoint subsets, i.e., S n = S (1) ∪ S (2) , where S (j) is the subset of S n that contains n j locations of feature observations from Ω j , j = 1, 2.
This is the case where spatial classified training data is collected at fixed locations (stations).
The n × 2q design matrix of the training sample T denoted by X is specified by where symbol ⊕ denotes the direct sum of matrices and X j is the n j × q matrix of regressors for observations from Ω j , j = 1, 2.
So the model of the training sample is where β = (β ′ 1 , β ′ 2 ) ′ is a 2q × 1 vector of regression parameters and E is the n × 1 vector of random errors that has multivariate Gaussian distribution N n (0, C(θ)).
Denote by c 0 (θ) the covariance between Z 0 and T . Let t denote the realization of T .
For notational convenience, the argument θ in all its functions is now dropped.
Since Z 0 follows model specified in (1), the conditional distribution of Z 0 given T = t, Ω j is Gaussian with mean µ 0 lt and variance σ 2 0 (θ) where Under the assumption of complete parametric certainty of populations and for known finite nonnegative losses {L(i, j), i, j = 1, 2}, the BDF minimizing the risk of classification is formed by log ratio of the conditional likelihoods.
In the practical applications not all statistical parameters of populations are known. The PBDF is constructed by replacing parameters in the BDF with their estimators.
Letβ,θ be the estimators of corresponding parameters from training sample T . Then the PBDF has the following form with H = (I q , I q ) and G = (I q , −I q ), where I q denotes the identity matrix of order q.

Definition 1.
The actual risk for PBDF W t (Z 0 ;Ψ ) is defined as where for i, j = 1, 2P The actual risk specified in (8), (9) for W t (Z 0 ;Ψ ) specified in (7) is (see e.g. [4]) The expectation of the actual risk with respect to the distribution of T is called the expected risk (ER) and is designated as E T (P (Ψ )).
In this paper we assume that all true values of parameters β and single parameter of covariance (anisotropy ratio) are unknown. So we will use estimates of these unknown parameters form PBDF.

The asymptotic approximation of ER with estimated parameters
We will use the maximum likelihood (ML) estimators of parameters based on the training sample. The asymptotic properties of ML estimators established by Mardia and Marshall [6] under increasing domain asymptotic (increasing domain asymptotic is based on a growing observation region) framework and subject to some regularity conditions are essentially exploited. Hence, the ML estimatorΨ is weakly consistent and asymptotically Gaussian [4]. We make the following assumptions: (A2) Training sample T and estimatorθ are asymptotically uncorrelated (see e.g., [1]).
Note that sufficient conditions for (A1) is formulated in [6]. Under (A1), (A2) the AER is derived in [4]. The accuracy of such a type approximation is examined in [3].
We consider the case where θ = λ, where λ is anisotropy ratio. Let ∆ 2 0 be squared Mahalanobis distance between conditional distributions of Z 0 given T = t. Denote

Lemma 1. Suppose that observation Z 0 to be classified by PBDF and let assumptions (A1), (A2) hold. Then the approximation of ER in the case of estimated unknown mean parameters and estimated unknown anisotropy ratio is
where here B is the first order partial derivative ofα 0 evaluated at the pointλ = λ ( λ denotes the first order partial derivative ofσ 2 0 (5) evaluated at the pointλ = λ λ and C (1) λ in (14)-(15) are the first order partial derivatives of c 0 and C evaluated at the pointλ = λ.
Proof. The proof of lemma is similar to the proof of Theorem 1 [4] by replacing θ with anisotropy ratio λ. ⊓ ⊔ Remark 1. Suppose we have the case of exponential spatial correlation function

Then the expression of B is
Here ϕ is range parameter, • represents the Hadamard product. Let the covariance have form C = τ 2 I + σ 2 R, where R is n × n matrix of correlations between components of T and (i, j)-th element is