Asymptotic performance of the quadratic discriminant function to skewed training samples

This study investigates the asymptotic performance of the quadratic discriminant function (QDF) under skewed training samples. The main objective of this study is to evaluate the performance of the QDF under skewed distribution considering different sample size ratios, varying the group centroid separators and the number of variables. Three populations \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\pi _i, i=1, 2, 3)$$\end{document}(πi,i=1,2,3) with increasing group centroid separator function were considered. A multivariate normal distributed data was simulated with MatLab R2009a. There was an increase in the average error rates of the sample size ratios 1:2:2 and 1:2:3 as the total sample size increased asymptotically in the skewed distribution when the centroid separator increased from 1 to 3. The QDF under the skewed distribution performed better for the sample size ratio 1:1:1 as compared to the other sampling ratios and under centroid separator \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\delta =5).$$\end{document}(δ=5).

deviation, the between sample variability of the individual error rates in the QDF on normal or non-normal distributions was quite large and for that instability of QDF is pronounced. Also the actual error rates were considerably larger than the optimal rates in the case of zero mean difference (this is a very difficult problem in assignment). The QDF for non-normal samples generally did not do substantially worse than when the QDF was derived under normal samples which were obtained after transformation. Lachenbruch et al. (1977) compared the re-substitution method and the leave-one-out method. The re-substitution method had an unacceptably high bias. The leave-one-out method was far superior in respect of generally having a far lesser bias.
Hosseini and Armacost (1992) presented a study on two group discriminant problem with equal group mean vectors with several methods and mathematical formulations. For comparative purposes, both Fishers linear discriminant function (FLDF) and that of QDF were used. Both methods performed better in the case of multivariate non-normal distributions than compared to that of the one generated from a multivariate normal distribution. All the various discriminatory methods performed better generally when the covariance matrices for the two populations were assumed to be unequal. Also, less favourable performance was observed for FLDF as well as QDF with presence of outliers than when there was absence of outliers/noise. Lachenbruch and Goldstein (1979) considered the effects of initial misclassification on the QDF. In his simulation, a population of two with equal priori probabilities, mean of 0 and 2 and number of variables, 2, 4, 8 and a fraction α i of the n i , which are actually from the other population, were considered. He then suggested that if initial misclassification is suspected, all sample points should be carefully checked and reassigned if needed. Krzanowski and Hand (1977) considered an assessment of error rate estimators paying special attention to the leave-oneout method. The estimator was investigated in a simulation study, both in absolute terms and in comparison with a popular bootstrap estimator. Motivated by this, extension of leave-one-out, the leave-two-out was looked at considering the variance. As expected, the leave-two-out method yields a slight variance reduction relative to the leave-one-out method, but was not enough to make it a good competitor.
In order to study the asymptotic error rates of linear, quadratic and logistic rules, Kakaï and Pelz (2010) conducted a Monte Carlo study in two, three and five-group discriminant analysis. The simulation study took into account the overlap of the populations (e = 0.05, e = 0.1, e = 1.5), their common distribution (normal, chi-square with 4, 8 and 12 df ) and their heteroscedasticity degree, Ŵ, measured by the value of the power function, 1 − β of the homoscedasticity test related to Ŵ (1 − β = 0.05, 1 − β = 0.4 , 1 − β = 0.6, 1 − β = 0.8). They found that the three rules gave similar error rates for normal homoscedastic populations. For non-normal populations, quadratic rule still gave lowest relative error except for two-group where logistic was the best. The quadratic and logistic rules were more influenced by the number of groups irrespective of their lowest relative error. Also linear and quadratic were more influenced by non-normality. The study deviates from Lachenbruch et al. (1977) by focusing on three populations, unequal sample sizes and log-normal distribution for the skewness. Croux (2004) studied the influence of observations on the misclassification probability in quadratic discriminant analysis. They also studied the effect of observations in the training sample on the performance of the associated classification rule. MacFarland (2001) investigated into the exact misclassification probabilities for plug-in normal quadratic functions; the case of equal mean. A stochastic representations for the exact distributions of the "plug-in" quadratic discriminant functions was derived for classifying a newly obtained observation.
As evident in the above literatures, several researchers have done extensive work on the performance of various discriminant and classification functions under skewed or non normal distributions. However, not much attention has been focused on studying and evaluating the performance of these classifiers using three populations under skewed distribution considering different sampling ratios, under different centroid separators and under varying variable selections. This study therefore seeks to investigate the performance of a single classifier (i.e the QDF) under skewed distribution considering different variable selections, varying sampling ratios and varying centroid separators considering three groups/populations.

The quadratic classifier ( 1 � = 2 )
Suppose that the joint densities of X ′ = [X 1 , X 2 , . . . , X p ] for population 1 and 2 are given by When the multivariate normal densities have different covariance structures, the terms in the density ratio involving |� 1/2 i | do not cancel as they do when we have equal covariance matrices and also the quadratic forms in the exponents of f i (x) do not combine. Therefore substituting multivariate normal densities with different covariance matrices into Eq. (1) and after taking the natural logarithms and simplifying, the likelihood of the density ratios gives the quadratic function (assuming equal misclassification cost). Allocate x to 1 if where otherwise, x ∈ 2 . Considering the Mahalanobis distance, the function is sometimes written as is the Mahalanobis square distance. When 1 = 2 the function reduces to the linear classifier rule. This function is easily extended to the three group classification where two cut off points are required for assigning observations to the three groups (Johnson and Wichern 2007). (1)

Simulation design
We evaluated the performance of QDF in case of skewed training samples following non normal distribution. In the simulation procedure, multivariate normally correlated random data was generated for three populations with their mean vector µ 1 = (0, . . . , 0), µ 2 = (0, . . . , δ) and µ 3 = (0, . . . , 2δ) respectively using MatLab R2009a. The covariance matrices, � i (i = 1, 2, 3), where k � = l, σ kl = 0.7 for all groups except the diagonal entries given as σ 2 k = i, for i = 1, 2, 3 were obtained. Three different groups or populations which are normally correlated data were generated. Since the researchers were interested in evaluating the performance of the QDF under skewed uncorrelated data, the data was transformed from correlated normal to skewed data. In transforming the data, skewed data was generated by taking an exponents of the normally correlated/ log normal data.
QDF was then performed in each case and the leave-one-out method was used to estimate the proportion of observations misclassified. Factors considered in this study were: 1. Mean vector separator which is set at δ from 1 to 5 where δ is determined by the difference between the mean vectors. 2. Sample sizes which are also specified. Here 14 values of n 1 set at 30, 60, 100, 150, 200, 250, 300, 400, 500, 600, 700, 800, 1000, 2000 and the sample size of n 2 and n 3 are determined by the sample ratios at 1:1:1, 1:2:2 and 1:2:3 and these ratios also determined the prior probabilities to be considered. 3. The number of variables were set at 4, 6 and 8 following (Murray 1977). 4. The size of population 1 (n 1 ) was fixed throughout the study and the sizes of populations 2 and 3, n 2 and n 3 respectively are determined by the sample size ratio under consideration.

Evaluating the performance of the QDF
Let r denote the classification rule obtained on individuals belonging to p-variate populations with mixture density F. The error rate can be defined as the overall probability of misclassification associated with the classification rule. The probability e jk (r, F) that r allocates a random observation vector X to G j whiles it belongs to G k and is computed as follows [McLachlan (1992) as cited in Kakaï and Pelz (2010)].
The overall error rate e(r, F) associated with r is computed as shown below.
where p k (k = 1, . . . , g) is the group prior probability of G k

Results and discussion
This sections presents the outcome and discussion of the simulation results of the asymptotic performance of the QDF under skewed training samples.
e jk (r, F)

Performance of QDF under varying sampling ratios
From the results, there was an increase in the average error rates of the sample size ratios 1:2:2 and 1:2:3 as the total sample size increased asymptotically in the skewed distribution for δ = 1-3 as shown in Figs. 1, 2 and 3. In Fig. 1 for δ = 1 the lowest error rates were reported for equal sample size ratios (1:1:1). The error rates reduced marginally across the number of variables. Improvement in the performance was achieved with increased Mahalanobis distance and not asymptotically. The patterns of the error rates did not change significantly beyond δ = 3 as shown in Fig. 3. The average error rates for

Effects of number of variables on the performance of the QDF
The QDF performs differently with increasing number of variables. For sample size ratio 1:1:1, the average error rates of the variables reduced and curved upward as the total sample size increased for all δs, as shown in Fig. 4. The average error rates of sample ratios 1:2:2 and 1:2:3 were different as shown in Figs. 5 and 6. Also from Figs. 5 and 6 the average error rate of the QDF for the respective populations increased as the total sample size increased and reduced with increasing number of variable for δ = 1 and 2 . In δ = 3 and 4 of ratios 1:2:2 and 1:2:3, as the number of variables increased the average error rate of the QDF dropped from the total sample size of 150-300 and increased as the sample size also increased respectively while that of δ = 5 decreased marginally. In general the average error rate increased as the number of variables increased with increasing δ.

Effects of group centroid separator on the performance of QDF
The average error rate of the skewed distribution for sample size ratio 1:1:1 in Fig. 7 revealed that, as the sample size increases, the average error rates of the individual δs generally reduces. Also from Fig. 8, the error rates increased marginally for the individual deltas (centroids separators) as the sample sizes increases. However the performance of the QDF was quite abysmal when the centroid separator was set at δ = 1 as compared to the other deltas since it recorded the highest error rates with respect to each of the variable selections as 0.20. Also as clearly indicated in Fig. 8, the error rates of the QDF Fig. 7 Average error rates of skewed distribution for δ: n 1 :n 2 :n 3 = 1:1:1 Fig. 8 Average error rates of skewed distribution for δ: n 1 :n 2 :n 3 = 1:2:2 was minimised when the group centroid separator was set at δ = 5. Hence increasing the group centroid separators minimizes the misclassification rates thereby enhancing the performance of the QDF under the sample ratio of 1:2:2. Finally the performance of the QDF was evaluated under the sampling ratio of 1:2:3 with respect to the three groups/ populations, π 1 , π 2 , π 3 with different selections of group centroids as shown in Fig. 9. From Fig. 9, similar results were obtained and the performance of the QDF was better under increasing group centroid separators, irrespective of the number of variables considered at a particular instance but was also dependent on the sample size selection.

Conclusion
This paper investigated the asymptotic performance of QDF on skewed training data for three populations (π i , i = 1, 2, 3) with increasing group centroid (δ), with chosen variables and sample size ratios. Results from the study indicates that, the QDF performed quite poorly with an increase in error rates under sample ratios 1:2:2 and 1:2:3 for δ = 1 -δ = 3. Other results also indicates that, the QDF performs better under an equal sample size ratio (1:1:1) resulting in a reduced misclassification rate with minimized error rates. The group centroid separators increased with decreasing group error rates and sample sizes. In other words, the QDF performed better in classifying the observations into their respective groups when the group centroid separators were increased. Also with increasing number of variables, from 4 to 8, the average error rate for evaluating the performance of the QDF dropped under δ = 3, 4 for sample ratios 1:2:2 and 1:2:3. Generally, the study found that, there is always a pronouncement in the reduction of misclassification error rates as the group centroid separator increases as compared to an increasing sample size ratios. The results obtained from this study (skewed distribution) shows some conformity with Lachenbruch et al. (1977). Lachenbruch et al. (1977) generated random samples through simulations under non-normal distribution. Johnson's Fig. 9 Average error rates of skewed distribution for δ: n 1 :n 2 :n 3 = 1:2:3 system of transformation was used to transform the generated random samples into components by components. After the transformation, the QDF was derived and its performance was evaluated by the estimated mean error rates, standard deviation and sample variability. From their study the QDF recorded very high and increasing error rates, standard deviation under non-normality compared with the performance of the function under normally distributed data/training samples. In other words, they discovered that the QDF under non normal samples generally performs quite poorly as compared to when their performance are evaluated under normal distribution.