Skip to main content
Log in

Choosing Between Two Classification Learning Algorithms Based on Calibrated Balanced \(5\times 2\) Cross-Validated F-Test

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

\(5\times 2\) cross-validated F-test based on independent five replications of 2-fold cross-validation is recommended in choosing between two classification learning algorithms. However, the reusing of the same data in a \(5\times 2\) cross-validation causes the real degree of freedom (DOF) of the test to be lower than the F(10, 5) distribution given by (Neural Comput 11:1885–1892, [1]). This easily leads the test to suffer from high type I and type II errors. Random partitions for \(5\times 2\) cross-validation result in difficulty in analyzing the DOF for the test. In particular, Wang et al. (Neural Comput 26(1):208–235, [2]) proposed a new blocked \(3 \times 2\) cross-validation, that considered the correlation between any two 2-fold cross-validations. Based on this, a calibrated balanced \(5\times 2\) cross-validated F-test following F(7, 5) distribution is put forward in this study by calibrating the DOF for the F(10, 5) distribution. Simulated and real data studies demonstrate that the calibrated balanced \(5\times 2\) cross-validated F-test has lower type I and type II errors than the \(5\times 2\) cross-validated F-test following F(10, 5) in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alpaydin E (1999) Combined \(5\times 2\) cv \(F\) test for comparing supervised classification learning algorithms. Neural Comput 11(8):1885–1892

    Article  Google Scholar 

  2. Wang Y, Ruibo W, Huichen J, Jihong L (2014) Blocked \(3\times 2\) cross-validated t-test for comparing supervised classification learning algorithms. Neural Comput 26(1):208–235

    Article  MathSciNet  Google Scholar 

  3. Bengio Y, Grandvalet Y (2004) No unbiased estimator of the variance of \(K\)-fold cross-validation. J Mach Learn Res 5:1089–1105

    MathSciNet  MATH  Google Scholar 

  4. Grandvalet Y, Bengio Y (2006) Hypothesis testing for cross-validation. Technical report. University of Montreal, Montreal

  5. Markatou M, Tian H, Biswas S, Hripcsak G (2005) Analysis of variance of cross-validation estimators of the generalization error. J Mach Learn Res 6:1127–1168

    MathSciNet  MATH  Google Scholar 

  6. Nadeau C, Bengio Y (2003) Inference for the generalization error. Mach Learn 52(3):239–281

    Article  MATH  Google Scholar 

  7. Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1924

    Article  Google Scholar 

  8. Yildiz OT (2013) Omnivariate rule induction using a novel pairwise statistical test. IEEE Trans Knowl Data Eng 25:2105–2118

    Article  Google Scholar 

  9. Chen W, Gallas BD, Yousef WA (2012) Classifier variability: accounting for training and testing. Pattern Recognit 45:2661–2671

    Article  MATH  Google Scholar 

  10. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  11. Garcia S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694

    MATH  Google Scholar 

  12. Ulas A, Yildiz OT, Alpaydin E (2012) Cost-conscious comparison of supervised learning algorithms over multiple data sets. Pattern Recognit 45:1772–1781

    Article  Google Scholar 

  13. Wang Y, Jihong L, Yanfang L (2015) Measure for data partitioning in \(m\times 2\) cross-validation. Pattern Recognit Lett 65:211–217

    Article  Google Scholar 

  14. Yildiz OT, Alpaydin E (2006) Ordering and finding the best of \(K>2\) supervised learning algorithms. IEEE Trans Pattern Anal Mach Intell 28:392–402

    Article  Google Scholar 

  15. Bouckaert RR, Frank E (2004) Evaluating the replicability of significance tests for comparing learning algorithms. PAKDD, LNAI 3056, 3–12

  16. Bouckaert RR (2003) Choosing between two learning algorithms based on calibrated tests. In: Proceedings of the twentieth international conference on machine learning. pp 51–58

  17. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):1–47

    Article  Google Scholar 

  18. Brenneman WA, Nair VN (2001) Methods for identifying dispersion effects in unreplicated factorial experiments: a critical analysis and proposed strategies. Technometrics 43:388–404

    Article  MathSciNet  Google Scholar 

  19. Satterhwaite FE (1946) An approximate distribution of estimates of variance components. Biom Bull 2:110–114

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural and Social Science Funds of China (61503228, 16BTJ034), Natural Science Fund of Shanxi Province (201601D011046) and Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jihong Li.

Appendix

Appendix

Proof of proposition

Denoting \(U=(\hat{\mu }_{B_1}^{(1)}, \hat{\mu }_{B_2}^{(1)}, \hat{\mu }_{B_1}^{(2)}, \hat{\mu }_{B_2}^{(2)}, \ldots , \hat{\mu }_{B_1}^{(5)}, \hat{\mu }_{B_2}^{(5)})^{T}\), we have \(U\sim N(0, \sigma ^{2}\Sigma )\) from the assumption of Proposition, where

$$\begin{aligned} \Sigma = \left( \begin{array}{cccccc} 1 &{} \rho _{1} &{}\rho _{2} &{} \cdots &{}\rho _{2} &{}\rho _{2}\\ \rho _{1} &{}1 &{}\rho _{2} &{}\cdots &{}\rho _{2} &{}\rho _{2}\\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots &{} \vdots \\ \rho _{2} &{}\rho _{2} &{}\rho _{2} &{} \cdots &{}1 &{}\rho _{1}\\ \rho _{2} &{} \rho _{2} &{}\rho _{2} &{} \cdots &{}\rho _{1} &{}1 \\ \end{array}\right) _{10\times 10} \end{aligned}$$

The eigenvalues of \(\Sigma \) are obtained easily from \(|\lambda I-\Sigma |=0\): \(\lambda _{1}=1-\rho _{1}\) with multiplicity 5, \(\lambda _{6}=-2\rho _{2}+\rho _{1}+1\) with multiplicity 4, and \(\lambda _{10}=\rho _{1}+8\rho _{2}+1\). Thus, we can conclude that real symmetric matrix \(\Sigma \) represents a positive definite matrix when \(0\le \rho _{1}\le \rho _{2}<0.50\). We can also conclude that \(\Sigma ^{\frac{1}{2}}\) is a positive definite matrix from the decomposition \(\Sigma =\Sigma ^{\frac{1}{2}} \Sigma ^{\frac{1}{2}}\).

Let \(U_{1}=U/\sigma , Z=\Sigma ^{-\frac{1}{2}}U_{1}\), then we have  \(U_{1}\sim N(0,\Sigma ), Z\sim N(0, I_{10})\) and obviously \(U_{1}^{T}U_{1}=Z^{T}\Sigma Z\).

An orthogonal matrix exists for each n order real symmetric matrix such that the matrix can be diagonalized. Thus, an orthogonal matrix T exists such that \(T\Sigma T^{T}=\Lambda \), i.e., \(\Sigma =T^{T}\Lambda T, \) where \(\Lambda \) is a diagonal matrix, and its element is the eigenvalue of \(\Sigma \).

We know that \(TZ\sim N(0, I_{10})\) from the properties of the orthogonal matrix, then \(U_{1}^{T}U_{1}=Z^{T}\Sigma Z=Z^{T}T^{T} \Lambda TZ=\sum _{i=1}^{10}\lambda _{i}\eta _{i}^{2}\). Thus, \(U_{1}^{T}U_{1}\) approximately follows an  \(C \chi ^{2}(f)\) distribution because \(\sum _{i=1}^{10}\lambda _{i}\eta _{i}^{2}\) approximately follows  \(C \chi ^{2}(f)\) distribution, where \(\lambda _{i}\) denotes the eigenvalue of \(\Sigma \), \(\eta _{i}\) is the i-th element of matrix TZ, and

$$\begin{aligned} C=\frac{\sum _{i=1}^{10}\lambda _{i}^{2}}{\sum _{i=1}^{10}\lambda _{i}} =1+\rho _{1}^{2}+8\rho _{2}^{2}, f=\frac{(\sum _{i=1}^{10}\lambda _{i})^{2}}{\sum _{i=1}^{10}\lambda _{i}^{2}}=\frac{10}{1+\rho _{1}^{2}+8\rho _{2}^{2}} \end{aligned}$$

(see [18, 19]).

Note \(\sum _{i=1}^{10}\lambda _{i}=10, \sum _{i=1}^{10}\lambda _{i}^{2}=10(1+\rho _{1}^{2}+8\rho _{2}^{2}), fC=\sum _{i=1}^{10}\lambda _{i}=10. \)

Moreover, we have \(Var(\hat{\mu }_{B_1}^{(i)}-\hat{\mu }_{B_2}^{(i)})=2(1-\rho _{1})\sigma ^{2}, Cov(\hat{\mu }_{B_1}^{(i)}-\hat{\mu }_{B_2}^{(i)},\hat{\mu }_{B_1}^{(i')}-\hat{\mu }_{B_2}^{(i')}) = (\rho _{2}-\rho _{2}-\rho _{2}+\rho _{2})\sigma ^{2}=0\) for \(i\ne i'\). If letting \(U_{2}=((\hat{\mu }_{B_1}^{(1)}-\hat{\mu }_{B_2}^{(1)})/\sigma , \ldots , (\hat{\mu }_{B_1}^{(5)}-\hat{\mu }_{B_2}^{(5)})/\sigma )\), then \(U_{2} \sim N(0,\Sigma _{2}), \) and \(U_{2}^{T}U_{2} \sim 2(1-\rho _{1}) \chi ^{2}(5)\), where

$$\begin{aligned} \Sigma _{2}= \left( \begin{array}{cccc} 2(1-\rho _{1}) &{}0 &{} \cdots &{}0\\ 0 &{}2(1- \rho _{1}) &{}\cdots &{}0\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0 &{}0 &{}2(1- \rho _{1}) &{} \cdots \\ 0 &{} 0 &{} \cdots &{}2(1-\rho _{1}) \\ \end{array}\right) _{5\times 5} \end{aligned}$$

These result in

$$\begin{aligned} \frac{\sum _{i=1}^{5}\sum _{k=1}^{2}(\hat{\mu }_{B_k}^{(i)})^{2}/C / f}{\sum _{i=1}^{5}(\hat{\mu }_{B_1}^{(i)}-\hat{\mu }_{B_2}^{(i)})^{2}/(2(1-\rho _{1}))/ 5} =\frac{U_{1}^{T}U_{1}}{U_{2}^{T}U_{2}} \frac{10(1-\rho _{1})}{fC} =(1-\rho _{1})\frac{U_{1}^{T}U_{1}}{U_{2}^{T}U_{2}}\sim F(f,5) . \end{aligned}$$

We know that

$$\begin{aligned} {S_B}_{i}^2= & {} \sum _{k=1}^{2}(\hat{\mu }_{B_k}^{(i)}-\hat{\mu }_B^{(i)})^{2} =(\hat{\mu }_{B_1}^{(i)}-(\frac{\hat{\mu }_{B_1}^{(i)}+\hat{\mu }_{B_2}^{(i)}}{2}))^{2} +(\hat{\mu }_{B_2}^{(i)}-(\frac{\hat{\mu }_{B_1}^{(i)}+\hat{\mu }_{B_2}^{(i)}}{2}))^{2}\\= & {} \frac{(\hat{\mu }_{B_2}^{(i)}-\hat{\mu }_{B_1}^{(i)})^{2}}{2}, \end{aligned}$$

where \(\hat{\mu }_B^{(i)}=\frac{\hat{\mu }_{B_1}^{(i)}+\hat{\mu }_{B_2}^{(i)}}{2}\).

Therefore,

$$\begin{aligned} F=(1-\rho _{1})\frac{\sum _{i=1}^{5}\sum _{k=1}^{2}(\hat{\mu }_{B_k}^{(i)})^{2}}{\sum _{i=1}^{5}(\hat{\mu }_{B_1}^{(i)}-\hat{\mu }_{B_2}^{(i)})^{2}} =\frac{1-\rho _{1}}{2}\frac{\sum _{i=1}^{5}\sum _{k=1}^{2} (\hat{\mu }_{B_k}^{(i)})^{2}}{\sum _{i=1}^{5}{S_B}_{i}^{2}}\sim F(f,5), \end{aligned}$$

where \(f=\frac{10}{1+\rho _{1}^{2}+8\rho _{2}^{2}}.\) \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Li, J. & Li, Y. Choosing Between Two Classification Learning Algorithms Based on Calibrated Balanced \(5\times 2\) Cross-Validated F-Test. Neural Process Lett 46, 1–13 (2017). https://doi.org/10.1007/s11063-016-9569-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-016-9569-z

Keywords

Navigation