Choosing Between Two Classification Learning Algorithms Based on Calibrated Balanced $$5\times 2$$ Cross-Validated F-Test

Wang, Yu; Li, Jihong; Li, Yanfang

doi:10.1007/s11063-016-9569-z

Choosing Between Two Classification Learning Algorithms Based on Calibrated Balanced $5\times 2$ Cross-Validated F-Test

Published: 25 November 2016

Volume 46, pages 1–13, (2017)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Yu Wang¹,
Jihong Li^1,2 &
Yanfang Li²

340 Accesses
23 Citations
Explore all metrics

Abstract

$5\times 2$ cross-validated F-test based on independent five replications of 2-fold cross-validation is recommended in choosing between two classification learning algorithms. However, the reusing of the same data in a $5\times 2$ cross-validation causes the real degree of freedom (DOF) of the test to be lower than the F(10, 5) distribution given by (Neural Comput 11:1885–1892, [1]). This easily leads the test to suffer from high type I and type II errors. Random partitions for $5\times 2$ cross-validation result in difficulty in analyzing the DOF for the test. In particular, Wang et al. (Neural Comput 26(1):208–235, [2]) proposed a new blocked $3 \times 2$ cross-validation, that considered the correlation between any two 2-fold cross-validations. Based on this, a calibrated balanced $5\times 2$ cross-validated F-test following F(7, 5) distribution is put forward in this study by calibrating the DOF for the F(10, 5) distribution. Simulated and real data studies demonstrate that the calibrated balanced $5\times 2$ cross-validated F-test has lower type I and type II errors than the $5\times 2$ cross-validated F-test following F(10, 5) in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Tuning Parameter Selection Based on Blocked $$3\times 2$$ Cross-Validation for High-Dimensional Linear Regression Model

Article 15 October 2019

Xingli Yang, Yu Wang, … Jihong Li

Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization

Subsampling bias and the best-discrepancy systematic cross validation

Article 21 November 2019

Liang Guo, Jianya Liu & Ruodan Lu

References

Alpaydin E (1999) Combined $5\times 2$ cv $F$ test for comparing supervised classification learning algorithms. Neural Comput 11(8):1885–1892
Article Google Scholar
Wang Y, Ruibo W, Huichen J, Jihong L (2014) Blocked $3\times 2$ cross-validated t-test for comparing supervised classification learning algorithms. Neural Comput 26(1):208–235
Article MathSciNet Google Scholar
Bengio Y, Grandvalet Y (2004) No unbiased estimator of the variance of $K$-fold cross-validation. J Mach Learn Res 5:1089–1105
MathSciNet MATH Google Scholar
Grandvalet Y, Bengio Y (2006) Hypothesis testing for cross-validation. Technical report. University of Montreal, Montreal
Markatou M, Tian H, Biswas S, Hripcsak G (2005) Analysis of variance of cross-validation estimators of the generalization error. J Mach Learn Res 6:1127–1168
MathSciNet MATH Google Scholar
Nadeau C, Bengio Y (2003) Inference for the generalization error. Mach Learn 52(3):239–281
Article MATH Google Scholar
Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1924
Article Google Scholar
Yildiz OT (2013) Omnivariate rule induction using a novel pairwise statistical test. IEEE Trans Knowl Data Eng 25:2105–2118
Article Google Scholar
Chen W, Gallas BD, Yousef WA (2012) Classifier variability: accounting for training and testing. Pattern Recognit 45:2661–2671
Article MATH Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Garcia S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
Ulas A, Yildiz OT, Alpaydin E (2012) Cost-conscious comparison of supervised learning algorithms over multiple data sets. Pattern Recognit 45:1772–1781
Article Google Scholar
Wang Y, Jihong L, Yanfang L (2015) Measure for data partitioning in $m\times 2$ cross-validation. Pattern Recognit Lett 65:211–217
Article Google Scholar
Yildiz OT, Alpaydin E (2006) Ordering and finding the best of $K>2$ supervised learning algorithms. IEEE Trans Pattern Anal Mach Intell 28:392–402
Article Google Scholar
Bouckaert RR, Frank E (2004) Evaluating the replicability of significance tests for comparing learning algorithms. PAKDD, LNAI 3056, 3–12
Bouckaert RR (2003) Choosing between two learning algorithms based on calibrated tests. In: Proceedings of the twentieth international conference on machine learning. pp 51–58
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):1–47
Article Google Scholar
Brenneman WA, Nair VN (2001) Methods for identifying dispersion effects in unreplicated factorial experiments: a critical analysis and proposed strategies. Technometrics 43:388–404
Article MathSciNet Google Scholar
Satterhwaite FE (1946) An approximate distribution of estimates of variance components. Biom Bull 2:110–114
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural and Social Science Funds of China (61503228, 16BTJ034), Natural Science Fund of Shanxi Province (201601D011046) and Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase).

Author information

Authors and Affiliations

School of Software, Shanxi University, Taiyuan, 030006, People’s Republic of China
Yu Wang & Jihong Li
School of Mathematical Sciences, Shanxi University, Taiyuan, 030006, People’s Republic of China
Jihong Li & Yanfang Li

Authors

Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jihong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yanfang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jihong Li.

Appendix

Proof of proposition

Denoting $U=(\hat{\mu }_{B_1}^{(1)}, \hat{\mu }_{B_2}^{(1)}, \hat{\mu }_{B_1}^{(2)}, \hat{\mu }_{B_2}^{(2)}, \ldots , \hat{\mu }_{B_1}^{(5)}, \hat{\mu }_{B_2}^{(5)})^{T}$, we have $U\sim N(0, \sigma ^{2}\Sigma )$ from the assumption of Proposition, where

$$\begin{aligned} \Sigma = \left( \begin{array}{cccccc} 1 &{} \rho _{1} &{}\rho _{2} &{} \cdots &{}\rho _{2} &{}\rho _{2}\\ \rho _{1} &{}1 &{}\rho _{2} &{}\cdots &{}\rho _{2} &{}\rho _{2}\\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots &{} \vdots \\ \rho _{2} &{}\rho _{2} &{}\rho _{2} &{} \cdots &{}1 &{}\rho _{1}\\ \rho _{2} &{} \rho _{2} &{}\rho _{2} &{} \cdots &{}\rho _{1} &{}1 \\ \end{array}\right) _{10\times 10} \end{aligned}$$

The eigenvalues of $\Sigma $ are obtained easily from $|\lambda I-\Sigma |=0$: $\lambda _{1}=1-\rho _{1}$ with multiplicity 5, $\lambda _{6}=-2\rho _{2}+\rho _{1}+1$ with multiplicity 4, and $\lambda _{10}=\rho _{1}+8\rho _{2}+1$. Thus, we can conclude that real symmetric matrix $\Sigma $ represents a positive definite matrix when $0\le \rho _{1}\le \rho _{2}<0.50$. We can also conclude that $\Sigma ^{\frac{1}{2}}$ is a positive definite matrix from the decomposition $\Sigma =\Sigma ^{\frac{1}{2}} \Sigma ^{\frac{1}{2}}$.

Let $U_{1}=U/\sigma , Z=\Sigma ^{-\frac{1}{2}}U_{1}$, then we have $U_{1}\sim N(0,\Sigma ), Z\sim N(0, I_{10})$ and obviously $U_{1}^{T}U_{1}=Z^{T}\Sigma Z$.

An orthogonal matrix exists for each n order real symmetric matrix such that the matrix can be diagonalized. Thus, an orthogonal matrix T exists such that $T\Sigma T^{T}=\Lambda $, i.e., $\Sigma =T^{T}\Lambda T, $ where $\Lambda $ is a diagonal matrix, and its element is the eigenvalue of $\Sigma $.

We know that $TZ\sim N(0, I_{10})$ from the properties of the orthogonal matrix, then $U_{1}^{T}U_{1}=Z^{T}\Sigma Z=Z^{T}T^{T} \Lambda TZ=\sum _{i=1}^{10}\lambda _{i}\eta _{i}^{2}$. Thus, $U_{1}^{T}U_{1}$ approximately follows an $C \chi ^{2}(f)$ distribution because $\sum _{i=1}^{10}\lambda _{i}\eta _{i}^{2}$ approximately follows $C \chi ^{2}(f)$ distribution, where $\lambda _{i}$ denotes the eigenvalue of $\Sigma $, $\eta _{i}$ is the i-th element of matrix TZ, and

$$\begin{aligned} C=\frac{\sum _{i=1}^{10}\lambda _{i}^{2}}{\sum _{i=1}^{10}\lambda _{i}} =1+\rho _{1}^{2}+8\rho _{2}^{2}, f=\frac{(\sum _{i=1}^{10}\lambda _{i})^{2}}{\sum _{i=1}^{10}\lambda _{i}^{2}}=\frac{10}{1+\rho _{1}^{2}+8\rho _{2}^{2}} \end{aligned}$$

(see [18, 19]).

Note $\sum _{i=1}^{10}\lambda _{i}=10, \sum _{i=1}^{10}\lambda _{i}^{2}=10(1+\rho _{1}^{2}+8\rho _{2}^{2}), fC=\sum _{i=1}^{10}\lambda _{i}=10. $

Moreover, we have $Var(\hat{\mu }_{B_1}^{(i)}-\hat{\mu }_{B_2}^{(i)})=2(1-\rho _{1})\sigma ^{2}, Cov(\hat{\mu }_{B_1}^{(i)}-\hat{\mu }_{B_2}^{(i)},\hat{\mu }_{B_1}^{(i')}-\hat{\mu }_{B_2}^{(i')}) = (\rho _{2}-\rho _{2}-\rho _{2}+\rho _{2})\sigma ^{2}=0$ for $i\ne i'$. If letting $U_{2}=((\hat{\mu }_{B_1}^{(1)}-\hat{\mu }_{B_2}^{(1)})/\sigma , \ldots , (\hat{\mu }_{B_1}^{(5)}-\hat{\mu }_{B_2}^{(5)})/\sigma )$, then $U_{2} \sim N(0,\Sigma _{2}), $ and $U_{2}^{T}U_{2} \sim 2(1-\rho _{1}) \chi ^{2}(5)$, where

$$\begin{aligned} \Sigma _{2}= \left( \begin{array}{cccc} 2(1-\rho _{1}) &{}0 &{} \cdots &{}0\\ 0 &{}2(1- \rho _{1}) &{}\cdots &{}0\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0 &{}0 &{}2(1- \rho _{1}) &{} \cdots \\ 0 &{} 0 &{} \cdots &{}2(1-\rho _{1}) \\ \end{array}\right) _{5\times 5} \end{aligned}$$

These result in

$$\begin{aligned} \frac{\sum _{i=1}^{5}\sum _{k=1}^{2}(\hat{\mu }_{B_k}^{(i)})^{2}/C / f}{\sum _{i=1}^{5}(\hat{\mu }_{B_1}^{(i)}-\hat{\mu }_{B_2}^{(i)})^{2}/(2(1-\rho _{1}))/ 5} =\frac{U_{1}^{T}U_{1}}{U_{2}^{T}U_{2}} \frac{10(1-\rho _{1})}{fC} =(1-\rho _{1})\frac{U_{1}^{T}U_{1}}{U_{2}^{T}U_{2}}\sim F(f,5) . \end{aligned}$$

We know that

$$\begin{aligned} {S_B}_{i}^2= & {} \sum _{k=1}^{2}(\hat{\mu }_{B_k}^{(i)}-\hat{\mu }_B^{(i)})^{2} =(\hat{\mu }_{B_1}^{(i)}-(\frac{\hat{\mu }_{B_1}^{(i)}+\hat{\mu }_{B_2}^{(i)}}{2}))^{2} +(\hat{\mu }_{B_2}^{(i)}-(\frac{\hat{\mu }_{B_1}^{(i)}+\hat{\mu }_{B_2}^{(i)}}{2}))^{2}\\= & {} \frac{(\hat{\mu }_{B_2}^{(i)}-\hat{\mu }_{B_1}^{(i)})^{2}}{2}, \end{aligned}$$

where $\hat{\mu }_B^{(i)}=\frac{\hat{\mu }_{B_1}^{(i)}+\hat{\mu }_{B_2}^{(i)}}{2}$.

Therefore,

$$\begin{aligned} F=(1-\rho _{1})\frac{\sum _{i=1}^{5}\sum _{k=1}^{2}(\hat{\mu }_{B_k}^{(i)})^{2}}{\sum _{i=1}^{5}(\hat{\mu }_{B_1}^{(i)}-\hat{\mu }_{B_2}^{(i)})^{2}} =\frac{1-\rho _{1}}{2}\frac{\sum _{i=1}^{5}\sum _{k=1}^{2} (\hat{\mu }_{B_k}^{(i)})^{2}}{\sum _{i=1}^{5}{S_B}_{i}^{2}}\sim F(f,5), \end{aligned}$$

where $f=\frac{10}{1+\rho _{1}^{2}+8\rho _{2}^{2}}.$ $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Li, J. & Li, Y. Choosing Between Two Classification Learning Algorithms Based on Calibrated Balanced $5\times 2$ Cross-Validated F-Test. Neural Process Lett 46, 1–13 (2017). https://doi.org/10.1007/s11063-016-9569-z

Download citation

Published: 25 November 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11063-016-9569-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Choosing Between Two Classification Learning Algorithms Based on Calibrated Balanced \(5\times 2\) Cross-Validated F-Test

Abstract

Access this article

Similar content being viewed by others

Tuning Parameter Selection Based on Blocked $$3\times 2$$ Cross-Validation for High-Dimensional Linear Regression Model

Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization

Subsampling bias and the best-discrepancy systematic cross validation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of proposition

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Choosing Between Two Classification Learning Algorithms Based on Calibrated Balanced \(5\times 2\) Cross-Validated F-Test

Abstract

Access this article

Similar content being viewed by others

Tuning Parameter Selection Based on Blocked $$3\times 2$$ Cross-Validation for High-Dimensional Linear Regression Model

Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization

Subsampling bias and the best-discrepancy systematic cross validation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of proposition

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation