Learning vector quantization classifiers for ROC-optimization

Villmann, T.; Kaden, M.; Hermann, W.; Biehl, M.

doi:10.1007/s00180-016-0678-y

Learning vector quantization classifiers for ROC-optimization

Original Paper
Published: 27 August 2016

Volume 33, pages 1173–1194, (2018)
Cite this article

Computational Statistics Aims and scope Submit manuscript

T. Villmann¹,
M. Kaden¹,
W. Hermann² &
…
M. Biehl³

722 Accesses
8 Citations
Explore all metrics

Abstract

This paper proposes a variant of the generalized learning vector quantizer (GLVQ) optimizing explicitly the area under the receiver operating characteristics (ROC) curve for binary classification problems instead of the classification accuracy, which is frequently not appropriate for classifier evaluation. This is particularly important in case of overlapping class distributions, when the user has to decide about the trade-off between high true-positive and good false-positive performance. The model keeps the idea of learning vector quantization based on prototypes by stochastic gradient descent learning. For this purpose, a GLVQ-based cost function is presented, which describes the area under the ROC-curve in terms of the sum of local discriminant functions. This cost function reflects the underlying rank statistics in ROC analysis being involved into the design of the prototype based discriminant function. The resulting learning scheme for the prototype vectors uses structured inputs, i.e. ordered pairs of data vectors of both classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

To ensure convergence of SGDL one has to require that \(\sum _{t=1}^{\infty }\varepsilon \left( t\right) =\infty \), whereas \(\sum _{t=1}^{\infty }\varepsilon ^{2}\left( t\right) <\infty \) (Robbins and Monro 1951; Graf and Lushgy 2000). In practice, frequently the learning rate is set to be constant to a small positive value delivering also convergent behavior without loss of quality, i.e. \(\varepsilon \left( t\right) =\varepsilon \ll 1\) (Haykin 1994). If it is not declared otherwise we take this latter option.
The algorithm is implemented in MATLAB^TM as the ROCGMLVQ-package (Vers. 1.7). It is available from the authors by personal request or via the webpage https://www.cb.hs-mittweida.de/webs/villmann/research/tools-data.html.

References

Ataman K, Street WN, Zhang Y (2006) Learning to rank by maximizing AUC with linear programming. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN). IEEE Press, pp 123–129
Baldi P, Brunak S, Chauvin Y, Andersen C, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424
Article Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MATH Google Scholar
Berger JO (1993) Statistical decision theory and Bayesian analysis. Springer series in statistics, 3rd edn. Springer, New York
Google Scholar
Biehl M, Hammer B, Merényi E, Sperduti A, Villman T (2011) Learning in the context of very high dimensional data (Dagstuhl Seminar 11341). Dagstuhl Rep 1(8):67–95
Google Scholar
Biehl M, Kaden M, Stürmer P, Villmann T (2014) ROC-optimization and statistical quality measures in learning vector quantization classifiers. Mach Learn Rep, 8(MLR-01-2014):23–34, ISSN:1865-3960, http://www.techfak.uni-bielefeld.de/~fschleif/mlr/mlr_01_2014.pdf
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Dep. of Information and Computer Science, Irvine. http://www.ics.edu/mlearn/MLRepository.html
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1149–1155
Article Google Scholar
Brefeld U, Scheffer T (2005) AUC maximizing support vector learning. In: Proceedings of ICML 2005 workshop on ROC analysis in machine learning, pp 377–384
Calders T, Jaroszewicz S (2007) Efficient AUC optimization for classification. In: Kok JN, Koronacki J, de Mantaras R Lopez, Matwin S, Mladenic D, Skowron A (eds) Knowledge discovery in databases: PKDD 2007, volume 4702 of LNCS. Springer-Verlag, Berlin, pp 42–53
Chapter Google Scholar
Cortes C, Vapnik V (1995) Support vector network. Mach Learn 20:1–20
MATH Google Scholar
Crammer K, Gilad-Bachrach R, Navot A, Tishby A (2003) Margin analysis of the LVQ algorithm. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing (Proc. NIPS 2002), vol 15. MIT Press, Cambridge, pp 462–469
Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge
Book MATH Google Scholar
Duda RO, Hart PE (1973) Pattern Classification and scene analysis. Wiley, New York
MATH Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874
Article Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Article Google Scholar
Graf S, Lushgy H (2000) Foundations of quantization for random vectors. LNM-1730. Springer, Berlin
Google Scholar
Güvenir HA, Kurtcephe M (2013) Ranking instances by maximizing the area under ROC curve. IEEE Trans Knowl Data Eng 25(10):2356–2366
Article Google Scholar
Hammer B, Strickert M, Villmann T (2005) On the generalization ability of GRLVQ networks. Neural Process Lett 21(2):109–120
Article Google Scholar
Hammer B, Nebel D, Riedel M, Villmann T (2014) Generative versus discriminative prototype based classification. In: Villmann T, Schleif F-M, Kaden M, Lange M (eds) Advances in self-organizing maps and learning vector quantization: proceedings of 10th international workshop WSOM 2014, Mittweida, volume 295 of advances in intelligent systems and computing. Springer, Berlin, pp 123–132
Chapter Google Scholar
Hammer B, Villmann T (2002) Generalized relevance learning vector quantization. Neural Netw 15(8–9):1059–1068
Article Google Scholar
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic. Radiology 143:29–36
Article Google Scholar
Hanley JA, McNeil BJ (1983) A method of comparing the area under receiver operating characteristic curves derived from the same case. Radiology 148(3):839–843
Article Google Scholar
Haykin Simon (1994) Neural networks. A comprehensive foundation. Macmillan, New York
MATH Google Scholar
Hermann W, Barthel H, Hesse S, Villmann Th, Wagner A (2002) Korrelation der motorisch evozierten Potentiale mit dem striatalen Glukosestoffwechsel bei Patienten mit einem Morbus Wilson. Aktuelle Neurol 5:242–246
Article Google Scholar
Hermann W, Barthel H, Hesse S, Grahmann F, Kühn H-J, Wagner A, Villmann Th (2002) Comparison of clinical types of Wilson’s disease and glucose metabolism in extrapyramidal motor brain regions. J Neurol 249(7):896–901
Article Google Scholar
Hermann W, Villmann Th, Grahmann F, Kühn HJ, Wagner A (2003) Investigation of fine motoric disturbances in Wilson’s disease. Neurol Sci 23(6):279–285
Article Google Scholar
Herschtal A, Raskutti B (2004) Optimising area under the ROC curve using gradient descent. In: Proceedings of the 21st international conference on machine learning. Banff, pp 49–56
Huaichun W, Dopazo J, Carazo JM (1998) Self-organizing tree growing network for classifying amino acids. Bioinformatics 14(4):376–377
Article Google Scholar
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310
Article Google Scholar
Kaden M, Hermann W, Villmann T (2014) Optimization of general statistical accuracy measures for classification based on learning vector quantization. In: Verleysen M (ed) Proceedings of European symposium on artificial neural networks, computational intelligence and machine learning (ESANN’2014). Louvain-La-Neuve, Belgium, pp 47–52
Google Scholar
Kaden M, Lange M, Nebel D, Riedel M, Geweniger T, Villmann T (2014) Aspects in classification learning—review of recent developments in learning vector quantization. Found Comput Decis Sci 39(2):79–105
MathSciNet MATH Google Scholar
Kaden M, Riedel M, Hermann W, Villmann T (2015) Border-sensitive learning in generalized learning vector quantization: an alternative to support vector machines. Soft Comput 19(9):2423–2434
Article Google Scholar
Kästner M, Riedel M, Strickert M, Hermann W, Villmann T (2013) Border-sensitive learning in kernelized learning vector quantization. In: Rojas I, Joya G, Cabestany J (eds) Proceedings of the 12th international workshop on artificial neural networks (IWANN), volume 7902 of LNCS. Springer, Berlin, pp 357–366
Google Scholar
Keilwagen J, Grosse I, Grau J (2014) Area under precision-recall curves for weighted and unweighted data. PLos One 9(3):1–13
Article Google Scholar
Kohonen T (1990) Improved versions of learning vector quantization. In: Proceedings of IJCNN-90, international joint conference on neural networks, vol I. Piscataway, IEEE Service Center, San Diego, pp 545–550
Kohonen Teuvo (1986) Learning vector quantization for pattern recognition. Report TKK-F-A601, Helsinki University of Technology, Espoo
Kohonen T (1988) Learning vector quantization. Neural Netw 1(Supplement 1):303
Google Scholar
Kohonen T (1992) Learning-vector quantization and the self-organizing map. In: Taylor JG, Mannion CLT (eds) Theory and applications of neural networks. Springer, London, pp 235–242
Chapter Google Scholar
Kohonen Teuvo (1995) Self-organizing maps, volume 30 of Springer series in information sciences. Springer, Berlin, Heidelberg (Second Extended Edition 1997)
Google Scholar
Landgrebe TCW, Tax D, Paclìk P, Duin RPW (2006) The interaction between classification and reject performance for distance-based reject-option classifiers. Pattern Recogn Lett 27:908–917
Article Google Scholar
Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inf 38:404–415
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
Article Google Scholar
Mann HB, Whitney DR (1947) On a test whether one of two random variables is stochastically larger than the other. Ann Math Stat 18:50–60
Article MathSciNet MATH Google Scholar
McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition., Wiley series in probability and mathematical statistics: applied probability and statisticsWiley, New York
Book MATH Google Scholar
Mitchell T (1997) Machine learning. mcgraw hill, New York
MATH Google Scholar
Nebel D, Villmann T (2015) Median-LVQ for classification of dissimilarity data based on ROC-optimization. In: Verleysen M (ed) Proceedings of the European symposium on artifical neural networks, computational intelligence and machine learning (ESANN’2015). Louvain-La-Neuve, Belgium, pp 1–6
Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Rakotomamonjy A (2004) Optimizing area under ROC curve with SVMs. In: Proceedings of the workshop on ROC analysis in artificial intelligence, Hamburg, pp 71–80
Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworths, London
MATH Google Scholar
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407
Article MathSciNet MATH Google Scholar
Sachs L (1992) Angewandte Statistik, 7th edn. Springer Verlag, Berlin
Book MATH Google Scholar
Santos-Pereira CM, Pires AM (2005) On optimal reject rules and ROC curves. Pattern Recogn Lett 26:943–952
Article Google Scholar
Sato A, Yamada K (1996) Generalized learning vector quantization. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems 8. Proceedings of the 1995 conference. MIT Press, Cambridge, pp 423–429
Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Article Google Scholar
Schneider P, Hammer B, Biehl M (2009) Adaptive relevance matrices in learning vector quantization. Neural Comput 21:3532–3561
Article MathSciNet MATH Google Scholar
Schölkopf B, Smola A (2002) Learning with Kernels. MIT Press, Cambridge
MATH Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis and discovery. Cambridge University Press, Cambridge
Book MATH Google Scholar
Steinwart I (2001) On the influence of the kernel on the consistency of support vector machines. J Mach Learn Res 2:67–93
MathSciNet MATH Google Scholar
Strickert M, Schleif F-M, Seiffert U, Villmann T (2008) Derivatives of Pearson correlation for gradient-based analysis of biomedical data. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial 37:37–44
Google Scholar
Strickert M, Keilwagenan J, Schleif F-M, Villmann T, Biehl M (2009) Matrix metric adaptation linear discriminant analysis of biomedical data. In: Cabestany J et al (eds) Proceedings international workshop on artificial neural networks (IWANN) 2009, volume 5517 of LNCS. Springer, Heidelberg, pp 933–940
Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Villmann T, Haase S, Kaden M (2015) Kernelized vector quantization in gradient-descent learning. Neurocomputing 147:83–95
Article Google Scholar
Villmann T, Kaden M, Nebel D, Biehl M (2015) Learning vector quantization with adaptive cost-based outlier-rejection. In: Azzopardi G, Petkov N (eds) Proceedings of 16th international conference on computer analysis of images and pattern, CAIP 2015, Valetta-Malta, volume Part II of LNCS 9257. Springer, Berlin, Heidelberg, pp 772–782
Google Scholar
Villmann T, Kaden M, Bohnsack A, Saralajew S, Villmann J-M, Drogies T, Hammer B (2016) Self-adjusting reject options in prototype based classification. In: Merényi E, Mendenhall MJ, O’Driscoll P (eds) Advances in self-organizing maps and learning vector quantization: proceedings of 11th international workshop WSOM 2016, volume 428 of advances in intelligent systems and computing. Springer, Berlin, Heidelberg, pp 269–279
Chapter Google Scholar
Villmann T, Schleif F-M, Kaden M, Lange M (eds) (2014) Advances in self-organizing maps and learning vector quantization - proceedings of the 10th international workshop, WSOM 2014, Mittweida. Number 295 in Advances in intelligent systems and computing. Springer, Heidelberg
Wilcoxon F (1945) Andividual comparisons by ranking methods. Biometrics 1:80–83
Article Google Scholar
Yan L, Dodier R, Mozer MC, Wolniewicz R (2003) Optimizing classifier performance via approximation to the Wilcoxon–Mann–Witney statistics. In: Proceedings of the 20th international conference on machine learning. AAAI Press, Menlo Park, pp 848–855
Yu G, Russell W, Schwartz R, Makhoul J (1990) Discriminant analysis and supervised vector quantization for continuous speech recognition. In: ICASSP-90, international conference on acoustics, speech and signal processing, volume II, pp 685–688, Piscataway. IEEE, IEEE Service Center

Download references

Author information

Authors and Affiliations

Computational Intelligence Group, University of Applied Sciences Mittweida, Mittweida, Germany
T. Villmann & M. Kaden
Abt. Neurologie, Paracelsus-Klinikum Zwickau, Zwickau, Germany
W. Hermann
Johann-Bernoulli-Institute for Mathematics and Computer Sciences, University Groningen, Groningen, The Netherlands
M. Biehl

Authors

T. Villmann
View author publications
You can also search for this author in PubMed Google Scholar
M. Kaden
View author publications
You can also search for this author in PubMed Google Scholar
W. Hermann
View author publications
You can also search for this author in PubMed Google Scholar
M. Biehl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. Villmann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Villmann, T., Kaden, M., Hermann, W. et al. Learning vector quantization classifiers for ROC-optimization. Comput Stat 33, 1173–1194 (2018). https://doi.org/10.1007/s00180-016-0678-y

Download citation

Received: 29 May 2015
Accepted: 17 August 2016
Published: 27 August 2016
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00180-016-0678-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning vector quantization classifiers for ROC-optimization

Abstract

Access this article

Similar content being viewed by others

Optimization of Statistical Evaluation Measures for Classification by Median Learning Vector Quantization

Sophisticated LVQ Classification Models - Beyond Accuracy Optimization

Dynamic Prototype Addition in Generalized Learning Vector Quantization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning vector quantization classifiers for ROC-optimization

Abstract

Access this article

Similar content being viewed by others

Optimization of Statistical Evaluation Measures for Classification by Median Learning Vector Quantization

Sophisticated LVQ Classification Models - Beyond Accuracy Optimization

Dynamic Prototype Addition in Generalized Learning Vector Quantization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation