Skip to main content

Data Similarity in Classification and Fictitious Training Data Generation

  • Conference paper
  • First Online:
Operations Research Proceedings 2008

Summary

In this paper we explore the possibility of deriving consensus rankings by solving consensus optimization problems, characterizing consensus rankings as suitable complete order relations minimizing the average Kemeny-Snell distance to the individual rankings. This optimization problem can be expressed as a binary programming (BP) problem which can typically be solved reasonably efficiently. The underlying theory is discussed in Sect. 1. Applications of the proposed method given in Sect. 2 include a comparison to other mathematical programming (MP) approaches using the data set of Tse [9] and establishing a consensus ranking of marketing journals identified by domain experts from a subset of the Harzing journal quality list [2]. In Sect. 3 we discuss computational details and present the results of a benchmark experiment comparing the performance of the commercial solver CPLEX to three open source mixed integer linear programming (MILP) solvers

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abu-mostafa, Y.S. (1995): Hints. Neural Computation 7, 639–671.

    Article  Google Scholar 

  2. Decoste, D. and SchÖlkopf, B. (2002): Training Invariant Support Vector Machines. Machine Learning 46, 161–190.

    Article  Google Scholar 

  3. DUIN, R.P.W. and PEKALSKA, E. (2007): The Science of Pattern Recognition. Achievements and Perspectives. In: W. Duch, J. Mandziuk (eds.), Challenges for Computational Intelligence, Studies in Computational Intelligence, Springer.

    Google Scholar 

  4. Schebesch, K.B. and Stecking, R. (2005): Support vector machines for credit applicants: detecting typical and critical regions.Journal of the Operational Research Society, 56(9), 1082–1088.

    Article  Google Scholar 

  5. SchÖlkopf, B. and Smola, A. (2002): Learning with Kernels. The MIT Press, Cambridge.

    Google Scholar 

  6. SchÖlkopf, B., Burges, C. and Vapnik, V. (1996): Incorporating Invariances in Support Vector Learning. In: von der Malsburg, C., von Seelen, W., Vorbrüggen, J.C., Sendhoff, B. (Eds.): Artificial Neural Networks – ICANN'96. Springer Lecture Notes in Computer Science, Vol. 1112, Berlin, 47–52.

    Google Scholar 

  7. Stecking, R. and Schebesch, K.B. (2003): Support Vector Machines for Credit Scoring: Comparing to and Combining with some Traditional Classification Methods. In: Schader, M., Gaul, W., Vichi, M. (Eds.): Between Data Science and Applied Data Analysis. Springer, Berlin, 604–612.

    Google Scholar 

  8. Stecking, R. and Schebesch, K.B. (2005): Informative Patterns for Credit Scoring Using Linear SVM. In: Weihs, C. and Gaul, W. (Eds.): Classification - The Ubiquitous Challenge. Springer, Berlin, 450–457.

    Chapter  Google Scholar 

  9. Stecking, R. and Schebesch, K.B. (2006): Comparing and Selecting SVM-Kernels for Credit Scoring. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (Eds.): From Data and Information Analysis to Knowledge Engineering. Springer, Berlin, 542–549.

    Chapter  Google Scholar 

  10. Stecking, R. and Schebesch, K.B. (2008): Improving Classifier Performance by Using Fictitious Training Data? A Case Study. In: Kalcsics, J., Nickel, S. (Eds.): Operations Research Proceedings 2007. Springer, Berlin 89–94.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ralf Stecking .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stecking, R., Schebesch, K.B. (2009). Data Similarity in Classification and Fictitious Training Data Generation. In: Fleischmann, B., Borgwardt, KH., Klein, R., Tuma, A. (eds) Operations Research Proceedings 2008., vol 2008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00142-0_64

Download citation

Publish with us

Policies and ethics