Skip to main content

A Data Driven Stopping Criterion for Evolutionary Instance Selection

  • Conference paper
  • First Online:
Advances in Computational Intelligence Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 513))

  • 1202 Accesses

Abstract

Instance based classifiers, such as k-Nearest Neighbors, predict the class value of a new observation based on some distance or similarity measure between the new instance and the stored training data. However, due to the required distance calculations, classifying new instances becomes computationally expensive as the number of training observations increases. Therefore, instance selection techniques have been proposed to improve instance based classifiers by reducing the number of training instances that must be stored to achieve adequate classification rates. Although other methods exist, an evolutionary algorithm has been used for instance selection with some of the best results in regard to data reduction and preservation of classification accuracy. Unfortunately, the performance of the evolutionary algorithm for instance selection comes at the cost of longer computation times in comparison to classic instance selection techniques. In this work we introduce a new stopping criterion for the evolutionary algorithm which depends on the convergence of its fitness function. Experimentation shows that the new criterion results in less computation time while achieving comparable performance.

DISTRIBUTION A: Approved for public release: distribution unlimited: 02 May 2016. Case # 88ABW-2016-2258l.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based learning algorithms. Mach. Learn. 6, 37–66 (1991)

    Google Scholar 

  2. Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17, 255–287 (2011)

    Google Scholar 

  3. Cano, J.F., Herrera, F., Lozano, M.: Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability. Data Knowl. Eng. 60, 90–108 (2007)

    Article  Google Scholar 

  4. Cano, J., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans. Evol. Comput. 7, 561–575 (2003)

    Article  Google Scholar 

  5. Cano, J.R., Herrera, F., Lozano, M.: On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining. Appl. Soft Comput. 6, 323–332 (2006)

    Article  Google Scholar 

  6. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  7. De Haro-García, A., García-Pedrajas, N.: A divide-and-conquer recursive approach for scaling up instance selection algorithms. Data Min. Knowl. Discov. 18, 392–418 (2009)

    Article  MathSciNet  Google Scholar 

  8. De Haro-García, A., García-Pedrajas, N., Del Castillo, J.A.R.: Large scale instance selection by means of federal instance selection. Data Knowl. Eng. 75, 58–77 (2012)

    Article  Google Scholar 

  9. Demšar, J., Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Eshelman, L.J.: The CHC adaptive search algorithm. Foundations of Genetic Algorithms. Morgan-Kaufmann (1991)

    Google Scholar 

  11. García, S., Cano, J.R., Fernandez, A., Herrera, F.: A Proposal of Evolutionary Prototype Selection for Class Imbalance Problems, pp. 1415–1423 (2006)

    Google Scholar 

  12. García, S., Derrac, J., Triguero, I., Carmona, C.J., Herrera, F.: Evolutionary-based selection of generalized instances for imbalanced classification. Knowl.-Based Syst. 25, 3–12 (2012)

    Article  Google Scholar 

  13. García, S., Fernandez, A., Herrera, F.: Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems. Appl. Soft Comput. J. 9, 1304–1314 (2009)

    Article  Google Scholar 

  14. García, S., Luengo, J., Herrera, F.: Data preprocessing in data mining. Intelligent Systems Reference Library, vol. 72. Springer International Publishing, Cham (2015)

    Google Scholar 

  15. García-Pedrajas, N.: Evolutionary computation for training set selection. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 1, 512–523 (2011)

    Google Scholar 

  16. García-Pedrajas, N., Peez-Rodríguez, J., De Haro-Garciá, A.: OligoIS: scalable instance selection for class-imbalanced data sets. IEEE Trans. Cybern. 43, 332–346 (2013)

    Article  Google Scholar 

  17. Hart, P.: The condensed nearest neighbor rule (Corresp.). IEEE Trans. Inf. Theory 14, 1966–1967 (1968)

    Article  Google Scholar 

  18. Olvera-Lopez, J.A., Carrasco-Ochoa, J.A., Martinez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010)

    Article  Google Scholar 

  19. Passini, C., Luiza, M.: A strategy for training set selection in text classification problems. Int. J. Adv. Comput. Sci. Appl. 4, 54–60 (2013)

    Google Scholar 

  20. Ritter, G., Woodruff, H., Lowry, S., Isenhour, T.: An algorithm for a selective nearest neighbor decision rule (Corresp.). IEEE Trans. Inf. Theory 21 (1975)

    Google Scholar 

  21. Safe, M., Carballido, J., Ponzoni, I., Brignole, N.: On stopping criteria for genetic algorithms. Adv. Artif. Intell. 17, 405–413 (2004)

    MATH  Google Scholar 

  22. Sebban, M., Nock, R., Chauchat, J.H., Rakotomalala, R.: Impact of learning set quality and size on decision tree performances. Int. J. Comput. Syst. Signals 1, 85–105 (2000)

    Google Scholar 

  23. Wickham, H.: ggplot2: elegant graphics for data analysis (2009)

    Google Scholar 

  24. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. yst. Man Cybern. 2, 408–421 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  25. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 257–286 (2000)

    Article  MATH  Google Scholar 

  26. Zhu, X., Wu, X.: Scalable representative instance selection and ranking. Proc. Int. Conf. Pattern Recognit. 3, 352–355 (2006)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Walter D. Bennette .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bennette, W.D. (2017). A Data Driven Stopping Criterion for Evolutionary Instance Selection. In: Angelov, P., Gegov, A., Jayne, C., Shen, Q. (eds) Advances in Computational Intelligence Systems. Advances in Intelligent Systems and Computing, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-319-46562-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46562-3_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46561-6

  • Online ISBN: 978-3-319-46562-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics