A Data Driven Stopping Criterion for Evolutionary Instance Selection

Bennette, Walter D.

doi:10.1007/978-3-319-46562-3_26

Walter D. Bennette⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 513))

1202 Accesses

Abstract

Instance based classifiers, such as k-Nearest Neighbors, predict the class value of a new observation based on some distance or similarity measure between the new instance and the stored training data. However, due to the required distance calculations, classifying new instances becomes computationally expensive as the number of training observations increases. Therefore, instance selection techniques have been proposed to improve instance based classifiers by reducing the number of training instances that must be stored to achieve adequate classification rates. Although other methods exist, an evolutionary algorithm has been used for instance selection with some of the best results in regard to data reduction and preservation of classification accuracy. Unfortunately, the performance of the evolutionary algorithm for instance selection comes at the cost of longer computation times in comparison to classic instance selection techniques. In this work we introduce a new stopping criterion for the evolutionary algorithm which depends on the convergence of its fitness function. Experimentation shows that the new criterion results in less computation time while achieving comparable performance.

DISTRIBUTION A: Approved for public release: distribution unlimited: 02 May 2016. Case # 88ABW-2016-2258l.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based learning algorithms. Mach. Learn. 6, 37–66 (1991)
Google Scholar
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17, 255–287 (2011)
Google Scholar
Cano, J.F., Herrera, F., Lozano, M.: Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability. Data Knowl. Eng. 60, 90–108 (2007)
Article Google Scholar
Cano, J., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans. Evol. Comput. 7, 561–575 (2003)
Article Google Scholar
Cano, J.R., Herrera, F., Lozano, M.: On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining. Appl. Soft Comput. 6, 323–332 (2006)
Article Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)
Article MATH Google Scholar
De Haro-García, A., García-Pedrajas, N.: A divide-and-conquer recursive approach for scaling up instance selection algorithms. Data Min. Knowl. Discov. 18, 392–418 (2009)
Article MathSciNet Google Scholar
De Haro-García, A., García-Pedrajas, N., Del Castillo, J.A.R.: Large scale instance selection by means of federal instance selection. Data Knowl. Eng. 75, 58–77 (2012)
Article Google Scholar
Demšar, J., Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Eshelman, L.J.: The CHC adaptive search algorithm. Foundations of Genetic Algorithms. Morgan-Kaufmann (1991)
Google Scholar
García, S., Cano, J.R., Fernandez, A., Herrera, F.: A Proposal of Evolutionary Prototype Selection for Class Imbalance Problems, pp. 1415–1423 (2006)
Google Scholar
García, S., Derrac, J., Triguero, I., Carmona, C.J., Herrera, F.: Evolutionary-based selection of generalized instances for imbalanced classification. Knowl.-Based Syst. 25, 3–12 (2012)
Article Google Scholar
García, S., Fernandez, A., Herrera, F.: Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems. Appl. Soft Comput. J. 9, 1304–1314 (2009)
Article Google Scholar
García, S., Luengo, J., Herrera, F.: Data preprocessing in data mining. Intelligent Systems Reference Library, vol. 72. Springer International Publishing, Cham (2015)
Google Scholar
García-Pedrajas, N.: Evolutionary computation for training set selection. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 1, 512–523 (2011)
Google Scholar
García-Pedrajas, N., Peez-Rodríguez, J., De Haro-Garciá, A.: OligoIS: scalable instance selection for class-imbalanced data sets. IEEE Trans. Cybern. 43, 332–346 (2013)
Article Google Scholar
Hart, P.: The condensed nearest neighbor rule (Corresp.). IEEE Trans. Inf. Theory 14, 1966–1967 (1968)
Article Google Scholar
Olvera-Lopez, J.A., Carrasco-Ochoa, J.A., Martinez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010)
Article Google Scholar
Passini, C., Luiza, M.: A strategy for training set selection in text classification problems. Int. J. Adv. Comput. Sci. Appl. 4, 54–60 (2013)
Google Scholar
Ritter, G., Woodruff, H., Lowry, S., Isenhour, T.: An algorithm for a selective nearest neighbor decision rule (Corresp.). IEEE Trans. Inf. Theory 21 (1975)
Google Scholar
Safe, M., Carballido, J., Ponzoni, I., Brignole, N.: On stopping criteria for genetic algorithms. Adv. Artif. Intell. 17, 405–413 (2004)
MATH Google Scholar
Sebban, M., Nock, R., Chauchat, J.H., Rakotomalala, R.: Impact of learning set quality and size on decision tree performances. Int. J. Comput. Syst. Signals 1, 85–105 (2000)
Google Scholar
Wickham, H.: ggplot2: elegant graphics for data analysis (2009)
Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. yst. Man Cybern. 2, 408–421 (1972)
Article MathSciNet MATH Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 257–286 (2000)
Article MATH Google Scholar
Zhu, X., Wu, X.: Scalable representative instance selection and ranking. Proc. Int. Conf. Pattern Recognit. 3, 352–355 (2006)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

The United States Air Force, 525 Brooks Road, Rome, NY, 13441, USA
Walter D. Bennette

Authors

Walter D. Bennette
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Walter D. Bennette .

Editor information

Editors and Affiliations

School of Computing and Communications, Lancaster University Bailrigg School of Computing and Communications, Lancaster, United Kingdom
Plamen Angelov
School of Computing, University of Portsmouth School of Computing, Portsmouth, Hampshire, United Kingdom
Alexander Gegov
School of Comp. Sci. & Digital Media, Robert Gordon University School of Comp. Sci. & Digital Media, Aberdeen, United Kingdom
Chrisina Jayne
Ins. of Mathematics, Physics & Comp. Sci, Aberystwyth University Ins. of Mathematics, Physics & Comp. Sci, Aberystwyth, United Kingdom
Qiang Shen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bennette, W.D. (2017). A Data Driven Stopping Criterion for Evolutionary Instance Selection. In: Angelov, P., Gegov, A., Jayne, C., Shen, Q. (eds) Advances in Computational Intelligence Systems. Advances in Intelligent Systems and Computing, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-319-46562-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-46562-3_26
Published: 07 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46561-6
Online ISBN: 978-3-319-46562-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics