Instance Selection Techniques for Large Volumes of Data

Cubillos, Marco Antonio Peña; Ballesteros, Antonio Javier Tallón

doi:10.1007/978-3-031-48232-8_49

Marco Antonio Peña Cubillos¹³ &
Antonio Javier Tallón Ballesteros¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14404))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

414 Accesses

Abstract

Instance selection (IS) serves as a vital preprocessing step, particularly in addressing the complexities associated with high-dimensional problems. Its primary goal is the reduction of data instances, a process that involves eliminating irrelevant and superfluous data while maintaining a high level of classification accuracy. IS, as a strategic filtering mechanism, addresses these challenges by retaining essential instances and discarding hindering elements. This refinement process optimizes classification algorithms, enabling them to excel in handling extensive datasets. In this research, IS offers a promising avenue to strengthen the effectiveness of classification in various real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Etikan, I., Bala, K.: Sampling and sampling methods. Biometrics Biostatistics Int. J. 5(6), 00149 (2017)
Article Google Scholar
García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining, vol. 72. Springer, Cham (2015)
Google Scholar
García, S., Luengo, J., Herrera, F.: Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl.-Based Syst. 98, 1–29 (2016)
Article Google Scholar
Garcia, S., Luengo, J., Sáez, A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2012)
Article Google Scholar
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol. 14, pp. 1137–1145. Montreal, Canada (1995)
Google Scholar
Leyva, E., González, A., Pérez, R.: Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective. Pattern Recogn. 48(4), 1523–1537 (2015)
Article Google Scholar
Li, T., Fong, S., Wu, Y., Tallón-Ballesteros, A.J.: Kennard-stone balance algorithm for time-series big data stream mining. In: 2020 International Conference on Data Mining Workshops (ICDMW), pp. 851–858. IEEE (2020)
Google Scholar
Li, Y., Li, T., Liu, H.: Recent advances in feature selection and its applications. Knowl. Inf. Syst. 53(3), 551–577 (2017). https://doi.org/10.1007/s10115-017-1059-8
Article Google Scholar
Nanni, L., Lumini, A.: Prototype reduction techniques: a comparison among different approaches. Expert Syst. Appl. 38(9), 11820–11828 (2011)
Article Google Scholar
Rendon, E., Alejo, R., Castorena, C., Isidro-Ortega, F.J., Granda-Gutierrez, E.E.: Data sampling methods to deal with the big data multi-class imbalance problem. Appl. Sci. 10(4), 1276 (2020)
Article Google Scholar
Schaffer, C.: A conservation law for generalization performance. In: Machine Learning Proceedings 1994, pp. 259–265. Elsevier (1994)
Google Scholar
Triguero, I., Sáez, J.A., Luengo, J., García, S., Herrera, F.: On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132, 30–41 (2014)
Article Google Scholar
Yıldırım, A.A., Özdoğan, C., Watson, D.: Parallel data reduction techniques for big datasets. In: Big Data: Concepts, Methodologies, Tools, and Applications, pp. 734–756. IGI Global (2016)
Google Scholar
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17(5–6), 375–381 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic, Computer Systems and Automation Engineering, International University of Andalusia, Huelva, Spain
Marco Antonio Peña Cubillos & Antonio Javier Tallón Ballesteros

Authors

Marco Antonio Peña Cubillos
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Javier Tallón Ballesteros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Marco Antonio Peña Cubillos or Antonio Javier Tallón Ballesteros .

Editor information

Editors and Affiliations

University of Évora, Évora, Portugal
Paulo Quaresma
Technical University of Madrid, Madrid, Spain
David Camacho
University of Manchester, Manchester, UK
Hujun Yin
University of Évora, Évora, Portugal
Teresa Gonçalves
Polytechnic University of Valencia, Valencia, Spain
Vicente Julian
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cubillos, M.A.P., Ballesteros, A.J.T. (2023). Instance Selection Techniques for Large Volumes of Data. In: Quaresma, P., Camacho, D., Yin, H., Gonçalves, T., Julian, V., Tallón-Ballesteros, A.J. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2023. IDEAL 2023. Lecture Notes in Computer Science, vol 14404. Springer, Cham. https://doi.org/10.1007/978-3-031-48232-8_49

Download citation

DOI: https://doi.org/10.1007/978-3-031-48232-8_49
Published: 15 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48231-1
Online ISBN: 978-3-031-48232-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Instance Selection Techniques for Large Volumes of Data