Parabolic Threshold Discretization for Big Data

Lounes, Naima; Remil, Zakaria; Oudghiri, Houria; Chalal, Rachid; Hidouci, Walid-Khaled

doi:10.1007/978-3-031-04826-5_7

Naima Lounes¹³,
Zakaria Remil¹³,
Houria Oudghiri¹³,
Rachid Chalal¹³ &
…
Walid-Khaled Hidouci¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 468))

Included in the following conference series:

World Conference on Information Systems and Technologies

923 Accesses

Abstract

The quality of knowledge extracted during the process of knowledge extraction from raw data is intrinsically dependent on the data pre-processing phase. Discretization is the fundamental function in the data pre-processing phase. The main objective of discretization is to transform continuous infinite data into discrete finite data. Most of the discretization methods proposed in the literature are not adapted to Big Data context. In this paper, we propose a new discretization method that is adapted to Big Data. The method has been implemented and tested on several benchmark databases and also compared to well-known discretization methods. The results show that the proposed method is adapted to voluminous databases used in Big Data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lounes, N., Oudghiri, H., Chalal, R., Hidouci, W.-K.: From KDD to KUBD: big data characteristics within the KDD process steps. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) WorldCIST’18 2018. AISC, vol. 746, pp. 931–937. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77712-2_88
Chapter Google Scholar
Tsai, C., Lee, C., Yang, W.: A discretization algorithm based on class-attribute contingency coefficient. Inf. Sci. 178, 714–731 (2008)
Google Scholar
Huan, L., Farhad, H., Chew, L., Manoranjan, D.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6, 393–423 (2002)
Google Scholar
Ramírez-Gallego, S., García, S., Mouriño-Talín, H.: Data discretization: taxonomy and big data challenge. WIREs Data Mining Knowl. Disc. 6, 5–21 (2015)
Google Scholar
Catlett, J.: On Changing Continuous Attributes Into Ordered Discrete Attributes. Springer-Verlag (1991)
Book Google Scholar
Ramirez-Gallego, S., et al.: Distributed entropy minimization discretizer for big data analysis under Apache Spark. In: IEEE Trustcom/BigDataSE/ISPA (2015)
Google Scholar
Yang, Y., Webb, G., Wu, X.: Discretization methods. In: Data Mining and Knowledge Discovery Handbook, pp 101–116. Springer (2010)
Google Scholar
Salzberg, S.L.: Book review: C4.5: programs for machine learning by J. Ross Quinlan. Machine Learning 16, 235–240 (1994)
Google Scholar
Ferreira, A.J., Figueiredo, M.A.T.: An unsupervised approach to feature discretization and selection. Pattern Recognition Letters (2012)
Google Scholar
de Sá, C.R., Soares, C., Knobbe, A., Azevedo, P., Jorge, A.M.: Multi-interval discretization of continuous attributes for label ranking. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS (LNAI), vol. 8140, pp. 155–169. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40897-7_11
Kurgan, L.A., Cios, K.J.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Ecole Nationale Supérieure d’Infomatique BP 68M, 16059, Oued-Smar Alger, Algeria
Naima Lounes, Zakaria Remil, Houria Oudghiri, Rachid Chalal & Walid-Khaled Hidouci

Authors

Naima Lounes
View author publications
You can also search for this author in PubMed Google Scholar
Zakaria Remil
View author publications
You can also search for this author in PubMed Google Scholar
Houria Oudghiri
View author publications
You can also search for this author in PubMed Google Scholar
Rachid Chalal
View author publications
You can also search for this author in PubMed Google Scholar
Walid-Khaled Hidouci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naima Lounes .

Editor information

Editors and Affiliations

ISEG, Universidade de Lisboa, Lisbon, Portugal
Alvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania
Gintautas Dzemyda
DCT, Universidade Portucalense, Porto, Portugal
Fernando Moreira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lounes, N., Remil, Z., Oudghiri, H., Chalal, R., Hidouci, WK. (2022). Parabolic Threshold Discretization for Big Data. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 468. Springer, Cham. https://doi.org/10.1007/978-3-031-04826-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-04826-5_7
Published: 11 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04825-8
Online ISBN: 978-3-031-04826-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics