Abstract
The quality of knowledge extracted during the process of knowledge extraction from raw data is intrinsically dependent on the data pre-processing phase. Discretization is the fundamental function in the data pre-processing phase. The main objective of discretization is to transform continuous infinite data into discrete finite data. Most of the discretization methods proposed in the literature are not adapted to Big Data context. In this paper, we propose a new discretization method that is adapted to Big Data. The method has been implemented and tested on several benchmark databases and also compared to well-known discretization methods. The results show that the proposed method is adapted to voluminous databases used in Big Data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lounes, N., Oudghiri, H., Chalal, R., Hidouci, W.-K.: From KDD to KUBD: big data characteristics within the KDD process steps. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) WorldCIST’18 2018. AISC, vol. 746, pp. 931–937. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77712-2_88
Tsai, C., Lee, C., Yang, W.: A discretization algorithm based on class-attribute contingency coefficient. Inf. Sci. 178, 714–731 (2008)
Huan, L., Farhad, H., Chew, L., Manoranjan, D.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6, 393–423 (2002)
Ramírez-Gallego, S., García, S., Mouriño-Talín, H.: Data discretization: taxonomy and big data challenge. WIREs Data Mining Knowl. Disc. 6, 5–21 (2015)
Catlett, J.: On Changing Continuous Attributes Into Ordered Discrete Attributes. Springer-Verlag (1991)
Ramirez-Gallego, S., et al.: Distributed entropy minimization discretizer for big data analysis under Apache Spark. In: IEEE Trustcom/BigDataSE/ISPA (2015)
Yang, Y., Webb, G., Wu, X.: Discretization methods. In: Data Mining and Knowledge Discovery Handbook, pp 101–116. Springer (2010)
Salzberg, S.L.: Book review: C4.5: programs for machine learning by J. Ross Quinlan. Machine Learning 16, 235–240 (1994)
Ferreira, A.J., Figueiredo, M.A.T.: An unsupervised approach to feature discretization and selection. Pattern Recognition Letters (2012)
de Sá, C.R., Soares, C., Knobbe, A., Azevedo, P., Jorge, A.M.: Multi-interval discretization of continuous attributes for label ranking. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS (LNAI), vol. 8140, pp. 155–169. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40897-7_11
Kurgan, L.A., Cios, K.J.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lounes, N., Remil, Z., Oudghiri, H., Chalal, R., Hidouci, WK. (2022). Parabolic Threshold Discretization for Big Data. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 468. Springer, Cham. https://doi.org/10.1007/978-3-031-04826-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-04826-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04825-8
Online ISBN: 978-3-031-04826-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)