Skip to main content

Parabolic Threshold Discretization for Big Data

  • Conference paper
  • First Online:
Information Systems and Technologies (WorldCIST 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 468))

Included in the following conference series:

  • 923 Accesses

Abstract

The quality of knowledge extracted during the process of knowledge extraction from raw data is intrinsically dependent on the data pre-processing phase. Discretization is the fundamental function in the data pre-processing phase. The main objective of discretization is to transform continuous infinite data into discrete finite data. Most of the discretization methods proposed in the literature are not adapted to Big Data context. In this paper, we propose a new discretization method that is adapted to Big Data. The method has been implemented and tested on several benchmark databases and also compared to well-known discretization methods. The results show that the proposed method is adapted to voluminous databases used in Big Data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lounes, N., Oudghiri, H., Chalal, R., Hidouci, W.-K.: From KDD to KUBD: big data characteristics within the KDD process steps. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) WorldCIST’18 2018. AISC, vol. 746, pp. 931–937. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77712-2_88

    Chapter  Google Scholar 

  2. Tsai, C., Lee, C., Yang, W.: A discretization algorithm based on class-attribute contingency coefficient. Inf. Sci. 178, 714–731 (2008)

    Google Scholar 

  3. Huan, L., Farhad, H., Chew, L., Manoranjan, D.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6, 393–423 (2002)

    Google Scholar 

  4. Ramírez-Gallego, S., García, S., Mouriño-Talín, H.: Data discretization: taxonomy and big data challenge. WIREs Data Mining Knowl. Disc. 6, 5–21 (2015)

    Google Scholar 

  5. Catlett, J.: On Changing Continuous Attributes Into Ordered Discrete Attributes. Springer-Verlag (1991)

    Book  Google Scholar 

  6. Ramirez-Gallego, S., et al.: Distributed entropy minimization discretizer for big data analysis under Apache Spark. In: IEEE Trustcom/BigDataSE/ISPA (2015)

    Google Scholar 

  7. Yang, Y., Webb, G., Wu, X.: Discretization methods. In: Data Mining and Knowledge Discovery Handbook, pp 101–116. Springer (2010)

    Google Scholar 

  8. Salzberg, S.L.: Book review: C4.5: programs for machine learning by J. Ross Quinlan. Machine Learning 16, 235–240 (1994)

    Google Scholar 

  9. Ferreira, A.J., Figueiredo, M.A.T.: An unsupervised approach to feature discretization and selection. Pattern Recognition Letters (2012)

    Google Scholar 

  10. de Sá, C.R., Soares, C., Knobbe, A., Azevedo, P., Jorge, A.M.: Multi-interval discretization of continuous attributes for label ranking. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS (LNAI), vol. 8140, pp. 155–169. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40897-7_11

  11. Kurgan, L.A., Cios, K.J.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naima Lounes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lounes, N., Remil, Z., Oudghiri, H., Chalal, R., Hidouci, WK. (2022). Parabolic Threshold Discretization for Big Data. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 468. Springer, Cham. https://doi.org/10.1007/978-3-031-04826-5_7

Download citation

Publish with us

Policies and ethics