skip to main content
10.1145/3587828.3587858acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicscaConference Proceedingsconference-collections
research-article

The Use of Dynamic n-Gram to Enhance TF-IDF Features Extraction for Bahasa Indonesia Cyberbullying Classification

Published:20 June 2023Publication History

ABSTRACT

Cyberbullying detection in a sentence or utterance has challenges due to syntactic and meaning variations (lexical). Term Frequency-Inverse Document Frequency (TF-IDF) carries out textual feature extraction to produce candidates thematically based on word occurrence statistics. However, these candidates are generated without considering a term relationship between constituent elements in the parsing language syntax. This study discusses a TF-IDF feature extraction model using the n-Gram approach to produce candidate feature selection based on a specified term relationship. Thresholding applications for the formation of dynamic n-Gram segmentation were also discussed. Furthermore, the dynamic n-Gram model in TF-IDF feature extraction can be used in cyberbullying classification to overcome variations in syntax and meaning of sentences/speech from Bahasa Indonesia.

References

  1. [1] P. Michael A, R. Sharon, H. Mats, and B. Tina, Post-Truth, Fake News. Singapore: Springer International Publishing, 2018. [Online]. Available: https://doi.org/10.1007/978-981-10-8013-5Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] A. Rajput, “Chapter 3 - Natural Language Processing, Sentiment Analysis, and Clinical Analytics,” in Innovation in Health Informatics, M. D. Lytras and A. Sarirete, Eds. Academic Press, 2020, pp. 79–97. doi: 10.1016/B978-0-12-819043-2.00003-4.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] J. Baggini and P. S. Fosl, The Philosophers, 2nd ed. United Kingdom (UK): Blackwell Publishing Ltd, 2010.Google ScholarGoogle Scholar
  4. [4] V. Balakrishnan, S. Khan, and H. R. Arabnia, “Improving cyberbullying detection using Twitter users’ psychological features and machine learning,” Comput. Secur., vol. 90, p. 101710, Mar. 2020, doi: 10.1016/j.cose.2019.101710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] C. P. Barlett, “Chapter 2 - Cyberbullying, Traditional Bullying, and Aggression: A Complicated Relationship,” in Predicting Cyberbullying, C. P. Barlett, Ed. Academic Press, 2019, pp. 11–16. doi: 10.1016/B978-0-12-816653-6.00002-9.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] I. Ting, W. S. Liou, D. Liberona, S. Wang, and G. M. Tarazona Bermudez, “Towards the detection of cyberbullying based on social network mining techniques,” in 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC), Oct. 2017, pp. 1–2. doi: 10.1109/BESC.2017.8256403.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] T. Mahlangu and C. Tu, “Deep Learning Cyberbullying Detection Using Stacked Embbedings Approach,” in 2019 6th International Conference on Soft Computing & Machine Intelligence (ISCMI), Nov. 2019, pp. 45–49. doi: 10.1109/ISCMI47871.2019.9004292.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] P. Sheldon, P. A. Rauschnabel, and J. M. Honeycutt, “Chapter 3 - Cyberstalking and Bullying,” in The Dark Side of Social Media, P. Sheldon, P. A. Rauschnabel, and J. M. Honeycutt, Eds. Academic Press, 2019, pp. 43–58. doi: 10.1016/B978-0-12-815917-0.00003-4.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Z. L. Chia, M. Ptaszynski, F. Masui, G. Leliwa, and M. Wroczynski, “Machine Learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection,” Inf. Process. Manag., vol. 58, no. 4, p. 102600, Jul. 2021, doi: 10.1016/j.ipm.2021.102600.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] J. Eronen, M. Ptaszynski, F. Masui, A. Smywiński-Pohl, G. Leliwa, and M. Wroczynski, “Improving classifier training efficiency for automatic cyberbullying detection with Feature Density,” Inf. Process. Manag., vol. 58, no. 5, p. 102616, Sep. 2021, doi: 10.1016/j.ipm.2021.102616.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] A. Xiong, D. Liu, H. Tian, Z. Liu, P. Yu, and M. Kadoch, “News keyword extraction algorithm based on semantic clustering and word graph model,” Tsinghua Sci. Technol., vol. 26, no. 6, pp. 886–893, Dec. 2021, doi: 10.26599/TST.2020.9010051.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] I. Arroyo-Fernández, C.-F. Méndez-Cruz, G. Sierra, J.-M. Torres-Moreno, and G. Sidorov, “Unsupervised sentence representations as word information series: Revisiting TF–IDF,” Comput. Speech Lang., vol. 56, pp. 107–129, Jul. 2019, doi: 10.1016/j.csl.2019.01.005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] C. Wan, Y. Wang, Y. Liu, J. Ji, and G. Feng, “Composite Feature Extraction and Selection for Text Classification,” IEEE Access, vol. 7, pp. 35208–35219, 2019, doi: 10.1109/ACCESS.2019.2904602.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] G. Gledec, R. Šoić, and Š. Dembitz, “Dynamic N-Gram System Based on an Online Croatian Spellchecking Service,” IEEE Access, vol. 7, pp. 149988–149995, 2019, doi: 10.1109/ACCESS.2019.2947898.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] S. Li, R. Pan, H. Luo, X. Liu, and G. Zhao, “Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modeling,” Knowl.-Based Syst., vol. 218, p. 106827, Apr. 2021, doi: 10.1016/j.knosys.2021.106827.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] P. Flach, Machine Learning. The Art and Science of Algorithms that Make Sense of Data. United Kingdom (UK): Cambridge University Press, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] M. Fortunatus, P. Anthony, and S. Charters, “Combining textual features to detect cyberbullying in social media posts,” Knowl.-Based Intell. Inf. Eng. Syst. Proc. 24th Int. Conf. KES2020, vol. 176, pp. 612–621, Jan. 2020, doi: 10.1016/j.procs.2020.08.063.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] A. P. Genoud, Y. Gao, G. M. Williams, and B. P. Thomas, “A comparison of supervised machine learning algorithms for mosquito identification from backscattered optical signals,” Ecol. Inform., vol. 58, p. 101090, Jul. 2020, doi: 10.1016/j.ecoinf.2020.101090.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] F. A. Ozbay and B. Alatas, “Fake news detection within online social media using supervised artificial intelligence algorithms,” Phys. Stat. Mech. Its Appl., vol. 540, p. 123174, Feb. 2020, doi: 10.1016/j.physa.2019.123174.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and M. A. Moni, “Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison,” Comput. Biol. Med., vol. 136, p. 104672, Sep. 2021, doi: 10.1016/j.compbiomed.2021.104672.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] T. Imura , “Comparison of Supervised Machine Learning Algorithms for Classifying of Home Discharge Possibility in Convalescent Stroke Patients: A Secondary Analysis,” J. Stroke Cerebrovasc. Dis., vol. 30, no. 10, p. 106011, Oct. 2021, doi: 10.1016/j.jstrokecerebrovasdis.2021.106011.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] N. Cohen-Shapira and L. Rokach, “Automatic selection of clustering algorithms using supervised graph embedding,” Inf. Sci., vol. 577, pp. 824–851, Oct. 2021, doi: 10.1016/j.ins.2021.08.028.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Z. Zhao, P. Zheng, S. Xu, and X. Wu, “Object Detection With Deep Learning: A Review,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 11, Art. no. 11, Nov. 2019, doi: 10.1109/TNNLS.2018.2876865.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] E. Zinovyeva, W. K. Härdle, and S. Lessmann, “Antisocial online behavior detection using deep learning,” Decis. Support Syst., vol. 138, p. 113362, Nov. 2020, doi: 10.1016/j.dss.2020.113362.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] H. M. Fayek, L. Cavedon, and H. R. Wu, “Progressive learning: A deep learning framework for continual learning,” Neural Netw., vol. 128, pp. 345–357, Aug. 2020, doi: 10.1016/j.neunet.2020.05.011.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] U. Mokhtar , “SVM-Based Detection of Tomato Leaves Diseases,” in Intelligent Systems’2014, Cham, 2015, pp. 641–652.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] P. Tao, Z. Sun, and Z. Sun, “An Improved Intrusion Detection Algorithm Based on GA and SVM,” IEEE Access, vol. 6, pp. 13624–13631, 2018, doi: 10.1109/ACCESS.2018.2810198.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] D. Martens, B. B. Baesens, and T. Van Gestel, “Decompositional Rule Extraction from Support Vector Machines by Active Learning,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 2, pp. 178–191, Feb. 2009, doi: 10.1109/TKDE.2008.131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] H. Chen, P. Tino, and X. Yao, “Probabilistic Classification Vector Machines,” IEEE Trans. Neural Netw., vol. 20, no. 6, pp. 901–914, Jun. 2009, doi: 10.1109/TNN.2009.2014161.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] J. Wu and H. Yang, “Linear Regression-Based Efficient SVM Learning for Large-Scale Classification,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 10, pp. 2357–2369, Oct. 2015, doi: 10.1109/TNNLS.2014.2382123.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] S. Long, X. He, and C. Yao, “Scene Text Detection and Recognition: The Deep Learning Era,” Int. J. Comput. Vis., vol. 129, no. 1, Art. no. 1, Jan. 2021, doi: 10.1007/s11263-020-01369-0.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] J. (Juyang) Weng, N. Ahuja, and T. S. Huang, “Learning Recognition and Segmentation Using the Cresceptron,” Int. J. Comput. Vis., vol. 25, no. 2, pp. 109–143, Nov. 1997, doi: 10.1023/A:1007967800668.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The Use of Dynamic n-Gram to Enhance TF-IDF Features Extraction for Bahasa Indonesia Cyberbullying Classification
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICSCA '23: Proceedings of the 2023 12th International Conference on Software and Computer Applications
          February 2023
          385 pages
          ISBN:9781450398589
          DOI:10.1145/3587828

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 June 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)32
          • Downloads (Last 6 weeks)3

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format