research-article

The Use of Dynamic n-Gram to Enhance TF-IDF Features Extraction for Bahasa Indonesia Cyberbullying Classification

Authors:
Yudi Setiawan

School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia

School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia

0000-0001-8660-8508
View Profile

,
Nur Ulfa Maulidevi

School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia

School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia

0000-0001-6624-4153
View Profile

,
Kridanto Surendro

School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia

School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia

0000-0003-1705-1202
View Profile

ICSCA '23: Proceedings of the 2023 12th International Conference on Software and Computer ApplicationsFebruary 2023Pages 200–205https://doi.org/10.1145/3587828.3587858

Published:20 June 2023Publication History

ICSCA '23: Proceedings of the 2023 12th International Conference on Software and Computer Applications

Pages 200–205

ABSTRACT

Cyberbullying detection in a sentence or utterance has challenges due to syntactic and meaning variations (lexical). Term Frequency-Inverse Document Frequency (TF-IDF) carries out textual feature extraction to produce candidates thematically based on word occurrence statistics. However, these candidates are generated without considering a term relationship between constituent elements in the parsing language syntax. This study discusses a TF-IDF feature extraction model using the n-Gram approach to produce candidate feature selection based on a specified term relationship. Thresholding applications for the formation of dynamic n-Gram segmentation were also discussed. Furthermore, the dynamic n-Gram model in TF-IDF feature extraction can be used in cyberbullying classification to overcome variations in syntax and meaning of sentences/speech from Bahasa Indonesia.

References

[1] P. Michael A, R. Sharon, H. Mats, and B. Tina, Post-Truth, Fake News. Singapore: Springer International Publishing, 2018. [Online]. Available: https://doi.org/10.1007/978-981-10-8013-5Google ScholarCross Ref
[2] A. Rajput, “Chapter 3 - Natural Language Processing, Sentiment Analysis, and Clinical Analytics,” in Innovation in Health Informatics, M. D. Lytras and A. Sarirete, Eds. Academic Press, 2020, pp. 79–97. doi: 10.1016/B978-0-12-819043-2.00003-4.Google ScholarCross Ref
[3] J. Baggini and P. S. Fosl, The Philosophers, 2nd ed. United Kingdom (UK): Blackwell Publishing Ltd, 2010.Google Scholar
[4] V. Balakrishnan, S. Khan, and H. R. Arabnia, “Improving cyberbullying detection using Twitter users’ psychological features and machine learning,” Comput. Secur., vol. 90, p. 101710, Mar. 2020, doi: 10.1016/j.cose.2019.101710.Google ScholarDigital Library
[5] C. P. Barlett, “Chapter 2 - Cyberbullying, Traditional Bullying, and Aggression: A Complicated Relationship,” in Predicting Cyberbullying, C. P. Barlett, Ed. Academic Press, 2019, pp. 11–16. doi: 10.1016/B978-0-12-816653-6.00002-9.Google ScholarCross Ref
[6] I. Ting, W. S. Liou, D. Liberona, S. Wang, and G. M. Tarazona Bermudez, “Towards the detection of cyberbullying based on social network mining techniques,” in 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC), Oct. 2017, pp. 1–2. doi: 10.1109/BESC.2017.8256403.Google ScholarCross Ref
[7] T. Mahlangu and C. Tu, “Deep Learning Cyberbullying Detection Using Stacked Embbedings Approach,” in 2019 6th International Conference on Soft Computing & Machine Intelligence (ISCMI), Nov. 2019, pp. 45–49. doi: 10.1109/ISCMI47871.2019.9004292.Google ScholarCross Ref
[8] P. Sheldon, P. A. Rauschnabel, and J. M. Honeycutt, “Chapter 3 - Cyberstalking and Bullying,” in The Dark Side of Social Media, P. Sheldon, P. A. Rauschnabel, and J. M. Honeycutt, Eds. Academic Press, 2019, pp. 43–58. doi: 10.1016/B978-0-12-815917-0.00003-4.Google ScholarCross Ref
[9] Z. L. Chia, M. Ptaszynski, F. Masui, G. Leliwa, and M. Wroczynski, “Machine Learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection,” Inf. Process. Manag., vol. 58, no. 4, p. 102600, Jul. 2021, doi: 10.1016/j.ipm.2021.102600.Google ScholarDigital Library
[10] J. Eronen, M. Ptaszynski, F. Masui, A. Smywiński-Pohl, G. Leliwa, and M. Wroczynski, “Improving classifier training efficiency for automatic cyberbullying detection with Feature Density,” Inf. Process. Manag., vol. 58, no. 5, p. 102616, Sep. 2021, doi: 10.1016/j.ipm.2021.102616.Google ScholarDigital Library
[11] A. Xiong, D. Liu, H. Tian, Z. Liu, P. Yu, and M. Kadoch, “News keyword extraction algorithm based on semantic clustering and word graph model,” Tsinghua Sci. Technol., vol. 26, no. 6, pp. 886–893, Dec. 2021, doi: 10.26599/TST.2020.9010051.Google ScholarCross Ref
[12] I. Arroyo-Fernández, C.-F. Méndez-Cruz, G. Sierra, J.-M. Torres-Moreno, and G. Sidorov, “Unsupervised sentence representations as word information series: Revisiting TF–IDF,” Comput. Speech Lang., vol. 56, pp. 107–129, Jul. 2019, doi: 10.1016/j.csl.2019.01.005.Google ScholarDigital Library
[13] C. Wan, Y. Wang, Y. Liu, J. Ji, and G. Feng, “Composite Feature Extraction and Selection for Text Classification,” IEEE Access, vol. 7, pp. 35208–35219, 2019, doi: 10.1109/ACCESS.2019.2904602.Google ScholarCross Ref
[14] G. Gledec, R. Šoić, and Š. Dembitz, “Dynamic N-Gram System Based on an Online Croatian Spellchecking Service,” IEEE Access, vol. 7, pp. 149988–149995, 2019, doi: 10.1109/ACCESS.2019.2947898.Google ScholarCross Ref
[15] S. Li, R. Pan, H. Luo, X. Liu, and G. Zhao, “Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modeling,” Knowl.-Based Syst., vol. 218, p. 106827, Apr. 2021, doi: 10.1016/j.knosys.2021.106827.Google ScholarDigital Library
[16] P. Flach, Machine Learning. The Art and Science of Algorithms that Make Sense of Data. United Kingdom (UK): Cambridge University Press, 2012.Google ScholarCross Ref
[17] M. Fortunatus, P. Anthony, and S. Charters, “Combining textual features to detect cyberbullying in social media posts,” Knowl.-Based Intell. Inf. Eng. Syst. Proc. 24th Int. Conf. KES2020, vol. 176, pp. 612–621, Jan. 2020, doi: 10.1016/j.procs.2020.08.063.Google ScholarCross Ref
[18] A. P. Genoud, Y. Gao, G. M. Williams, and B. P. Thomas, “A comparison of supervised machine learning algorithms for mosquito identification from backscattered optical signals,” Ecol. Inform., vol. 58, p. 101090, Jul. 2020, doi: 10.1016/j.ecoinf.2020.101090.Google ScholarCross Ref
[19] F. A. Ozbay and B. Alatas, “Fake news detection within online social media using supervised artificial intelligence algorithms,” Phys. Stat. Mech. Its Appl., vol. 540, p. 123174, Feb. 2020, doi: 10.1016/j.physa.2019.123174.Google ScholarCross Ref
[20] M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and M. A. Moni, “Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison,” Comput. Biol. Med., vol. 136, p. 104672, Sep. 2021, doi: 10.1016/j.compbiomed.2021.104672.Google ScholarDigital Library
[21] T. Imura , “Comparison of Supervised Machine Learning Algorithms for Classifying of Home Discharge Possibility in Convalescent Stroke Patients: A Secondary Analysis,” J. Stroke Cerebrovasc. Dis., vol. 30, no. 10, p. 106011, Oct. 2021, doi: 10.1016/j.jstrokecerebrovasdis.2021.106011.Google ScholarCross Ref
[22] N. Cohen-Shapira and L. Rokach, “Automatic selection of clustering algorithms using supervised graph embedding,” Inf. Sci., vol. 577, pp. 824–851, Oct. 2021, doi: 10.1016/j.ins.2021.08.028.Google ScholarDigital Library
[23] Z. Zhao, P. Zheng, S. Xu, and X. Wu, “Object Detection With Deep Learning: A Review,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 11, Art. no. 11, Nov. 2019, doi: 10.1109/TNNLS.2018.2876865.Google ScholarCross Ref
[24] E. Zinovyeva, W. K. Härdle, and S. Lessmann, “Antisocial online behavior detection using deep learning,” Decis. Support Syst., vol. 138, p. 113362, Nov. 2020, doi: 10.1016/j.dss.2020.113362.Google ScholarCross Ref
[25] H. M. Fayek, L. Cavedon, and H. R. Wu, “Progressive learning: A deep learning framework for continual learning,” Neural Netw., vol. 128, pp. 345–357, Aug. 2020, doi: 10.1016/j.neunet.2020.05.011.Google ScholarCross Ref
[26] U. Mokhtar , “SVM-Based Detection of Tomato Leaves Diseases,” in Intelligent Systems’2014, Cham, 2015, pp. 641–652.Google ScholarCross Ref
[27] P. Tao, Z. Sun, and Z. Sun, “An Improved Intrusion Detection Algorithm Based on GA and SVM,” IEEE Access, vol. 6, pp. 13624–13631, 2018, doi: 10.1109/ACCESS.2018.2810198.Google ScholarCross Ref
[28] D. Martens, B. B. Baesens, and T. Van Gestel, “Decompositional Rule Extraction from Support Vector Machines by Active Learning,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 2, pp. 178–191, Feb. 2009, doi: 10.1109/TKDE.2008.131.Google ScholarDigital Library
[29] H. Chen, P. Tino, and X. Yao, “Probabilistic Classification Vector Machines,” IEEE Trans. Neural Netw., vol. 20, no. 6, pp. 901–914, Jun. 2009, doi: 10.1109/TNN.2009.2014161.Google ScholarDigital Library
[30] J. Wu and H. Yang, “Linear Regression-Based Efficient SVM Learning for Large-Scale Classification,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 10, pp. 2357–2369, Oct. 2015, doi: 10.1109/TNNLS.2014.2382123.Google ScholarCross Ref
[31] S. Long, X. He, and C. Yao, “Scene Text Detection and Recognition: The Deep Learning Era,” Int. J. Comput. Vis., vol. 129, no. 1, Art. no. 1, Jan. 2021, doi: 10.1007/s11263-020-01369-0.Google ScholarDigital Library
[32] J. (Juyang) Weng, N. Ahuja, and T. S. Huang, “Learning Recognition and Segmentation Using the Cresceptron,” Int. J. Comput. Vis., vol. 25, no. 2, pp. 109–143, Nov. 1997, doi: 10.1023/A:1007967800668.Google ScholarDigital Library

Index Terms

The Use of Dynamic n-Gram to Enhance TF-IDF Features Extraction for Bahasa Indonesia Cyberbullying Classification

Index terms have been assigned to the content through auto-classification.

Recommendations

TF-IDF Keyword Extraction Method Combining Context and Semantic Classification
DSIT 2020: Proceedings of the 3rd International Conference on Data Science and Information Technology

Keyword extraction plays the same role as the cornerstone in the field of natural language processing. Text classification, information retrieval, abstract generation and text clustering are all based on keyword extraction. This article takes the ...
Read More
Apply the Dynamic N-gram to Extract the Keywords of Chinese News
IEA/AIE 2014: Proceedings, Part II, of the 27th International Conference on Modern Advances in Applied Intelligence - Volume 8482

The explosive growth of information on the Internet has created a great demand for new and powerful tools to acquire useful information. The first step to retrieve information form Chinese article is word segmentation. But there are two major ...
Read More
Applications of tf-idf concept to improve monolingual and cross-language information retrieval based on word embeddings
AISS '19: Proceedings of the 1st International Conference on Advanced Information Science and System

This work applied word embeddings for English monolingual information retrieval and Dutch-English cross-language information retrieval. Besides word embeddings, this work also applied tf-idf concept to increase result of relevant documents. We present ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICSCA '23: Proceedings of the 2023 12th International Conference on Software and Computer Applications
February 2023
385 pages
ISBN:9781450398589
DOI:10.1145/3587828

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cyberbullying classification
Dynamic n-Gram
Features extraction
TF-IDF
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 32
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

The Use of Dynamic n-Gram to Enhance TF-IDF Features Extraction for Bahasa Indonesia Cyberbullying Classification

ICSCA '23: Proceedings of the 2023 12th International Conference on Software and Computer Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

TF-IDF Keyword Extraction Method Combining Context and Semantic Classification

Apply the Dynamic N-gram to Extract the Keywords of Chinese News

Applications of tf-idf concept to improve monolingual and cross-language information retrieval based on word embeddings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

The Use of Dynamic n-Gram to Enhance TF-IDF Features Extraction for Bahasa Indonesia Cyberbullying Classification

ICSCA '23: Proceedings of the 2023 12th International Conference on Software and Computer Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

TF-IDF Keyword Extraction Method Combining Context and Semantic Classification

Apply the Dynamic N-gram to Extract the Keywords of Chinese News

Applications of tf-idf concept to improve monolingual and cross-language information retrieval based on word embeddings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media