Quora Insincere Questions Classification Using Attention Based Model

Chakraborty, Snigdha; Wilson, Megan; Assi, Sulaf; Al-Hamid, Abdullah; Alamran, Maitham; Al-Nahari, Abdulaziz; Mustafina, Jamila; Lunn, Jan; Al-Jumeily OBE, Dhiya

doi:10.1007/978-981-99-0741-0_26

Snigdha Chakraborty⁶,
Megan Wilson⁷,
Sulaf Assi⁷,
Abdullah Al-Hamid⁸,
Maitham Alamran⁹,
Abdulaziz Al-Nahari¹¹,
Jamila Mustafina¹⁰,
Jan Lunn⁶ &
…
Dhiya Al-Jumeily OBE⁶

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 165))

Included in the following conference series:

The International Conference on Data Science and Emerging Technologies

394 Accesses

Abstract

The online platform has evolved into an unparalleled storehouse of information. People use various social question-and-answer websites such as Quora, Form-spring, Stack-Overflow, Twitter, and Beepl to ask questions, clarify doubts, and share ideas and expertise with others. An increase in inappropriate and insincere comments by users without a genuine motive is a major issue with such Q & A websites. Individuals tend to share harmful and toxic content intended to make a statement rather than look for helpful answers. In the world of natural language processing (NLP), Bidirectional Encoder Representations from Transformers (BERT) has been a game-changer. It has dominated performance benchmarks and thereby pushed the limits of researchers’ ability to experiment and produce similar models. This resulted in improvements in language models by introducing lighter models while maintaining efficiency and performance. This study utilized pre-trained state-of-the-art language models for understanding whether posted questions are sincere or insincere with limited computation. To overcome the high computation problem of NLP, the BERT, XLNet, StructBERT, and DeBERTa models were trained on three samples of data. The metrics proved that even with limited resources, recent transformer-based models outscore previous studies with remarkable results. Amongst the four, DeBERTa stands out with the highest balanced accuracy, macro, and weighted f1-score of 80%, 0.83 and 0.96, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hosseinmardi, H., Mattson, S. A., Ibn Rafiq, R., Han, R., Lv, Q., Mishra, S.. Analyzing labeled cyberbullying incidents on the instagram social network. In: Liu, TY., Scollon, C., Zhu, W. (eds.) Social Informatics. SocInfo 2015. Lecture Notes in Computer Science, vol. 9471, pp. 49-66. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27433-1_4
Maslej-Krešňáková, V., Sarnovský, M., Butka, P., Machová, K.: Comparison of deep learning models and various text pre-processing techniques for the toxic comments classification. Appl. Sci. 10(23), 8631 (2020)
Article Google Scholar
Del Vicario, M., et al.: The spreading of misinformation online. Proc. Natl. Acad. Sci. 113(3), 554–559 (2016)
Article Google Scholar
Morzhov, S.: Avoiding unintended bias in toxicity classification with neural networks. In: 2020 26th Conference of Open Innovations Association (FRUCT), pp. 314–320. IEEE (2020)
Google Scholar
Quora Insincere Questions Classification | Kaggle, https://www.kaggle.com/c/quora-insincere-questions-classification/data. Accessed 02 Nov 2021
Kumar, A., Makhija, P., Gupta, A. Noisy Text Data: Achilles’ Heel of BERT. arXiv preprint arXiv:2003.12932 (2020)
Wirth, R., Hipp, J.: CRISP-DM: Towards a standard process model for data mining. In Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, vol. 1, pp. 29–39 (2000)
Google Scholar
Aslam, I., et al.: Classification of Insincere Questions Using Deep Learning: Quora Dataset Case Study. Springer International Publishing, Cham (2021)
Google Scholar
Al-Ramahi, M.A. Alsmadi, I.: Using data analytics to filter insincere posts from online social networks. a case study: Quora Insincere Questions (2020)
Google Scholar
Rachha, A. Vanmane, G.: Detecting insincere questions from text: a transfer learning approach (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering and Technology, Liverpool John Moores University, Liverpool, UK
Snigdha Chakraborty, Jan Lunn & Dhiya Al-Jumeily OBE
School of Pharmacy and Bimolecular Science, Liverpool John Moores University, Liverpool, UK
Megan Wilson & Sulaf Assi
Saudi Ministry of Health, Najran, Saudi Arabia
Abdullah Al-Hamid
Al-Qadisiyah University, Al-Qadisiyah, Iraq
Maitham Alamran
Kazan Federal University, Kazan, Russia
Jamila Mustafina
UNITAR Graduate School, UNITAR International University, Selangor, Petaling Jaya, Malaysia
Abdulaziz Al-Nahari

Authors

Snigdha Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Megan Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Sulaf Assi
View author publications
You can also search for this author in PubMed Google Scholar
Abdullah Al-Hamid
View author publications
You can also search for this author in PubMed Google Scholar
Maitham Alamran
View author publications
You can also search for this author in PubMed Google Scholar
Abdulaziz Al-Nahari
View author publications
You can also search for this author in PubMed Google Scholar
Jamila Mustafina
View author publications
You can also search for this author in PubMed Google Scholar
Jan Lunn
View author publications
You can also search for this author in PubMed Google Scholar
Dhiya Al-Jumeily OBE
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sulaf Assi .

Editor information

Editors and Affiliations

UNITAR Graduate School, UNITAR International University, Selangor, Malaysia
Yap Bee Wah
University of Tennessee, Knoxville, TN, USA
Michael W. Berry
Institute for Big Data Analytics and Artificial Intelligence, Universiti Teknologi MARA (UiTM), Shah Alam, Selangor, Malaysia
Azlinah Mohamed
School of Computer Science and Mathematics, Liverpool John Moores University, Liverpool, UK
Dhiya Al-Jumeily

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chakraborty, S. et al. (2023). Quora Insincere Questions Classification Using Attention Based Model. In: Wah, Y.B., Berry, M.W., Mohamed, A., Al-Jumeily, D. (eds) Data Science and Emerging Technologies. DaSET 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 165. Springer, Singapore. https://doi.org/10.1007/978-981-99-0741-0_26

Download citation

DOI: https://doi.org/10.1007/978-981-99-0741-0_26
Published: 01 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0740-3
Online ISBN: 978-981-99-0741-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics