Skip to main content
Log in

A systematic review of Arabic text classification: areas, applications, and future directions

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Text classification pertains to the automated procedure of assigning predefined labels or categories to textual data. A comprehensive review of the existing literature on Arabic text classification (ATC) reveals that most research concentrates on methodologies and approaches, with no thorough evaluation of ATC. Consequently, this systematic review aims to offer a comprehensive understanding of the state-of-the-art in ATC, illuminate the present challenges, and discuss prominent trends in large-scale research. From a collection of 2875 studies, 60 were determined to satisfy the eligibility criteria and were rigorously analyzed. The selected studies were divided into three categories: topic areas, tasks/applications, and ATC phases. The topic areas were classified into six primary sectors: healthcare, legal, security and cybersecurity, history, culture and religion, social media, and agriculture. The ATC tasks/applications were classified into nine groups: gender identification, author identification, disease detection, threat and spam detection, dialect identification, hierarchical categorization, news article classification, web page clustering, and question classification. The ATC phases were organized into five categories: corpus creation, preprocessing (stemming and tokenization), feature selection, feature extraction, and classifiers/approaches. The review emphasizes the proposed solutions in each ATC study and offers insights for future research. This review also underscores the potential applications of ATC in addressing current challenges across various industries and highlights the significance of developing a benchmark dataset for ATC to facilitate model comparison. The review concludes by proposing areas where further research is required, such as addressing the unbalanced dataset issue, enhancing the preprocessing phase, and exploring human factors’ role in utilizing ATC systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data presented in this study are available on request from the authors.

References

Download references

Funding

There is no funding received for this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mostafa Al-Emran.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wahdan, A., Al-Emran, M. & Shaalan, K. A systematic review of Arabic text classification: areas, applications, and future directions. Soft Comput 28, 1545–1566 (2024). https://doi.org/10.1007/s00500-023-08384-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-08384-6

Keywords

Navigation