Skip to main content
Log in

ARTC: feature selection using association rules for text classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Feature vectors are extracted to represent objects in many classification tasks, such as text classification. Due to the high dimensionality of these raw feature vectors, the classification efficiency and accuracy are reduced. Therefore, reducing the size of feature vectors by selecting the relevant features that better represent the objects is an important aspect in text classification. Feature selection not only reduces the dimensionality of the feature vectors, but also produces more efficient classification models with higher predictive power. In this paper, we propose ARTC, which is an effective feature selection method that is based on the extraction of association rules to classify text documents. The extracted association rules discover the hidden relationships and correlations between the relevant words within the textual documents of a class and a cross different classes. Consequently, each class of documents is represented by a small set of contrasting features that are more effective in text classification. Our experiments show that ARTC outperforms other relevant techniques in terms of classification performance and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Wang R, Chow C-Y, Kwong S (2016) Ambiguity-based multiclass active learning. IEEE Trans Fuzzy Syst 24(1):242–248

    Article  Google Scholar 

  2. Makkar A, Garg S, Kumar N, Hossain MS, Ghoneim A, Alrashoud M (2020) An efficient spam detection technique for IoT devices using machine learning. IEEE Trans Industr Inf 17(2):903–912

    Article  Google Scholar 

  3. Kanimozhi, U, Sannasi, G, Manjula, D, Arputharaj, K (2021) A user preference tree based personalized route recommendation system for constraint tourism and travel. Soft Computing, pp 1–20

  4. Basiri ME, Nemati S, Abdar M, Cambria E, Acharya UR (2021) ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis. Futur Gener Comput Syst 115:279–294

    Article  Google Scholar 

  5. Peng H, Li J, Song Y, Yang R, Ranjan R, Yu PS, He L (2021) Streaming social event detection and evolution discovery in heterogeneous information networks. ACM Trans Knowl Discov Data (TKDD) 15(5):1–33

    Article  Google Scholar 

  6. Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79

    Article  Google Scholar 

  7. Sheydaei N, Saraee M, Shahgholian A (2015) A novel feature selection method for text classification using association rules and clustering. J Inf Sci 41(1):3–15

    Article  Google Scholar 

  8. S¸ahin, D O, Kılı¸c, E, (2019) Two new feature selection metrics for text classification. Automatika 60(2):162–171

    Article  Google Scholar 

  9. Al Aghbari Z, Junejo IN (2015) DisCoSet: discovery of contrast sets to reduce dimensionality and improve classification. Int J Comput Intel Sys 8(6):1178–1191

    Article  Google Scholar 

  10. Uysal AK, Gunal S (2014) Text classification using genetic algorithm oriented latent semantic features. Expert Sys Appl 41(13):5938–5947

    Article  Google Scholar 

  11. Kim K, Zang SY (2019) Trigonometric comparison measure: a feature selec tion method for text categorization. Data Knowl Eng 119:1–21

    Article  Google Scholar 

  12. Lee J, Yu I, Park J et al (2019) Memetic feature selection for multilabel text categorization label frequency difference. Inf Sci 485:263–280

    Article  Google Scholar 

  13. Labani M, Moradi P, Ahmadizar F et al (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intel 70:25–37

    Article  Google Scholar 

  14. Webb GI (2007) Discovering significant patterns. J Mach Lear 68:1–33

    Article  MATH  Google Scholar 

  15. Song M, Song IY, Hu X, Allen RB (2007) Integration of association rules and ontologies for semantic query expansion. Data Knowl Eng 63:63–75

    Article  Google Scholar 

  16. Kaoungku, N, Suksut, K, Chanklan, R, Kerdprasop, K, Kerdprasop, N (2017) Data Classification Based on Feature Selection with Association Rule Mining. International MultiConference of Engineers and Computer Scientists, Hong Kong

  17. Xie, J, Wu, J, Qian, Q (2009) Feature selection algorithm based on association rules mining method. Eighth IEEE/ACIS International Conference Computer and Information Science

  18. Hadi WE, Aburub F, Alhawari S (2016) A new fast associative classification algorithm for detecting phishing websites. Appl Soft Comput 48:729–734

    Article  Google Scholar 

  19. Alwidian, J, Hammo, B, Obeid, N (2020) Enhanced CBA algorithm based on apriori optimization and statistical ranking measure. In Proceeding of 28th International Business Information Management Association (IBIMA) conference on Vision pp. 4291–4306

  20. Hadi WE, Al-Radaideh QA, Alhawari S (2018) Integrating associative rule-based classification with naive bayes for text classification. Appl Soft Comput 69:344–356

    Article  Google Scholar 

  21. Geng X, Liang Y, Jiao L (2021) EARC: Evidential association rule-based classification. Inf Sci 547:202–222

    Article  MATH  Google Scholar 

  22. Fernandez-Basso C, Ruiz MD, Martin-Bautista MJ (2021) Spark solutions for discovering fuzzy association rules in big data. Int J Approximate Reason 137:94–112

    Article  MathSciNet  MATH  Google Scholar 

  23. Shang H, Lu D, Zhou Q (2021) Early warning of enterprise finance risk of big data mining in internet of things based on fuzzy association rules. Neural Comput Appl 33(9):3901–3909

    Article  Google Scholar 

  24. Li, C, Li, W (2021) Automatic Classification Algorithm for Multisearch Data Association Rules in Wireless Networks. Wireless Communications and Mobile Computing, 2021

  25. Geng X, Liang Y, Jiao L (2021) ARC-SL: association rule-based classification with soft labels. Knowl-Based Syst 225:107116

    Article  Google Scholar 

  26. Geng, X, Liang, Y, & Jiao, L (2021) Evidential Association Classification for High-Dimensional Data. In 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), pp 100–105

  27. Abu-Arqoub M, Hadi W, Ishtaiwi A (2021) ACRIPPER: a new associative classification based on RIPPER algorithm. J Inf Knowl Manag 20(01):2150013

    Article  Google Scholar 

  28. Khedr AM, Al Aghbari Z, Al Ali A, Eljamil M (2021) An efficient association rule mining from distributed medical databases for predicting heart diseases. IEEE Access 9:15320–15333

    Article  Google Scholar 

  29. Annapureddy, P, Franco, Z, Madiraju, P, Ahamed, S I, Flower, M, Hossain, M F, Winstead, O (2021) Identifying Precursors to Long-Term Crisis in Veterans Using Associative Classifier. In 2021 IEEE International Conference on Big Data (Big Data), pp 4633–4642

  30. Wang CH, Lee TY, Hui KC, Chung MH (2019) Mental disorders and medical comorbidities: association rule mining approach. Perspect Psychiatr Care 55(3):517–526

    Article  Google Scholar 

  31. Rohidin, D, Samsudin, N A, Deris, M M (2020) Association rules of fuzzy soft set based classification for text classification problem. Journal of King Saud University-Computer and Information Sciences

  32. Shao Z, Li Y, Wang X, Zhao X, Guo Y (2020) Research on a new auto- matic generation algorithm of concept map based on text analysis and association rules mining. J Ambient Intell Humaniz Comput 11(2):539–551

    Article  Google Scholar 

  33. Jabri, S, Dahbi, A, Gadi, T, Bassir, A (2018) Ranking of text documents using TF-IDF weighting and association rules mining. 4th international conference on optimization and applications, pp. 1–6

  34. Puri, S, Singh, S P, (2019) An efficient hindi text classification model using svm. In Computing and Network Sustainability, Singapore, pp. 227–237

  35. Al Aghbari, Z, Saeed, M, (2021) “Leveraging Association Rules in Feature Selection to Classify Text”, 4th International conference on Computer Networks and Inventive Communication Technologies, India.

Download references

Acknowledgements

The authors have published the preliminary results of this work as a short version in a conference [35]. This version includes more detailed analysis of the proposed algorithms, and more experiments and comparisons.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mozamel M. Saeed.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Zaher Al Aghbari contributed equally to this work.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saeed, M.M., Al Aghbari, Z. ARTC: feature selection using association rules for text classification. Neural Comput & Applic 34, 22519–22529 (2022). https://doi.org/10.1007/s00521-022-07669-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07669-5

Keywords

Navigation