Abstract
Feature vectors are extracted to represent objects in many classification tasks, such as text classification. Due to the high dimensionality of these raw feature vectors, the classification efficiency and accuracy are reduced. Therefore, reducing the size of feature vectors by selecting the relevant features that better represent the objects is an important aspect in text classification. Feature selection not only reduces the dimensionality of the feature vectors, but also produces more efficient classification models with higher predictive power. In this paper, we propose ARTC, which is an effective feature selection method that is based on the extraction of association rules to classify text documents. The extracted association rules discover the hidden relationships and correlations between the relevant words within the textual documents of a class and a cross different classes. Consequently, each class of documents is represented by a small set of contrasting features that are more effective in text classification. Our experiments show that ARTC outperforms other relevant techniques in terms of classification performance and efficiency.
Similar content being viewed by others
References
Wang R, Chow C-Y, Kwong S (2016) Ambiguity-based multiclass active learning. IEEE Trans Fuzzy Syst 24(1):242–248
Makkar A, Garg S, Kumar N, Hossain MS, Ghoneim A, Alrashoud M (2020) An efficient spam detection technique for IoT devices using machine learning. IEEE Trans Industr Inf 17(2):903–912
Kanimozhi, U, Sannasi, G, Manjula, D, Arputharaj, K (2021) A user preference tree based personalized route recommendation system for constraint tourism and travel. Soft Computing, pp 1–20
Basiri ME, Nemati S, Abdar M, Cambria E, Acharya UR (2021) ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis. Futur Gener Comput Syst 115:279–294
Peng H, Li J, Song Y, Yang R, Ranjan R, Yu PS, He L (2021) Streaming social event detection and evolution discovery in heterogeneous information networks. ACM Trans Knowl Discov Data (TKDD) 15(5):1–33
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Sheydaei N, Saraee M, Shahgholian A (2015) A novel feature selection method for text classification using association rules and clustering. J Inf Sci 41(1):3–15
S¸ahin, D O, Kılı¸c, E, (2019) Two new feature selection metrics for text classification. Automatika 60(2):162–171
Al Aghbari Z, Junejo IN (2015) DisCoSet: discovery of contrast sets to reduce dimensionality and improve classification. Int J Comput Intel Sys 8(6):1178–1191
Uysal AK, Gunal S (2014) Text classification using genetic algorithm oriented latent semantic features. Expert Sys Appl 41(13):5938–5947
Kim K, Zang SY (2019) Trigonometric comparison measure: a feature selec tion method for text categorization. Data Knowl Eng 119:1–21
Lee J, Yu I, Park J et al (2019) Memetic feature selection for multilabel text categorization label frequency difference. Inf Sci 485:263–280
Labani M, Moradi P, Ahmadizar F et al (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intel 70:25–37
Webb GI (2007) Discovering significant patterns. J Mach Lear 68:1–33
Song M, Song IY, Hu X, Allen RB (2007) Integration of association rules and ontologies for semantic query expansion. Data Knowl Eng 63:63–75
Kaoungku, N, Suksut, K, Chanklan, R, Kerdprasop, K, Kerdprasop, N (2017) Data Classification Based on Feature Selection with Association Rule Mining. International MultiConference of Engineers and Computer Scientists, Hong Kong
Xie, J, Wu, J, Qian, Q (2009) Feature selection algorithm based on association rules mining method. Eighth IEEE/ACIS International Conference Computer and Information Science
Hadi WE, Aburub F, Alhawari S (2016) A new fast associative classification algorithm for detecting phishing websites. Appl Soft Comput 48:729–734
Alwidian, J, Hammo, B, Obeid, N (2020) Enhanced CBA algorithm based on apriori optimization and statistical ranking measure. In Proceeding of 28th International Business Information Management Association (IBIMA) conference on Vision pp. 4291–4306
Hadi WE, Al-Radaideh QA, Alhawari S (2018) Integrating associative rule-based classification with naive bayes for text classification. Appl Soft Comput 69:344–356
Geng X, Liang Y, Jiao L (2021) EARC: Evidential association rule-based classification. Inf Sci 547:202–222
Fernandez-Basso C, Ruiz MD, Martin-Bautista MJ (2021) Spark solutions for discovering fuzzy association rules in big data. Int J Approximate Reason 137:94–112
Shang H, Lu D, Zhou Q (2021) Early warning of enterprise finance risk of big data mining in internet of things based on fuzzy association rules. Neural Comput Appl 33(9):3901–3909
Li, C, Li, W (2021) Automatic Classification Algorithm for Multisearch Data Association Rules in Wireless Networks. Wireless Communications and Mobile Computing, 2021
Geng X, Liang Y, Jiao L (2021) ARC-SL: association rule-based classification with soft labels. Knowl-Based Syst 225:107116
Geng, X, Liang, Y, & Jiao, L (2021) Evidential Association Classification for High-Dimensional Data. In 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), pp 100–105
Abu-Arqoub M, Hadi W, Ishtaiwi A (2021) ACRIPPER: a new associative classification based on RIPPER algorithm. J Inf Knowl Manag 20(01):2150013
Khedr AM, Al Aghbari Z, Al Ali A, Eljamil M (2021) An efficient association rule mining from distributed medical databases for predicting heart diseases. IEEE Access 9:15320–15333
Annapureddy, P, Franco, Z, Madiraju, P, Ahamed, S I, Flower, M, Hossain, M F, Winstead, O (2021) Identifying Precursors to Long-Term Crisis in Veterans Using Associative Classifier. In 2021 IEEE International Conference on Big Data (Big Data), pp 4633–4642
Wang CH, Lee TY, Hui KC, Chung MH (2019) Mental disorders and medical comorbidities: association rule mining approach. Perspect Psychiatr Care 55(3):517–526
Rohidin, D, Samsudin, N A, Deris, M M (2020) Association rules of fuzzy soft set based classification for text classification problem. Journal of King Saud University-Computer and Information Sciences
Shao Z, Li Y, Wang X, Zhao X, Guo Y (2020) Research on a new auto- matic generation algorithm of concept map based on text analysis and association rules mining. J Ambient Intell Humaniz Comput 11(2):539–551
Jabri, S, Dahbi, A, Gadi, T, Bassir, A (2018) Ranking of text documents using TF-IDF weighting and association rules mining. 4th international conference on optimization and applications, pp. 1–6
Puri, S, Singh, S P, (2019) An efficient hindi text classification model using svm. In Computing and Network Sustainability, Singapore, pp. 227–237
Al Aghbari, Z, Saeed, M, (2021) “Leveraging Association Rules in Feature Selection to Classify Text”, 4th International conference on Computer Networks and Inventive Communication Technologies, India.
Acknowledgements
The authors have published the preliminary results of this work as a short version in a conference [35]. This version includes more detailed analysis of the proposed algorithms, and more experiments and comparisons.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Zaher Al Aghbari contributed equally to this work.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Saeed, M.M., Al Aghbari, Z. ARTC: feature selection using association rules for text classification. Neural Comput & Applic 34, 22519–22529 (2022). https://doi.org/10.1007/s00521-022-07669-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07669-5