Skip to main content
Log in

Design and analysis of text document clustering using salp swarm algorithm

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In the technological era, exponential increase of unorganized text documents offers increased difficulties retrieving the most relevant data. The document clustering is a most prominent technique that transforms unorganized contents into organized contents in the form of clusters. The recognition technique always undergoes clustering of text documents with misleading or redundant information that degrades document clustering quality. In this study, a salp swarm algorithm (SSA) is used for clustering the text documents. The study is improved with a similarity and a distance-based measurements as an objective function in the clustering domain. The experimental validation is conducted to show the efficacy of SSA-based similarity distance measurement that prominently improves the quality of clustering the text documents. The comparison with existing methods shows that the proposed SSA offers better clustering of text documents in accuracy, sensitivity, specificity, and f-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73:1–23

    Article  Google Scholar 

  2. Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19

    Google Scholar 

  3. Singh VK, Tiwari N, Garg S (2011, October) Document clustering using k-means, heuristic k-means and fuzzy c-means. In: 2011 International Conference on Computational Intelligence and Communication Networks. IEEE, pp 297–301

  4. Aggarwal CC, Zhai C (2012) A survey of text clustering algorithms. In: Aggarwal CC, Zhai C (eds) Mining text data. Springer, Boston, pp 77–128

    Chapter  Google Scholar 

  5. Zaw MM, Mon EE (2015) Web document clustering by using PSO-based cuckoo search clustering algorithm. In: Yang X-S (ed) Recent advances in swarm intelligence and evolutionary computation. Springer International Publishing, Cham, pp 263–281

    Google Scholar 

  6. Premalatha K, Natarajan AM (2010) Hybrid PSO and GA models for document clustering. Int J Adv Soft Comput Appl 2(3):302–320

    Google Scholar 

  7. Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA (2016, May) A krill herd algorithm for efficient text documents clustering. In: 2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE). IEEE, pp 67–72

  8. Solihin MI, Chuan CY, Astuti W (2020) Optimization of fuzzy logic controller parameters using modern meta-heuristic algorithm for gantry crane system (GCS). Mater Today Proc 29:168–172

    Article  Google Scholar 

  9. Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evolut Comput 16:1–18

    Article  Google Scholar 

  10. Abualigah LM, Khader AT, Hanandeh ES (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466

    Article  Google Scholar 

  11. Shehab M, Khader AT, Al-Betar MA, Abualigah LM (2017, May) Hybridizing cuckoo search algorithm with hill climbing for numerical optimization problems. In: 2017 8th International Conference on Information Technology, ICIT. IEEE, pp 36–43

  12. Alomari OA, Khader AT, Al-Betar MA, Abualigah LM (2017) Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. Int J Data Min Bioinform 19(1):32–51

    Article  Google Scholar 

  13. Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34

    Article  Google Scholar 

  14. Alyasseri ZAA, Khader AT, Al-Betar MA, Abualigah LM (2017, May) ECG signal denoising using β-hill climbing algorithm and wavelet transform. In: 2017 8th International Conference on Information Technology (ICIT). IEEE, pp 96–101

  15. Alomari OA, Khader AT, Mohammed AAB, Abualigah LM, Nugroho H, Chandra GR et al (2017) MRMR BA: a hybrid gene selection algorithm for cancer classification. J Theor Appl Inf Technol 95(12):2610–2618

    Google Scholar 

  16. Jaganathan P, Jaiganesh S (2013, December) An improved k-means algorithm combined with particle swarm optimization approach for efficient web document clustering. In: 2013 International Conference on Green Computing, Communication and Conservation of Energy, CGCE. IEEE, pp 772–776

  17. Adeyanju OM, Canha LN (2021) Decentralized multi-area multi-agent economic dispatch model using select meta-heuristic optimization algorithms. Electric Power Syst Res 195:107128

    Article  Google Scholar 

  18. Dhiman G (2021) SSC: a hybrid nature-inspired meta-heuristic optimization algorithm for engineering applications. Knowl Based Syst 222:106926

    Article  Google Scholar 

  19. Moayedikia A, Jensen R, Wiil UK, Forsati R (2015) Weighted bee colony algorithm for discrete optimization problems with application to feature selection. Eng Appl Artif Intell 44:153–167

    Article  Google Scholar 

  20. Song W, Qiao Y, Park SC, Qian X (2015) A hybrid evolutionary computation approach with its application for optimizing text document clustering. Expert Syst Appl 42(5):2517–2524

    Article  Google Scholar 

  21. Wang GG, Gandomi AH, Alavi AH, Deb S (2016) A hybrid method based on krill herd and quantum-behaved particle swarm optimization. Neural Comput Appl 27(4):989–1006

    Article  Google Scholar 

  22. Wang GG, Gandomi AH, Alavi AH, Hao GS (2014) Hybrid krill herd algorithm with differential evolution for global numerical optimization. Neural Comput Appl 25(2):297–308

    Article  Google Scholar 

  23. Wang G, Guo L, Wang H, Duan H, Liu L, Li J (2014) Incorporating mutation scheme into krill herd algorithm for global numerical optimization. Neural Comput Appl 24(3–4):853–871

    Article  Google Scholar 

  24. Wang J, Yuan W, Cheng D (2015) Hybrid genetic–particle swarm algorithm: an efficient method for fast optimization of atomic clusters. Comput Theor Chem 1059:12–17

    Article  Google Scholar 

  25. Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191

    Article  Google Scholar 

  26. Bolaji ALA, Al-Betar MA, Awadallah MA, Khader AT, Abualigah LM (2016) A comprehensive review: Krill Herd algorithm (KH) and its applications. Appl Soft Comput 49:437–446

    Article  Google Scholar 

  27. Abualigah LM, Khader AT, Al-Betar MA, Hanandeh ES (2017) A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. Management 9:11

    Google Scholar 

  28. Bharti KK, Singh PK (2016) Chaotic gradient artificial bee colony for text clustering. Soft Comput 20(3):1113–1126

    Article  Google Scholar 

  29. Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inform Sci 220:269–291

    Article  MathSciNet  Google Scholar 

  30. Forsati R, Keikha A, Shamsfard M (2015) An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing 159:9–26

    Article  Google Scholar 

  31. Swathine K, Sumathi N (2021) An adaptive optimization based meta-heuristic approach for tracing software requirements. Mater Today Proc

Download references

Funding

No funding is involved in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muruganantham Ponnusamy.

Ethics declarations

Conflict of interest

The authors declare there is no conflict of interest.

Ethics approval and consent to participate

No participation of humans takes place in this implementation process.

Human and animal rights

No violation of human and animal rights is involved.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ponnusamy, M., Bedi, P., Suresh, T. et al. Design and analysis of text document clustering using salp swarm algorithm. J Supercomput 78, 16197–16213 (2022). https://doi.org/10.1007/s11227-022-04525-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04525-0

Keywords

Navigation