A Survey of Numerous Text Similarity Approach

Authors

  • Joyinee Dasgupta  Advanced Technology Centers, India
  • Priyanka Kumari Mishra  Advanced Technology Centers, India
  • Selvakuberan Karuppasamy   Advanced Technology Centers, India
  • Arpana Dipak Mahajan  Advanced Technology Centers, India

DOI:

https://doi.org//10.32628/CSEIT2390133

Keywords:

Natural Language Processing; Euclidian distance, Cosine similarity, Jaccard Distance, word embeddings, Language Models ,Universal Sentence Encoders

Abstract

One of the most common NLP use cases is text similarity. Every domain comes with a variety of use cases. The most common uses of text similarity include finding related articles/news/genres, efficient use of search engines, classification of related issues on any topic, etc. It serves as a framework for many text analytics use cases. Methods to solve text similarity use cases have been around for a while, but the main drawbacks of the old methods are loss of dependency information, difficulty remembering long conversations, exploding gradient problems, etc. Recent advanced deep learning-based models pay attention to both contiguous and distant words, making their learning ability more rigorous. This white paper focuses on various text similarity techniques that can be used in everyday life to solve these use cases.

References

  1. P. Bambroo and A. Awasthi, “LegalDB : Long DistilBERT for Legal Document Classification”.
  2. D. Chandrasekaran and V. Mago, “Evolution of Semantic Similarity — A Survey,” vol. 54, no. 2, 2021.
  3. X. Deng, Y. Li, J. Weng, and J. Zhang, “Feature selection for text classification: A review,” Multimed. Tools Appl., vol. 78, no. 3, pp. 3797–3816, 2019, doi: 10.1007/s11042-018-6083-5.
  4. Z. Huang et al., Context-aware legal citation recommendation using deep learning, vol. 1, no. 1. Association for Computing Machinery, 2021. doi: 10.1145/3462757.3466066.
  5. S. Yang, G. Huang, B. Ofoghi, and J. Yearwood, “Short text similarity measurement using context-aware weighted biterms,” Concurr. Comput. Pract. Exp., vol. 34, no. 8, pp. 1–11, 2022, doi: 10.1002/cpe.5765.
  6. D. W. Prakoso, A. Abdi, and C. Amrit, “Short text similarity measurement methods: a review,” Soft Comput., vol. 25, no. 6, pp. 4699–4723, 2021, doi: 10.1007/s00500-020-05479-2.
  7. A. Kaundal, “A Review on WordNet and Vector Space Analysis for Short-text Semantic Similarity,” Int. J. Innov. Eng. Technol., vol. 8, no. 1, pp. 135–142, 2017, doi: 10.21172/ijiet.81.018.
  8. A. W. Qurashi, V. Holmes, and A. P. Johnson, “Document Processing: Methods for Semantic Text Similarity Analysis,” INISTA 2020 - 2020 Int. Conf. Innov. Intell. Syst. Appl. Proc., pp. 0–5, 2020, doi: 10.1109/INISTA49547.2020.9194665.
  9. T. Nora Raju, P. A. Rahana, R. Moncy, S. Ajay, and S. K. Nambiar, “Sentence Similarity - A State of Art Approaches,” Proc. Int. Conf. Comput. Commun. Secur. Intell. Syst. IC3SIS 2022, pp. 0–5, 2022, doi: 10.1109/IC3SIS54991.2022.9885721.
  10. R. Singh and S. Singh, “Text Similarity Measures in News Articles by Vector Space Model Using NLP,” J. Inst. Eng. Ser. B, vol. 102, no. 2, pp. 329–338, 2021, doi: 10.1007/s40031-020-00501-5.

Downloads

Published

2023-02-28

Issue

Section

Research Articles

How to Cite

[1]
Joyinee Dasgupta, Priyanka Kumari Mishra, Selvakuberan Karuppasamy , Arpana Dipak Mahajan, " A Survey of Numerous Text Similarity Approach, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 9, Issue 1, pp.184-194, January-February-2023. Available at doi : https://doi.org/10.32628/CSEIT2390133