Topic Identification and Prediction Using Sanskrit Hysynset

Bafna, Prafulla B.; Saini, Jatinderkumar R.

doi:10.1007/978-981-19-2840-6_14

Prafulla B. Bafna¹² &
Jatinderkumar R. Saini¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 475))

453 Accesses

Abstract

The topic model implementation is not a new concept for English corpus due to the availability of plenty of resources, but developing a topic model for Sanskrit is comparatively an untouched area. The proposed approach is a 4 phased. The first phase constructs Hysynset followed by building a topic model that acts as a second phase. In the third phase, clustering is applied and the approach completes with classification and prediction that is the fourth phase. Hypernyms-hyponyms and synonyms are grouped in the first phase to reduce the dimensions and creates semantic space. The topic model is built using Latent Dirichlet Allocation (LDA) which shows very specific and informative topics as it uses Hysynset vector space model for Sanskrit (HSVSMS). The dataset belongs to more than 1100 Sanskrit stories. The documents’ wise topics are presented using dendrogram obtained after applying HAC and then supervised model that is random forest is used to predict the topic of the test/new document and evaluated using classification error and accuracy. In the absence of the availability of standard experiments, current work could not be compared with other existing work in case of prediction of stories. Comparative analysis of topic identification using existing technique Vector Space Model for Sanskrit (VSMS) proves that betterment of the proposed technique that is (HSVSMS) in the form of the accuracy, misclassification error of classification, coherence score, entropy and purity and topic titles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Novel Hysynset-Based Topic Modeling Approach for Marathi Language

MaTop: An Evaluative Topic Model for Marathi

When are Latent Topics Useful for Text Mining?

References

Al-Sultany, G. A., &Aleqabie, H. J. (2019).Enriching Tweets for Topic Modeling via Linking to the Wikipedia. International Journal of Engineering & Technology, 8(1.5), 144–150.
Google Scholar
Syed, S., & Spruit, M. (2017, October). Full-text or abstract? Examining topic coherence scores using latent dirichlet allocation. In 2017 IEEE International conference on data science and advanced analytics (DSAA) (pp. 165–174). IEEE.
Google Scholar
Raulji, J. K., & Saini, J. R. (2016). Stop-word removal algorithm and its implementation for Sanskrit language. International Journal of Computer Applications, 150(2), 15-17.
Google Scholar
Bafna P.B., Saini J.R., 2020,” Marathi Text Analysis using Unsupervised Learning and Word Cloud”, International Journal of Engineering and Advanced Technology, 9(3)
Google Scholar
Xu, Songhua. “Recommending personally interested contents by text mining, filtering, and interfaces.“ U.S. Patent 9,171,068, issued October 27, 2015.
Google Scholar
Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17-35.
Google Scholar
Bhardwaj, A., Setlur, S., &Govindaraju, V. (2007). Keyword spotting techniques for sanskrit documents. In Sanskrit Computational Linguistics (pp. 403–416). Springer, Berlin, Heidelberg.
Google Scholar
Koltcov, S., Ignatenko, V., & Koltsova, O. (2019). Estimating Topic Modeling Performance with Sharma–Mittal Entropy. Entropy, 21(7), 660.
Google Scholar
Neill, T. (2019). LDA Topic Modeling for pramāṇa Texts: A Case Study in Sanskrit NLP Corpus Building. In Proceedings of the 6th International Sanskrit Computational Linguistics Symposium (pp. 52–67).
Google Scholar
Saini, Jatinderkumar & B., Prafulla. (2020). Measuring the Similarity between the Sanskrit Documents using the Context of the Corpus. International Journal of Advanced Computer Science and Applications. 11. https://doi.org/10.14569/IJACSA.2020.0110521.
Sharef, N. M., Zin, H. M., &Nadali, S. (2016). Overview and Future Opportunities of Sentiment Analysis Overview and Future Opportunities of Sentiment Analysis Approaches for Big Data. JCS, 12(3), 153-168.
Google Scholar
Wang, P., Zhang, H., Xu, B., Liu, C., &Hao, H. (2014).Short text feature enrichment using link analysis on topic-keyword graph.In Natural Language Processing and Chinese Computing (pp. 79–90).Springer, Berlin, Heidelberg.
Google Scholar
Petersen, C., Lioma, C., Simonsen, J. G., & Larsen, B. (2015, September). Entropy and graph based modelling of document coherence using discourse entities: An application to IR. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval (pp. 191–200).
Google Scholar
Chandra, R., & Kulkarni, V. (2022). Semantic and sentiment analysis of selected Bhagavad Gita translations using BERT-based language framework. arXiv preprint arXiv:2201.03115.
Egger, R., & Gokce, E. (2022). Natural language processing (NLP): An introduction. In Applied Data Science in Tourism (pp. 307–334). Springer, Cham.
Google Scholar
Wu, C., Li, X., Guo, Y., Wang, J., Ren, Z., Wang, M., & Yang, Z. (2022). Natural language processing for smart construction: Current status and future directions. Automation in Construction, 134, 104059.
Google Scholar
Kaddoura, S., & D. Ahmed, R. (2022). A comprehensive review on Arabic word sense disambiguation for natural language processing applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1447.
Google Scholar
Lauriola, I., Lavelli, A., & Aiolli, F. (2022). An introduction to deep learning in natural language processing: models, techniques, and tools. Neurocomputing, 470, 443-456.
Google Scholar
Punitha, S. C., P. Ranjith Jeba Thangaiah, and M. Punithavalli. “Performance analysis of clustering using partitioning and hierarchical clustering techniques.“ International Journal of Database Theory and Application 7, no. 6 (2014): 233–240.
Google Scholar
Vispute, S. R., & Potey, M. A. (2013, September). Automatic text categorization of Marathi documents using clustering technique. In 2013 15th International Conference on Advanced Computing Technologies (ICACT) (pp. 1–5). IEEE.
Google Scholar
Xu, B., Guo, X., Ye, Y., & Cheng, J. (2012). An Improved Random Forest Classifier for Text Categorization. JCP, 7(12), 2913-2920.
Google Scholar
Moon, Seonghyeon, Gitaek Lee, and Seokho Chi. “Automated system for construction specification review using natural language processing.“ Advanced Engineering Informatics 51 (2022): 101495.
Google Scholar
Nair, J., Nair, S. S., & Abhishek, U. (2022). Sanskrit Stemmer Design: A Literature Perspective. In International Conference on Innovative Computing and Communications (pp. 117–128). Springer, Singapore.
Google Scholar
Pontillo, T., & Candotti, M. P. (2022). Dispensing with ellipsis devices in the analysis of Sanskrit ba-huvrīhi. Journal of South Asian Linguistics, 12(1), 1-22.
Google Scholar
Yang, N. Y., Kim, S. G., & Kang, J. Y. (2018). Researcher and research area recommendation system for promoting convergence research using text mining and messenger UI. The Journal of Information Systems, 27(4), 71-96.
Google Scholar

Download references

Author information

Authors and Affiliations

Symbiosis Institute of Computer Studies and Research, Symbiosis International Deemed University, Pune, India
Prafulla B. Bafna & Jatinderkumar R. Saini

Authors

Prafulla B. Bafna
View author publications
You can also search for this author in PubMed Google Scholar
Jatinderkumar R. Saini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prafulla B. Bafna .

Editor information

Editors and Affiliations

Electronics and Communication Engineering, Gnanamani College of Technology, Namakkal, India
G. Ranganathan
Czech Technical University in Prague, Prague, Czech Republic
Robert Bestak
Ryerson Communications Lab, Toronto, ON, Canada
Xavier Fernando

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bafna, P.B., Saini, J.R. (2023). Topic Identification and Prediction Using Sanskrit Hysynset. In: Ranganathan, G., Bestak, R., Fernando, X. (eds) Pervasive Computing and Social Networking. Lecture Notes in Networks and Systems, vol 475. Springer, Singapore. https://doi.org/10.1007/978-981-19-2840-6_14

Download citation

DOI: https://doi.org/10.1007/978-981-19-2840-6_14
Published: 02 September 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2839-0
Online ISBN: 978-981-19-2840-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Topic Identification and Prediction Using Sanskrit Hysynset

Abstract

Access this chapter

Similar content being viewed by others

A Novel Hysynset-Based Topic Modeling Approach for Marathi Language

MaTop: An Evaluative Topic Model for Marathi

When are Latent Topics Useful for Text Mining?

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Topic Identification and Prediction Using Sanskrit Hysynset

Abstract

Access this chapter

Similar content being viewed by others

A Novel Hysynset-Based Topic Modeling Approach for Marathi Language

MaTop: An Evaluative Topic Model for Marathi

When are Latent Topics Useful for Text Mining?

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation