Skip to main content

Topic Modelling Based Semantic Search

  • Conference paper
  • First Online:
Data Management, Analytics and Innovation (ICDMAI 2022)

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 137))

Included in the following conference series:

Abstract

In today’s world, information is needed to be retrieved from plethora of publications and papers for scientific paper writing, paper reviewing. Retrieval of relevant data has become a very tedious job. The novelty of this paper is that we are providing a holistic approach to retrieve information or abstracts from a corpus of published papers and artifacts, by improving the conventional methods of searching abstracts. To get a better search result, reviewer can provide phrases, sentences and even a complete abstract. This brings to the reviewer the most relevant abstracts in any huge database of publications. To accomplish this, we have implemented topic modelling based semantic search. Various AI algorithms are used to find the closest and nearest abstracts to any given search abstract. For this paper we have extensively used the PUBMED data. Along with search abstracts we are also providing other inherent features which may get unnoticed in huge corpus, like similar words, relation between words and phrases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Semantic Scholar AI-Powered Research Tool. Available: https://www.semanticscholar.org/

  2. Microsoft Research. Available:https://academic.microsoft.com/home

  3. C. Krstev, et al., The usage of various lexical resources and tools to improve the performance of web search engines, in Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08) (2008)

    Google Scholar 

  4. H. Bast, B. Buchhold, E. Haussmann, Semantic search on text and knowledge bases. Found. Trends Inf. Retrieval 10(2–3), 119–271 (2016)

    Google Scholar 

  5. D.H. Widyantoro, J. Yen, A fuzzy ontology-based abstract search engine and its user studies, in 10th IEEE International Conference on Fuzzy Systems, vol. 2 (Cat. No. 01CH37297) (2001), pp. 1291–1294. http://doi.org/10.1109/FUZZ.2001.1008895

  6. A. Pretschner, S. Gauch, Ontology based personalized search, in Proceedings 11th International Conference on Tools with Artificial Intelligence (1999), pp. 391–398. http://doi.org/10.1109/TAI.1999.809829

  7. A. Malve, P. Chawan, A comparative study of keyword and semantic based search engine. 4 (2015). http://doi.org/10.15680/IJIRSET.2015.0411039

  8. Download MEDLINE/PubMed Data, National Library of Medicine. Available: https://www.nlm.nih.gov/databases/download/pubmed_medline.html

  9. T. Achakulvisut, D. Acuna, K. Kording, Pubmed parser: a python parser for pubmed open-access XML subset and MEDLINE XML dataset XML dataset. J. Open Source Softw. 5 (1979). http://doi.org/10.21105/joss.01979

  10. R. Řehůřek, models.phrases—phrase (collocation) detection. Available: https://radimrehurek.com/gensim/models/phrases.html

  11. S. Prabhakaran, What does LDA do?, March 2018. Available: https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/#4whatdoesldado

  12. R. Řehůřek, models.ldamodel—latent Dirichlet allocation. Available: https://radimrehurek.com/gensim/models/ldamodel.html

  13. R. Řehůřek, models.ldamulticore—parallelized latent Dirichlet allocation. Available: https://radimrehurek.com/gensim/models/ldamulticore.html

  14. Z. Jaadi, A step-by-step explanation of principal component analysis (PCA). Available: https://builtin.com/data-science/step-step-explanation-principal-component-analysis

  15. Nearest Neighbor. Available: https://scikit-learn.org/stable/modules/neighbors.html

  16. sklearn.neighbors.KDTree. Available: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html

  17. Topic models (e.g. LDA) visualization using D3. Available: https://pyldavis.readthedocs.io/en/latest/modules/API.html

  18. D. Friedman, J. Claassen, L.J. Hirsch, Continuous electroencephalogram monitoring in the intensive care unit. Anesth. Analg. 109(2), 506–523 (2009). https://doi.org/10.1213/ane.0b013e3181a9d8b5 (PMID: 19608827)

    Article  Google Scholar 

  19. L. Van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)

    Google Scholar 

  20. V. Tshitoyan, J. Dagdelen, L. Weston et al., Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019)

    Article  Google Scholar 

  21. N. Baumann, How to use the medical subject headings (MeSH). Int. J. Clin. Pract. 70, 171–174 (2016). https://doi.org/10.1111/ijcp.12767

    Article  Google Scholar 

  22. Medical Subject Headings (MeSH®) in MEDLINE®/PubMed®: a tutorial. Available: https://www.nlm.nih.gov/bsd/disted/meshtutorial/introduction/index.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mrityunjoy Panday .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Panday, M., Sahu, S. (2023). Topic Modelling Based Semantic Search. In: Goswami, S., Barara, I.S., Goje, A., Mohan, C., Bruckstein, A.M. (eds) Data Management, Analytics and Innovation. ICDMAI 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 137. Springer, Singapore. https://doi.org/10.1007/978-981-19-2600-6_20

Download citation

Publish with us

Policies and ethics