Abstract
In today’s world, information is needed to be retrieved from plethora of publications and papers for scientific paper writing, paper reviewing. Retrieval of relevant data has become a very tedious job. The novelty of this paper is that we are providing a holistic approach to retrieve information or abstracts from a corpus of published papers and artifacts, by improving the conventional methods of searching abstracts. To get a better search result, reviewer can provide phrases, sentences and even a complete abstract. This brings to the reviewer the most relevant abstracts in any huge database of publications. To accomplish this, we have implemented topic modelling based semantic search. Various AI algorithms are used to find the closest and nearest abstracts to any given search abstract. For this paper we have extensively used the PUBMED data. Along with search abstracts we are also providing other inherent features which may get unnoticed in huge corpus, like similar words, relation between words and phrases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Semantic Scholar AI-Powered Research Tool. Available: https://www.semanticscholar.org/
Microsoft Research. Available:https://academic.microsoft.com/home
C. Krstev, et al., The usage of various lexical resources and tools to improve the performance of web search engines, in Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08) (2008)
H. Bast, B. Buchhold, E. Haussmann, Semantic search on text and knowledge bases. Found. Trends Inf. Retrieval 10(2–3), 119–271 (2016)
D.H. Widyantoro, J. Yen, A fuzzy ontology-based abstract search engine and its user studies, in 10th IEEE International Conference on Fuzzy Systems, vol. 2 (Cat. No. 01CH37297) (2001), pp. 1291–1294. http://doi.org/10.1109/FUZZ.2001.1008895
A. Pretschner, S. Gauch, Ontology based personalized search, in Proceedings 11th International Conference on Tools with Artificial Intelligence (1999), pp. 391–398. http://doi.org/10.1109/TAI.1999.809829
A. Malve, P. Chawan, A comparative study of keyword and semantic based search engine. 4 (2015). http://doi.org/10.15680/IJIRSET.2015.0411039
Download MEDLINE/PubMed Data, National Library of Medicine. Available: https://www.nlm.nih.gov/databases/download/pubmed_medline.html
T. Achakulvisut, D. Acuna, K. Kording, Pubmed parser: a python parser for pubmed open-access XML subset and MEDLINE XML dataset XML dataset. J. Open Source Softw. 5 (1979). http://doi.org/10.21105/joss.01979
R. Řehůřek, models.phrases—phrase (collocation) detection. Available: https://radimrehurek.com/gensim/models/phrases.html
S. Prabhakaran, What does LDA do?, March 2018. Available: https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/#4whatdoesldado
R. Řehůřek, models.ldamodel—latent Dirichlet allocation. Available: https://radimrehurek.com/gensim/models/ldamodel.html
R. Řehůřek, models.ldamulticore—parallelized latent Dirichlet allocation. Available: https://radimrehurek.com/gensim/models/ldamulticore.html
Z. Jaadi, A step-by-step explanation of principal component analysis (PCA). Available: https://builtin.com/data-science/step-step-explanation-principal-component-analysis
Nearest Neighbor. Available: https://scikit-learn.org/stable/modules/neighbors.html
sklearn.neighbors.KDTree. Available: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html
Topic models (e.g. LDA) visualization using D3. Available: https://pyldavis.readthedocs.io/en/latest/modules/API.html
D. Friedman, J. Claassen, L.J. Hirsch, Continuous electroencephalogram monitoring in the intensive care unit. Anesth. Analg. 109(2), 506–523 (2009). https://doi.org/10.1213/ane.0b013e3181a9d8b5 (PMID: 19608827)
L. Van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
V. Tshitoyan, J. Dagdelen, L. Weston et al., Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019)
N. Baumann, How to use the medical subject headings (MeSH). Int. J. Clin. Pract. 70, 171–174 (2016). https://doi.org/10.1111/ijcp.12767
Medical Subject Headings (MeSH®) in MEDLINE®/PubMed®: a tutorial. Available: https://www.nlm.nih.gov/bsd/disted/meshtutorial/introduction/index.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Panday, M., Sahu, S. (2023). Topic Modelling Based Semantic Search. In: Goswami, S., Barara, I.S., Goje, A., Mohan, C., Bruckstein, A.M. (eds) Data Management, Analytics and Innovation. ICDMAI 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 137. Springer, Singapore. https://doi.org/10.1007/978-981-19-2600-6_20
Download citation
DOI: https://doi.org/10.1007/978-981-19-2600-6_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2599-3
Online ISBN: 978-981-19-2600-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)