Automatic PDF Files Based Information Retrieval System with Section Selection and Key Terms Aggregation Rules

Lancucki, Rafal; Polanski, Andrzej

doi:10.1007/978-3-319-23437-3_21

Rafal Lancucki⁶ &
Andrzej Polanski⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 391))

908 Accesses

Abstract

Standard approaches to knowledge extraction from biomedical literature focus on information retrieval from abstracts publicly available in medical databases like PubMed. To limit the number of the results initially, a suitable query against such databases can be constructed. However, for many research topics the pre-selection of small enough set of the documents can be very difficult or even impossible. Another problem stems from large variability of the retrieved lists of publications when changing keywords in search engines. In this paper we address both of these problems by proposing an algorithm and an implementation capable of working on the full text articles. We present an information retrieval system with selection of separate sections of full texts of papers and a rule-based search engine. We demonstrate that in some research our solution can provide much better results than finding documents only by keywords and abstracts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, S., Yu, H.: Figure summarizer browser extensions for pubmed central. Bioinformatics 27(12), 1723–1724 (2011)
Article Google Scholar
Bhattacharya, S., Ha-Thuc, V., Srinivasan, P.: MeSH: a window into full text for document summarization. Bioinformatics 27(13), 120–128 (2011)
Article Google Scholar
Chiang, J.H., Shin, J.W., Liu, H.H., Chin, C.L.: Genelibrarian: an effective gene-information summarization and visualization system. BMC Bioinform. 7(1), 392–401 (2006)
Article Google Scholar
Cohen, K.B., Johnson, H.L., Verspoor, K., Roeder, C., Hunter, L.E.: The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinform. 11(1), 492–501 (2010)
Article Google Scholar
Divoli, A., Attwood, T.: Bioie: extracting informative sentences from the biomedical literature. Bioinformatics 21(9), 2138–2139 (2005)
Article Google Scholar
Fundel, K., Küffner, R., Zimmer, R.: RelEx-Relation extraction using dependency parse trees. Bioinformatics 23(3), 365–371 (2007)
Article Google Scholar
Howson, J.: Analysis of 19 genes for association with type i diabetes in the type i diabetes genetics consortium families. Genes Immun. (2009)
Google Scholar
Hur, J., Schuyler, A., States, D., Feldman, E.: Sciminer: web-based literature mining tool for target identification and functional enrichment analysis. Bioinformatics 25(6), 838–40 (2009)
Article Google Scholar
Krallinger, M., Valencia, A., Hirschman, L.: Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol. 9(Suppl 2) (2008)
Google Scholar
Krallinger, M., Leitner, F., Valencia, A.: Analysis of biological processes and diseases using text mining approaches. In: Matthiesen, R. (ed.) Bioinformatics Methods in Clinical Research, Methods in Molecular Biology, vol. 593, pp. 341–382. Humana Press, New York (2010)
Google Scholar
Litchfield, B.: Making PDFs Portable: Integrating PDF and Java Technology (2014).http://java.sys-con.com/node/48543
McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in action. Manning Publications Co., Shelter Island (2010)
Google Scholar
Papanikolaou, N., Pavlopoulos, G., Pafilis, E., Theodosiou, T., Schneider, R., Satagopam, V., Ouzounis, C., Eliopoulos, A., Promponas, V., Iliopoulos, I.: Biotextquest(\(+\)): a knowledge integration platform for literature mining and concept discovery. Bioinformatics 30(22), 56–3249 (2014)
Google Scholar
Šarić, J.: Extraction of regulatory gene/protein networks from medline. Bioinformatics 22(6), 645–650 (2006)
Article Google Scholar

Download references

Acknowledgments

This paper was partially financially supported by the NCN Opus grant UMO-2011/01/B/ST6/06868 to AP. Computations were performed with the use of the infrastructure provided by the NCBIR POIG.02.03.01-24-099/13 grant: GCONiI - Upper-Silesian Center for Scientific Computations.

Author information

Authors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Rafal Lancucki & Andrzej Polanski

Authors

Rafal Lancucki
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Polanski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafal Lancucki .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Aleksandra Gruca
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Agnieszka Brachman
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Tadeusz Czachórski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lancucki, R., Polanski, A. (2016). Automatic PDF Files Based Information Retrieval System with Section Selection and Key Terms Aggregation Rules. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds) Man–Machine Interactions 4. Advances in Intelligent Systems and Computing, vol 391. Springer, Cham. https://doi.org/10.1007/978-3-319-23437-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-23437-3_21
Published: 09 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23436-6
Online ISBN: 978-3-319-23437-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics