Abstract
Standard approaches to knowledge extraction from biomedical literature focus on information retrieval from abstracts publicly available in medical databases like PubMed. To limit the number of the results initially, a suitable query against such databases can be constructed. However, for many research topics the pre-selection of small enough set of the documents can be very difficult or even impossible. Another problem stems from large variability of the retrieved lists of publications when changing keywords in search engines. In this paper we address both of these problems by proposing an algorithm and an implementation capable of working on the full text articles. We present an information retrieval system with selection of separate sections of full texts of papers and a rule-based search engine. We demonstrate that in some research our solution can provide much better results than finding documents only by keywords and abstracts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, S., Yu, H.: Figure summarizer browser extensions for pubmed central. Bioinformatics 27(12), 1723–1724 (2011)
Bhattacharya, S., Ha-Thuc, V., Srinivasan, P.: MeSH: a window into full text for document summarization. Bioinformatics 27(13), 120–128 (2011)
Chiang, J.H., Shin, J.W., Liu, H.H., Chin, C.L.: Genelibrarian: an effective gene-information summarization and visualization system. BMC Bioinform. 7(1), 392–401 (2006)
Cohen, K.B., Johnson, H.L., Verspoor, K., Roeder, C., Hunter, L.E.: The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinform. 11(1), 492–501 (2010)
Divoli, A., Attwood, T.: Bioie: extracting informative sentences from the biomedical literature. Bioinformatics 21(9), 2138–2139 (2005)
Fundel, K., Küffner, R., Zimmer, R.: RelEx-Relation extraction using dependency parse trees. Bioinformatics 23(3), 365–371 (2007)
Howson, J.: Analysis of 19 genes for association with type i diabetes in the type i diabetes genetics consortium families. Genes Immun. (2009)
Hur, J., Schuyler, A., States, D., Feldman, E.: Sciminer: web-based literature mining tool for target identification and functional enrichment analysis. Bioinformatics 25(6), 838–40 (2009)
Krallinger, M., Valencia, A., Hirschman, L.: Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol. 9(Suppl 2) (2008)
Krallinger, M., Leitner, F., Valencia, A.: Analysis of biological processes and diseases using text mining approaches. In: Matthiesen, R. (ed.) Bioinformatics Methods in Clinical Research, Methods in Molecular Biology, vol. 593, pp. 341–382. Humana Press, New York (2010)
Litchfield, B.: Making PDFs Portable: Integrating PDF and Java Technology (2014).http://java.sys-con.com/node/48543
McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in action. Manning Publications Co., Shelter Island (2010)
Papanikolaou, N., Pavlopoulos, G., Pafilis, E., Theodosiou, T., Schneider, R., Satagopam, V., Ouzounis, C., Eliopoulos, A., Promponas, V., Iliopoulos, I.: Biotextquest(\(+\)): a knowledge integration platform for literature mining and concept discovery. Bioinformatics 30(22), 56–3249 (2014)
Šarić, J.: Extraction of regulatory gene/protein networks from medline. Bioinformatics 22(6), 645–650 (2006)
Acknowledgments
This paper was partially financially supported by the NCN Opus grant UMO-2011/01/B/ST6/06868 to AP. Computations were performed with the use of the infrastructure provided by the NCBIR POIG.02.03.01-24-099/13 grant: GCONiI - Upper-Silesian Center for Scientific Computations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Lancucki, R., Polanski, A. (2016). Automatic PDF Files Based Information Retrieval System with Section Selection and Key Terms Aggregation Rules. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds) Man–Machine Interactions 4. Advances in Intelligent Systems and Computing, vol 391. Springer, Cham. https://doi.org/10.1007/978-3-319-23437-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-23437-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23436-6
Online ISBN: 978-3-319-23437-3
eBook Packages: EngineeringEngineering (R0)