Skip to main content

Automatic PDF Files Based Information Retrieval System with Section Selection and Key Terms Aggregation Rules

  • Conference paper
  • First Online:
Man–Machine Interactions 4

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 391))

  • 908 Accesses

Abstract

Standard approaches to knowledge extraction from biomedical literature focus on information retrieval from abstracts publicly available in medical databases like PubMed. To limit the number of the results initially, a suitable query against such databases can be constructed. However, for many research topics the pre-selection of small enough set of the documents can be very difficult or even impossible. Another problem stems from large variability of the retrieved lists of publications when changing keywords in search engines. In this paper we address both of these problems by proposing an algorithm and an implementation capable of working on the full text articles. We present an information retrieval system with selection of separate sections of full texts of papers and a rule-based search engine. We demonstrate that in some research our solution can provide much better results than finding documents only by keywords and abstracts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, S., Yu, H.: Figure summarizer browser extensions for pubmed central. Bioinformatics 27(12), 1723–1724 (2011)

    Article  Google Scholar 

  2. Bhattacharya, S., Ha-Thuc, V., Srinivasan, P.: MeSH: a window into full text for document summarization. Bioinformatics 27(13), 120–128 (2011)

    Article  Google Scholar 

  3. Chiang, J.H., Shin, J.W., Liu, H.H., Chin, C.L.: Genelibrarian: an effective gene-information summarization and visualization system. BMC Bioinform. 7(1), 392–401 (2006)

    Article  Google Scholar 

  4. Cohen, K.B., Johnson, H.L., Verspoor, K., Roeder, C., Hunter, L.E.: The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinform. 11(1), 492–501 (2010)

    Article  Google Scholar 

  5. Divoli, A., Attwood, T.: Bioie: extracting informative sentences from the biomedical literature. Bioinformatics 21(9), 2138–2139 (2005)

    Article  Google Scholar 

  6. Fundel, K., Küffner, R., Zimmer, R.: RelEx-Relation extraction using dependency parse trees. Bioinformatics 23(3), 365–371 (2007)

    Article  Google Scholar 

  7. Howson, J.: Analysis of 19 genes for association with type i diabetes in the type i diabetes genetics consortium families. Genes Immun. (2009)

    Google Scholar 

  8. Hur, J., Schuyler, A., States, D., Feldman, E.: Sciminer: web-based literature mining tool for target identification and functional enrichment analysis. Bioinformatics 25(6), 838–40 (2009)

    Article  Google Scholar 

  9. Krallinger, M., Valencia, A., Hirschman, L.: Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol. 9(Suppl 2) (2008)

    Google Scholar 

  10. Krallinger, M., Leitner, F., Valencia, A.: Analysis of biological processes and diseases using text mining approaches. In: Matthiesen, R. (ed.) Bioinformatics Methods in Clinical Research, Methods in Molecular Biology, vol. 593, pp. 341–382. Humana Press, New York (2010)

    Google Scholar 

  11. Litchfield, B.: Making PDFs Portable: Integrating PDF and Java Technology (2014).http://java.sys-con.com/node/48543

  12. McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in action. Manning Publications Co., Shelter Island (2010)

    Google Scholar 

  13. Papanikolaou, N., Pavlopoulos, G., Pafilis, E., Theodosiou, T., Schneider, R., Satagopam, V., Ouzounis, C., Eliopoulos, A., Promponas, V., Iliopoulos, I.: Biotextquest(\(+\)): a knowledge integration platform for literature mining and concept discovery. Bioinformatics 30(22), 56–3249 (2014)

    Google Scholar 

  14. Šarić, J.: Extraction of regulatory gene/protein networks from medline. Bioinformatics 22(6), 645–650 (2006)

    Article  Google Scholar 

Download references

Acknowledgments

This paper was partially financially supported by the NCN Opus grant UMO-2011/01/B/ST6/06868 to AP. Computations were performed with the use of the infrastructure provided by the NCBIR POIG.02.03.01-24-099/13 grant: GCONiI - Upper-Silesian Center for Scientific Computations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafal Lancucki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Lancucki, R., Polanski, A. (2016). Automatic PDF Files Based Information Retrieval System with Section Selection and Key Terms Aggregation Rules. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds) Man–Machine Interactions 4. Advances in Intelligent Systems and Computing, vol 391. Springer, Cham. https://doi.org/10.1007/978-3-319-23437-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23437-3_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23436-6

  • Online ISBN: 978-3-319-23437-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics