Skip to main content

An Automatic Identification and Resolution System for Protein-Related Abbreviations in Scientific Papers

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2011)

Abstract

We propose a methodology to identify and resolve protein-related abbreviations found in the full texts of scientific papers, as part of a semi-automatic process implemented in our PRAISED framework. The identification of biological acronyms is carried out via an effective syntactical approach, by taking advantage of lexical clues and using mostly domain-independent metrics, resulting in considerably high levels of recall as well as extremely low execution time. The subsequent abbreviation resolution uses both syntactical and semantic criteria in order to match an abbreviation with its potential explanation, as discovered among a number of contiguous words proportional to the abbreviation’s length. We have tested our system against the Medstract Gold Standard corpus and a relevant set of manually annotated PubMed papers, obtaining significant results and high performance levels, while at the same time allowing for great customization, lightness and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Taghva, K., Gilbreth, J.: Recognizing acronyms and their definitions. International Journal on Document Analysis and Recognition, 191–198 (1999)

    Google Scholar 

  2. Yeates, S.: Automatic extraction of acronyms from text. In: Third New Zealand Computer Science Research Students’ Conference, pp. 117–124 (1999)

    Google Scholar 

  3. Pustejovsky, J., Castao, J., Cochran, B., Kotecki, M., Morrell, M., Rumshisky, A.: Automatic Extraction of Acronym-meaning Pairs from MEDLINE Databases. In: MEDINFO (2001)

    Google Scholar 

  4. Park, Y., Byrd, R.J.: Hybrid Text Mining for Finding Abbreviations and Their Definitions. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (2001)

    Google Scholar 

  5. Chang, J.T., Schtze, H., Altman, R.B.: Creating an Online Dictionary of Abbreviations from MEDLINE. Journal of American Medical Informatics Association (JAMIA) 9(6), 612–620 (2002)

    Article  Google Scholar 

  6. Schwartz, A., Hearst, M.: A simple algorithm for identifying abbreviation definitions in biomedical texts. In: Proceedings of the Pacific Symposium on Biocomputing, PSB (2003)

    Google Scholar 

  7. Nadeau, D., Turney, P.D.: A Supervised Learning Approach to Acronym Identification. In: 18th Conference of the Canadian Society for Computational Studies of Intelligence, Canadian AI (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Atzeni, P., Polticelli, F., Toti, D. (2011). An Automatic Identification and Resolution System for Protein-Related Abbreviations in Scientific Papers. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2011. Lecture Notes in Computer Science, vol 6623. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20389-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20389-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20388-6

  • Online ISBN: 978-3-642-20389-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics