Skip to main content
Log in

Keyword extraction and structuralization of medical reports

  • Research
  • Published:
Health Information Science and Systems Aims and scope Submit manuscript

Abstract

Purpose

In recent years, patients usually accept more accurate and detailed examinations because of the rapid advances in medical technology. Many of the examination reports are not represented in numerical data, but text documents written by the medical examiners based on the observations from the instruments and biochemical tests. If the above-mentioned unstructured data can be organized as a report in a structured form, it will help doctors to understand a patient's status of the various examinations more efficiently. Besides, further association analysis on the structuralized data can be performed to identify potential factors that affect a disease.

Methods

In this paper, from the pathology examination reports of renal diseases, we applied the POS tagging results of natural language analysis to automatically extract the keyword phrases. Then a medical dictionary for various examination items in an examination report is established, which is used as the basic information for retrieving the terms to construct a structured form of the report. Moreover, a topical probability modeling method is applied to automatically discover the candidate keyword phrases of the examination items from the reports. Finally, a system is implemented to generate the structured form for the various examination items in a report according to the constructed medical dictionary.

Results and conclusion

The results of the experiments showed that the methods proposed in this paper can effectively construct a structural form of examination reports. Furthermore, the keywords of the popular examination items can be extracted correctly. The above techniques will help automatic processing and analysis of medical text reports.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  1. Amorim RC, Zampieri, M. Effective spell checking methods using clustering algorithms. In: Proceedings of recent advances in natural language processing, p. 172–178; 2013.

  2. Aronson, AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Procedings of the AMIA Symposium. American Medical Informatics Association; 2001.

  3. Balaneshin-kordan S, Kotov A, Xisto R. WSU-IR. Joint weighting of explicit and latent medical query concepts from diverse sources. In: Proceedings of the Text REtreival Conference (TREC); 2015.

  4. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Proc J Mach Learn Res JMLR. 2003;3:993–1022.

    MATH  Google Scholar 

  5. Feldman R, Netzer O, Peretz A, Rosenfeld B. Utilizing text mining on online medical forums to predict label change due to adverse drug reactions. In: Proceedings of Knowledge Discovery and Data Mining (KDD); 2015.

  6. Ghassemi M, Naumann T, Doshi-Velez F, Brimmer N, Joshi R, Rumshisky A, Szolovits P. Unfolding physiological state: mortality modelling in intensive care units. In: Proceedings of the Knowledge Discovery and Data Mining (KDD); 2014.

  7. Goodwin TR, Harabagiu SM. Medical question answering for clinical decision support. In: Proceedings of the International Conference on Information and Knowledge Management (CIKM); 2016.

  8. Gupta N, Mathur P, Spell checking techniques in NLP: a survey. Int J Adv Res Comput Sci Softw Eng 2(12); 2012.

  9. Jo Y, Loghmanpour N, Rose CP. Time series analysis of nursing notes for mortality prediction via a state transition topic model. In: Proceedings of the International Conference on Information and Knowledge Management (CIKM); 2015.

  10. Lafferty J, McCallum A, F. CN Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML); 2001.

  11. Leaman R, Wojtulewicz L, Sullivan R, Skariah A, Yang J, Gonzalez G. Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks. In: Proceedings of the 2010 workshop on Biomedical Natural Language Processing, Association for Computational Linguistics; 2010.

  12. Lehman L-W, Saeed M, Long W, Lee J, Mark R. Risk stratification of ICU patients using topic models inferred from unstructured progress notes. In: Proceedings of the American Medical Informatics Association (AMIA); 2012.

  13. Li Y, Li J, Duan H, Lu X. Structuralization of digestive endoscopic report based on NLP. In: 2008 International Conference on BioMedical Engineering and Informatics.

  14. Liu X, Chen H. Azdrugminer: an information extraction system for mining patient-reported adverse drug events in online patient forums. In: Proceedings of the International Conference on Smart Health (ICSH); 2013.

  15. Loo M, Jonge E. Statistical data cleaning with applications in R. New York: Wiley; 2018.

    Google Scholar 

  16. Nandhakumar N, et al. Clinically significant information extraction from radiology reports. In: Proceedings of the 2017 ACM Symposium on Document Engineering. ACM; 2017.

  17. Paterson M, Dančík V. Longest common subsequences. In: Proceedings of the Mathematical Foundations of Computer Science (MFCS); 1994.

  18. Rong X, Chen Z, Mei Q, Adar E. EgoSet: exploiting word ego-networks and user-generated ontology for multifaceted set expansion. In: Proceedings of the International Conference on Web Search and Data Mining (WSDM); 2016.

  19. Savova GK, et al. Discovering peripheral arterial disease cases from radiology notes using natural language processing. In: AMIA Annual Symposium Proceedings, vol. 2010. American Medical Informatics Association; 2010.

  20. Savova GK, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13.

    Article  Google Scholar 

  21. Sethi S, et al. Mayo clinic/renal pathology society consensus report on pathologic classification, diagnosis, and reporting of GN. Proc J Am Soc Nephrol JASN. 2015;27(5):1278–87.

    Article  Google Scholar 

  22. Stanford CoreNLP—Core natural language software https://stanfordnlp.github.io/CoreNLP.

  23. Taira RK, Soderland SG, Jakobovits RM. Automatic structuring of radiology free-text reports. Radiographics. 2001;21(1):237–45.

    Article  Google Scholar 

  24. Teneva N, Cheng, W. Salience rank: efficient keyphrase extraction with topic modeling. In: Proc. of the 55th Annual Meeting of the Association for Computational Linguistics; 2017. p. 530–535

  25. Wang Y, et al. Clinical information extraction applications: a literature review. J Biomed Inform. 2018;77:34–49.

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by Project Number ASIA-105-CMUH-20, China Medical University Hospital.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chin-Chi Kuo or Arbee L. P. Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, PH., Yu, A., Tsai, CW. et al. Keyword extraction and structuralization of medical reports. Health Inf Sci Syst 8, 18 (2020). https://doi.org/10.1007/s13755-020-00108-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13755-020-00108-6

Keywords

Navigation