Challenges of Medical Text and Image Processing: Machine Learning Approaches

Menasalvas, Ernestina; Gonzalo-Martin, Consuelo

doi:10.1007/978-3-319-50478-0_11

Ernestina Menasalvas¹⁴ &
Consuelo Gonzalo-Martin¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9605))

5240 Accesses
4 Citations

Abstract

The generalized adoption of Electronic Medical Records (EMR) together with the need to give the patient the appropriate treatment at the appropriate moment at the appropriate cost is demanding solutions to analyze the information on the EMR automatically. However most of the information on the EMR is non-structured: texts and images. Extracting knowledge from this data requires methods for structuring this information. Despite the efforts made in Natural Language Processing (NLP) even in the biomedical domain and in image processing, medical big data has still to undertake several challenges. The ungrammatical structure of clinical notes, abbreviations used and evolving terms have to be tackled in any Name Entity Recognition process. Moreover abbreviations, acronyms and terms are very much dependant on the language and the specific service. On the other hand, in the area of medical images, one of the main challenges is the development of new algorithms and methodologies that can help the physician take full advantage of the information contained in all these images. However, the large number of imaging modalities used today for diagnosis hinders the availability of general procedures as machine learning is, once again, a good approach for addressing this challenge. In this chapter, which concentrates on the problem of name entity recognition, we review previous approaches and look at future works. We also review the machine leaning approaches for image segmentation and annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Huang, T.S., Dagli, C.K., Rajaram, S., Chang, E.Y., Mandel, M., Poliner, G.E., Ellis, D.P., et al.: Active learning for interactive multimedia retrieval. Proc. IEEE 96(4), 648–667 (2008)
Article Google Scholar
Wei, C.H., Chen, S.Y.: Annotation of medical images. In: Intelligent Multimedia Databases and Information Retrieval: Advancing Applications and Technologies, pp. 74–90 (2012)
Google Scholar
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. John Wiley & Sons Ltd., Chichester (2012)
MATH Google Scholar
Toutanova, K., Klein, D., C.M., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL (2003)
Google Scholar
Holzinger, A., Geierhofer, R., Modritscher, F., Tatzl, R.: Semantic information in medical information systems: utilization of text mining techniques to analyze medical diagnoses. J. Univ. Comput. Sci. 14(22), 3781–3795 (2008)
Google Scholar
Kavuluru, R., Rios, A., Lu, Y.: An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif. Intell. Med. 65(2), 155–166 (2015). Intelligent healthcare informatics in big data era
Article Google Scholar
Tsuruoka, Y., McNaught, J., Tsujii, J., Ananiadou, S.: Learning string similarity measures for gene/protein name dictionary look-up using logistic regression. Bioinformatics 23(20), 2768–2774 (2007)
Article Google Scholar
http://www.cs.nyu.edu/cs/projects/lsp/. Accessed 5 Dec 2015
http://www.medlingmap.org/taxonomy/term/80. Accesed 5 Dec 2015
Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inf. Assoc. 17(5), 507–513 (2010)
Article Google Scholar
Goryachev, S., Sordo, M., Zeng, Q.T.: A suite of natural language processing tools developed for the I2B2 project, Boston, Massachusetts, Decision Systems Group. Brigham and Women’s Hospital, Harvard Medical School (2006)
Google Scholar
Joshi, M., Pakhomov, S., Pederson, T., Chute, C.: A comparative study of supervised learning as applied to acronym expansion in clinical reports. In: AMIA Annual Symposium Proceedings, pp. 399–403 (2006)
Google Scholar
Pakhomov, S., Pedersen, T., Chute, C.G.: Abbreviation and acronym disambiguation in clinical discourse. In: AMIA Annual Symposium Proceedings, pp. 589–593 (2005)
Google Scholar
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), Hong Kong (2000)
Google Scholar
Smith, L., Rindflesch, T., Wilbur, W.J.: MedPost: a part-of-speech tagger for bioMedical text. Bioinformatics (Oxford, England) 20(14), 2320–2321 (2004)
Article Google Scholar
Wermter, J., Hahn, U.: Really, is medical sublanguage that different? Experimental counter-evidence from tagging medical and newspaper corpora. In: 11th World Congress on Medical Informatics (MEDINFO) (2004)
Google Scholar
Pakhomov, S.V., Coden, A., Chute, C.G.: Developing a corpus of clinical notes manually annotated for part-of-speech. Int. J. Med. Inf. 75(6), 418–429 (2006)
Article Google Scholar
http://www-nlp.stanford.edu/links/statnlp.html. Acessed 5 Dec 2015
Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical text mining: state-of-the-art, open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 271–300. Springer, Heidelberg (2014)
Google Scholar
Poibeau, T., Kosseim, L.: Proper name extraction from non-journalistic texts. In: Daelemans, W., Sima’an, K., Veenstra, J., Zavrel, J., (eds.) CLIN, vol. 37 of Language and Computers - Studies in Practical Linguistics, Rodopi, pp. 144–157 (2000)
Google Scholar
Demner-Fushman, D., Chapman, W.W., McDonald, C.J.: What can natural language processing do for clinical decision support? J. Biomed. Inf. 42(5), 760–772 (2009)
Article Google Scholar
Ananiadou, S., Mcnaught, J.: Text Mining for Biology and Biomedicine. Artech House Inc., Norwood (2005)
Google Scholar
Korkontzelos, I., Piliouras, D., Dowsey, A.W., Ananiadou, S.: Boosting drug named entity recognition using an aggregate classifier. Artif. Intell. Med. 65(2), 145–153 (2015). Intelligent healthcare informatics in big data era
Article Google Scholar
Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 150–157. Morgan Kaufmann (1995)
Google Scholar
Tomanek, K., Wermter, J., Hahn, U.: An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: Proceedings of EMNLP/CoNLL07, pp. 486–495 (2007)
Google Scholar
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004)
Article Google Scholar
Johnson, S.B.: A semantic lexicon for medical language processing. J. Am. Med. Inf. Assoc. 6(3), 205–218 (1999)
Article Google Scholar
Mougin, F., Burgun, A., Bodenreider, O.: Using wordnet to improve the mapping of data elements to UMLS for data sources integration. In: AMIA Annual Symposium Proceedings, vol. 2006, p. 574. American Medical Informatics Association (2006)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 194–201. Association for Computational Linguistics (1997)
Google Scholar
Satoshi Sekine, N.: Description of the Japanese NE system used for MET-2. In: Proceedings of MUC-7, Verginia, USA, pp. 1314–1319 (1998)
Google Scholar
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: description of the MENE named entity system as used in MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7). Citeseer (1998)
Google Scholar
Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 8–15. Association for Computational Linguistics (2003)
Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 (CONLL 2003), Stroudsburg, PA, USA, vol. 4, pp. 188–191. Association for Computational Linguistics (2003)
Google Scholar
Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006). doi:10.1007/11766247_23
Chapter Google Scholar
http://nlp.stanford.edu/software/CRF-NER.shtml. Accessed 5 Dec 2015
Sang, E.F.T.K., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 142–147. Association for Computational Linguistics (2003)
Google Scholar
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of CoNLL-2003, pp. 168–171 (2003)
Google Scholar
Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., Valencia, A.: Overview of the chemical compound and drug name recognition (CHEMDNER) task. In: BioCreative Challenge Evaluation Workshop, vol. 2, p. 2 (2013)
Google Scholar
Meystre, S., Savova, G., Kipper-Schuler, K., Hurdle, J.: Extracting information from textual documents in the electronic health record: a review of recent research. Yearb. Med. Inf. 35, 128–144 (2008)
Google Scholar
Ananiadou, S., Friedman, C., Tsujii, J.: Introduction: named entity recognition in biomedicine. J. Biomed. Inf. 37(6), 393–395 (2004)
Article Google Scholar
Ohta, T., Tateisi, Y., Kim, J.D.: The GENIA corpus: an annotated research abstract corpus in molecular biology domain. In: Proceedings of the Second International Conference on Human Language Technology Research (HLT 2002), San Francisco, CA, USA, pp. 82–86. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Ogren, P.V., Savova, G.K., Chute, C.G.: Constructing evaluation corpora for automated clinical named entity recognition. In: LREC. European Language Resources Association (2008)
Google Scholar
Roberts, A., Gaizauskas, R.J., Hepple, M., Demetriou, G., Guo, Y., Roberts, I., Setzer, A.: Building a semantically annotated corpus of clinical texts. J. Biomed. Inf. 42(5), 950–966 (2009)
Article Google Scholar
Li, D., Kipper-Schuler, K., Savova, G.: Conditional random fields and support vector machines for disorder named entity recognition in clinical texts. In: Proceedings of the HLT Workshop on Current Trends in Biomedical Natural Language Processing, Ohio, USA (2008)
Google Scholar
Yang, L., Zhou, Y.: Exploring feature sets for two-phase biomedical named entity recognition using semi-CRFs. Knowl. Inf. Syst. 40(2), 439–453 (2014)
Article Google Scholar
Wang, X., Yang, C., Guan, R.: A comparative study for biomedical named entity recognition. Int. J. Mach. Learn. Cybern. 1–10 (2015). Springer
Google Scholar
Tanabe, L., Xie, N., Thom, L.H., Matten, W., Wilbur, W.J.: GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinf. 6(Suppl 1), 1 (2005)
Article Google Scholar
Tang, Z., Jiang, L., Yang, L., Li, K., Li, K.: CRFs based parallel biomedical named entity recognition algorithm employing mapreduce framework. Cluster Comput. 18(2), 493–505 (2015)
Article Google Scholar
He, L., Yang, Z., Lin, H., Li, Y.: Drug name recognition in biomedical texts: a machine-learning-based method. Drug Disc. Today 19(5), 610–617 (2014)
Article Google Scholar
Gobbel, G.T., Reeves, R., Jayaramaraja, S., Giuse, D., Speroff, T., Brown, S.H., Elkin, P.L., Matheny, M.E.: Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. J. Biomed. Inf. 48, 54–65 (2014)
Article Google Scholar
Kim, J.D., Ohta, T., Tateisi, Y., Ichi Tsujii, J.: GENIA corpus - a semantically annotated corpus for bio-textmining. ISMB (Suppl. Bioinf.) 19, 180–182 (2003)
Google Scholar
Seth, K., Bies, A., Liberman, M., Mandel, M., Mcdonald, R., Palmer, M., Schein, A.: Integrated annotation for biomedical information extraction. In: Proceedings of the BioLINK 2004 (2004)
Google Scholar
Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inf. 3(2), 119–131 (2016)
Article Google Scholar
Yimam, S.M., Biemann, C., Majnaric, L., Šabanović, Š., Holzinger, A.: An adaptive annotation approach for biomedical entity and relation recognition. Brain Inf. 3(3), 1–12 (2016). Springer
Google Scholar
Girardi, D., Küng, J., Kleiser, R., Sonnberger, M., Csillag, D., Trenkler, J., Holzinger, A.: Interactive knowledge discovery with the doctor-in-the-loop: a practical example of cerebral aneurysms research. Brain Inf. 3(3), 1–11 (2016). Springer
Google Scholar
Holzinger, A., Plass, M., Holzinger, K., Crişan, G.C., Pintea, C.-M., Palade, V.: Towards interactive machine learning (iML): applying ant colony algorithms to solve the traveling salesman problem with the human-in-the-loop approach. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 81–95. Springer, Heidelberg (2016). doi:10.1007/978-3-319-45507-5_6
Chapter Google Scholar
Wernick, M.N., Yang, Y., Brankov, J.G., Yourganov, G., Strother, S.C.: Machine learning in medical imaging. IEEE Signal Process. Mag. 27(4), 25–38 (2010)
Article Google Scholar
Powell, S., Magnotta, V.A., Johnson, H., Jammalamadaka, V.K., Pierson, R., Andreasen, N.C.: Registration and machine learning-based automated segmentation of subcortical and cerebellar brain structures. NeuroImage 39(1), 238–247 (2008)
Article Google Scholar
Ling, H., Zhou, S.K., Zheng, Y., Georgescu, B., Sühling, M., Comaniciu, D.: Hierarchical, learning-based automatic liver segmentation. In: CVPR 2008, pp. 1–8 (2008)
Google Scholar
Glocker, B., Zikic, D., Haynor, D.R.: Robust registration of longitudinal spine CT. Med. Image Comput. Comput. Assist. Interv. 17, 251–258 (2014)
Google Scholar
Wang, Z., Ma, Y.: Medical image fusion using m-PCNN. Inf. Fus. 9(2), 176–185 (2008)
Article Google Scholar
Deselaers, T., Deserno, T.M., Müller, H.: Automatic medical image annotation in ImageCLEF 2007: overview, results, and discussion. Pattern Recogn. Lett. 29(15), 1988–1995 (2008)
Article Google Scholar
Müller, H., Michoux, N., Bandon, D., Geissbuhler, A.: A review of content-based image retrieval systems in medical applications—clinical benefits and future directions. Int. J. Med. Inf. 73(1), 1–23 (2004)
Article Google Scholar
Shen, D., Wu, G., Zhang, D., Suzuki, K., Wang, F., Yan, P.: Machine learning in medical imaging. Comput. Med. Imaging Grap. Official J. Comput. Med. Imaging Soc. 41, 1–2 (2015)
Article Google Scholar
Singh, S.: Review on machine learning techniques for automatic segmentation of liver images. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(4), 666–670 (2013)
Google Scholar
Van Ginneken, B., Heimann, T., Styner, M.: 3D segmentation in the clinic: a grand challenge. In: 3D Segmentation in the Clinic: A Grand Challenge, pp. 7–15 (2007)
Google Scholar
Metz, C., Schaap, M., van Walsum, T., van der Giessen, A., Weustink, A., Mollet, N., Krestin, G., Niessen, W.: 3D segmentation in the clinic: a grand challenge II-coronary artery tracking. Insight J. 1(5), 6 (2008)
Google Scholar
Angelini, E.D., Clatz, O., Mandonnet, E., Konukoglu, E., Capelle, L., Duffau, H.: Glioma dynamics and computational models: a review of segmentation, registration, and in silico growth algorithms and their clinical applications. Curr. Med. Imaging Rev. 3, 262–276 (2007)
Article Google Scholar
Bauer, S., Wiest, R., Nolte, L.P., Reyes, M.: A survey of MRI- based medical image analysis for brain tumor studies. Phys. Med. Biol. 58, R97–R129 (2013)
Article Google Scholar
Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015)
Article Google Scholar
Shattuck, D.W., Prasad, G., Mirza, M., Narr, K.L., Toga, A.W.: Online resource for validation of brain segmentation methods. Neuroimage 45(2), 431–439 (2009)
Article Google Scholar
Deselaers, T., Müller, H., Clough, P., Ney, H., Lehmann, T.M.: The CLEF 2005 automatic medical image annotation task. Int. J. Comput. Vis. 74(1), 51–58 (2007)
Article Google Scholar
Peters, C., et al. (eds.): CLEF 2008. LNCS, vol. 5706. Springer, Heidelberg (2009)
Google Scholar
Peters, C., Caputo, B., Gonzalo, J., Jones, G.J.F., Kalpathy-Cramer, J., Müller, H., Tsikrika, T. (eds.): CLEF 2009. LNCS, vol. 6242. Springer, Heidelberg (2010)
Google Scholar
Lehmann, T.M., Schubert, H., Keysers, D., Kohnen, M., Wein, B.B.: The IRMA code for unique classification of medical images. In: Medical Imaging 2003, pp. 440–451. International Society for Optics and Photonics (2003)
Google Scholar
Mueen, A., Zainuddin, R., Baba, M.S.: Automatic multilevel medical image annotation and retrieval. J. Digital Imaging 21(3), 290–295 (2007)
Article Google Scholar
Ko, B.C., Lee, J., Nam, J.Y.: Automatic medical image annotation and keyword-based image retrieval using relevance feedback. J. Digital Imaging 25(4), 454–465 (2011)
Article Google Scholar
Wei, C.H., Chen, S.Y.: Annotation of Medical Images (2012)
Google Scholar
An, K., Prasad, B.G.: Automated image annotation for semantic indexing and retrieval of medical images. Int. J. Comput. Appl. 55(3), 26–33 (2012)
Google Scholar
Burdescu, D.D., Mihai, C.G., Stanescu, L., Brezovan, M.: Automatic image annotation and semantic based image retrieval for medical domain. Neurocomputing 109, 33–48 (2013)
Article Google Scholar
Dumitru, D.B., Stanescu, L., Brezovan, M.: Information extraction from medical images: evaluating a novel automatic image annotation system using semantic-based visual information retrieval (2014)
Google Scholar
Villena Román, J., González Cristóbal, J.C., Goñi Menoyo, J.M., Martínez Fernández, J.L.: Miracles naive approach to medical images annotation (2005)
Google Scholar
Setia, L., Teynor, A., Halawani, A., Burkhardt, H.: Grayscale medical image annotation using local relational features. Pattern Recognit. Lett. 29(15), 2039–2045 (2008)
Article Google Scholar
Khademi, S.M., Pakize, S.R., Tanoorje, M.A.: A review of methods for the automatic annotation and retrieval of medical images. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(7), 1–5 (2014)
Google Scholar
Wang, M., Hua, X.S.: Active learning in multimedia annotation and retrieval: a survey. ACM Trans. Intell. Syst. Technol. 2(2), 10 (2011)
Article MathSciNet Google Scholar
Tang, J., Zha, Z.J., Tao, D., Chua, T.S.: Semantic-gap-oriented active learning for multilabel image annotation. IEEE Trans. Image Process. 21(4), 2354–2360 (2012)
Article MathSciNet Google Scholar
Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Ciresan, D., Giusti, A., Gambardella, L.M., Schmidhuber, J.: Deep neural networks segment neuronal membranes in electron microscopy images. In: Advances in Neural Information Processing Systems, pp. 2843–2851 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Center of Biomedical Technology, Universidad Politecnica de Madrid, Madrid, Spain
Ernestina Menasalvas & Consuelo Gonzalo-Martin

Authors

Ernestina Menasalvas
View author publications
You can also search for this author in PubMed Google Scholar
Consuelo Gonzalo-Martin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ernestina Menasalvas .

Editor information

Editors and Affiliations

Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, Austria
Andreas Holzinger

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Menasalvas, E., Gonzalo-Martin, C. (2016). Challenges of Medical Text and Image Processing: Machine Learning Approaches. In: Holzinger, A. (eds) Machine Learning for Health Informatics. Lecture Notes in Computer Science(), vol 9605. Springer, Cham. https://doi.org/10.1007/978-3-319-50478-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-50478-0_11
Published: 10 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50477-3
Online ISBN: 978-3-319-50478-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics