Abstract
The generalized adoption of Electronic Medical Records (EMR) together with the need to give the patient the appropriate treatment at the appropriate moment at the appropriate cost is demanding solutions to analyze the information on the EMR automatically. However most of the information on the EMR is non-structured: texts and images. Extracting knowledge from this data requires methods for structuring this information. Despite the efforts made in Natural Language Processing (NLP) even in the biomedical domain and in image processing, medical big data has still to undertake several challenges. The ungrammatical structure of clinical notes, abbreviations used and evolving terms have to be tackled in any Name Entity Recognition process. Moreover abbreviations, acronyms and terms are very much dependant on the language and the specific service. On the other hand, in the area of medical images, one of the main challenges is the development of new algorithms and methodologies that can help the physician take full advantage of the information contained in all these images. However, the large number of imaging modalities used today for diagnosis hinders the availability of general procedures as machine learning is, once again, a good approach for addressing this challenge. In this chapter, which concentrates on the problem of name entity recognition, we review previous approaches and look at future works. We also review the machine leaning approaches for image segmentation and annotation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Huang, T.S., Dagli, C.K., Rajaram, S., Chang, E.Y., Mandel, M., Poliner, G.E., Ellis, D.P., et al.: Active learning for interactive multimedia retrieval. Proc. IEEE 96(4), 648–667 (2008)
Wei, C.H., Chen, S.Y.: Annotation of medical images. In: Intelligent Multimedia Databases and Information Retrieval: Advancing Applications and Technologies, pp. 74–90 (2012)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. John Wiley & Sons Ltd., Chichester (2012)
Toutanova, K., Klein, D., C.M., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL (2003)
Holzinger, A., Geierhofer, R., Modritscher, F., Tatzl, R.: Semantic information in medical information systems: utilization of text mining techniques to analyze medical diagnoses. J. Univ. Comput. Sci. 14(22), 3781–3795 (2008)
Kavuluru, R., Rios, A., Lu, Y.: An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif. Intell. Med. 65(2), 155–166 (2015). Intelligent healthcare informatics in big data era
Tsuruoka, Y., McNaught, J., Tsujii, J., Ananiadou, S.: Learning string similarity measures for gene/protein name dictionary look-up using logistic regression. Bioinformatics 23(20), 2768–2774 (2007)
http://www.cs.nyu.edu/cs/projects/lsp/. Accessed 5 Dec 2015
http://www.medlingmap.org/taxonomy/term/80. Accesed 5 Dec 2015
Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inf. Assoc. 17(5), 507–513 (2010)
Goryachev, S., Sordo, M., Zeng, Q.T.: A suite of natural language processing tools developed for the I2B2 project, Boston, Massachusetts, Decision Systems Group. Brigham and Women’s Hospital, Harvard Medical School (2006)
Joshi, M., Pakhomov, S., Pederson, T., Chute, C.: A comparative study of supervised learning as applied to acronym expansion in clinical reports. In: AMIA Annual Symposium Proceedings, pp. 399–403 (2006)
Pakhomov, S., Pedersen, T., Chute, C.G.: Abbreviation and acronym disambiguation in clinical discourse. In: AMIA Annual Symposium Proceedings, pp. 589–593 (2005)
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), Hong Kong (2000)
Smith, L., Rindflesch, T., Wilbur, W.J.: MedPost: a part-of-speech tagger for bioMedical text. Bioinformatics (Oxford, England) 20(14), 2320–2321 (2004)
Wermter, J., Hahn, U.: Really, is medical sublanguage that different? Experimental counter-evidence from tagging medical and newspaper corpora. In: 11th World Congress on Medical Informatics (MEDINFO) (2004)
Pakhomov, S.V., Coden, A., Chute, C.G.: Developing a corpus of clinical notes manually annotated for part-of-speech. Int. J. Med. Inf. 75(6), 418–429 (2006)
http://www-nlp.stanford.edu/links/statnlp.html. Acessed 5 Dec 2015
Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical text mining: state-of-the-art, open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 271–300. Springer, Heidelberg (2014)
Poibeau, T., Kosseim, L.: Proper name extraction from non-journalistic texts. In: Daelemans, W., Sima’an, K., Veenstra, J., Zavrel, J., (eds.) CLIN, vol. 37 of Language and Computers - Studies in Practical Linguistics, Rodopi, pp. 144–157 (2000)
Demner-Fushman, D., Chapman, W.W., McDonald, C.J.: What can natural language processing do for clinical decision support? J. Biomed. Inf. 42(5), 760–772 (2009)
Ananiadou, S., Mcnaught, J.: Text Mining for Biology and Biomedicine. Artech House Inc., Norwood (2005)
Korkontzelos, I., Piliouras, D., Dowsey, A.W., Ananiadou, S.: Boosting drug named entity recognition using an aggregate classifier. Artif. Intell. Med. 65(2), 145–153 (2015). Intelligent healthcare informatics in big data era
Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 150–157. Morgan Kaufmann (1995)
Tomanek, K., Wermter, J., Hahn, U.: An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: Proceedings of EMNLP/CoNLL07, pp. 486–495 (2007)
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004)
Johnson, S.B.: A semantic lexicon for medical language processing. J. Am. Med. Inf. Assoc. 6(3), 205–218 (1999)
Mougin, F., Burgun, A., Bodenreider, O.: Using wordnet to improve the mapping of data elements to UMLS for data sources integration. In: AMIA Annual Symposium Proceedings, vol. 2006, p. 574. American Medical Informatics Association (2006)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 194–201. Association for Computational Linguistics (1997)
Satoshi Sekine, N.: Description of the Japanese NE system used for MET-2. In: Proceedings of MUC-7, Verginia, USA, pp. 1314–1319 (1998)
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: description of the MENE named entity system as used in MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7). Citeseer (1998)
Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 8–15. Association for Computational Linguistics (2003)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 (CONLL 2003), Stroudsburg, PA, USA, vol. 4, pp. 188–191. Association for Computational Linguistics (2003)
Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006). doi:10.1007/11766247_23
http://nlp.stanford.edu/software/CRF-NER.shtml. Accessed 5 Dec 2015
Sang, E.F.T.K., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 142–147. Association for Computational Linguistics (2003)
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of CoNLL-2003, pp. 168–171 (2003)
Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., Valencia, A.: Overview of the chemical compound and drug name recognition (CHEMDNER) task. In: BioCreative Challenge Evaluation Workshop, vol. 2, p. 2 (2013)
Meystre, S., Savova, G., Kipper-Schuler, K., Hurdle, J.: Extracting information from textual documents in the electronic health record: a review of recent research. Yearb. Med. Inf. 35, 128–144 (2008)
Ananiadou, S., Friedman, C., Tsujii, J.: Introduction: named entity recognition in biomedicine. J. Biomed. Inf. 37(6), 393–395 (2004)
Ohta, T., Tateisi, Y., Kim, J.D.: The GENIA corpus: an annotated research abstract corpus in molecular biology domain. In: Proceedings of the Second International Conference on Human Language Technology Research (HLT 2002), San Francisco, CA, USA, pp. 82–86. Morgan Kaufmann Publishers Inc. (2002)
Ogren, P.V., Savova, G.K., Chute, C.G.: Constructing evaluation corpora for automated clinical named entity recognition. In: LREC. European Language Resources Association (2008)
Roberts, A., Gaizauskas, R.J., Hepple, M., Demetriou, G., Guo, Y., Roberts, I., Setzer, A.: Building a semantically annotated corpus of clinical texts. J. Biomed. Inf. 42(5), 950–966 (2009)
Li, D., Kipper-Schuler, K., Savova, G.: Conditional random fields and support vector machines for disorder named entity recognition in clinical texts. In: Proceedings of the HLT Workshop on Current Trends in Biomedical Natural Language Processing, Ohio, USA (2008)
Yang, L., Zhou, Y.: Exploring feature sets for two-phase biomedical named entity recognition using semi-CRFs. Knowl. Inf. Syst. 40(2), 439–453 (2014)
Wang, X., Yang, C., Guan, R.: A comparative study for biomedical named entity recognition. Int. J. Mach. Learn. Cybern. 1–10 (2015). Springer
Tanabe, L., Xie, N., Thom, L.H., Matten, W., Wilbur, W.J.: GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinf. 6(Suppl 1), 1 (2005)
Tang, Z., Jiang, L., Yang, L., Li, K., Li, K.: CRFs based parallel biomedical named entity recognition algorithm employing mapreduce framework. Cluster Comput. 18(2), 493–505 (2015)
He, L., Yang, Z., Lin, H., Li, Y.: Drug name recognition in biomedical texts: a machine-learning-based method. Drug Disc. Today 19(5), 610–617 (2014)
Gobbel, G.T., Reeves, R., Jayaramaraja, S., Giuse, D., Speroff, T., Brown, S.H., Elkin, P.L., Matheny, M.E.: Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. J. Biomed. Inf. 48, 54–65 (2014)
Kim, J.D., Ohta, T., Tateisi, Y., Ichi Tsujii, J.: GENIA corpus - a semantically annotated corpus for bio-textmining. ISMB (Suppl. Bioinf.) 19, 180–182 (2003)
Seth, K., Bies, A., Liberman, M., Mandel, M., Mcdonald, R., Palmer, M., Schein, A.: Integrated annotation for biomedical information extraction. In: Proceedings of the BioLINK 2004 (2004)
Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inf. 3(2), 119–131 (2016)
Yimam, S.M., Biemann, C., Majnaric, L., Šabanović, Š., Holzinger, A.: An adaptive annotation approach for biomedical entity and relation recognition. Brain Inf. 3(3), 1–12 (2016). Springer
Girardi, D., Küng, J., Kleiser, R., Sonnberger, M., Csillag, D., Trenkler, J., Holzinger, A.: Interactive knowledge discovery with the doctor-in-the-loop: a practical example of cerebral aneurysms research. Brain Inf. 3(3), 1–11 (2016). Springer
Holzinger, A., Plass, M., Holzinger, K., Crişan, G.C., Pintea, C.-M., Palade, V.: Towards interactive machine learning (iML): applying ant colony algorithms to solve the traveling salesman problem with the human-in-the-loop approach. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 81–95. Springer, Heidelberg (2016). doi:10.1007/978-3-319-45507-5_6
Wernick, M.N., Yang, Y., Brankov, J.G., Yourganov, G., Strother, S.C.: Machine learning in medical imaging. IEEE Signal Process. Mag. 27(4), 25–38 (2010)
Powell, S., Magnotta, V.A., Johnson, H., Jammalamadaka, V.K., Pierson, R., Andreasen, N.C.: Registration and machine learning-based automated segmentation of subcortical and cerebellar brain structures. NeuroImage 39(1), 238–247 (2008)
Ling, H., Zhou, S.K., Zheng, Y., Georgescu, B., Sühling, M., Comaniciu, D.: Hierarchical, learning-based automatic liver segmentation. In: CVPR 2008, pp. 1–8 (2008)
Glocker, B., Zikic, D., Haynor, D.R.: Robust registration of longitudinal spine CT. Med. Image Comput. Comput. Assist. Interv. 17, 251–258 (2014)
Wang, Z., Ma, Y.: Medical image fusion using m-PCNN. Inf. Fus. 9(2), 176–185 (2008)
Deselaers, T., Deserno, T.M., Müller, H.: Automatic medical image annotation in ImageCLEF 2007: overview, results, and discussion. Pattern Recogn. Lett. 29(15), 1988–1995 (2008)
Müller, H., Michoux, N., Bandon, D., Geissbuhler, A.: A review of content-based image retrieval systems in medical applications—clinical benefits and future directions. Int. J. Med. Inf. 73(1), 1–23 (2004)
Shen, D., Wu, G., Zhang, D., Suzuki, K., Wang, F., Yan, P.: Machine learning in medical imaging. Comput. Med. Imaging Grap. Official J. Comput. Med. Imaging Soc. 41, 1–2 (2015)
Singh, S.: Review on machine learning techniques for automatic segmentation of liver images. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(4), 666–670 (2013)
Van Ginneken, B., Heimann, T., Styner, M.: 3D segmentation in the clinic: a grand challenge. In: 3D Segmentation in the Clinic: A Grand Challenge, pp. 7–15 (2007)
Metz, C., Schaap, M., van Walsum, T., van der Giessen, A., Weustink, A., Mollet, N., Krestin, G., Niessen, W.: 3D segmentation in the clinic: a grand challenge II-coronary artery tracking. Insight J. 1(5), 6 (2008)
Angelini, E.D., Clatz, O., Mandonnet, E., Konukoglu, E., Capelle, L., Duffau, H.: Glioma dynamics and computational models: a review of segmentation, registration, and in silico growth algorithms and their clinical applications. Curr. Med. Imaging Rev. 3, 262–276 (2007)
Bauer, S., Wiest, R., Nolte, L.P., Reyes, M.: A survey of MRI- based medical image analysis for brain tumor studies. Phys. Med. Biol. 58, R97–R129 (2013)
Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015)
Shattuck, D.W., Prasad, G., Mirza, M., Narr, K.L., Toga, A.W.: Online resource for validation of brain segmentation methods. Neuroimage 45(2), 431–439 (2009)
Deselaers, T., Müller, H., Clough, P., Ney, H., Lehmann, T.M.: The CLEF 2005 automatic medical image annotation task. Int. J. Comput. Vis. 74(1), 51–58 (2007)
Peters, C., et al. (eds.): CLEF 2008. LNCS, vol. 5706. Springer, Heidelberg (2009)
Peters, C., Caputo, B., Gonzalo, J., Jones, G.J.F., Kalpathy-Cramer, J., Müller, H., Tsikrika, T. (eds.): CLEF 2009. LNCS, vol. 6242. Springer, Heidelberg (2010)
Lehmann, T.M., Schubert, H., Keysers, D., Kohnen, M., Wein, B.B.: The IRMA code for unique classification of medical images. In: Medical Imaging 2003, pp. 440–451. International Society for Optics and Photonics (2003)
Mueen, A., Zainuddin, R., Baba, M.S.: Automatic multilevel medical image annotation and retrieval. J. Digital Imaging 21(3), 290–295 (2007)
Ko, B.C., Lee, J., Nam, J.Y.: Automatic medical image annotation and keyword-based image retrieval using relevance feedback. J. Digital Imaging 25(4), 454–465 (2011)
Wei, C.H., Chen, S.Y.: Annotation of Medical Images (2012)
An, K., Prasad, B.G.: Automated image annotation for semantic indexing and retrieval of medical images. Int. J. Comput. Appl. 55(3), 26–33 (2012)
Burdescu, D.D., Mihai, C.G., Stanescu, L., Brezovan, M.: Automatic image annotation and semantic based image retrieval for medical domain. Neurocomputing 109, 33–48 (2013)
Dumitru, D.B., Stanescu, L., Brezovan, M.: Information extraction from medical images: evaluating a novel automatic image annotation system using semantic-based visual information retrieval (2014)
Villena Román, J., González Cristóbal, J.C., Goñi Menoyo, J.M., MartÃnez Fernández, J.L.: Miracles naive approach to medical images annotation (2005)
Setia, L., Teynor, A., Halawani, A., Burkhardt, H.: Grayscale medical image annotation using local relational features. Pattern Recognit. Lett. 29(15), 2039–2045 (2008)
Khademi, S.M., Pakize, S.R., Tanoorje, M.A.: A review of methods for the automatic annotation and retrieval of medical images. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(7), 1–5 (2014)
Wang, M., Hua, X.S.: Active learning in multimedia annotation and retrieval: a survey. ACM Trans. Intell. Syst. Technol. 2(2), 10 (2011)
Tang, J., Zha, Z.J., Tao, D., Chua, T.S.: Semantic-gap-oriented active learning for multilabel image annotation. IEEE Trans. Image Process. 21(4), 2354–2360 (2012)
Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Ciresan, D., Giusti, A., Gambardella, L.M., Schmidhuber, J.: Deep neural networks segment neuronal membranes in electron microscopy images. In: Advances in Neural Information Processing Systems, pp. 2843–2851 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this chapter
Cite this chapter
Menasalvas, E., Gonzalo-Martin, C. (2016). Challenges of Medical Text and Image Processing: Machine Learning Approaches. In: Holzinger, A. (eds) Machine Learning for Health Informatics. Lecture Notes in Computer Science(), vol 9605. Springer, Cham. https://doi.org/10.1007/978-3-319-50478-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-50478-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50477-3
Online ISBN: 978-3-319-50478-0
eBook Packages: Computer ScienceComputer Science (R0)