BeamAtt: Generating Medical Diagnosis from Chest X-Rays Using Sampling-Based Intelligence

Sawarn, Aman; Srivastava, Shefali; Gupta, Monika; Srivastava, Smriti

doi:10.1007/978-3-030-76167-7_9

Aman Sawarn⁷,
Shefali Srivastava⁸,
Monika Gupta⁷ &
…
Smriti Srivastava⁹

Part of the book series: EAI/Springer Innovations in Communication and Computing ((EAISICC))

417 Accesses
2 Citations

Abstract

Medical imaging coupled with image captioning is fuelling the possibilities of generating accurate medical reports with minimal human intervention. In economically downtrodden nations, this produces opportunity for the poor to acquire world-class treatment from around the globe with an efficient time to market. Chest X-ray images are integral to the task of diagnosis and treatment of respiratory problems. In this paper, we propose BeamAtt: an end-to-end deep CNN-RNN-based encoder-decoder framework that incorporates spatial visual attention to generate a terse diagnosis from chest X-ray films. We choose to use a GRU RNN decoder as compared to LSTMs or hierarchical LSTMs from previous literature and reason via extensive evaluation for the same. To boost performance over state-of-the-art methods with complex architectures, we employ sampling-based techniques along with beam search optimisation while generating inferences and argue that a simpler framework with intelligent optimisation is able to successfully achieve higher performance metrics. We show how vivid attention plots can provide deep insight into the region of the image on which the network concentrates to generate a word token. We compare our model with recent prior art using standard evaluation metrics BLEU-1/2/3/4 and ROUGE-L and demonstrate superiority of the proposed method. BeamAtt achieves a BLEU-1 score of 0.56 and CIDEr score of 2.077 which is a significant boost in performance over contemporary solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jimmy Ba, Volodymyr Mnih, and KorayKavukcuoglu. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755, 2014.
Google Scholar
DzmitryBahdanau, Kyunghyun Cho, and YoshuaBengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
Google Scholar
Oliver Bintcliffe and Nick Maskell. Spontaneous pneumothorax. Bmj, 348, 2014.
Google Scholar
WilliamBoag,Tzu-MingHarryHsu, MatthewMcDermott, GabrielaBerner, Emily Alesentzer, and Peter Szolovits. Baselines for chest x-ray report generation. In Machine Learning for Health Workshop, pages 126–140, 2020.
Google Scholar
Kyunghyun Cho, Bart Van Merri ̈enboer, DzmitryBahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
Google Scholar
DinaDemner-Fushman,SonyaEShooshan,LaritzaRodriguez,SameerAntani,and George R Thoma. Annotation of chest radiology reports for indexing and retrieval. In International Workshop on Multimodal Retrieval in the Medical Domain, pages 99–111. Springer, 2015.
Google Scholar
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2625–2634, 2015.
Google Scholar
Sheffi eld. Respiratory societies. the global impact of respiratory disease – second edition. Forum of International Respiratory Societies, 2017.
Google Scholar
Desmond Elliott and Frank Keller. Image description using visual dependency representations. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1292–1302, 2013.
Google Scholar
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. Every picture tells a story: Generating sentences from images. In European conference on computer vision, pages 15–29. Springer, 2010.
Google Scholar
Yansong Feng and Mirella Lapata. How many words is a picture worth? automatic caption generation for news images. In Proceedings of the 48th annual meeting of the Association for Computational Linguistics, pages 1239–1249, 2010.
Google Scholar
Ralf Gerber and N-H Nagel. Knowledge representation for the generation of quantified natural language descriptions of vehicle traffic in image sequences. In Proceedings of 3rd IEEE International Conference on Image Processing, volume 2, pages 805–808. IEEE, 1996.
Google Scholar
Xin Huang, Fengqi Yan, Wei Xu, and Maozhen Li. Multi-attention and incorporating background information model for chest x-ray image report generation. IEEE Access, 7:154808–154817, 2019.
Google Scholar
Global Innovation Index. Global innovation index, 2019.
Google Scholar
Baoyu Jing, PengtaoXie, and Eric Xing. On the automatic generation of medical imaging reports. arXiv preprint arXiv:1711.08195, 2017.
Google Scholar
Andrej Karpathy and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3128–3137, 2015.
Google Scholar
Ryan Kiros, Ruslan Salakhutdinov, and Rich Zemel. Multimodal neural language models. In International conference on machine learning, pages 595–603, 2014.
Google Scholar
Girish Kulkarni, VisruthPremraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12):2891–2903, 2013.
Google Scholar
Christy Y Li, Xiaodan Liang, Zhiting Hu, and Eric P Xing. Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6666–6673, 2019.
Google Scholar
Yuan Li, Xiaodan Liang, Zhiting Hu, and Eric P Xing. Hybrid retrieval-generation reinforced agent for medical image report generation. In Advances in neural information processing systems, pages 1530–1540, 2018.
Google Scholar
Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
Google Scholar
Carolyn E Lipscomb. Medical subject headings (mesh). Bulletin of the Medical Library Association, 88(3):265, 2000.
Google Scholar
Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDermott, Willie Boag, Wei- Hung Weng, Peter Szolovits, and MarzyehGhassemi. Clinically accurate chest x-ray report generation. arXiv preprint arXiv:1904.02633, 2019.
Google Scholar
Jiasen Lu, CaimingXiong, Devi Parikh, and Richard Socher. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 375–383, 2017.
Google Scholar
Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, and Alan L Yuille. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.
Google Scholar
Margaret Mitchell, Jesse Dodge, Amit Goyal, Kota Yamaguchi, Karl Stratos, Xufeng Han, Alyssa Mensch, Alexander Berg, Tamara Berg, and Hal Daum ́e III. Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 747–756, 2012.
Google Scholar
Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. Recurrent models of visual attention. In Advances in neural information processing systems, pages 2204–2212, 2014.
Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
Google Scholar
Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. Self-critical sequence training for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7008–7024, 2017.
Google Scholar
Hoo-Chang Shin, Kirk Roberts, Le Lu, Dina Demner-Fushman, Jianhua Yao, and Ronald M Summers. Learning to read chest x-rays: Recurrent neural cascade model for automated image annotation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2497–2506, 2016.
Google Scholar
Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. Cider: Consensus- based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4566–4575, 2015.
Google Scholar
Oriol Vinyals, Alexander Toshev, SamyBengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3156–3164, 2015.
Google Scholar
Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, and Ronald M Summers. Tienet: Text-image embedding network for common thorax disease classification and re- porting in chest x-rays. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9049–9058, 2018.
Google Scholar
Sam Wiseman and Alexander M Rush. Sequence-to-sequence learning as beam- search optimization. arXiv preprint arXiv:1606.02960, 2016.
Google Scholar
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and YoshuaBengio. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages 2048–2057, 2015.
Google Scholar
Benjamin Z Yao, Xiong Yang, Liang Lin, Mun Wai Lee, and Song-Chun Zhu. I2t: Image parsing to text description. Proceedings of the IEEE, 98(8):1485–1508, 2010.
Google Scholar

Download references

Author information

Authors and Affiliations

Maharaja Agrasen Institute of Technology, New Delhi, India
Aman Sawarn & Monika Gupta
Adobe Systems Pvt Ltd, Noida, Uttar Pradesh, India
Shefali Srivastava
Department of Instrumentation and Control Engineering, Netaji Subhas University of Technology, New Delhi, India
Smriti Srivastava

Authors

Aman Sawarn
View author publications
You can also search for this author in PubMed Google Scholar
Shefali Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Monika Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Smriti Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shefali Srivastava .

Editor information

Editors and Affiliations

Instrumentation and Control Engineering, Netaji Subhas University of Technology, New Delhi, Delhi, India
Smriti Srivastava
School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, Delhi, India
Manju Khari
Department of Computer Science and Technology, Universidad Internacional de La Rioja, Logroño, La Rioja, Spain
Ruben Gonzalez Crespo
Department of Electronics and Communication Engineering, Bharati Vidyapeeth’s College of Engineering, New Delhi, Delhi, India
Gopal Chaudhary
Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, Noida, India
Parul Arora

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sawarn, A., Srivastava, S., Gupta, M., Srivastava, S. (2021). BeamAtt: Generating Medical Diagnosis from Chest X-Rays Using Sampling-Based Intelligence. In: Srivastava, S., Khari, M., Gonzalez Crespo, R., Chaudhary, G., Arora, P. (eds) Concepts and Real-Time Applications of Deep Learning. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-76167-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-76167-7_9
Published: 24 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76166-0
Online ISBN: 978-3-030-76167-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics