Abstract
Automated analysis of chest radiography using deep learning has tremendous potential to enhance the clinical diagnosis of diseases in patients. However, deep learning models typically require large amounts of annotated data to achieve high performance – often an obstacle to medical domain adaptation. In this paper, we build a data-efficient learning framework that utilizes radiology reports to improve medical image classification performance with limited labeled data (fewer than 1000 examples). Specifically, we examine image-captioning pretraining to learn high-quality medical image representations that train on fewer examples. Following joint pretraining of a convolutional encoder and transformer decoder, we transfer the learned encoder to various classification tasks. Averaged over 9 pathologies, we find that our model achieves higher classification performance than ImageNet-supervised and in-domain supervised pretraining when labeled training data is limited.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alfarghaly, O., Khaled, R., Elkorany, A., Helal, M., Fahmy, A.: Automated radiology report generation using conditioned transformers. Inf. Med. Unlocked 24, 100557 (2021)
Angehrn, Z., et al.: Artificial intelligence and machine learning applied at the point of care. Front. Pharmacol. 11, 759 (2020)
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (2019)
Chauhan, G., et al.: Joint modeling of chest radiographs and radiology reports for pulmonary edema assessment. In: MICCAI (2020)
Chen, X., et al.: Microsoft coco captions: Data collection and evaluation server (2015). arXiv:1048550/ARXIV.1504.00325
Davenport, T., Kalakota, R.: The potential for artificial intelligence in healthcare. Future Healthcare J. 6(2), 94 (2019)
Desai, K., Johnson, J.: VirTex: learning visual representations from textual annotations. In: CVPR (2021)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv (2020)
Gasimova, A., Montana, G., Rueckert, D.: Automated knee x-ray report generation. arXiv (2021)
Goyal, P., Mahajan, D., Gupta, A., Misra, I.: Scaling and benchmarking self-supervised visual representation learning. CoRR (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR arXiv:abs/1512.03385 (2015)
Horng, S., Liao, R., Wang, X., Dalal, S., Golland, P., Berkowitz, S.J.: Deep learning to quantify pulmonary edema in chest radiographs. Radiol. Artif. Intell. 3(2), e190228 (2021)
Hosseinzadeh Taher, M.R., Haghighi, F., Feng, R., Gotway, M.B., Liang, J.: A systematic benchmarking analysis of transfer learning for medical image analysis. In: Albarqouni, S., et al. (eds.) DART/FAIR -2021. LNCS, vol. 12968, pp. 3–13. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87722-4_1
Irvin, J., et al.: Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Thirty-Third AAAI Conference on Artificial Intelligence (2019)
Johnson, A., et al.: Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. In: Scientific data (2019)
Johnson, A., et al.: MIMIC-CXR-JPG - chest radiographs with structured labels (2019)
Johnson, A., Pollard, T., Mark, R., Berkowitz, S., Horng, S.: MIMIC-CXR database. PhysioNet (2019)
Krishnan, K.S., Krishnan, K.S.: Vision transformer based COVID-19 detection using chest x-rays. In: 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC), IEEE (2021)
Liao, R., Chauhan, G., Golland, P., Berkowitz, S., Horng, S.: Pulmonary edema severity grades based on MIMIC-CXR (version 1.0.1). In: PhysioNet (2021). https://doi.org/10.13026/rz5p-rc64
Liao, R., Chauhan, G., Golland, P., Berkowitz, S., Horng, S.: Pulmonary edema severity grades based on mimic-cxr (version 1.0.1). PhysioNet (2021)
Liao, R., et al.: Multimodal representation learning via maximization of local mutual information. In: MICCAI (2021)
Lin, T.Y., et al.: Microsoft coco: Common objects in context (2014). arxiv:1048550/ARXIV.1405.0312
Miura, Y., Zhang, Y., Tsai, E.B., Langlotz, C.P., Jurafsky, D.: Improving factual completeness and consistency of image-to-text radiology report generation. arXiv (2020)
Raghu, M., Zhang, C., Kleinberg, J., Bengio, S.: Transfusion: Understanding transfer learning for medical imaging. arXiv (2019)
Sutton, R., Pincock, D., Baumgart, D., Sadowski, D., Fedorak, R., Kroeker, K.: An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digital Med. 3(1), 1–10 (2020)
Thian, Y.L., et al.: Deep learning systems for pneumothorax detection on chest radiographs: a multicenter external validation study. Radiol. Artif. Intell. 3(4), e200190 (2021)
Vaswani, A., et al.: Attention is all you need. arXiv (2017)
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. CoRR arXiv:abs/1705.02315 (2017)
Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: Tienet: text-image embedding network for common thorax disease classification and reporting in chest x-rays. CoRR arXiv:abs/1801.04334 (2018)
Wen, Y., Chen, L., Deng, Y., Zhou, C.: Rethinking pre-training on medical imaging. J. Vis. Commun. Image Representation 78, 103145 (2021)
Xie, Y., Richmond, D.: Pre-training on grayscale imagenet improves medical image classification. In: Leal-Taixé, L., Roth, S. (eds.) Computer Vision - ECCV 2018 Workshops (2019)
Zhang, Y., Jiang, H., Miura, Y., Manning, C.D., Langlotz, C.P.: Contrastive learning of medical visual representations from paired images and text. arXiv (2020)
Acknowledgements
This work was supported in part by MIT Lincoln Laboratory, US Air Force, NIH NIBIB NAC P41EB015902, Wistron, IBM Watson, MIT Deshpande Center, and MIT J-Clinic.
DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited. This material is based upon work supported by the Old Program 1 under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Old Program 1. ©Massachusetts Institute of Technology. Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Quigley, K. et al. (2022). RadTex: Learning Efficient Radiograph Representations from Text Reports. In: Xu, X., Li, X., Mahapatra, D., Cheng, L., Petitjean, C., Fu, H. (eds) Resource-Efficient Medical Image Analysis. REMIA 2022. Lecture Notes in Computer Science, vol 13543. Springer, Cham. https://doi.org/10.1007/978-3-031-16876-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-16876-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16875-8
Online ISBN: 978-3-031-16876-5
eBook Packages: Computer ScienceComputer Science (R0)