ABSTRACT
Patients are increasingly using the web for understanding medical information, making health decisions, and validating physicians' advice. However, most of this content is tailored to an expert audience, due to which people with inadequate health literacy often find it difficult to access, comprehend, and act upon this information. Medical text simplification aims to alleviate this problem by computationally simplifying medical text. Most text simplification methods employ neural seq-to-seq models for this task. However, training such models requires a corpus of aligned complex and simple sentences. Creating such a dataset manually is effort intensive, while creating it automatically is prone to alignment errors. To overcome these challenges, we propose a denoising autoencoder based neural model for this task which leverages the simplistic writing style of medical social media text. Experiments on four datasets show that our method significantly outperforms the best known medical text simplification models across multiple automated and human evaluation metrics. Our model achieves an improvement of up to 16.52% over the existing best performing model on SARI which is the primary metric to evaluate text simplification models.
Supplemental Material
- Emil Abrahamsson, Timothy Forni, Maria Skeppstedt, and Maria Kvist. 2014. Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR@EACL 2014, Gothenburg, Sweden, April 27, 2014. 57--65. https://doi.org/10.3115/v1/W14--1207Google ScholarCross Ref
- Viraj Adduru, Sadid A. Hasan, Joey Liu, Yuan Ling, Vivek V. Datla, Ashequl Qadir, and Oladimeji Farri. 2018. Towards Dataset Creation And Establishing Baselines for Sentence-level Neural Clinical Paraphrase Generation and Simplification. In Proceedings of the 3rd International Workshop on Knowledge Discovery in Healthcare Data co-located with the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI 2018), Stockholm, Schweden, July 13, 2018. 45--52. http://ceur-ws.org/Vol-2148/paper07.pdfGoogle Scholar
- Alan R Aronson. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.. In Proceedings of the AMIA Symposium. American Medical Informatics Association, 17.Google Scholar
- Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65--72.Google Scholar
- William Coster and David Kauchak. 2011. Simple English Wikipedia: a new text simplification task. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2. Association for Computational Linguistics, 665--669.Google Scholar
- Mark Davies. 2014. N-grams data from the Corpus of Contemporary American English (COCA).Google Scholar
- William Hwang, Hannaneh Hajishirzi, Mari Ostendorf, and Wei Wu. 2015. Aligning Sentences from Standard Wikipedia to Simple Wikipedia. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Denver, Colorado, 211--217. https://doi.org/10.3115/v1/N15-1022Google ScholarCross Ref
- Dorothy Curtis Kandula, Sasikiran and Qing Zeng-Treitler. 2010. A semantic and syntactic text simplification tool for health content.. In AMIA annual symposium proceedings. Vol. 2010. American Medical Informatics Association.Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014.Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 605.Google ScholarDigital Library
- Donald AB Lindberg, Betsy L Humphreys, and Alexa T McCray. 1993. The unified medical language system. Yearbook of Medical Informatics, Vol. 2, 01 (1993), 41--51.Google ScholarCross Ref
- Carolyn E Lipscomb. 2000. Medical subject headings (MeSH). Bulletin of the Medical Library Association, Vol. 88, 3 (2000), 265.Google Scholar
- Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google Scholar
- Sergiu Nisioi, Sanja vS tajner, Simone Paolo Ponzetto, and Liviu P. Dinu. 2017. Exploring Neural Text Simplification Models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 85--91. https://doi.org/10.18653/v1/P17--2014Google Scholar
- Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019).Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.Google Scholar
- Ellie Pavlick and Chris Callison-Burch. 2016. Simple PPDB: A paraphrase database for simplification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 143--148.Google ScholarCross Ref
- Matt Post. 2018. A Call for Clarity in Reporting BLEU Scores. In Proceedings of the Third Conference on Machine Translation: Research Papers. Association for Computational Linguistics, Belgium, Brussels, 186--191. https://www.aclweb.org/anthology/W18--6319Google ScholarCross Ref
- Basel Qenam, Tae Youn Kim, Mark J Carroll, and Michael Hogarth. 2017. Text simplification using consumer health vocabulary to generate patient-centered radiology reporting: translation and evaluation. Journal of medical Internet research, Vol. 19, 12 (2017), e417.Google ScholarCross Ref
- Evelina Rennes and Arne Jönsson. 2015. A tool for automatic simplification of swedish texts. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015). 317--320.Google Scholar
- Guergana K Savova, James J Masanz, Philip V Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-Schuler, and Christopher G Chute. 2010. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, Vol. 17, 5 (2010), 507--513.Google ScholarCross Ref
- Matthew Shardlow and Raheel Nawaz. 2019. Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 380--389. https://doi.org/10.18653/v1/P19-1037Google ScholarCross Ref
- Advaith Siddharthan. 2006. Syntactic simplification and text cohesion. Research on Language and Computation, Vol. 4, 1 (2006), 77--109.Google ScholarCross Ref
- Luca Soldaini and Nazli Goharian. 2016. Quickumls: a fast, unsupervised approach for medical concept extraction. In MedIR workshop, sigir. 1--4.Google Scholar
- Bakhtiyar Syed, Gaurav Verma, Balaji Vasan Srinivasan, Vasudeva Varma, et almbox. 2019. Adapting Language Models for Non-Parallel Author-Stylized Rewriting. arXiv preprint arXiv:1909.09962 (2019).Google Scholar
- Özlem Uzuner, Brett R South, Shuying Shen, and Scott L DuVall. 2011. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, Vol. 18, 5 (2011), 552--556.Google ScholarCross Ref
- Raghuram Vadapalli, Bakhtiyar Syed, Nishant Prabhu, Balaji Vasan Srinivasan, and Vasudeva Varma. 2018. When science journalism meets artificial intelligence: An interactive demonstration. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 163--168.Google ScholarCross Ref
- Laurens van den Bercken, Robert-Jan Sips, and Christoph Lofi. 2019. Evaluating Neural Text Simplification in the Medical Domain. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). Association for Computing Machinery, New York, NY, USA, 3286--3292. https://doi.org/10.1145/3308558.3313630Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google Scholar
- Deborah X Xie, Ray Y Wang, and Sivakumar Chinnadurai. 2018. Readability of online patient education materials for velopharyngeal insufficiency. International journal of pediatric otorhinolaryngology, Vol. 104 (2018), 113--119.Google ScholarCross Ref
- Wei Xu, Chris Callison-Burch, and Courtney Napoles. 2015. Problems in current text simplification research: New data can help. Transactions of the Association for Computational Linguistics, Vol. 3 (2015), 283--297.Google ScholarCross Ref
- Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, and Chris Callison-Burch. 2016. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, Vol. 4 (2016), 401--415.Google ScholarCross Ref
- Sanqiang Zhao, Rui Meng, Daqing He, Saptono Andi, and Parmanto Bambang. 2018. Integrating transformer and paraphrase rules for sentence simplification. arXiv preprint arXiv:1810.11193 (2018).Google Scholar
- Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych. 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, 1353--1361.Google Scholar
Index Terms
- Leveraging Social Media for Medical Text Simplification
Recommendations
Evaluating Neural Text Simplification in the Medical Domain
WWW '19: The World Wide Web ConferenceHealth literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use ...
Extracting medical entities from social media
CHIL '20: Proceedings of the ACM Conference on Health, Inference, and LearningAccurately extracting medical entities from social media is challenging because people use informal language with different expressions for the same concept, and they also make spelling mistakes. Previous work either focused on specific diseases (e.g., ...
Best Practices in Social Media: Utilizing a Value Matrix to Assess Social Media's Impact on Health Care
This study examines the relationship of social media channel utilization (activity on blogs, content communities, and social networking sites, plus posting a social media policy) by health care organizations and the brand rating of those organizations, ...
Comments