skip to main content
10.1145/3397271.3401105acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Leveraging Social Media for Medical Text Simplification

Authors Info & Claims
Published:25 July 2020Publication History

ABSTRACT

Patients are increasingly using the web for understanding medical information, making health decisions, and validating physicians' advice. However, most of this content is tailored to an expert audience, due to which people with inadequate health literacy often find it difficult to access, comprehend, and act upon this information. Medical text simplification aims to alleviate this problem by computationally simplifying medical text. Most text simplification methods employ neural seq-to-seq models for this task. However, training such models requires a corpus of aligned complex and simple sentences. Creating such a dataset manually is effort intensive, while creating it automatically is prone to alignment errors. To overcome these challenges, we propose a denoising autoencoder based neural model for this task which leverages the simplistic writing style of medical social media text. Experiments on four datasets show that our method significantly outperforms the best known medical text simplification models across multiple automated and human evaluation metrics. Our model achieves an improvement of up to 16.52% over the existing best performing model on SARI which is the primary metric to evaluate text simplification models.

Skip Supplemental Material Section

Supplemental Material

3397271.3401105.mp4

mp4

24.9 MB

References

  1. Emil Abrahamsson, Timothy Forni, Maria Skeppstedt, and Maria Kvist. 2014. Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR@EACL 2014, Gothenburg, Sweden, April 27, 2014. 57--65. https://doi.org/10.3115/v1/W14--1207Google ScholarGoogle ScholarCross RefCross Ref
  2. Viraj Adduru, Sadid A. Hasan, Joey Liu, Yuan Ling, Vivek V. Datla, Ashequl Qadir, and Oladimeji Farri. 2018. Towards Dataset Creation And Establishing Baselines for Sentence-level Neural Clinical Paraphrase Generation and Simplification. In Proceedings of the 3rd International Workshop on Knowledge Discovery in Healthcare Data co-located with the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI 2018), Stockholm, Schweden, July 13, 2018. 45--52. http://ceur-ws.org/Vol-2148/paper07.pdfGoogle ScholarGoogle Scholar
  3. Alan R Aronson. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.. In Proceedings of the AMIA Symposium. American Medical Informatics Association, 17.Google ScholarGoogle Scholar
  4. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65--72.Google ScholarGoogle Scholar
  5. William Coster and David Kauchak. 2011. Simple English Wikipedia: a new text simplification task. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2. Association for Computational Linguistics, 665--669.Google ScholarGoogle Scholar
  6. Mark Davies. 2014. N-grams data from the Corpus of Contemporary American English (COCA).Google ScholarGoogle Scholar
  7. William Hwang, Hannaneh Hajishirzi, Mari Ostendorf, and Wei Wu. 2015. Aligning Sentences from Standard Wikipedia to Simple Wikipedia. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Denver, Colorado, 211--217. https://doi.org/10.3115/v1/N15-1022Google ScholarGoogle ScholarCross RefCross Ref
  8. Dorothy Curtis Kandula, Sasikiran and Qing Zeng-Treitler. 2010. A semantic and syntactic text simplification tool for health content.. In AMIA annual symposium proceedings. Vol. 2010. American Medical Informatics Association.Google ScholarGoogle Scholar
  9. Diederik P Kingma and Jimmy Ba. 2014.Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  10. Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 605.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Donald AB Lindberg, Betsy L Humphreys, and Alexa T McCray. 1993. The unified medical language system. Yearbook of Medical Informatics, Vol. 2, 01 (1993), 41--51.Google ScholarGoogle ScholarCross RefCross Ref
  12. Carolyn E Lipscomb. 2000. Medical subject headings (MeSH). Bulletin of the Medical Library Association, Vol. 88, 3 (2000), 265.Google ScholarGoogle Scholar
  13. Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google ScholarGoogle Scholar
  14. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google ScholarGoogle Scholar
  15. Sergiu Nisioi, Sanja vS tajner, Simone Paolo Ponzetto, and Liviu P. Dinu. 2017. Exploring Neural Text Simplification Models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 85--91. https://doi.org/10.18653/v1/P17--2014Google ScholarGoogle Scholar
  16. Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019).Google ScholarGoogle Scholar
  17. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.Google ScholarGoogle Scholar
  18. Ellie Pavlick and Chris Callison-Burch. 2016. Simple PPDB: A paraphrase database for simplification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 143--148.Google ScholarGoogle ScholarCross RefCross Ref
  19. Matt Post. 2018. A Call for Clarity in Reporting BLEU Scores. In Proceedings of the Third Conference on Machine Translation: Research Papers. Association for Computational Linguistics, Belgium, Brussels, 186--191. https://www.aclweb.org/anthology/W18--6319Google ScholarGoogle ScholarCross RefCross Ref
  20. Basel Qenam, Tae Youn Kim, Mark J Carroll, and Michael Hogarth. 2017. Text simplification using consumer health vocabulary to generate patient-centered radiology reporting: translation and evaluation. Journal of medical Internet research, Vol. 19, 12 (2017), e417.Google ScholarGoogle ScholarCross RefCross Ref
  21. Evelina Rennes and Arne Jönsson. 2015. A tool for automatic simplification of swedish texts. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015). 317--320.Google ScholarGoogle Scholar
  22. Guergana K Savova, James J Masanz, Philip V Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-Schuler, and Christopher G Chute. 2010. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, Vol. 17, 5 (2010), 507--513.Google ScholarGoogle ScholarCross RefCross Ref
  23. Matthew Shardlow and Raheel Nawaz. 2019. Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 380--389. https://doi.org/10.18653/v1/P19-1037Google ScholarGoogle ScholarCross RefCross Ref
  24. Advaith Siddharthan. 2006. Syntactic simplification and text cohesion. Research on Language and Computation, Vol. 4, 1 (2006), 77--109.Google ScholarGoogle ScholarCross RefCross Ref
  25. Luca Soldaini and Nazli Goharian. 2016. Quickumls: a fast, unsupervised approach for medical concept extraction. In MedIR workshop, sigir. 1--4.Google ScholarGoogle Scholar
  26. Bakhtiyar Syed, Gaurav Verma, Balaji Vasan Srinivasan, Vasudeva Varma, et almbox. 2019. Adapting Language Models for Non-Parallel Author-Stylized Rewriting. arXiv preprint arXiv:1909.09962 (2019).Google ScholarGoogle Scholar
  27. Özlem Uzuner, Brett R South, Shuying Shen, and Scott L DuVall. 2011. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, Vol. 18, 5 (2011), 552--556.Google ScholarGoogle ScholarCross RefCross Ref
  28. Raghuram Vadapalli, Bakhtiyar Syed, Nishant Prabhu, Balaji Vasan Srinivasan, and Vasudeva Varma. 2018. When science journalism meets artificial intelligence: An interactive demonstration. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 163--168.Google ScholarGoogle ScholarCross RefCross Ref
  29. Laurens van den Bercken, Robert-Jan Sips, and Christoph Lofi. 2019. Evaluating Neural Text Simplification in the Medical Domain. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). Association for Computing Machinery, New York, NY, USA, 3286--3292. https://doi.org/10.1145/3308558.3313630Google ScholarGoogle Scholar
  30. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google ScholarGoogle Scholar
  31. Deborah X Xie, Ray Y Wang, and Sivakumar Chinnadurai. 2018. Readability of online patient education materials for velopharyngeal insufficiency. International journal of pediatric otorhinolaryngology, Vol. 104 (2018), 113--119.Google ScholarGoogle ScholarCross RefCross Ref
  32. Wei Xu, Chris Callison-Burch, and Courtney Napoles. 2015. Problems in current text simplification research: New data can help. Transactions of the Association for Computational Linguistics, Vol. 3 (2015), 283--297.Google ScholarGoogle ScholarCross RefCross Ref
  33. Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, and Chris Callison-Burch. 2016. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, Vol. 4 (2016), 401--415.Google ScholarGoogle ScholarCross RefCross Ref
  34. Sanqiang Zhao, Rui Meng, Daqing He, Saptono Andi, and Parmanto Bambang. 2018. Integrating transformer and paraphrase rules for sentence simplification. arXiv preprint arXiv:1810.11193 (2018).Google ScholarGoogle Scholar
  35. Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych. 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, 1353--1361.Google ScholarGoogle Scholar

Index Terms

  1. Leveraging Social Media for Medical Text Simplification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2020
        2548 pages
        ISBN:9781450380164
        DOI:10.1145/3397271

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader