skip to main content
10.1145/3594806.3594828acmotherconferencesArticle/Chapter ViewAbstractPublication PagespetraConference Proceedingsconference-collections
research-article
Open Access

Human Experts’ Perceptions of Auto-Generated Summarization Quality

Published:10 August 2023Publication History

ABSTRACT

In this study we addressed automatic summarizations generated using modern artificial intelligence techniques. Several mathematical methods for evaluating the performance of automatic summarization exist. Such methods are commonly used as they allow many test cases to be assessed with little human effort as manual assessments are challenging and time consuming. One question is whether the output of such measures matches human perception of summarization quality. In this study we document a study involving the human evaluation of the automatic summarization of 22 academic texts. The unique aspect of this study is that our participants had strong familiarity with the texts as they had studied these texts in depth. The results are quite varied but do not give the impression of unanimous agreement that automatic summarizations are of high quality and are trusted.

References

  1. Mohammad Aljanabi, 2023. ChatGpt: Open Possibilities. Iraqi Journal For Computer Science and Mathematics, 2023, 4.1: 62-64.Google ScholarGoogle Scholar
  2. Ömer Aydin and Enis Karaarslan. 2022. OpenAI ChatGPT generated literature review: Digital twin in healthcare. Available at SSRN 4308687, 2022.Google ScholarGoogle Scholar
  3. Chidansh Bhatt, Andrei Popescu-Belis, and Matthew Cooper. 2016. Audiovisual Summarization of Lectures and Meetings Using a Segment Similarity Graph. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR '16). Association for Computing Machinery, New York, NY, USA, 261–264. https://doi.org/10.1145/2911996.2912047Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Som Biswas. 2023. ChatGPT and the Future of Medical Writing. Radiology, 2023, 223312.Google ScholarGoogle ScholarCross RefCross Ref
  5. Josieli Aparecida Marques Boiani, 2019. On the non-disabled perceptions of four common mobility devices in Norway: a comparative study based on semantic differentials. Technology and Disability, 2019, 31.1-2: 15-25.Google ScholarGoogle Scholar
  6. Kelly Caine. 2016. Local Standards for Sample Size at CHI. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). Association for Computing Machinery, New York, NY, USA, 981–992. https://doi.org/10.1145/2858036.2858498Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aline Darc Piculo dos Sandos, 2022. Aesthetics and the perceived stigma of assistive technology for visual impairment. Disability and Rehabilitation: Assistive Technology, 2022, 17.2: 152-158.Google ScholarGoogle Scholar
  8. Evelyn Eika, and Frode Eika Sandnes, 2022. Starstruck by journal prestige and citation counts? On students’ bias and perceptions of trustworthiness according to clues in publication references. Scientometrics, 2022, 127.11: 6363-6390.Google ScholarGoogle Scholar
  9. Thérèse Firmin and Inderjeet Mani. 1998. Automatic text summarization in TIPSTER. In Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998 (TIPSTER '98). Association for Computational Linguistics, USA, 179–180. https://doi.org/10.3115/1119089.1119119Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Simon Frieder, 2023. Mathematical Capabilities of ChatGPT. arXiv preprint arXiv:2301.13867, 2023.Google ScholarGoogle Scholar
  11. Mahak Gambhir and Vishal Gupta. 2017. Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 2017, 47: 1-66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Neslihan Iskender, Tim Polzehl, and Sebastian Moller. 2021. Reliability of human evaluation for text summarization: Lessons learned and challenges ahead. In: Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval). 2021. p. 86-96.Google ScholarGoogle Scholar
  13. Wenxiang Jiao, 2023. Is ChatGPT a good translator? A preliminary study. arXiv preprint arXiv:2301.08745, 2023.Google ScholarGoogle Scholar
  14. Hitesh Mohan Kaushik, Evelyn Eika, and Frode Eika Sandnes. 2020. Towards universal accessibility on the web: do grammar checking tools improve text readability?. In: Universal Access in Human-Computer Interaction. Design Approaches and Supporting Technologies: 14th International Conference, UAHCI 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part I 22. Springer International Publishing, 2020. p. 272-288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Farshad Kiyoumarsi. 2015. Evaluation of automatic text summarizations based on human summaries. Procedia-Social and Behavioral Sciences, 2015, 192: 83-91.Google ScholarGoogle ScholarCross RefCross Ref
  16. Sanghoon Lee, Sunny Shakya, Raj Sunderraman, and Saeid Belkasim. 2013. Real Time Micro-blog Summarization Based on Hadoop/HBase. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 03 (WI-IAT '13). IEEE Computer Society, USA, 46–49. https://doi.org/10.1109/WI-IAT.2013.148Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out. 2004. p. 74-81.Google ScholarGoogle Scholar
  18. Peng Li, Yinglin Wang, Wei Gao, and Jing Jiang. 2011. Generating aspect-oriented multi-document summarization with event-aspect model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, USA, 1137–1146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Selina Meyer, David Elsweiler, Bernd Ludwig, Marcos Fernandez-Pichel, and David E. Losada. 2022. Do We Still Need Human Assessors? Prompt-Based GPT-3 User Simulation in Conversational AI. In Proceedings of the 4th Conference on Conversational User Interfaces (CUI '22). Association for Computing Machinery, New York, NY, USA, Article 8, 1–6. https://doi.org/10.1145/3543829.3544529Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Karolina Owczarzak, 2012. An assessment of the accuracy of automatic evaluation in summarization. In: Proceedings of workshop on evaluation metrics and system comparison for automatic summarization. 2012. p. 1-9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Frode Eika Sandnes. 2021. HIDE: Short IDs for Robust and Anonymous Linking of Users Across Multiple Sessions in Small HCI Experiments. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA '21). Association for Computing Machinery, New York, NY, USA, Article 326, 1–6. https://doi.org/10.1145/3411763.3451794Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Teo Susnjak. 2022. ChatGPT: The End of Online Exam Integrity?. arXiv preprint arXiv:2212.09292, 2022.Google ScholarGoogle Scholar

Index Terms

  1. Human Experts’ Perceptions of Auto-Generated Summarization Quality

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments
      July 2023
      797 pages
      ISBN:9798400700699
      DOI:10.1145/3594806

      Copyright © 2023 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 August 2023

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format