research-article

Open Access

Human Experts’ Perceptions of Auto-Generated Summarization Quality

Authors:
Maryam Lotfigolian

Department of Computer Science, Oslo Metropolitan University, Norway

Department of Computer Science, Oslo Metropolitan University, Norway

0009-0008-8490-1733
View Profile

,
Christos Papanikolaou

Department of Computer Science, Oslo Metropolitan University, Norway

Department of Computer Science, Oslo Metropolitan University, Norway

0009-0008-3650-9497
View Profile

,
Samaneh Taghizadeh

Department of Computer Science, Oslo Metropolitan University, Norway

Department of Computer Science, Oslo Metropolitan University, Norway

0009-0003-1696-7769
View Profile

,
Frode Eika Sandnes

Department of Computer Science, Oslo Metropolitan University, Norway

Department of Computer Science, Oslo Metropolitan University, Norway

0000-0001-7781-748X
View Profile

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive EnvironmentsJuly 2023Pages 95–98https://doi.org/10.1145/3594806.3594828

Published:10 August 2023Publication History

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments

Pages 95–98

ABSTRACT

In this study we addressed automatic summarizations generated using modern artificial intelligence techniques. Several mathematical methods for evaluating the performance of automatic summarization exist. Such methods are commonly used as they allow many test cases to be assessed with little human effort as manual assessments are challenging and time consuming. One question is whether the output of such measures matches human perception of summarization quality. In this study we document a study involving the human evaluation of the automatic summarization of 22 academic texts. The unique aspect of this study is that our participants had strong familiarity with the texts as they had studied these texts in depth. The results are quite varied but do not give the impression of unanimous agreement that automatic summarizations are of high quality and are trusted.

References

Mohammad Aljanabi, 2023. ChatGpt: Open Possibilities. Iraqi Journal For Computer Science and Mathematics, 2023, 4.1: 62-64.Google Scholar
Ömer Aydin and Enis Karaarslan. 2022. OpenAI ChatGPT generated literature review: Digital twin in healthcare. Available at SSRN 4308687, 2022.Google Scholar
Chidansh Bhatt, Andrei Popescu-Belis, and Matthew Cooper. 2016. Audiovisual Summarization of Lectures and Meetings Using a Segment Similarity Graph. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR '16). Association for Computing Machinery, New York, NY, USA, 261–264. https://doi.org/10.1145/2911996.2912047Google ScholarDigital Library
Som Biswas. 2023. ChatGPT and the Future of Medical Writing. Radiology, 2023, 223312.Google ScholarCross Ref
Josieli Aparecida Marques Boiani, 2019. On the non-disabled perceptions of four common mobility devices in Norway: a comparative study based on semantic differentials. Technology and Disability, 2019, 31.1-2: 15-25.Google Scholar
Kelly Caine. 2016. Local Standards for Sample Size at CHI. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). Association for Computing Machinery, New York, NY, USA, 981–992. https://doi.org/10.1145/2858036.2858498Google ScholarDigital Library
Aline Darc Piculo dos Sandos, 2022. Aesthetics and the perceived stigma of assistive technology for visual impairment. Disability and Rehabilitation: Assistive Technology, 2022, 17.2: 152-158.Google Scholar
Evelyn Eika, and Frode Eika Sandnes, 2022. Starstruck by journal prestige and citation counts? On students’ bias and perceptions of trustworthiness according to clues in publication references. Scientometrics, 2022, 127.11: 6363-6390.Google Scholar
Thérèse Firmin and Inderjeet Mani. 1998. Automatic text summarization in TIPSTER. In Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998 (TIPSTER '98). Association for Computational Linguistics, USA, 179–180. https://doi.org/10.3115/1119089.1119119Google ScholarDigital Library
Simon Frieder, 2023. Mathematical Capabilities of ChatGPT. arXiv preprint arXiv:2301.13867, 2023.Google Scholar
Mahak Gambhir and Vishal Gupta. 2017. Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 2017, 47: 1-66.Google ScholarDigital Library
Neslihan Iskender, Tim Polzehl, and Sebastian Moller. 2021. Reliability of human evaluation for text summarization: Lessons learned and challenges ahead. In: Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval). 2021. p. 86-96.Google Scholar
Wenxiang Jiao, 2023. Is ChatGPT a good translator? A preliminary study. arXiv preprint arXiv:2301.08745, 2023.Google Scholar
Hitesh Mohan Kaushik, Evelyn Eika, and Frode Eika Sandnes. 2020. Towards universal accessibility on the web: do grammar checking tools improve text readability?. In: Universal Access in Human-Computer Interaction. Design Approaches and Supporting Technologies: 14th International Conference, UAHCI 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part I 22. Springer International Publishing, 2020. p. 272-288.Google ScholarDigital Library
Farshad Kiyoumarsi. 2015. Evaluation of automatic text summarizations based on human summaries. Procedia-Social and Behavioral Sciences, 2015, 192: 83-91.Google ScholarCross Ref
Sanghoon Lee, Sunny Shakya, Raj Sunderraman, and Saeid Belkasim. 2013. Real Time Micro-blog Summarization Based on Hadoop/HBase. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 03 (WI-IAT '13). IEEE Computer Society, USA, 46–49. https://doi.org/10.1109/WI-IAT.2013.148Google ScholarDigital Library
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out. 2004. p. 74-81.Google Scholar
Peng Li, Yinglin Wang, Wei Gao, and Jing Jiang. 2011. Generating aspect-oriented multi-document summarization with event-aspect model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, USA, 1137–1146.Google ScholarDigital Library
Selina Meyer, David Elsweiler, Bernd Ludwig, Marcos Fernandez-Pichel, and David E. Losada. 2022. Do We Still Need Human Assessors? Prompt-Based GPT-3 User Simulation in Conversational AI. In Proceedings of the 4th Conference on Conversational User Interfaces (CUI '22). Association for Computing Machinery, New York, NY, USA, Article 8, 1–6. https://doi.org/10.1145/3543829.3544529Google ScholarDigital Library
Karolina Owczarzak, 2012. An assessment of the accuracy of automatic evaluation in summarization. In: Proceedings of workshop on evaluation metrics and system comparison for automatic summarization. 2012. p. 1-9.Google ScholarDigital Library
Frode Eika Sandnes. 2021. HIDE: Short IDs for Robust and Anonymous Linking of Users Across Multiple Sessions in Small HCI Experiments. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA '21). Association for Computing Machinery, New York, NY, USA, Article 326, 1–6. https://doi.org/10.1145/3411763.3451794Google ScholarDigital Library
Teo Susnjak. 2022. ChatGPT: The End of Online Exam Integrity?. arXiv preprint arXiv:2212.09292, 2022.Google Scholar

Index Terms

Human Experts’ Perceptions of Auto-Generated Summarization Quality
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Summarization

Recommendations

Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation

As the number of electronic biomedical textual resources increases, it becomes harder for physicians to find useful answers at the point of care. Information retrieval applications provide access to databases; however, little research has been done on ...
Read More
ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization
Abstract
Automatic text summarization is important in this era due to the exponential growth of documents available on the Internet. In the Vietnamese language, VietnameseMDS is the only publicly available dataset for this task. Although the dataset has ...
Read More
An Extractive Automatic Summarization Method for Chinese Long Text
Advanced Data Mining and Applications
Abstract
The extractive automatic summarization method is capable of quickly and efficiently generating summaries through the steps of scoring, extracting and eliminating redundant sentences. Currently, most extractive methods utilize deep learning ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments
July 2023
797 pages
ISBN:9798400700699
DOI:10.1145/3594806

Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2023
Check for updates
Author Tags
Artificial intelligence
Automatic summarization
ChatGPT
Evaluation
GPT-3
Language model
NLP
Quality
User perception
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 218
  Total Downloads
- Downloads (Last 12 months)218
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Human Experts’ Perceptions of Auto-Generated Summarization Quality

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation

ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization

An Extractive Automatic Summarization Method for Chinese Long Text

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Human Experts’ Perceptions of Auto-Generated Summarization Quality

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation

ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization

An Extractive Automatic Summarization Method for Chinese Long Text

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media