Investigating Text Shortening Strategy in BERT: Truncation vs Summarization

Mutasodirin, Mirza Alim; Prasojo, Radityo Eko

doi:10.1109/ICACSIS53237.2021.9631364

Computer Science > Computation and Language

arXiv:2403.12799 (cs)

[Submitted on 19 Mar 2024]

Title:Investigating Text Shortening Strategy in BERT: Truncation vs Summarization

Authors:Mirza Alim Mutasodirin, Radityo Eko Prasojo

View PDF HTML (experimental)

Abstract:The parallelism of Transformer-based models comes at the cost of their input max-length. Some studies proposed methods to overcome this limitation, but none of them reported the effectiveness of summarization as an alternative. In this study, we investigate the performance of document truncation and summarization in text classification tasks. Each of the two was investigated with several variations. This study also investigated how close their performances are to the performance of full-text. We used a dataset of summarization tasks based on Indonesian news articles (IndoSum) to do classification tests. This study shows how the summaries outperform the majority of truncation method variations and lose to only one. The best strategy obtained in this study is taking the head of the document. The second is extractive summarization. This study explains what happened to the result, leading to further research in order to exploit the potential of document summarization as a shortening alternative. The code and data used in this work are publicly available in this https URL.

Comments:	The 13th International Conference on Advanced Computer Science and Information Systems (ICACSIS 2021)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2403.12799 [cs.CL]
	(or arXiv:2403.12799v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.12799
Related DOI:	https://doi.org/10.1109/ICACSIS53237.2021.9631364

Submission history

From: Mirza Alim Mutasodirin [view email]
[v1] Tue, 19 Mar 2024 15:01:14 UTC (155 KB)

Computer Science > Computation and Language

Title:Investigating Text Shortening Strategy in BERT: Truncation vs Summarization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Investigating Text Shortening Strategy in BERT: Truncation vs Summarization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators