Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

Tang, Xiangru; Zong, Yiming; Phang, Jason; Zhao, Yilun; Zhou, Wangchunshu; Cohan, Arman; Gerstein, Mark

Computer Science > Computation and Language

arXiv:2309.08963 (cs)

[Submitted on 16 Sep 2023 (v1), last revised 4 Apr 2024 (this version, v3)]

Title:Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

Authors:Xiangru Tang, Yiming Zong, Jason Phang, Yilun Zhao, Wangchunshu Zhou, Arman Cohan, Mark Gerstein

View PDF HTML (experimental)

Abstract:Despite the remarkable capabilities of Large Language Models (LLMs) like GPT-4, producing complex, structured tabular data remains challenging. Our study assesses LLMs' proficiency in structuring tables and introduces a novel fine-tuning method, cognizant of data structures, to bolster their performance. We unveil Struc-Bench, a comprehensive benchmark featuring prominent LLMs (GPT-NeoX-20B, GPT-3.5, GPT-4, and Vicuna), which spans text tables, HTML, and LaTeX formats. Our proposed FormatCoT aids in crafting format-specific instructions from the intended outputs to populate this benchmark. Addressing the gap in task-centered evaluation, we propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score), to more accurately gauge LLM performance. Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains, outshining its LLM counterparts across most measures. In-depth error analysis and creating an ability map across six dimensions -- coverage, formatting, reasoning, comprehension, pragmatics, and hallucination -- highlight areas for future enhancements and suggest forthcoming research trajectories. Our code and models can be found at this https URL.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2309.08963 [cs.CL]
	(or arXiv:2309.08963v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.08963

Submission history

From: Xiangru Tang [view email]
[v1] Sat, 16 Sep 2023 11:31:58 UTC (2,605 KB)
[v2] Tue, 19 Sep 2023 05:58:47 UTC (2,605 KB)
[v3] Thu, 4 Apr 2024 21:57:12 UTC (10,353 KB)

Computer Science > Computation and Language

Title:Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators