POSQA: Probe the World Models of LLMs with Size Comparisons

Shu, Chang; Han, Jiuzhou; Liu, Fangyu; Shareghi, Ehsan; Collier, Nigel

Computer Science > Computation and Language

arXiv:2310.13394 (cs)

[Submitted on 20 Oct 2023]

Title:POSQA: Probe the World Models of LLMs with Size Comparisons

Authors:Chang Shu, Jiuzhou Han, Fangyu Liu, Ehsan Shareghi, Nigel Collier

View PDF

Abstract:Embodied language comprehension emphasizes that language understanding is not solely a matter of mental processing in the brain but also involves interactions with the physical and social environment. With the explosive growth of Large Language Models (LLMs) and their already ubiquitous presence in our daily lives, it is becoming increasingly necessary to verify their real-world understanding. Inspired by cognitive theories, we propose POSQA: a Physical Object Size Question Answering dataset with simple size comparison questions to examine the extremity and analyze the potential mechanisms of the embodied comprehension of the latest LLMs.
We show that even the largest LLMs today perform poorly under the zero-shot setting. We then push their limits with advanced prompting techniques and external knowledge augmentation. Furthermore, we investigate whether their real-world comprehension primarily derives from contextual information or internal weights and analyse the impact of prompt formats and report bias of different objects. Our results show that real-world understanding that LLMs shaped from textual data can be vulnerable to deception and confusion by the surface form of prompts, which makes it less aligned with human behaviours.

Comments:	Accepted by EMNLP 2023 Findings
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as:	arXiv:2310.13394 [cs.CL]
	(or arXiv:2310.13394v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.13394

Submission history

From: Chang Shu [view email]
[v1] Fri, 20 Oct 2023 10:05:01 UTC (7,159 KB)

Computer Science > Computation and Language

Title:POSQA: Probe the World Models of LLMs with Size Comparisons

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:POSQA: Probe the World Models of LLMs with Size Comparisons

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators