The Impact of Unstated Norms in Bias Analysis of Language Models

Kohankhaki, Farnaz; Tian, Jacob-Junqi; Emerson, David; Seyyed-Kalantari, Laleh; Khattak, Faiza Khan

Computer Science > Computation and Language

arXiv:2404.03471 (cs)

[Submitted on 4 Apr 2024 (v1), last revised 7 Apr 2024 (this version, v2)]

Title:The Impact of Unstated Norms in Bias Analysis of Language Models

Authors:Farnaz Kohankhaki, Jacob-Junqi Tian, David Emerson, Laleh Seyyed-Kalantari, Faiza Khan Khattak

View PDF HTML (experimental)

Abstract:Large language models (LLMs), trained on vast datasets, can carry biases that manifest in various forms, from overt discrimination to implicit stereotypes. One facet of bias is performance disparities in LLMs, often harming underprivileged groups, such as racial minorities. A common approach to quantifying bias is to use template-based bias probes, which explicitly state group membership (e.g. White) and evaluate if the outcome of a task, sentiment analysis for instance, is invariant to the change of group membership (e.g. change White race to Black). This approach is widely used in bias quantification. However, in this work, we find evidence of an unexpectedly overlooked consequence of using template-based probes for LLM bias quantification. We find that in doing so, text examples associated with White ethnicities appear to be classified as exhibiting negative sentiment at elevated rates. We hypothesize that the scenario arises artificially through a mismatch between the pre-training text of LLMs and the templates used to measure bias through reporting bias, unstated norms that imply group membership without explicit statement. Our finding highlights the potential misleading impact of varying group membership through explicit mention in bias quantification

Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2404.03471 [cs.CL]
	(or arXiv:2404.03471v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.03471

Submission history

From: Faiza Khattak Dr. [view email]
[v1] Thu, 4 Apr 2024 14:24:06 UTC (884 KB)
[v2] Sun, 7 Apr 2024 21:55:38 UTC (884 KB)

Computer Science > Computation and Language

Title:The Impact of Unstated Norms in Bias Analysis of Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:The Impact of Unstated Norms in Bias Analysis of Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators