Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Zheng, Junhao; Ma, Qianli; Qiu, Shengjie; Wu, Yue; Ma, Peitian; Liu, Junlong; Feng, Huawen; Shang, Xichen; Chen, Haibin

Computer Science > Computation and Language

arXiv:2306.10790 (cs)

[Submitted on 19 Jun 2023]

Title:Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Authors:Junhao Zheng, Qianli Ma, Shengjie Qiu, Yue Wu, Peitian Ma, Junlong Liu, Huawen Feng, Xichen Shang, Haibin Chen

View PDF

Abstract:Fine-tuning has been proven to be a simple and effective technique to transfer the learned knowledge of Pre-trained Language Models (PLMs) to downstream tasks. However, vanilla fine-tuning easily overfits the target data and degrades the generalization ability. Most existing studies attribute it to catastrophic forgetting, and they retain the pre-trained knowledge indiscriminately without identifying what knowledge is transferable. Motivated by this, we frame fine-tuning into a causal graph and discover that the crux of catastrophic forgetting lies in the missing causal effects from the pretrained data. Based on the causal view, we propose a unified objective for fine-tuning to retrieve the causality back. Intriguingly, the unified objective can be seen as the sum of the vanilla fine-tuning objective, which learns new knowledge from target data, and the causal objective, which preserves old knowledge from PLMs. Therefore, our method is flexible and can mitigate negative transfer while preserving knowledge. Since endowing models with commonsense is a long-standing challenge, we implement our method on commonsense QA with a proposed heuristic estimation to verify its effectiveness. In the experiments, our method outperforms state-of-the-art fine-tuning methods on all six commonsense QA datasets and can be implemented as a plug-in module to inflate the performance of existing QA models.

Comments:	ACL 2023 (oral paper)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.10790 [cs.CL]
	(or arXiv:2306.10790v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.10790

Submission history

From: Junhao Zheng [view email]
[v1] Mon, 19 Jun 2023 09:06:44 UTC (1,979 KB)

Computer Science > Computation and Language

Title:Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators