Data Transformation to Construct a Dataset for Generating Entity-Relationship Model from Natural Language

Li, Zhenwen; Lou, Jian-Guang; Xie, Tao

Abstract:In order to reduce the manual cost of designing ER models, recent approaches have been proposed to address the task of NL2ERM, i.e., automatically generating entity-relationship (ER) models from natural language (NL) utterances such as software requirements. These approaches are typically rule-based ones, which rely on rigid heuristic rules; these approaches cannot generalize well to various linguistic ways of describing the same requirement. Despite having better generalization capability than rule-based approaches, deep-learning-based models are lacking for NL2ERM due to lacking a large-scale dataset. To address this issue, in this paper, we report our insight that there exists a high similarity between the task of NL2ERM and the increasingly popular task of text-to-SQL, and propose a data transformation algorithm that transforms the existing data of text-to-SQL into the data of NL2ERM. We apply our data transformation algorithm on Spider, one of the most popular text-to-SQL datasets, and we also collect some data entries with different NL types, to obtain a large-scale NL2ERM dataset. Because NL2ERM can be seen as a special information extraction (IE) task, we train two state-of-the-art IE models on our dataset. The experimental results show that both the two models achieve high performance and outperform existing baselines.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2312.13694 [cs.CL]
	(or arXiv:2312.13694v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2312.13694

Computer Science > Computation and Language

Title:Data Transformation to Construct a Dataset for Generating Entity-Relationship Model from Natural Language

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators