A Question-Answering Model Based on Knowledge Graphs for the General Provisions of Equipment Purchase Orders for Steel Plants Maintenance

: Recently, equipment replacement and maintenance repair and operation (MRO) optimization have substantially increased owing to the aging and deterioration of industrial plants, such as steel-making factories in Korea. Therefore, plant owners are required to quickly review equipment supply contracts, i.e


Purchasing Order Contracts for Equipment
A contract is defined as an agreed-upon promise or series of promises between parties for which the law imposes a legal obligation [1]. Checking contract information, such as contractual clauses by defining entity (contractual clause) relationships. To search for semantically similar clauses within equipment PO among various documents within the company, the recent technological trend in digital transformation was employed. For this purpose, a General Provisions Question-Answering Model (GPQAM) was developed, combining a knowledge graph (KG) model based on a GP taxonomy and lexicon with semantic concepts and a pattern-matching-based question-answering (QA) model. The GPQAM leverages the KG framework to enhance the understanding and retrieval of interconnected contractual provisions based on their semantic relationships.
The GPQAM is based on the KG, which expresses contract content in the form of a graph so that users can intuitively understand it [20]. The KG then saves the data to a structured database which can provide reliable answers to user questions. To build a GPQAM, the authors first classified the contract GP into a GP taxonomy. The authors used these results to develop a GP lexicon. After the KG transforms the items in the GP lexicon into nodes, it is developed by establishing relationships between each class of items. To find contract information within the KG, the GPQAM first uses pattern matching and then extracts entities. If it fails to extract entities based on the inputted questions, it uses similarity measurements to select the node with the meaning that is the most similar to the question. Answers are obtained from the KG using the extracted entity and the selected node. In addition, the authors developed a GPQAM platform for users' convenience.

Literature Review
This study aims to enhance the efficiency of information retrieval by developing a GPQAM that combines KG and QA technologies to search for semantically connected contractual clauses. To this end, previous studies were classified into three categories and then investigated. The first is a study that analyzed contracts by applying AI and the second is a study on general QA development. The third is a study on QA development that expressed and utilized knowledge in KG. Thus, the characteristics of previous studies were reviewed, and their limitations were analyzed as benchmarks in this study.

Bidding Document Analysis Applying Artificial Intelligence (AI)
Lee and Yi [21] improved prediction accuracy by developing a risk prediction model that included text data to predict uncertain risks in the bidding process of a construction project. Naji et al. [22] analyzed the cause of the change order using the Adaboost technique to minimize the cost increase in an Iraqi construction project. Kim et al. [18] proposed an analytic hierarchy process (AHP)-fuzzy inference system (FIS) model to support decision making in the risk assessment and mitigation of overseas steel plant projects. The proposed model is a useful tool for the risk assessment of steel projects, but it has limitations in that it is based on the subjective opinions of experts. Lee et al. [23] presented an automatic model of contract-risk extraction based on natural language processing (NLP) that could automatically detect unbalanced clauses in contracts to support contract management by construction companies. Marzouk and Enaba [24] developed a dynamic text analytical for contract and correspondence (DTA-CC) model using building information modeling (BIM) to visually analyze construction project contracts and efficiently understand the obligations of each party. Son and Lee [25] developed a schedule delay estimate model (SDEM) that predicts project schedule delays by applying text mining technology to bidding documents of 13 offshore oil and gas EPC projects. However, generalization has limitations since only 13 case studies were used to develop this model. Lee et al. [26] developed a proactive risk assessment model to identify whether a clause favorable to the contractor was omitted from the contract clause modified by the owner. Losada-Maseda et al. [27] conducted a study to optimize operational expenditure (OPEX) by determining the elements that should be included in contract writing in an energy infrastructure construction project. Choi et al. [28] developed an engineering machine learning automation platform (EMAP) to which machine learning (ML) technology and data generated in the bidding, engineering, construction, operation, and maintenance stages of an EPC project were applied, thereby strengthening the risk response at each stage of the project. Fantoni et al. [29] implemented a method to automatically detect, extract, segment, and assign information from tender documents to convert the contract terms of a railway project into technical specifications, thereby improving its performance over that of existing solutions. Choi et al. [30] improved the accuracy of risk clause extraction by developing critical risk check (CRC) and term frequency analysis (TFA) modules for the risk analysis of contractors in the invitation to bid (ITB) of the EPC project. Jang et al. [31] developed a model that classifies the level of bid price volatility as a risk factor through parameters and ML in Caltrans' bid summary and pre-bid description documents. Moon et al. [19] developed an automatic information extraction model by applying named entity recognition (NER) technology to automatically extract information from construction specifications and contributed to the automation of the construction specification review process. Park et al. [32] developed technical risk extraction (TRE) and design parameter extraction (DPE) modules by applying ML technology for technical specification risk analysis of EPC projects, thereby improving risk extraction accuracy. For the risk analysis of contractors in EPC contracts, Choi and Lee [11] developed a semantic analysis (SA) model by applying NLP technology and a risk level ranking (RLR) model applying bidirectional long short-term memory (Bi-LSTM), thereby enabling a timely response at the bidding stage. Kim et al. [12] presented a purchase order recognition and analysis system (PORAS) that uses AI to automatically detect and compare risk clauses between plant owner and supplier POs.

General Question Answering
Do et al. [33] proposed a model using the features of latent semantic indexing (LSI), Manhattan distance, and Jaccard distance to build QA systems for the Japan Civil Code. Kim et al. [34] developed legal QA systems by combining a ranking support vector machine (SVM) model and convolutional neural network (CNN) model. Sadhuram and Soni [35] developed NLP-based factoid QA systems to answer various user questions. Sinha et al. [36] developed a chatbot using an unsupervised machine learning technique to recognize diseases based on information about health conditions or symptoms provided by patients in the medical field. Veisi and Shandi [37] developed question processing, document retrieval, and answer extraction modules that applied rule-based methods and NLP technology in Persian in the medical field. Zhong et al. [38] developed an end-to-end methodology that combines NLP technology and deep learning models to improve the efficiency and effectiveness of searching queries related to building regulations. To examine long-term QA-matching technology based on deep learning for psychological counseling, Chen and Xu [39] improved the matching effect by developing a deep structured semantic model (DSSM) using a bidirectional gate recurrent unit (BiGRU) and a double attention layer. Gholami and Noori [40] presented a new solution for zero-shot open-book QA. Noraset et al. [41] developed QA systems using the Bi-LSTM model for Thai users. Song et al. [42] presented an attention model for estimating the difficulty of a given question and achieved a higher performance than previous studies and a pre-trained language model. Tasi et al. [43] developed chatbot systems to support mine safety procedures in the event of a natural disaster. As a result of evaluating the efficiency of the procedure before and after the introduction of the system, reductions of an average of 55.8 min were produced. Zhou and Zhang [44] developed a medical QA model based on bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 2 (GPT-2), and text-to-text transfer transformer (T5) models, thereby showing improved performance compared to existing systems. Wang et al. [45] suggested a classification method based on a lite BERT (ALBERT) and match-long short-term memory (match-LSTM) models to improve the performance of data classification.

Knowledge Graph-Based Question Answering
Fawei et al. [46] developed systems that applied legal ontologies and rules to automate the process of judging legal cases. Gao and Li [47] developed systems using the Bi-LSTM-conditional random field (CRF) model, the term frequency inverse document frequency (TFIDF) algorithm, and Word2Vec to efficiently query traditional Chinese medicine knowledge. Huang et al. [48] developed the first Chinese legal QA system based on KG to solve the problems of existing QA systems that lack domain expertise . To build intelligent Chinese QA systems in the field of film culture, Shuai and Zhang [49] developed QA systems using the Neo4j graph database (GDB) to store data and apply the jieba word segmentation tool and naive Bayes model. Do et al. [50] developed QA systems to which KG and LSTM models were applied in Vietnamese to solve the problem of open-domain QA systems providing significant unnecessary information to users. To solve the problem that many QA methods have low entity and relation recognition effects and rely on predefined rules, Jiang et al. [51] developed BERT-Bi-LSTM-CRF and BERT-Bi-LSTM models and improved their performance in entity and relation recognition. Huang et al. [52] developed knowledge-graph-based question answering (KGQA) by applying the path-ranking algorithm (PRA), which showed improved performance over the latest method, thereby solving the problems of traditional knowledge-based QA, which relies on various historical cases and requires significant manpower. Jiang et al. [53] developed systems applying the Aho-Corasick (AC) algorithm and naive Bayes model to meet the high-efficiency QA needs of patients and doctors; however, the configuration and search speed of KG need to be improved. Li et al. [54] presented a single-layer feed-forward neural network combining a soft histogram and self-attention (SHSA) to extract the predicates included in the question, improving its performance over that of state-of-the-art solutions. Yang et al. [55] developed KG-based intelligent QA systems using the reverse maximum matching (RMM) algorithm, CRF, and the TFIDF algorithm to solve the problems in QA of existing high schools. Yu et al. [56] built a KG with a small amount of military aircraft data and developed QA systems using the AC algorithm to solve the problem of the lack of visual query methods in small sample domain data. Li et al. [57] improved the accuracy of answers to complex questions by implementing QA systems based on the BERT model, which could answer questions related to scenic spot information. Yin et al. [58] developed a KG for hepatitis B to extract structured medical knowledge and developed QA systems using deep-learning-based Bi-LSTM and CRF models and Word2Vec. Cha et al. [59] proposed a Purchase Order Knowledge Retrieval Model (POKREM) that applied KG technology to equipment PO documents of steel plants. However, this study has limitations in that a person must recognize the contents of the PO document and then create a CSV file to generate information in the database. The question must be made only through a query, not a natural language.
By reviewing previous studies, the authors attempted to benchmark the strengths of prior studies and improve the limitations. As a result of the review of previous studies, a majority of the studies that analyzed contracts by applying AI extracted risk clauses from contracts by applying NLP technology. No research has been found to apply the KG-based QA function to contracts to check contract information. Additionally, most related studies were big-data-based supervised or semi-supervised learning models.
A sufficient amount of training data is needed to train an ML model. Collecting contracts for training requires a lot of time and effort. However, in certain domains, such as legal documents, there is a limitation in collecting sufficient data owing to security issues. This study develops GPQAM as an unsupervised learning method that uses a patternmatching algorithm and a pre-trained word-embedding model. It presents a research method that can be used even in a domain with insufficient data. General QA and KGQA were compared to apply the QA function to the contract. KGQA has been significantly applied in the medical field, which requires specialized knowledge, as does the legal field, rather than general QA. KGQA has the advantage that anyone can identify expertise intuitively and quickly using KG to store and display data. KGQA, as a case of finding an answer in a structured database, has the advantage that the accuracy of the answer is high if accurate information is stored in the database. Through this study, the inappropriate content of the contract is visually expressed by applying KG to the contract so that users can easily grasp the contract clauses and expect accurate answers to queries. Accordingly, the authors proposed a GPQAM based on KG.

Research Framework
This section describes the overall research framework. It consists of the modeling process and development environment of the GPQAM.

Research Framework and Model Process
This study identified the need for a model that efficiently enables users to find contact information in contracts and developed a GPQAM that combines KG and QA technologies. The study targeted GP, a standard contract used for equipment purchase and installation by Company P. In general, contracts are difficult to collect for security reasons, but GP was selected as the subject of the study because it has the advantage that anyone can easily obtain it from Company P 's homepage related to purchasing. The GPQAM was developed by dividing it into KG and QA sub-models.
To express the GP contract clauses in KG, first, the proof of concept (PoC) of this study was selected, and then the GP taxonomy was developed to express the GP hierarchically. For the development of the GP taxonomy, the provisions of the GP were classified from provisions of the same category, and then the sub-levels were sub-divided. The GP lexicon was developed based on the GP taxonomy and was used to construct the KG. The KG was developed by creating nodes by class and setting relationships and then stored in the Neo4j GDB.
QA is a model that finds and answers the contract information within a KG. It was first developed using a pattern-matching method for entity recognition in the question. Furthermore, if pattern matching was not possible, a node with the meaning most similar to the query was selected using similarity measurement. Similarity measurements were performed in the following order: preprocessing, word embedding, and Euclidean distance measurement. Using the extracted entities and selected nodes, a cipher query statement was created, and an answer was obtained from the KG.
This study was implemented in four stages: KG model (Section 4), QA model (Section 5), Test and Validation (Section 6), and Platform Development (Section 7). Section 4 describes the development of the KG of GPQAM. Section 4.1 introduces the definition and advantages of the KG. The contract and PoC, which are the subjects of this study, are explained in Section 4.2. Section 4.3 explains the development of the GP taxonomy to express the GP as a KG in graph form. The GP taxonomy set seven groups as class 1, and based on this, sub-levels were classified and systematized in detail. The development of the GP lexicon by finding the synonym of the word corresponding to the lowest level of GP taxonomy is explained in Section 4.4. The GP lexicon was used to develop KG, and synonyms of the GP lexicon were used for the entity recognition of questions in QA. Section 4.5 describes the development of KG by converting the entities of the GP lexicon into nodes and then setting the relationships between each class. The developed KG was stored in Neo4j GDB.
Section 5 explains the QA development of GPQAM. Section 5.1 briefly introduces the definition and classification of QA systems, the advantages of KGQA, and the development process of QA. Pattern matching is performed in Section 5.2 to extract the terms of the GP lexicon from the user's questions. The AC algorithm was used for pattern matching. The AC algorithm is a multipattern-matching algorithm. Entities with the same patterns were extracted from questions by saving the patterns in a tree-type data structure composed of links and nodes. Section 5.3 explains the similarity measurements. The similarity measurement used in this study was to find two words with the most similar vector values. The similarity measurement process is as follows: First, the user's questions were preprocessed, and then the average value of the vector of meaningful words in the questions was obtained. Nodes with the most similar meaning to the questions were selected by measuring each node's vector value and Euclidean distance in the KG. Cypher query statement generation is explained in Section 5.4. The entities extracted in Section 5.3 and the nodes selected in Section 5.4 were changed to the names of nodes with the same meaning in KG, and then a cypher query statement was created for use in the Neo4j GDB. Section 5.5 shows the modeling results of QA.
Section 6 discusses the test and validation of the GPQAM developed in this study. First, to quantitatively evaluate the GPQAM, 45 test questions were created, as detailed in Section 6.1. The evaluation metrics used in this study are described in Section 6.2. Section 6.3 consists of the validation of the test results along with subject matter experts (SMEs). Section 6.4 reviews the applicability of GPQAM to the field through a focus group interview (FGI).
Section 7 describes the web server platform developed through this study. Section 7.1 shows the configuration and flow of the platform designed considering user convenience, and the description of the user interface is in Section 7.2. The GPQAM modeling process is summarized in Figure 1.
terns were extracted from questions by saving the patterns in a tree-type data structure composed of links and nodes. Section 5.3 explains the similarity measurements. The similarity measurement used in this study was to find two words with the most similar vector values. The similarity measurement process is as follows: First, the user's questions were preprocessed, and then the average value of the vector of meaningful words in the questions was obtained. Nodes with the most similar meaning to the questions were selected by measuring each node's vector value and Euclidean distance in the KG. Cypher query statement generation is explained in Section 5.4. The entities extracted in Section 5.3 and the nodes selected in Section 5.4 were changed to the names of nodes with the same meaning in KG, and then a cypher query statement was created for use in the Neo4j GDB. Section 5.5 shows the modeling results of QA.
Section 6 discusses the test and validation of the GPQAM developed in this study. First, to quantitatively evaluate the GPQAM, 45 test questions were created, as detailed in Section 6.1. The evaluation metrics used in this study are described in Section 6.2. Section 6.3 consists of the validation of the test results along with subject matter experts (SMEs). Section 6.4 reviews the applicability of GPQAM to the field through a focus group interview (FGI).
Section 7 describes the web server platform developed through this study. Section 7.1 shows the configuration and flow of the platform designed considering user convenience, and the description of the user interface is in Section 7.2. The GPQAM modeling process is summarized in Figure 1.

Development Environment of GPQAM
This study was implemented on Windows 10 operating systems (OS), and the GPQAM was developed by dividing it into KG and QA. Neo4j GDB was used to develop GPQAM's KG of GPQAM. Neo4j is the most popular DBMS according to the DB-Engines ranking [60]. The programming language used in the GPQAM QA development was Python, and the integrated developed environment (IDE) was Anaconda, the world's most popular open source Python distribution platform [61]. This study used Neo4j, the most popular GDB, and Anaconda, a Python platform, so that other researchers can easily benchmark it in the future. The libraries used in this study were the ahocorapy and spaCy libraries. ahocorapy is a Python library that was used to implement the AC algorithm. If a list of keywords is provided, it can confirm whether one or more keywords are present in the text [62]. spaCy is a free, open source library for advanced NLP in Python that helps build applications that process large amounts of text [63]. spaCy was used for similarity measurements in GPQAM. Table 1 summarizes the GPQAM development environment.

Knowledge Graph Model
This section concerns the KG of GPQAM and comprises five subsections. Section 4.1 introduces the definition and advantages of KG, and Section 4.2 explains the data used in the study. Sections 4.3 and 4.4 describe the development of the GP taxonomy and lexicon in detail. Section 4.5 shows the developed KG.
The KG development process was largely carried out in four stages: source data, GP taxonomy, GP lexicon development, and KG for GP. First, the data to be used for research were collected, and then the PoC was selected (Section 4.2). Next, the GP taxonomy was classified to express GP in the graph form of KG (Section 4.3). In addition, the GP lexicon was developed by finding synonyms of words corresponding to the lowest level of the GP taxonomy (Section 4.4). Finally, the KG was developed (Section 4.5) by establishing relationships between classes (Section 4.5.2) after converting the entities of the GP lexicon to nodes (Section 4.5.1). Figure 2 illustrates the development process of the KG model.

Concept of Knowledge Graph
KG, designed as a data structure for representing knowledge and connections, primarily takes the form of a graph-based data model that represents relationships between entities [64]. Clustering orders is a technique that analyzes order data using data mining and ML techniques and forms clusters by grouping orders with similar characteristics or attributes [65]. Through this, patterns and similarities of order data can be identified. Clustering aims to classify into distinct clusters or categories by grouping data with similar characteristics or similarities. This technique helps in identifying inherent structures or segments within the order data and provides valuable insights for decision making, customer segmentation, or targeted marketing strategies. However, the developed GPQAM in this study aims to automatically retrieve contract information. Therefore, a KG that represents relationships between entities and their meanings by inferring semantic connections among data is more suitable than clustering orders, which focuses on grouping data with similar characteristics.

Concept of Knowledge Graph
KG, designed as a data structure for representing knowledge and connections, primarily takes the form of a graph-based data model that represents relationships between entities [64]. Clustering orders is a technique that analyzes order data using data mining and ML techniques and forms clusters by grouping orders with similar characteristics or attributes [65]. Through this, patterns and similarities of order data can be identified. Clustering aims to classify into distinct clusters or categories by grouping data with similar characteristics or similarities. This technique helps in identifying inherent structures or segments within the order data and provides valuable insights for decision making, customer segmentation, or targeted marketing strategies. However, the developed GPQAM in this study aims to automatically retrieve contract information. Therefore, a KG that represents relationships between entities and their meanings by inferring semantic connections among data is more suitable than clustering orders, which focuses on grouping data with similar characteristics.
Additionally, KG describes nodes and their relationships and defines the possible classes of entities and relationships in a schema. KG is primarily used for knowledge expression, reasoning, and querying, with ontology capabilities that enable understanding the meaning of sentences through the definition of relationships between entities. In this study, a GPQAM was developed to find semantically connected contractual clauses by defining relationships between entities (contract clauses).
The advantages of a KG-based GPQAM are as follows: First, the KG stores data in a structured database, such that the database contains accurate information. The GPQAM can provide reliable answers to user questions by utilizing KG. Second, the KG stores and displays data in the form of a graph; thus, anyone can quickly and intuitively identify related information. Furthermore, users can visually check highly relevant information besides the information they are looking for. GPQAM has the advantage of expressing the problematic contents of the contract in the form of a graph, enabling users to intuitively grasp the contents of the contract and providing highly relevant information, allowing users to understand it. Third, for the relational database (RDB), the table is increased whenever new data are added, and all connections with existing data should be considered. However, for KG, adding, deleting, and changing data is simple, as they simply add nodes and establish relationships with the existing nodes. This flexibility of KG can help extend GPQAM to other clauses and contracts in the future. Finally, the RDB requires multiple data accesses while cross-referencing the related tables for questions that need to be answered through multiple references. However, KG has an advantage over RDB in terms of performance, as it is possible to answer questions with less access because data are all connected through relationships.

Source Data: PO General Provisions
Company P is a Korean steel manufacturer, and the general terms and conditions of Company P's facility purchasing PO are GP. A GP is a contract that applies to all contracts in which Company P purchases equipment and services in relation to plant construction. It is used when the scope of investment is limited and the performance requirements are general, such as obsolete facility replacement or purchasing a single piece of equipment and replacement [66]. Figure 3 illustrates the GP document of Company P, which serves as the source data for this study. Figure 3a represents the GP's cover page, containing the document's common information. The cover page typically includes the document's creation date, department, and contracting parties involved. Figure 3b displays the table of contents of the GP. The table of contents consists of 35 chapters, starting from 1. Definition and extending to 35. Special Instructions for Foreign Corrupt Practices Act, of which 26 are contract articles. The information on arbitration and governing law articles, which is the PoC of this study, is listed in the "22. Arbitration" and "28. Governing Law" chapters.
In this study, among the total 26 articles of the GP, the arbitration and governing law articles were selected as the GPQAM's PoC because the GPQAM development method can be expanded and applied to other articles after verification. An arbitration article is a contract article that is recommended to be resolved through arbitration when a dispute arises between the contracting parties. Arbitration can save time and money compared to litigation, a formal legal process [67], and can be conducted privately to maintain confidentiality. Furthermore, most countries under the New York Convention accept arbitration awards as the final dispute resolution procedure. The governing law clause specifies the law to be considered in interpreting a contract in the event of a dispute [68]. The interpretation may differ according to the applicable law, even for the same fact between the contracting parties. document's common information. The cover page typically includes the document's creation date, department, and contracting parties involved. Figure 3b displays the table of contents of the GP. The table of contents consists of 35 chapters, starting from 1. Definition and extending to 35. Special Instructions for Foreign Corrupt Practices Act, of which 26 are contract articles. The information on arbitration and governing law articles, which is the PoC of this study, is listed in the "22. Arbitration" and "28. Governing Law" chapters. In this study, among the total 26 articles of the GP, the arbitration and governing law articles were selected as the GPQAM's PoC because the GPQAM development method can be expanded and applied to other articles after verification. An arbitration article is a contract article that is recommended to be resolved through arbitration when a dispute arises between the contracting parties. Arbitration can save time and money compared to litigation, a formal legal process [67], and can be conducted privately to maintain confidentiality. Furthermore, most countries under the New York Convention accept arbitration awards as the final dispute resolution procedure. The governing law clause specifies the law to be considered in interpreting a contract in the event of a dispute [68]. The interpretation may differ according to the applicable law, even for the same fact between the contracting parties.
The following are some clauses of the arbitration article: "If the two (2) parties hereto fail to amicably settle such disputes, controversies or differences within a reasonable period of time, such disputes, controversies or differences shall be submitted to arbitration to be held in Seoul, Korea under the Commercial Arbitration The following are some clauses of the arbitration article: "If the two (2) parties hereto fail to amicably settle such disputes, controversies or differences within a reasonable period of time, such disputes, controversies or differences shall be submitted to arbitration to be held in Seoul, Korea under the Commercial Arbitration Rules of the Korean Commercial Arbitration Board and under the laws of the Republic of Korea".
The following is part of the governing law article: "The Contract shall be governed, interpreted and construed under the laws of the Republic of Korea".
Company P's GP stipulates that in case of a dispute, it should be referred to arbitration held in Seoul, Korea, in accordance with the Commercial Arbitration Rules of the Korean Commercial Arbitration Board (KCAB) and the laws of the Republic of Korea. The governing law is defined as the laws of the Republic of Korea.
The liquidated damages (LD) article stipulates the seller's compensation method and procedure for delayed delivery, preliminary acceptance certificate (PAC), or final acceptance certificate (FAC) issuance.

•
In the case of delayed delivery due to the Seller's reasons, a grace period of seven days for shipment and three days for air cargo is granted. However, if that period is exceeded, the buyer can claim LD equivalent to a maximum of 10% of the contract price at a rate of 0.1% per day from the first day of delay, regardless of the grace period.

•
In the case of the delay in issuing PAC or FAC due to the Seller's reasons, LD, including Delay Liquidated Damages (DLD), can be claimed by the buyer against the seller at a rate of 0.1% of the contract price per day, up to a maximum of 10%.

•
The total amount of LD for delayed delivery and delayed PAC or FAC issuance is limited to 15% of the contract price, which does not affect the buyer's other claim rights.
The termination and assurance article specifies the rights and procedures to terminate the contract in case a contracting party violates the contract and the buyer's rights in case of contract termination due to the seller's cause.

•
In the case of a material breach of contract by the Buyer or Seller, the other party may terminate the contract immediately after notifying its counterparty in writing.

•
If the contract is terminated due to the Seller's cause, the Buyer has the right to renew the unfinished part of the contract and must reimburse the Seller. Even if the Seller does not renew the incomplete part of the contract, the Seller's obligation to repay is not eliminated. The repayment amount is calculated at a reasonable market price at the end of the contract. Table 2 lists the table of contents of the GP, which consists of a total of 26 articles. Arbitration and governing law articles were used in the development of the GPQAM as the PoC in this study.

Proof of Concept: Arbitration Article
An arbitration article is an article of a contract that recommends resolving a dispute through arbitration. Arbitration is a part of the dispute resolution procedure. Arbitration is a single-trial system that can save time and money, and can be conducted as a confidential hearing that guarantees the confidentiality of the parties to the contract. Moreover, the decision is made by an arbitrator who is an expert in each field. The New York Convention is an abbreviation for the United Nations Convention on the Recognition and Enforcement of Foreign Arbitral Awards adopted by representatives of 48 countries at the United Nations (UN) headquarters in New York in 1958. Arbitration awards are based on the New York Convention and are accepted as legally effective in member countries. Accordingly, no further dispute resolution procedures are permitted in most cases once concluded through arbitration.
The GP stipulates that when a dispute arises between contracting parties, it must be referred to arbitration held in Seoul, Korea, in accordance with the Commercial Arbitration Rules of the KCAB and the laws of the Republic of Korea. KCAB is the only permanent statutory arbitration institution in Korea established to resolve and prevent disputes arising from domestic and international commercial transactions through arbitration [69]. The KCAB recommends the following Model Clause to increase arbitration efficiency and a speedy arbitration procedure: "Any disputes arising out of or in connection with this contract shall be finally settled by arbitration in accordance with the International Arbitration Rules of the Korean Commercial Arbitration Board".
"The number of arbitrators shall be [one/three]" "The seat, or legal place, of arbitral proceedings shall be [Seoul/Republic of Korea]" "The language to be used in the arbitral proceedings shall be [language]" The first clause concerns the arbitration rule and explains the rules to be applied in the event of a dispute. The remaining clauses provide examples of the number of arbitrators, arbitration venues, and language used in the arbitration.

Proof of Concept: Governing Law Article
The governing law article is a contract article that specifies the law to be considered when interpreting the contract in case of a dispute. International contracts represent deals between parties operating under different legal systems. Given the possibility of differences in opinion or disagreements regarding the interpretation of a contract's content, the contract should specify the law to be used as a basis for resolving disputes. The governing law is decided through agreements between parties to the contract. If a party agrees to abide by the laws of another party's country or a third country, they may find it difficult to fully grasp that country's legal system. Therefore, it is important to agree to use the laws of one's own home country as the governing law. If the parties to the contract do not reach an agreement regarding the governing law, then the governing law is determined in accordance with private international law. However, because the principles that are applied in such situations are complicated, it is important to reach an agreement on the governing law when signing a contract.
Governing law provisions generally include the phrase "without regard to conflict of law." Private international law (conflict of law) designates the governing law or law that determines which country's legal system will be used as a basis to determine the legal relationship between parties to a contract with different nationalities. If the contract stipulates that the laws of the Republic of Korea are the governing laws, for instance, the contracting parties interpret the contract under South Korean law. However, if the United Kingdom's laws are designated as the governing law in a South Korean private international law (conflict of laws), then the contract is interpreted under British law. This outcome is contrary to the parties' intentions to interpret legal relationships in accordance with South Korean law. The phrase "without regard to conflict of law" was inserted to prevent this from occurring. In other words, Korean law is applied as the governing law, but South Korean private international law is not.

GP Taxonomy
An ontology-based NLP approach is used to semantically connect the information of contract clauses in legal articles of complex contracts [11]. In this study, a taxonomy and lexicon were employed to incorporate the ontology concept that semantically connects contractual legal terms and contract clauses within the GP. Taxonomy refers to a hierarchical taxonomy in which things are organized into groups or types [70]. An example of a taxonomy is the European Union (EU) taxonomy, which is a classification system that aims to transform public and private investments into environmentally sustainable economic activities to achieve environmental goals [71]. Sovrano et al. [72] conducted research on developing KGs by constructing a taxonomy to efficiently use, query, and explore knowledge. Choi and Lee [11] conducted a study to develop a lexical dictionary by arranging a taxonomy to extract risks from EPC contracts. In this study, a hierarchical classification system, GP taxonomy, was developed to express GP in the graph form of KG. The GP taxonomy was used to develop the GP lexicon.
This study conducted a workshop with subject matter experts (SMEs), such as equipment planning engineers, equipment purchasing managers, and legal representatives with 10-15 years of experience, to help develop a GP taxonomy. SMEs classified all the provisions of the GP, formed provisions of the same category, and subdivided the sub-levels for each provision in the workshops. Table 3 provides information on the SMEs that participated in the development of the GP taxonomy. Regarding the GP taxonomy developed through workshops with SMEs, seven groups, that is, project information, project requirements, variation, project payment, project liabilities, project rights and termination, and laws and regulations were set as Class 1. Based on this, the sub-levels were classified in detail and systematized ( Figure 4). • The project liabilities of Class 1 can be divided into direct damages, indirect damages, LD cap, remedy, and limit of liability. Direct damages are the direct damages suffered by the project owner due to the breach of contract by the project contractor, and indirect damages are any or all damages that are not direct damages. The LD cap, which is generally 15-20% of the contract price, is the upper limit of the LD.

•
The direct damage to Class 2 can be divided into general damage and LD. General damages, which refer to actual damages suffered by the project owner due to a breach of contract by the project contractor, are also called actual damages. LD is a genuine pre-estimate, a fixed amount agreed upon by the contracting parties at the time of the contract to compensate the project owner in the event of a breach of contract, such as a delay in project completion. The genuine pre-estimate is a preliminary estimate that is fair and reasonable and based on the project owner's actual damages. Indirect damages can be divided into consequential damages and loss of profits. Damages to project stakeholders, such as equity investors, are consequential damage. The loss of profits is the loss of the project owner's expected profits owing to delays in project completion. • Class 3 LD can be divided into DLD, performance-liquidated damages (PLD), and key personnel LD. If the project contractor fails to meet the contract delivery date or the performance required by the contract, the projector owner must compensate for the damages. DLD are used for schedule delays, and PLD are used for underperformance. Key personnel LD is compensation for damages from key members who are contracted to participate in the project, failing to participate regularly in the project. For LD, the GP taxonomy was subdivided from Class 1 to a maximum of Class 4. • Figure 4 shows the GP taxonomy systematized for this study.  • The project liabilities of Class 1 can be divided into direct damages, indirect damages, LD cap, remedy, and limit of liability. Direct damages are the direct damages suffered by the project owner due to the breach of contract by the project contractor, and indirect damages are any or all damages that are not direct damages. The LD cap, which is generally 15-20% of the contract price, is the upper limit of the LD.

•
The direct damage to Class 2 can be divided into general damage and LD. General damages, which refer to actual damages suffered by the project owner due to a breach of contract by the project contractor, are also called actual damages. LD is a genuine pre-estimate, a fixed amount agreed upon by the contracting parties at the time of the contract to compensate the project owner in the event of a breach of contract, such as a delay in project completion. The genuine pre-estimate is a preliminary estimate that is fair and reasonable and based on the project owner's actual damages. Indirect damages can be divided into consequential damages and loss of profits. Damages to project stakeholders, such as equity investors, are consequential damage. The loss of profits is the loss of the project owner's expected profits owing to delays in project completion. • Class 3 LD can be divided into DLD, performance-liquidated damages (PLD), and key personnel LD. If the project contractor fails to meet the contract delivery date or the performance required by the contract, the projector owner must compensate for the damages. DLD are used for schedule delays, and PLD are used for underperformance. Key personnel LD is compensation for damages from key members who are contracted to participate in the project, failing to participate regularly in the project. For LD, the GP taxonomy was subdivided from Class 1 to a maximum of Class 4. • Figure 4 shows the GP taxonomy systematized for this study.

GP Lexicon Development
This study used a lexicon to establish an ontological relationship that semantically connects the information in the contract and expresses it consistently. A lexicon is the lexicon of a language or field of knowledge that provides lexical information [73]. The GP lexicon was developed using the GP taxonomy, and the GP taxonomy only classified GP hierarchically; however, the GP lexicon developed synonyms of words corresponding to the lowest level. In this respect, it differs from the GP taxonomy. Regarding the arbitration and governing law articles, which are the PoC in this study, Class 3 was set as the lowest level in the GP taxonomy. Among them, the synonyms of words corresponding to the lowest level were found in the EPC contract and GP, and the GP lexicon was developed. The GP lexicon was used to build KG, and synonyms of the GP lexicon were used for the entity recognition of questions in QA.
In order, the dispute resolution procedure consists of negotiation, mediation, conciliation, adjudication, arbitration, and litigation. Arbitration, as one of the means of alternative dispute resolution (ADR), is a step before litigation, which is an official litigation procedure. Arbitration rules include the International Chamber of Commerce (ICC), the United Nations Commission on International Trade Law (UNICITRAL), and American Arbitration Association (AAA) rules. The ICC rules are generally applied. An arbitration venue is a physical location where an arbitration tribunal is held, and the preferred cities are Paris, London, and New York, in order. A third party selected to make an arbitration award is called an arbitrator. An arbitration tribunal can consist of one or more than two arbitrators but usually consists of an odd number. Arbitration is recognized as the final dispute resolution procedure in most countries based on the New York Convention. Even if the contracting parties file an appeal to the court against the arbitration result, it is usually not accepted, except in extremely exceptional circumstances.
Governing law determines which country's law governs the legal relationship between the parties when the contracting parties are nationals of different countries or the performance of the contract involves multiple countries. Governing law can be divided into two types: common law and civil law. Common law is a legal system that originated in England and has spread to English-speaking countries and British colonies. It is based on case law, also called English law or case law. Civil law is a continental European legal system centered on Germany and France. It is also called French law, continental law, and Romano-Germanic law. Civil law differs from common law in that it is based on written law. A civil code is a legal code known as civil law and is enforced in countries using the civil law legal system. The Napoleonic code was created to simplify and codify all laws into one document. Table 4 illustrates some of the terms related to arbitration and the governing law article set as the PoC in this study with the GP lexicon developed in this study. The GP lexicon was used for developing KG.

Knowledge Graph Modeling for General Provisions
A KG is a large network composed of entities, properties, and relationships between entities [74]. The GDB used for KG development includes Neo4j, JanusGraph, TigerGraph, and Dgraph [60]. Neo4j is a graph database platform developed by Neo4j [75], which uses a cypher query language and can be accessed from software written in other languages through a transactional hypertext transfer protocol (HTTP) endpoint or binary bolt protocol [76,77]. Neo4j is the most popular DBMS, according to the DB-Engines ranking [60]. In this study, KG was developed using Neo4j, the most popular GDB, in consideration of future scalability. Neo4j is a graph database management system that stores, manages, and queries data based on the graph data model. Graph DB organizes data centered on entities and relationships between entities and can efficiently express interactions between complex data in various domains. In addition, Neo4j offers real-time data processing capabilities and provides immediate responses, enabling real-time exploration and analysis of relationships among data [74].

Node Creation of Knowledge Graph
Nodes are used to represent entities in a domain and can have properties consisting of key-value pairs [78]. In this study, nodes were created to express the entities of the GP lexicon in graph form. Methods for generating nodes can be divided into methods of generating nodes one by one and generating nodes of the same class at once using a comma-separated variables (CSV) file. In this study, nodes were generated using CSV files.
Arbitration and governing law articles, the PoC of this study, were classified up to Class 3 in the GP lexicon. The highest class of GPs was created to represent the entities corresponding to each class as nodes. Next, the nodes of Classes 1, 2, and 3 were sequentially created. Finally, the "Seoul", "KCAB Rules", and "Laws of the Republic of Korea" nodes corresponding to the terms of the GP lexicon were created. The cypher query language used to generate the nodes of Class 1 is as follows.
Load csv with headers from "file:///class1.csv" as class1 create (b1:class1 {name: class1.name, parent_class: class1.parent_class, contract: class1.contract}) Seven class 1 nodes were created: "Project Information," "Project Requirement," "Variation," "Project Payment," "Project Liabilities," "Law and Regulation," and "Rights and Termination." They are entities classified as class 1 in GP lexicon. Class 1 nodes have name and parent_class as properties. Node name was used for the purpose of distinguishing each node, and parent_class was configured to set the relationships between nodes at once ( Table 5). Twenty-seven nodes of Class 2, ten nodes of Class 3, and three nodes of terms were created in the same way as the nodes of Class 1.

Generation of Knowledge Graph Relationships
Relationships indicate the connection between nodes [78]. The relationship between nodes is a key function of the GDB through which related data can be easily found. In this study, relationships were established between nodes created in Section 4.5.1 for ease of data search. Furthermore, the relationships should be directed and set so that the two nodes are divided into source and target nodes and connected. In this study, relationships between upper-and lower-class nodes were created in both directions, such that questions in both directions could be answered.
The methods of creating relationships can be divided into a method of creating relationships one by one and a method of generating all relationships at once using a CSV file. Both methods were used in this study. Relationships between nodes of Classes 0 and 1, 1 and 2, and 2 and 3 were created using CSV files. Relationships between the nodes of Class 3 and terms were created individually. The cypher query language for creating relationships between the nodes of various classes and Class 1 is as follows.
Load csv with headers from "file:///class1.csv" as class1 match (a: The relationships created are "SubClassOf" and "contain". The relationship from nodes in Class 0 to those in Class 1 is "contain". The relationship between the nodes in Class 1 and those in Class 0 is "SubClassOf". Similarly, relationships were established between the nodes of Classes 1 and 2 and those of Classes 2 and 3.
In the developed KG, there are "Civil Law", "Common Law", and "Civil Code" nodes under the Class 2 "Governing Law" node. Under the "Civil Law" node, there is a "Laws of the Republic of Korea" node. Figure 5 shows the "Governing Law" node of Class 2, its sub-nodes, and relationships.
ships between the nodes of various classes and Class 1 is as follows.
In the developed KG, there are "Civil Law", "Common Law", and "Civil Code" nodes under the Class 2 "Governing Law" node. Under the "Civil Law" node, there is a "Laws of the Republic of Korea" node. Figure 5 shows the "Governing Law" node of Class 2, its sub-nodes, and relationships.  The finally constructed KG is composed of 48 nodes, including 1 node of Class 0, 7 nodes of Class 1, 27 nodes of Class 2, 10 nodes of Class 3, 3 nodes of terms, and 96 relationships. It was used as a database to store answers to user questions.

Question-Answering Model
This section discusses the QA of the GPQAM and consists of five subsections. Section 5.1 briefly introduces the definition, classification, and development processes of QA. Sections 5.2-5.4 describe the development of QA, and Section 5.5 presents the modeling results.

Question-Answering Systems
GPQAM applies KG and QA functions for information retrieval in GP documents. The QA function uses a prestructured database or natural language document, and is a technology that automatically answers questions asked in natural language [79]. QA systems can be classified according to the paradigm they implement [80].

•
Information retrieval (IR) QA: Filters and rankings are applied after searching for answers using a search engine. • NLP QA: Uses verbal intuition and the ML method to obtain answers from the snippet found. • Knowledge Base (KB) QA: Uses structured data sources such as ontology. Ontology describes the conceptual representation of concepts and their relationships in a particular domain.

•
Hybrid QA: This is a high-performance QA system that uses many types of resources and is a combination of IR QA, NLP QA, and KB QA.
Among 130 QA-related studies published from 2000 to 2017, a greater number of studies used KB and NLP QA paradigms than IR and Hybrid paradigms [81]. KGQA, which is a KBQA, a technology for finding answers in a structured data source, has the advantage of high reliability because accurate information is stored in the data source [82]. Furthermore, KGQA uses the GDB and is advantageous in terms of future scalability because it is easy to add, modify, and change data [83]. In this study, we developed a GPQAM model to which KGQA technology was applied to provide users with reliable answers and future scalability.
The QA development process was largely carried out in three steps: pattern matching, similarity measurement, and cypher query statement generation. First, entities were extracted from the user's questions using a pattern-matching algorithm (Section 5.2). If entities could not be extracted from questions, the average value of the vector of meaningful words was obtained after preprocessing the questions. Subsequently, nodes with the most similar meanings were selected by measuring the nodes and Euclidean distance (Section 5.3). Finally, a cypher query language was created using the extracted entities and selected nodes (Section 5.4). Figure 6 shows a flowchart illustrating the development process of QA.

Pattern Matching
Pattern matching was performed to extract entities identical to the GP lexicon synonyms from user questions. Pattern matching verifies whether one or more keywords (i.e., patterns) exist in the text.
Synonyms of entities for pattern matching were developed during the GP lexicon development stage. Table 6 presents an example of entities and their synonyms for pattern matching.

Pattern Matching
Pattern matching was performed to extract entities identical to the GP lexicon synonyms from user questions. Pattern matching verifies whether one or more keywords (i.e., patterns) exist in the text.
Synonyms of entities for pattern matching were developed during the GP lexicon development stage. Table 6 presents an example of entities and their synonyms for pattern matching. Pattern matching is an entity recognition technique applied to develop the QA model. Various techniques exist for implementing entity recognition. There is a method that uses a deep learning model, which is a training-based approach [58], one using the AC algorithm, a multipattern-matching algorithm [53], and one applying the Phrase-matcher function of spaCy, a Python open source library [30]. This study was limited because it could not secure sufficient data to use a training-based model. Moreover, the Phrase-matcher function had the limitation that entity recognition could not be performed if the entities and patterns of the question did not exactly match. For instance, a pattern called an arbitration rule could be identified by the model when the question included an entity called an arbitration rule, but it was not identified in the arbitration rule entity. The multipattern-matching algorithm includes AC and Commentz-Walter. The Commentz-Walter algorithm has the advantage of being faster than AC for a small number of patterns [84]. This study applied an entity recognition technique using the AC algorithm for future scalability, despite using a few multipatterns.
The AC algorithm is a multipattern-matching algorithm that extends the Knuth-Morris-Pratt (KMP) algorithm [84]. A trie, an output link, and a failure link should be used to implement the AC algorithm. A trie is a tree-type data structure composed of links and nodes to store patterns and search efficiently [85]. A pattern refers to the string to be found, and the trie consists of a link with one letter of the pattern. The trie configures the link to move along the same node until the prefixes of the patterns are similar and to move to another link when the letters are different. The AC algorithm finds patterns by matching questions and tries. The algorithm moves nodes while comparing the letters in the question with the letters configured in the link. It confirms the output link while moving, and if the output link is present, it notifies that a pattern has occurred. The output link is the node where the pattern ends. Reaching this node means that it has been matched with the string being sought. If there is no matching link while proceeding with the algorithm, it moves to the root node or another node, a failure link.
QA converts the synonyms of the GP lexicon into patterns and uses them. In addition, after extracting entities identical to patterns from user queries, the result is applied as the correct answer for the KG. The following is an example of pattern matching for a question using the AC algorithm: Patterns are synonyms of "venue", "site", "spot", and "place". ahocorapy, a Python library, was used to implement the AC algorithm, and the Python code written for trie creation is as follows.
from ahocorapy.keywordtree import KeywordTree kwtree = KeywordTree(case_insensitive = True) kwtree.add('venue') kwtree.add('site') kwtree.add('spot') kwtree.add('place') kwtree.finalize() The trie created patterns by entering "venue", "site", "spot", and "place" from the third to the sixth line of the Python code. The process of constructing a "venue" in a trie is as follows. The first letter of "Venue" is "v". Because there is no letter in the trie initially, "v" is formed in the link located after the root node and node 1 is created. While moving the link in the same way, all letters of "venue" are composed, and node 5 is displayed as an output link to indicate the end of the pattern. The rest of the patterns are organized in a trie in the same way. The Python code to find the patterns in a question is as follows. results = kwtree.search_all('Where is the place of arbitration?') for result in results: print(result) The question used in the example is, "Where is the place of arbitration?" Matching is carried out sequentially from the first to last letters of the question until "v", "s", and "p" are recognized in a question. When "s" is recognized in the question, the algorithm moves from the root node to node 6. However, since the next letter of "s" is not "i" or "p", it goes back to the root node. Matching is carried out sequentially from the next letter of "s" in the question, and when "p" is recognized, the algorithm moves from the root node to node 10. Since "l", "a", "c", and "e" are sequentially recognized in the question, it moves to the output link and recognizes that pattern matching has been achieved. Since the algorithm cannot recognize "e", "ce", "ace", and "lace" from other nodes, it returns to the root node and sequentially matches the remaining letters in the question (Figure 7). If pattern matching is not possible in the question, a similarity measurement is used.

Similarity Measurement
A similarity measurement technique was used to select nodes with meanings most similar to the user's questions. The purpose of the similarity measurement in this study is to find two words with the most similar vector values. Similarity measurements were carried out in the order of preprocessing, word embedding, and Euclidean distance measurement.
Preprocessing is performed to increase the speed and efficiency of text mining before analyzing the text data. Preprocessing methods include punctuation, number, and stopword removal [25]. This study was conducted by dividing preprocessing into two stages. First, it removed punctuation from the questions and converted all text to lowercase. Then, it removed the stopwords from the questions using the stopword list in the spaCy library. The spaCy library has 326 stopwords by default. The interrogative words "who", "when", "where", "what", "how", "why", and "which" are generally meaningless in declarative texts, but have significant meaning in interrogative sentences. Accordingly, they were removed from the stopwords list. Preprocessing was completed, and meaningful words from the questions were word embedded.
Word embedding is the foundation of NLP and represents words of text in R-dimensional vector space [86]. For word embedding, the authors used en_core_web_lg, which is a pre-trained word embedding model provided by the spaCy library. This model used If pattern matching is not possible in the question, a similarity measurement is used.

Similarity Measurement
A similarity measurement technique was used to select nodes with meanings most similar to the user's questions. The purpose of the similarity measurement in this study is to find two words with the most similar vector values. Similarity measurements were carried out in the order of preprocessing, word embedding, and Euclidean distance measurement.
Preprocessing is performed to increase the speed and efficiency of text mining before analyzing the text data. Preprocessing methods include punctuation, number, and stopword removal [25]. This study was conducted by dividing preprocessing into two stages. First, it removed punctuation from the questions and converted all text to lowercase. Then, it removed the stopwords from the questions using the stopword list in the spaCy library. The spaCy library has 326 stopwords by default. The interrogative words "who", "when", "where", "what", "how", "why", and "which" are generally meaningless in declarative texts, but have significant meaning in interrogative sentences. Accordingly, they were removed from the stopwords list. Preprocessing was completed, and meaningful words from the questions were word embedded.
Word embedding is the foundation of NLP and represents words of text in R-dimensional vector space [86]. For word embedding, the authors used en_core_web_lg, which is a pretrained word embedding model provided by the spaCy library. This model used OntoNotes 5, WordNet 3.0, and Wikipedia as training data and expressed 514,000 words as a 300-dimensional vector [87]. After obtaining the vector values of meaningful words in the questions using a pre-trained word-embedding model, the average value of words was calculated.
The Euclidean distance and cosine similarity are methods for measuring the similarity between two vectors. Euclidean distance measures the distance between two vectors, and cosine similarity measures the cosine angle between two vectors. Euclidean distance is widely used for text clustering [88]. In this study, the node of KG with the vector value most similar to the questions was selected using the Euclidean distance. The nodes in this study included single words and phrases that consisted of two words. For example, "Common Law", "Civil Law", and "Civil code" are lower nodes of the "Governing Law" node. Because these phrases did not have vector values, the vector values of the words constituting the phrases were obtained, and then the average was used in the study. However, this method is limited in that the meaning of actual phrases cannot be accurately expressed as a vector value. The nodes selected from the similarity measurements were used to create a cypher query statement.

Cypher Query Statement Generation
A cypher query statement was created to ask questions in the KG. Cypher query is a graph query language that allows users to store and retrieve data in a GDB [89]. This study used a predefined cypher query template and divided it into the following three categories: A cypher query statement was created by replacing the "first", "second", and "third" parts of the cypher query template with the names of nodes.

Results of Modeling
For QA, a cypher query statement was finally generated. This study used pattern matching and similarity measurements for cypher query statement generation.
The following describes the process of generating a cypher query statement through QA. The question used in the example is "Where will the arbitration be held in GP?" When a question was entered, entity recognition was performed using the AC algorithm. However, a similarity measurement was used because entity recognition was not possible in this question. When the input questions were preprocessed, "where", "arbitration", and "held" remained. The vector values of the three words were obtained, and the average value was obtained. For lower nodes of the "Arbitration" node, there are "Rule", "Place", "Language", "Tribunal", "Arbitrator", "Settlement", and "Litigation". Each lower node's average vector value and vector value were measured using Euclidean distance, and "Place" was selected as the most similar node. A cypher query statement was created by changing "Place" to "Venue" and entering "Arbitration" and "Venue" into the cypher query template. Table 7 lists the cypher query statements corresponding to the questions. Table 7. Questions and corresponding cypher query statements.

Questions Cypher Query Statements
Where will the arbitration be held in GP? The generated cypher query statement was used to find answers to questions in KG.

Test and Validation
This section describes the testing and validation of the GPQAM developed in this study. First, the questions and evaluation metrics used for testing are explained. Next, the test results are verified, and expert reviews are presented.

Test Data Setup
The performance evaluation of the GPQAM confirms the accuracy of the answering result of the model compared with the original GP text. Therefore, separate questions were generated to test the model. Among the GP documents of Company P used in this study, a total of 15 questions were extracted from the arbitration and governing law articles, 9 for arbitration and 6 for governing law. To evaluate the performance of the GPQAM model, more questions were needed. Accordingly, the authors generated an additional set of 30 test questions by referring to the arbitration, governing law, and LD articles in the EPC contracts of onshore petrochemical and offshore plants. In total, 45 questions were used for performance evaluation and were named the GP questionnaire.
The questions of the arbitration article were divided into arbitration venues and rules, and those of the governing law article was divided into governing law. Table 8 illustrates questions and answers related to arbitration and governing law articles, as some of the 45 questions of the GP questionnaire developed for this study. Table A1 in Appendix A presents the remainder of the GP questionnaire.

Evaluation Metrics for Model Test
The index for the performance evaluation of the GPQAM used the f1 score and was derived through a confusion matrix [80]. There are four variables in the confusion matrix: True Positive (TP), False Positive (FP), False Negative (FN), and True Negative (TN) [90]. The answer accuracy of the model can be quantitatively evaluated by calculating the precision, recall, and f1 score using the above four variables. In the confusion matrix, "Positive" and "Negative" indicate the correct and incorrect answers shown by the GPQAM, respectively. "True" and "False" mean correct and incorrect answers determined by humans. If both the results of the model and human judgment agree that the answer is correct, it means "TP", and if both agree that the answer is incorrect, it means "TN". FP and FN represent errors in the model resulting from incorrect answers. Table 9 presents the transformation of the confusion matrix suitable for this study to evaluate the accuracy of the GPQAM model. Table 8. A part of the GP questionnaire for performance evaluation of the GPQAM.

Question Category Questions for GPQAM Test Answers
Arbitration Venue Where will the arbitration be held in GP?

Seoul
Where will the arbitration take place in GP?
Where is the place of arbitration in GP? What should be the place of arbitration in GP? Where is the arbitration tribunal located in GP?
Where is the place of arbitration situated in GP?
Where is the location of arbitration in GP?

Arbitration Rule
What are arbitration rules in GP? KCAB Rules What is the rule of arbitration in GP?
Laws of the Republic of Korea

Governing Law
What is governing law in GP? Laws of the Republic of Korea What is the law governing GP? What law governs GP?

Seoul
Venue KCAB Rules Arbitration Rule Laws of the Republic of Korea Arbitration Rule, Civil Law Table 9. An example of a confusion matrix for GPQAM.

Results of Question Answering Actual Correct Answer Actual Incorrect Answer
Correct For descriptions and formulas for precision, recall, and f1 score, studies by Soares et al. and Yao were referenced [80,81]. Precision can be defined as the ratio of the number of answers that are actually correct to the number of answers predicted by GPQAM to be correct, and was calculated using Equation (1). Recall is the ratio of the number of answers predicted to be correct by the GPQAM to the actual number of correct answers, and was calculated using Equation (2). The F1 score is the tradeoff between precision and recall, which is the harmonic mean, that is, (Equation (3)).
The model test results of the GPQAM are presented in Section 6.3.

Test Result and Validation
To evaluate the performance of the GPQAM, validation of the test results was performed with SMEs who are experts in the related field. The SMEs included three engineers with 10-15 years of experience in equipment purchasing. For each question, whether the answer predicted by the GPQAM was correct or incorrect was confirmed. Consequently, the TP, TN, FP, and FN of the confusion matrix were constructed. As a result of the evaluation, TN was not observed because there were no cases where the model answered an actual incorrect answer as an incorrect answer. Table 10 summarizes the validation results for the GPQAM; the F1 score of GPQAM was measured to be 82.8%. Of the 45 questions used for performance evaluation, 35 were correctly predicted by the GPQAM. There were three questions with two correct answers. Of the ten questions that the GPQAM answered incorrectly, three were answered with one correct answer and one incorrect answer, respectively, and 41 TPs and 3 FPs were analyzed. For the remaining seven questions, the GPQAM had an error in deriving the wrong answer as the correct answer. In this case, FP and FN increased by one each; thus, 10 FPs and 7 FNs were counted.
The response accuracy of the GPQAM was examined for each question category of the GP questionnaire. A model accuracy of 86% was demonstrated by correctly answering 18 of 21 questions about the arbitration venue. There were cases where there was no pattern for the venue in the questions, but the interrogative word "where" was generally used to ask about a place for the question about the venue, and thus, the node most similar to the question was found through a similarity measurement. There were cases in which GPQAM answers were not accurate. In this case, the answer was analyzed to be inaccurate because the question had a pattern for tribunals, not a pattern for venues. A model accuracy of 100% was demonstrated by correctly answering all six questions regarding the arbitration rule. This is because all the questions had a pattern called a rule. A model accuracy of 89% was demonstrated by correctly answering 24 of the 27 questions on the arbitration venue and rule.
The GPQAM's response accuracy for questions related to governing law was 48%, which was lower than for arbitration. Governing law is divided into common law, civil law, and civil code. Both common law and civil law have a common governing law as a synonym. The GPQAM incorrectly answered all lower nodes of common law and civil law as correct answers to questions with a pattern of governing law. Furthermore, the answer was incorrect even when the question did not have a pattern. A similarity measurement is suitable for use between different types of nodes, such as venues, rules, and languages, but is not suitable for use between similar types of nodes, such as common law, civil law, and civil code. The GPQAM answered one correct answer and one incorrect answer to three of the nine questions on governing law and two correct answers and four incorrect answers to the remaining six questions. In this study, the most incorrectly answered question category by the GPQAM was the governing law. A model accuracy of 100% was demonstrated by correctly answering all nine questions regarding the terms. This was because there was a pattern in terms of all the questions.
The authors analyzed whether the similarity measurement used in this study was applicable in other articles or other contracts other than PoC. Firstly, no answers were extracted for the questions when testing other articles apart from the arbitration and governing law that served as the PoC in this study. This was analyzed due to the absence of data for those specific articles in the database. If the lexicon contains information for other articles, it is anticipated that corresponding answers could be provided for questions related to those articles. Next, similar performance evaluation results were obtained when testing the EPC plant contracts instead of the GP. This suggests that EPC plant contracts and GPs have a similar composition and content, allowing for shared databases and resulting in comparable performance.
After validating the model, the authors compared it with other studies related to legal contract review techniques and KG-based QA functions. Lee et al. [23] and Choi and Lee [11] proposed an automatic risk extraction model using NLP techniques to review the risks of construction contracts in the bidding stage of a construction project. These studies share a common characteristic with the GPQAM proposed in this study: the application of ontology-based semantic concepts. Furthermore, examples of KG-based QA systems in various fields, such as those by Yu et al. [56] and Yin et al. [58], can be found. Research applying AI technology to review legal contracts and check contract information is active. However, based on the review of prior studies, it is speculated that this study is the first to apply KG-based QA functionality to contract review. In particular, the significance of this research lies in developing a more advanced model through the integration of technologies. The proposed GPQAM utilizes a knowledge-based approach by leveraging the taxonomy and lexicon defined based on the applied ontology concept in the contract and additionally incorporates QA techniques, resulting in a more innovative model.

Review through FGI for System Adaptability
To review the industrial applicability of GPQAM, a focus group interview (FGI) was conducted targeting GP-related SMEs. Although the sample size was small, the FGI method was applied to review the applicability of the GPQAM model to the field to obtain professional information [91]. This FGI targeted 11 engineers with 5 to 10 years of experience in GP at Company P. Table 11 is information on the 11 participants who participated in the FGI. The FGI consisted of three issues: operational usability, satisfaction with the GPQAM system, and improvement requirement. The questionnaire used for the FGI is attached as Appendix B. As a result of the interview, to the issue, "It would be helpful for work if GPQAM were applied to all provisions of GP or other contracts", 54.5% responded that it would be slightly helpful for work, and 18% said they did not know. Respondents were found to be most satisfied with the fact that the GPQAM answered various questions with similar meanings, unlike existing document search functions. The ability to easily check contract information on the platform developed in this study was also found to be satisfactory. However, some responded that improvement was needed on the points where the model did not accurately answer all the questions used in the test. Furthermore, the respondents said that it took time and explanation for users to understand the related information provided visually.
In addition to the survey questionnaire, the authors also collected the individual opinions of the respondents. The interview uncovered the opinion that if an environment in which contract information can be easily confirmed through GPQAM is created, it helps improve the work efficiency of a contract with domestic suppliers. This is attributable to the lower professionalism of domestic suppliers compared with the performance of foreign suppliers. Additionally, in the process of pre-contract negotiations, some contract details are changed to make the contract terms more favorable. A discussion was conducted regarding technology to determine the existence of an unbalanced clause when reviewing contract documents after converting these changes into a database [11].

Configuration and Flow of Web Server
This study developed a separate web platform to evaluate the search accuracy of GP contract contents through the GPQAM. The developed platform consists of the following steps. Apache 2.4 and Apache Tomcat 8 were used for the open source software implementation. Apache is open source web server software for web servers [92], and Tomcat is a web application server (WAS) used for developing web applications and web services [93]. The web screen was implemented using Angular, an application design framework, and Node.js, a network application [94,95]. Neo4j GDB and MySQL were used for the database.
The flow of the web server system is as follows. When a user enters a question into the web browser and presses the "Run" button, the query is transmitted to the WAS. The WAS generates the corresponding cypher query statement by inputting the delivered question into the Python program and transmits it to the web server. The web server executes the cypher query statement in the Neo4j GDB and displays the relevant content along with the answer in the web browser. Table 12 summarizes the platform development environment.

User Interface
The user can confirm the execution result after inputting and commanding what they are looking for in the GP on the web platform. When the user enters the question "Where is the place of arbitration" in the middle left corner of the screen, the answer "Seoul" appears below it. Furthermore, implementing KG on the right side of the screen enabled the user to identify related content through pictures intuitively. Figure 8 shows a screenshot of the web platform screen for the GPQAM model testing, including the user's question, the answer to it, and the screen implemented by the KG. along with the answer in the web browser. Table 12 summarizes the platform development environment.

User Interface
The user can confirm the execution result after inputting and commanding what they are looking for in the GP on the web platform. When the user enters the question "Where is the place of arbitration" in the middle left corner of the screen, the answer "Seoul" appears below it. Furthermore, implementing KG on the right side of the screen enabled the user to identify related content through pictures intuitively. Figure 8 shows a screenshot of the web platform screen for the GPQAM model testing, including the user's question, the answer to it, and the screen implemented by the KG. In this study, instead of targeting the entire contract clauses of the GP, the focus was on only two PoC clauses. The proposed model holds significance as it was applied and verified in actual work by developing a web platform. Additionally, the development of the web platform provided an opportunity to assess the applicability and convenience of the GPQAM in real-world scenarios. In this study, instead of targeting the entire contract clauses of the GP, the focus was on only two PoC clauses. The proposed model holds significance as it was applied and verified in actual work by developing a web platform. Additionally, the development of the web platform provided an opportunity to assess the applicability and convenience of the GPQAM in real-world scenarios.

Conclusions and Contributions
This study aims to enhance the efficiency of contract review and contract information retrieval during equipment purchase and focuses on searching for contract clauses that are semantically connected through the relationships between contract clauses. Therefore, it developed the GPQAM, which combines KG and QA technology. The GPQAM was developed in the following steps.
In the KG development stage, to express the contractual provisions of GP as KG, first, the document that became the source data was limited to the GP. Subsequently, among the articles of the GP, the arbitration and governing law articles were selected as the PoC of this study. Furthermore, a GP taxonomy was classified through workshops with SMEs to express the GP as a KG in graph form. The GP taxonomy subdivides Class 1 into seven groups, up to Class 4. The GP lexicon was developed by finding synonyms of words corresponding to the lowest level of the GP taxonomy. The KG was developed by establishing relationships between each class after converting the entities of the GP lexicon into nodes. KG was stored in the Neo4j GDB.
QA is a model that finds and answers information a user requests in a KG. It was developed as a pattern-matching method for entity recognition. The AC algorithm, which is a multipattern-matching algorithm, is used for pattern matching. Patterns are terms of the GP lexicon. The AC algorithm stores patterns in a tree-type data structure composed of links and nodes, and extracts the same entities as patterns from questions. In addition, a similarity measurement was used when pattern matching was not possible. In this study, the similarity measurement involves finding two words with the most similar vector values. After preprocessing the questions for the similarity measurement, the average value of the vector of meaningful words in the questions was obtained. Nodes with the meaning most similar to the questions were selected by measuring each node's vector value and Euclidean distance in the KG. After changing the extracted entities and selected nodes to the names of nodes with the same meaning in the KG, they were entered into the predefined cypher query template, a cypher query statement was created, and an answer from the KG was obtained.
For the evaluation of the GPQAM, 45 test questions related to the PoC were created, and the f1 score was used as an evaluation metric. Furthermore, for model evaluation, a web-based test platform was developed, and the implementation of the KG was visually confirmed through the platform, along with the answers to the questions for testing. In the performance evaluation of the GPQAM, the f1 score was 82.8%. In the survey and FGI with SMEs, 54.5% responded that the GPQAM would help their work.
This study applies the recent technological trend of knowledge graph (KG) concepts to facilitate the semantic similarity-based search of contract clauses within contracts. The study possesses the following distinguishing features: Through the GPQAM, users can efficiently search for contract information during contract analysis, which is expected to contribute to expanding research in related fields.
The contributions of this study were examined in terms of theoretical, technical, and practical implications. The theoretical contribution is as follows: Engineering is an industry where the skills and experiences of engineers play a crucial role. However, in recent times, there has been an increase in the application of AI technology in engineering to reduce engineers' workload and improve work efficiency. In this study, the proposed GPQAM utilized the pre-trained word embedding model en_core_web_lg, provided by the spaCy library. Pre-trained models are trained on a large amount of data, which helps mitigate the issue of data scarcity. However, if the target data differs significantly from the data on which the model was trained, the performance may relatively decline. The performance evaluation results of the GPQAM demonstrated that pre-trained models could be effectively applied in the QA process of plant engineering documents, where obtaining sufficient training data is limited or requires consensus. The study outcomes can potentially be extended to other domains and contribute to the development of AI-driven solutions for contract analysis and management.
The technical contributions of this research are as follows: This study developed a model capable of answering various questions with similar meanings, distinguishing it from existing document retrieval research targeting contracts. Additionally, the developed model was integrated into a platform, enabling users to easily access contract information. The integration of the developed model into a platform enhances the accessibility and usability of contract information, making it a valuable technical contribution.
The practical implications of this study are as follows. First, the model applies KG technology to contract documents, such as the GP. Reviewing previous studies found no case in which a KG was built for contracts. This does not apply to contract document analysis of a new technology called KG. Compared with the general QA model, the ease of use of the KGQA model is expected to be applied quickly in the future. Second, the GPQAM was developed using an unsupervised method. It takes significant time and effort to collect sufficient training data. In particular, legal documents such as contracts have limitations in data collection for security reasons. This study is expected to provide a direction for developing a KGQA model in the legal domain, where data collection is brutal. Third, one of the advantages of KG using GDB is that it is easy to add, delete, and change data that have a new relationship with the existing data. This is advantageous for scalability. Fourth, the practical implications of this study extend beyond the specific domain of plant engineering. The developed model and platform can be adapted and applied to various industries and sectors that deal with contract analysis and management. This widens the scope of its practical utility and provides opportunities for organizations across different fields to benefit from improved contract analysis processes. The GPQAM developed in this study is expected to be expanded and applied to various legal contract documents, such as construction contracts and EPC contracts and general terms and conditions, in the future. Finally, when the GPQAM is used not only by Company P but also by domestic suppliers, the GPQAM will be able to supplement the contract business capabilities of domestic suppliers, which are insufficient compared to overseas suppliers.

Limitations and Future Works
The limitations of this study and directions for future research are as follows. First, this study constructed the KG by extracting only the critical contents from the GP based on the GP taxonomy; thus, it could not implement all contents of the GP into the KG. In other words, the details of each individual article of the GP could not be confirmed in the KG. In the future, studies that show the details of the GP are needed. Second, in this study, the KG was built by manually finding related content in the GP to construct the lowest nodes of the KG. The PoC of this study is simple in its scope; thus, searching for related content in the GP did not take much time. However, it is expected that significant time will be consumed finding the relevant content if the KG is extended to other provisions of the GP or other types of contracts in the future. Accordingly, in the future, research on how to automatically extract contract information from contracts and build the KG automatically is required. Third, this study was conducted using a pattern-matching method for entity recognition of the question. This is a method of selecting the node that is most similar to the question by measuring the similarity between the question and the nodes of KG if pattern matching is not possible. In this study, the nodes were either single or two-word phrases. Sub-nodes of "Governing law" include "Common law", "Civil law", and "Civil code". As these phrases did not have their own vector values, the vector values of the words constituting the phrases were obtained, and the average was applied to the study. However, this method is limited in that the meaning of actual phrases cannot be accurately expressed as a vector value. Therefore, additional research is required to express these phrases as vector values.
The lessons learned from this study are that significant data and training are required to improve the model's accuracy. In the future, improved model implementation is expected through additional data collection and more diverse methods.

Acknowledgments:
The authors would like to thank Sea-eun Park (a Researcher in Pohang University of Science and Technology) for her academic cooperation. The views expressed in this thesis/paper are solely those of the authors and do not represent those of any official organization or research sponsor.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this paper:  Table A1. GP Questionnaire for Performance Evaluation of the GPQAM.

Question Category Questions for GPQAM Test Answers
Arbitration Venue Where will the arbitration be held in GP?

Seoul
Where will the arbitration take place in GP?
Where is the place of arbitration in GP? What should be the place of arbitration in GP? Where is the arbitration tribunal located in GP?
Where is the place of arbitration situated in GP?
Where is the location of arbitration in GP?
Where will the arbitration be held in AAA EPC contract?

London
Where will the arbitration take place in AAA EPC contract?
Where is the place of arbitration in AAA EPC contract? What should be the place of arbitration in AAA EPC contract? Where is the arbitration tribunal located in AAA EPC contract? Where the place of arbitration is situated in AAA EPC contract?
Where is a location of arbitration in AAA EPC contract?
Where will the arbitration be held in BBB EPC contract?

Paris
Where will the arbitration take place in BBB EPC contract?
Where is the place of arbitration in BBB EPC contract? What should be the place of arbitration in BBB EPC contract? Where is the arbitration tribunal located in BBB EPC contract? Where the place of arbitration is situated in BBB EPC contract?
Where is a location of arbitration in BBB EPC contract?