Semantic query expansion method based on pay-as-yougo fashion for graph model

The search requirements of users are usually vague and unclear, and the traditional keyword query method is difficult to obtain satisfactory query results. Therefore, the semantic query expansion technology uses semantic information to modify and extend the initial search requirements of users, to obtain more comprehensive and accurate query results. However, the construction cost of the extended query graph is high, and the existing research is difficult to adjust and optimize the user’s query demand in time, resulting in a poor query effect. To solve the above problems, this paper proposes a semantic expansion method based on pay-as-you-go fashion for graph model: Firstly, the initial query graph is constructed according to the user’s search requirements, and the semantic similarities between the search requirements and knowledge bases are calculated; Then, the extended sets of attribute-values and edges are generate by sorting the similarity value in descending order; Finally, all or part of the elements in the extended set are combined to generate the extended query graph. This method can dynamically adjust and optimize the initial query based on the relevant semantic information in time, and improve the query efficiency and effect.


Introduction
Traditional information searches technology uses keyword queries to obtain the information containing keywords. Due to the lack of semantic support from the search engines, it is difficult to obtain implicit information that does not contain keywords but is semantically related to keywords [1]. In addition, due to the lack of cognition and expression, users' search requirements are often vague and unclear. In order to obtain appropriate query results, it is necessary to analyse and modify the initial search requirements from the relevant semantic of background knowledge and search content.
To solve these problems, semantic query expansion technology is usually used [2]. Semantic query expansion refers to the synonym, homonym, and hierarchical relationship of the user's initial query words as extended phrases to synthesize the new query, and then, the search engine processes the new query and return more accurate results.
At present, the research of semantic query expansion has achieved a lot of results. To avoid repeated calculation, Venetis et al. [3] proposed to extend the previously calculated initial queries method by decomposing the query about atoms first and then incrementally processing each atom. To help develop the KBC (Knowledge Base Construction) system, Shin et al. [4] proposed an incremental reasoning method based on sampling and variational techniques. Kang et al. [5] proposed an incremental optimization method guided by semantic rules for the periodic query scenario of the data warehouse. By extending the query syntax, users can describe the repetition period and increment table of the query, and the optimized incremental query plan can be executed on major distributed computing frameworks  [6] proposed dynamically to expand the query scope of feedback information and puts forward the retail commodity search and recommendation method that embeds weighted TF-IDF (Term Frequency-Inverse Document Frequency) in keywords. However, the existing semantic query expansion methods still have a large room for improvement [7,8]. Most of the research lacks support for dynamic data search, and it is difficult to adjust and optimize users' searches requirements in time, resulting in poor query results.
Aiming at the above problems, this paper proposes a semantic query expansion method based on pay-as-you-go fashion [9], which is a data management style for mass, heterogeneous and dynamic data. Our method can dynamically optimize and extend the search demand of graph data according to semantic information, thus improving the efficiency and effect of the semantic query.

Generation of Multi-dimensional Semantic Expansion Set
We define the general graph model G as follows: Definition 1. Graph model G consists of nodes and edges, denoted as , , where N is the node set , . . , , is composed of attribute-value pairs, denoted as {( , ),..,( , )}. Let A be the set of attributes, and V be the set of values, then there is a property-value pair , )∈ , 1 . E is the set of labelled edges, and denoted as , , , , ∈ and , L represents the label of an edge.
The generation steps of the multi-dimensional semantic expansion set are shown in Figure 1: 1) Based on the concept/relation of user search requirements, construct an initial query graph on a graph model G.
2) Compare the concept/relation of the search requirement with the existing concept/relation of semantic information such as related ontology, knowledge graph, Wikipedia, etc. Then calculate the coarse-grained semantic similarity functions , of the node vocabulary of the two. The node vocabulary mainly includes the node name and the vocabulary composed of attribute-value pairs.
, compares node names and attribute-value pairs as a whole, which is a coarsegrained comparison of node vocabulary similarity. It is calculated based on the cosine similarity formula as follows: Where, nodes and respectively represent concept nodes related to search requirements and concept nodes in ontology, knowledge graph or Wikipedia, etc. 0 , 0 , n and m respectively represent the number of nodes that need to be compared in the search requirements and the number of nodes to be compared in ontology, knowledge graph or Wikipedia.
is the weight, ∈ 0,1 , ∑ 1, and respectively represent the word frequency of the words corresponding to node and , h is the number of total words. 3) If the coarse-grained semantic similarity functions , 1 (threshold), then calculate the fine-grained semantic similarity function , , otherwise go to step 2 to compare the similarity with the next node.
In order to judge the similarity more accurately, we define the fine-grained node vocabulary similarity function , as follows: , . , . , , Where, is the weight, ∈ 0,1 , ∑ 1, . , . is the node name similarity function, . and . are the names of nodes and respectively. , is the attribute similarity function, and are the attribute set of nodes and respectively, such as , , … , , , … . , is the value of the similarity function, , are the values of and respectively, such as , , … , , , … . , , , is the attribute-value pair similarity function, , and , are the attribute-value pairs of , respectively, such as , , … , , , … . In order to improve the calculation efficiency of similarity, the following method is proposed: first calculate the coarse-grained similarity , , If the threshold t1 is exceeded, then calculate the fine-grained similarity , .

Construct the initial query graph according to the user's search requirements
Calculate the coarse-grained similarity function ,

No
Calculate the fine-grained similarity function , , . 2, the similarity value is sorted in descending order, and . is stored in the node name expansion set SET1 of ; If , 3, the similarity value is sorted in descending order, and is stored in the attribute expansion set SET2 of ; If , 4, the similarity value is sorted in descending order and is stored in the attribute expansion set SET3 of ; If , , , 5, the similarity value is sorted in descending order, and , is stored in the attribute expansion set SET4 of , . 5) Calculate the edge similarity function , . If , 6, store in the edge expansion set SET5 of . The similarity function , of node edge is calculated as follows: Where, is the weight, and respectively represent the number of edge connections of and , ∩ represents the number of common edges of and , and ⋂ represents the number of common neighbor nodes of and . 6) If and , that is, all nodes have been compared, then return SET1-SET5; otherwise, go to step 2 and continue to compare the similarity of the next node.

Expansion graph construction and query processing
On the basis of the expansion set, the semantic query expansion graph is constructed. The basic steps are shown in Figure 2: 1) If ⋃ ∅, that is, if there is a non-empty expansion SET, calculate the total semantic similarity functions , , otherwise, perform query processing on the initial query graph and return the query results.
, takes into account the lexical similarity of the concept node and the similarity of the node's connection edge (association), and it is calculated as follows: Where, is the weight, ∈ 0,1 ，∑ 1.

2) If
, 7, combine all elements of non-empty SETk to generate an expanded query graph on the initial queries graph, otherwise, select some elements of non-empty SETk for combination, and generate an extended query graph on the initial query graph. Each extended query graph describes one of the user's possible query requirements.
The semantic query expansion graph is constructed as follows: The elements from the attribute-value pair expansion set SET4 (form , ) in the combination are added to the attribute-value pair list of the node ; The elements from SET1, SET2, and SET3 in the combination are converted into attributevalue pairs in the form of , . , , .
represents the value of the attribute , and .
represents the attribute corresponding to the value , and then these attribute-value pairs are added to the attribute-value pairs list of the node . The elements from SET5 in the combination are added as the edges of node and its neighbor nodes. 3) Conduct query processing on the extended query graph and return the query result. In particular, the above steps involve the selection of the weights 、 、 and the threshold , which can be determined by the previous practical experience or other analysis methods, such as principal component analysis, factor analysis, regression analysis, and average analysis.

Conclusion
This paper proposes a query expansion method of graph models, which has dynamic and semantic characteristics. The method can dynamically optimize and extend the search requirements according to the semantic information, so as to improve the efficiency and effectiveness of semantic query. Moreover, the method is not only suitable for our defined graph model, but also suitable for RDF (Resource Description Framework), knowledge graph and other similar graph models.