Graph-Based Node Finding in Big Complex Contextual Social Graphs

Graph pattern matching is to find the subgraphs matching the given pattern graphs. In complex contextual social networks, considering the constraints of social contexts like the social relationships, the social trust, and the social positions, users are interested in the top-Kmatches of a specific node (denoted as the designated node) based on a pattern graph, rather than the entire set of graph matching.,is inspires the conText-Aware Graph pattern-based top-K designated node matching (TAG-K) problem, which is NP-complete. Targeting this challenging problem, we propose a recurrent neural network(RNN-) based Monte Carlo Tree Search algorithm (RN-MCTS), which automatically balances exploring new possible matches and extending existing matches. ,e RNN encodes the subgraph and maps it to a policy which is used to guide the MCTS. ,e experimental results demonstrate that our proposed algorithm outperforms the state-of-the-art methods in terms of both efficiency and effectiveness.


Introduction
Graph pattern matching (GPM) is widely used in many applications, like computer vision [1], chemical structure [2], and social networks [3][4][5][6]. Formally, given a pattern graph Q and a data graph G, GPM is to compute the set M(Q, G) of matches Q in G. Based on GPM, many applications focus on finding matches of a specific pattern (query) node, rather than the entire set of graph matching, for example, expert recommendation [7,8] and egocentric search [9]. is leads to the top-K designated node matching (topKP) [10] problem, where given a designated node u 0 in Q, it is to find the top-K matching nodes of u 0 ranked by a quality function in G.
In a social network, like crowd-sourcing travel [11] and social network based e-commerce [12], social queries often need to find matches of the topKP under requirements of social contexts, i.e., the social positions, the social trust, and the social relationships that have significant influence on collaborations and decision-making [13,14]. For example, in the expert recommendation, people are more willing to find experts who can build the intimate relationships with team members. Example 1 discusses this scenario with a social network.
Example 1. A fraction of a collaboration network is given as graph G in Figure 1. Each node in G denotes a person, with attributes such as job title, e.g., project manager (PM), database developer (DB), programmer (PRG), and software tester (ST). Each edge indicates a supervision relationship; e.g., edge (A 1 , D 1 ) indicates that A 1 supervised D 1 . e number added on the edge indicates the intimacy between the two persons. In order to simplify, in graph G, only the intimacy of corresponding edges of Q is given.
is discussion above leads to the conText-Aware Graph pattern-based top-K designated node matching (TAG-K), which aims to find the topKP results under Multiconstrained Simulation (MCS) [12,15]. ere are multiple end-to-end constraints for the attributes. So the MCS becomes NPcomplete, as it is exponential of the time complexity to find the best solution to investigate each combination of the different attributes. erefore, TAG-K is NP-complete, as it subsumes the classical NP-complete multiconstrained path selection problem [12]. e existing topKP methods for the MCS problem [10,[16][17][18][19] need to enumerate all possible subgraphs that satisfy the constraints of social contexts, leading to the expensive time cost. In addition, even though these methods can find TAG-K results, once the problem statements change slightly (i.e., the value of social contexts), they need to be revised. erefore, the existing methods are not applicable when dealing with the large-scale data graphs. In contrast, machine learning methods have the potential to be applicable across many optimization tasks by automatically discovering their own heuristics based on the training data, and they can get stable performance without adjusting the model in different social contexts.
While supervised learning is used widely in most successful machine learning methods, it is not applicable to the TAG-K problem because one does not have access to the optimal labels. However, we can compare the quality of subgraphs by using a verifier and provide some reward feedbacks to a learning algorithm. Hence, we follow the idea of Monte Carlo Tree Search (MCTS), which is an effective way of solving NP-complete problems [12,20]. e key idea of MCTS is to construct a search tree of states (i.e., possible subgraphs) evaluated by fast Monte Carlo simulations [21]. en, the state value is estimated as the mean outcome of the simulations. Meanwhile, a search tree is maintained to guide the direction of simulation, for which bandit algorithms can be employed to balance exploration and exploitation [22]. en, after certain times of simulations, we can get TAG-K results based on the state values. e problem poses a major challenge; since the state space of TAG-K is large, MCTS needs to traverse most subgraphs to get the accurate state values, which takes a lot of time. Inspired by the graph embedding technique, Hamilton and Ying [23] use neural networks to extract graph structural information and graph properties, which is an effective and efficient way to solve the graph analytic problem. In this paper, we utilize the characteristics of experience (i.e., vector representations of found matches) to train an RNN structure, which can evaluate the "potential" of the node to be the TAG-K result without traversing it. Our contributions are summarized as follows: (i) We formulate TAG-K as the MCTS problem and design the neural network, which is trained by the subgraphs generated in the searching process to evaluate nodes in G (ii) We propose the upper tree and the lower tree with optimal strategies to return TAG-K results without accessing all nodes in G (iii) Using real social network datasets, we experimentally verify that our algorithm outperforms the existing methods in both efficiency and effectiveness

Isomorphism-Based topKP.
is type of the topKP is based on the subgraph isomorphism [24]. Tian and Patel [18] propose the concept of approximate subgraph matching, which allows node mismatches and node/edge insertions and deletions. For coping with approximate subgraph matching problem, an index-based method is presented in [18], called TALE. In addition, Ding et al. [16] define the matching similarity between a data graph and a query graph to order results. Built on NH-index in [18], Ding et al. [16] employ the index to prune unpromising candidate nodes for each query node. Furthermore, Zhu et al. [25] consider the entire structure matching rather than substructure matching and propose an algorithm to respond to the topKP similarity query using two distance lower bounds with different computational costs.

Simulation-Based topKP.
Existing isomorphism-based topKP methods are still too strict to be used in some applications, e.g., finding social experts [26] and project organization [10]. Based on graph simulation [27], Fan et al. [10] propose a novel topKP method supporting a designated pattern node u 0 , which can find the topKP without computing the entire graph matching results. In addition, Chang et al. [28] study the problem of top-K tree pattern matching, where the edges in the tree are mapped to the shortest paths in G connecting the corresponding nodes and then proposed an optimal enumeration paradigm [28]. However, the existing topKP methods do not consider the social contexts which are common in social networkbased applications, like crowd-sourcing travel [11,29,30] and social network-based e-commerce [12]. erefore, these methods cannot support the NP-complete TAG-K that exists in many real applications.
Recently, based on [10], Lie et al. [31] propose an Monte Carlo-based approximation algorithm (MC-TAG-K) for TAG-K, which uses a random sampling method that cannot guarantee the algorithm performance, and MC-TAG-K has to traverse the data graph more than once, which costs a lot of time when the data graph is large; therefore, it is not a good solution to the TAG-K problem.

Preliminaries
In this section, we first introduce the data graph, pattern graph, and multiconstrained simulation (MCS) [12] and then propose the ranking function and TAG-K problem.

Data Graph.
A data graph is a contextual social graph (CSG). G � (V, E, L G , ρ).
(i) V is the set of nodes of G, and each node represents the participant of a CSG. (ii) E is the set of edges of G, and a direct edge e ∈ E from node v to node v ′ is a quad (v, v ′ , t, r), where v, v ′ ∈ V, t ∈ [0, 1], denotes the social trust between v and v ′ , r ∈ [0, 1] denotes the social intimacy degree between v and v ′ , the higher values of t and r, and the closer trust and social intimacy between the two participants. (iii) L G is a function that assigns every node in V with the social role of a specific domain. (iv) ρ is a function that assigns every node in V with a role impact factor, denoted as ρ(v) ∈ [0, 1], which illustrates the impact of the participant in a specific domain. e greater the value of ρ(v), the more professional knowledge the participant v has t, r, and ρ are called social impact factors. For a path Based on the theories in Social Psychology [13], the aggregated social contexts of a path P(v, v ′ ) are denoted as A(P(v 1 , v n )) � At(P(v 1 , v n )), Ar(P(v 1 , v n )), Aρ Figure 1(b) shows the structure of a data graph, where we can find the labels and the intimacy relationships between two users.

Pattern Graph.
A pattern graph is a directed graph.
is a function that assigns every node in Q with a social role (iv) λ Q is a function that assigns every edge (u, u ′ ) with a quad (l, λ t , λ r , λ ρ ), where l is the bounded length of (u, u ′ ), which is a positive number, and λ t , λ r , and λ ρ in the range of [0, 1] denote the multiple constraints of the aggregated social contexts Example 3. Figure 1(a) shows a pattern graph that contains three nodes and the corresponding edges. In each edge, we can find one of the constraints for social intimacy between nodes.

GPM-Based Multiconstrained Simulation (MCS).
where the following conditions apply: It is known that if G matches Q, then there exists a unique maximum relation set M(Q, G) � (u 1 , v 1 ), (u 2 , v 2 ), . . . , (u n , v n ) including all pairs in S.

Ranking Function
to save all relevant paths from v to all matches v ′ , and then for any match v of a query node u 0 , there exists unique, maximum relevant node set and relevant edge set.
In the node matching, we need to consider the matching of nodes and the corresponding pattern. Based on [31], the ranking function to rank matches of the designated node u 0 is defined as follows: where α is used to balance the two functions; |R(u 0 , v)| and |R E (u 0 , v)| are the number of nodes and edges in R(u 0 , v) and R E (u 0 , v), respectively; and U(P i ) denotes the average social impact of P i . In the complex social network, we have three social impact factors, i.e., trust, social intimacy, and social role impact,and they should be considered in the modelling of node matching. erefore, we propose the following equation: where w t , w r , and w ρ are the weights of social impact factors t, r, and ρ, respectively; w t , w r , and w ρ are in the scope of [0, 1] and w t + w r + w ρ = 1.

TAG-K Problem
Definition 1. Context-Aware Graph pattern based top-K designated node finding problem (TAG-K). Given a CSG G, a pattern graph Q with a designated node u 0 , a positive integer K, TAG-K is to find a subset D, where where D ′ is a subset of M U (Q, G, u 0 ) and |D ′ | � K. at is, TAG-K is to identify a set of K matches of u 0 , which maximizes the summation of ranking function F(·), namely, We denote C(u 0 ) as the set of all the candidates v in G of u 0 (i.e., v has the same label as u 0 ) and use F L (u 0 , v) and F U (u 0 , v) to denote the lower bound and upper bound of Based on Lemma 1, we can get TAG-K results without computing the entire M(Q, G).

Recurrent Neural Network-Based Monte Carlo Tree Search (RN-MCTS)
In our algorithm, we build the upper tree and the lower tree of each candidate node that are used to evaluate the upper bound F U (·) and lower bound F L (·), respectively; in the searching process, F U (·) and F L (·) will be updated, and F L (·) will gradually close to F(·); hence, in the worst case, RN-MCTS can still find TAG-K results. In addition, we use the RNN structure that trained by searching results to guide MCTS and propose some optimization strategies to speed up the searching process. In this section, we first introduce the tree structure of RN-MCTS, then introduce the RNN structure, and discuss the details of RN-MCTS combined with RNN and optimization strategies.

Objective Function.
Given a CSG G, a pattern graph Q, and a path P(v, v ′ ) in G, we first propose the objective function to investigate if P(v, v ′ ) meets the constraints specified for the edge in Q. We use the objective function to combine the aggregated multiple attribute values and the corresponding constrains. en, we could investigate the value of the objective function to check the feasibility of the graph search: where u ′ ∈ Q and v ′ ∈ C(u ′ ). From the objective function, we can see that if an edge

Monte Carlo Tree Structure (MCT)
. Now we introduce the Monte Carlo Tree structure (MCT) of RN-MCTS, and each node in the tree represents a node in G, while the tree's edges correspond to the edges of G. MCTS grows the tree structure iteratively; with each iteration, the MCT is traversed and expanded. In an MCT, a root node represents a candidate in C(u 0 ), and a child node represents the node connected with its ancestor by an edge in G; a leaf node represents (1) the paths through which it exceeds constraints 4 Complexity of social contexts or cannot be relevant paths or (2) a new expanded node that has not been visited. Each MCT can be seen as the subgraph connected to a candidate node. Each node v ′ in the MCT stores the set of statistics: Here, the root node v of the MCT is a candidate, is the number of times that v ′ has been visited in the MCT, R ES (v, v ′ ) is a set that saves all relevant paths that start from P(v, v ′ ) in the MCT, and R S (v, v ′ ) is a set that saves the corresponding relevant nodes; i.e., for a visited relevant path is defined as follows: where α is the same factor of equation (1). In the searching process, if RN-MCTS finds a new relevant path P(v, v j ) passing through P(v, v ′ ), P(v, v j ) and v j will be added in Given a candidate v of a designated node u 0 in Q, if all the relevant paths of R E (u 0 , v) has been visited in the MCT of v. Intuitively, F(u 0 , v) can be calculated as follows: Here, v i is the child node of v in the MCT. at is, after certain times of iterations, F(·) of a candidate can be approximated by statistics of its child nodes. en, we formulate the TAG-K problem as the Monte Carlo Tree Search (MCTS) problem. Since F(·) of the root node is proportional to F S (·) of child nodes, the aim of MCTS is to find high values of F S (·) in the MCTs of candidates under limited searching time.

Recurrent Neural Network
Policy. Now, we introduce the RNN policy. In the searching process, RNN is used to help RN-MCTS to select child nodes. Our RNN structure is simple and consists of two layers: an RNN layer and a fully connected layer. In the MCT of v 1 , for the visited path P(v 1 , v n ) � 〈v 1 , v 2 , . . . , v n 〉, we can get the sequence: is the vector representation of v that is generated by DeepWalk [32]. For the path P(v 1 , v n ), we set T (P(v 1 , v n− 1 )) as the input and the node selection E(v n ) and F S (v, v n ) as the output to train the RNN structure, and the lost function of a time step is defined as follows: Here, T i ′ � P ′ , R S ′ , F S ′ is the prediction of the last time step T i− 1 . RMSE is the function of root mean squared error: en, for the current searching path P, we define the output of the RNN structure (i.e., E(v n ) and F S (v, v n )) as E net (P) and F net (P), respectively.

UCT Function.
An important problem of MCTS is to balance the exploration versus exploitation. e exploration approach promotes the exploration of unvisited nodes in the MCT, and this means that the exploration will expand the tree's breath more than its depth. Exploitation tends to stick to one path that has the greatest estimated value (i.e., the maximal value of F S (·) ) to find more relevant nodes. e balance of exploration and exploitation can ensure our algorithm is not overlooking any potential relevant paths, avoiding the inefficiency in the search with a large number of candidates.
Specifically, on each RN-MCTS iteration, the search process is rolled out by selecting child nodes according to the proposed variant of UCT [33] from the root node. During the search of a Monte Carlo Tree, we need to consider both the width and the depth of the tree search. erefore, we propose equations (10) and (11): where c and β are the constants that control the level of exploration, P(v, v f ) is the current search path, and v s is the child node of v f . d (P(v, v f ), v s ) is the similarity between v s and the predict of RNN, which is defined as follows: Overall, UCT initially prefers nodes that have high similarity with the RNN output E net (·) and low visit count N(·), but then asymptotically prefers nodes with high values of F S (·).
In addition, in the upper trees, F S (·) saved in the nodes is replaced by F U S (·) and is defined as follows: e upper bound F U (u 0 , v) is calculated by equation (7). us, comparing with equation (6), because of the relaxation of constraints and the selection of maximal U(·), F U (u 0 , v) > F(u 0 , v). In the searching process of upper trees, RNN structure will not be used.

Lower Tree.
In the lower tree, the pattern graph and the MCT are not changed, and F L (u 0 , v) is calculated by using equation (6). Hence, with the increase in iterations, F L (u 0 , v) will be close to F(u 0 , v). Here, the RNN structure is used to guide the searching process.

Optimization Strategy.
In this section, we introduce 3 optimization strategies to reduce the searching space and speed up the searching process. Strategy 1. Rapid evaluate. Social graphs are typically large, with millions of nodes and billions of edges; hence, RN-MCTS first uses rapid evaluate to make the preliminary estimate lower bounds and upper bounds of all candidates and details as below.
Given a designated node u 0 in Q, a candidate v in C(u 0 ), RN-MCTS builds the upper tree and the lower tree of v and performs once iteration on them, respectively, and then uses the searching results to calculate the initial values of lower bound and upper bound. In the process of rapid evaluate, if the candidate v c is a descendant in the MCT of v, v c will be preferentially selected, then RN-MCTS calculates F S (v, v c ) in the lower tree and F U S (v, v c ) in the upper tree; set them as the initial values of F L (u 0 , v c ) and F U (u 0 , v c ), respectively, and then put v c into the protection set PT. e nodes in PT will be not deleted when performing optimization strategies, and v c will be taken out from PT the next time RN-MCTS searches the MCT of v c . In addition, F L (u 0 , v c ) and F U (u 0 , v c ) will be updated when v c is visited again and still in PT. us, RN-MCTS can get initial bounds of candidates without visiting all MCTs.

Strategy 2.
Optimization at dominating nodes. In the MCTs, there can have multiple paths ending at the same nodes. To obtain a near-optimal solution, the first time the searching path (denoted as P 1 ) reaches the node v i , RN-MCTS marks v i as visited and stores the value of δ(P 1 ), and assume the parent node of v i is v s . In the following iterations, suppose there is another path from root node to v i (denoted as P y , assume the parent node of v i in P y as v t ). If δ(P 1 ) > δ(P y ), it indicates P y is better than P 1 , then we delete v i , which is the child node of v s ; otherwise, if δ(P 1 ) < δ(P y ), it indicates P 1 is better than P y , then we delete v i that is the child node of v t .
Strategy 3 (Reduce space). Given a designated node u 0 in Q and two candidates v i , v j ∈ C(u 0 ), after certain times of it is easy to know v i may be a better designated node matching than v j . In order to reduce the search space, RN-MCTS performs optimizations in the following situations: indicates v j has a lower probability to be the top-K matches of u 0 , then we delete v j in C(u 0 ) (ii) Situation 2: for a node v j ∈ C(u 0 ), if all nodes in the upper tree of v j have been visited and

Probability Function.
Before performing the iteration of MCTS, RN-MCTS uses the probability function to randomly select a candidate v c and the probability of v c to be selected is defined as follows: Here, we use a function that is similar to UCT, which combined exploitation with exploration to define the weight of selection: and the constant θ controls the level of exploration.
Step 1 (Selection). In this step, given a candidate v ∈ C as the root node of the MCT, starting from v, MCTS traverses the current MCT using a tree policy. A tree policy uses an evaluation function (i.e., UCT function) that prioritize nodes with the greatest estimated values. When the searching path reaches a leaf node, if the leaf node is unvisited and with children yet to be added, MCTS will transition to the expansion step; otherwise, if the leaf node has been visited, then MCTS will transition to the backpropagation step.
Step 2 (Expansion). In the expansion step, for the unvisited leaf node reached in the selection step, MCTS selects all neighbor nodes of the leaf node, and add all nodes that δ(·) ≤ 1 into the MCTas child nodes of the leaf node; namely, the searching paths ending at these neighboring nodes are feasible and then initializes P(·), R ES (·) � ∅, R S (·) � ∅, F S (·) � 0, N(·) � 0} of these child nodes. After that, MCTS transitions to the simulation step. If all δ(P) values of neighboring nodes of v c are more than 1, it indicates v c is a leaf node that cannot find relevant paths through v c , then MCTS will transition to the backpropagation step.
Step 3 (Simulation). Since the child nodes of the current node have been added to the MCT, MCTS can continue to traverse the MCT. erefore, in this step, MCTS performs selection and expansion repeatedly until reaches a visited leaf node and then transitions to the backpropagation step.
Step 4 (Backpropagation). Now that the MCTS has reached the visited leaf node, and the rest of the MCT must be updated. Starting from the leaf node, RN-MCTS traverses back to the root node. During the traversal, the statistics stored in each node of the traversal (i.e., R ES (·), R S (·), F S (·) or F U S (·), N(·)} are incremented. N(·) ⟵N(·) + 1. R ES (·), R S (·), F S (·), or F U S (·) are updated as discussed before. In addition, for each node v ′ in the traversal of backpropagation, we use T(v, v c ), E(v ′ ), F S (v, v ′ ) to train the RNN. Here, v c is the parent node of v ′ in the traversal.
We set RN-MCTS performs the above four steps for I iterations. So its time complexity is O(I × (selection + expansion + simulation + backpropagation)). Let N c denote the average number of child nodes at each layer of the tree and let D denotes the depth of the search tree. e selection step has the time complexity O(N c ), the expansion step has O(N c ), and both the simulation step and backpropagation step have O(DN c ). erefore, the time complexity of RN-MCTS is O(IDN c ). In addition, we can use graph database to save the graph structure, and in addition, in real-world social networks, not all the pairs of nodes have links. erefore, we can compress the sparse matrix of a data graph by using the Hybrid format [34] to save large graphs.

Experiment Settings
(i) We conduct experiments on five large-scale realworld social graphs available at snap.stanford.edu. ese datasets have been widely used in the literature for studies of graph pattern matching and social network analysis. e details of these datasets are shown in Table 1.
(ii) We use a popular social network generation tool, SocNetV (socnetv.org, with version 2.2) to generate five query graphs, and the details of these graphs are shown in Table 2, where we randomly select a node from each of the pattern graph as the designated node. (iii) e average constrains of edges in pattern (λ t , λ r , λ ρ ) are set in 0.1, 0.2, 0.3, 0.4, and 0.5 to ensure the high possibility of returning TAG-K designated nodes in a data graph. Otherwise, no or only few answers might be returned by all the algorithms, making it difficult to investigate their performance. (iv) In each of the datasets, α is set to 0.5; the number of iteration is set to 1000, 2000, 3000, 4000, and 5000; K is set to 5, 10, 15, 20, and 25; and the average maximal bounded path length of the pattern matching is set as 2, 4, 6, 8, and 10, respectively. (v) In order to avoid the bias in the tree search, we balance the search in both width and depth. en, in the UCT function and probability function, the exploration parameters of β, c, and θ are set to 0.5.
In the RNN structure, μ is set to 0.5, the embedding of nodes are encoded by DeepWalk [32] with 16 dimensions. We use the ADAM optimization algorithm with a learning rate of 0.005 during training, and we set the mini-batch size to 8. (vi) In order to deliver fair experimental results to avoid the bias experimental results by a specific setting, we calculate the average ranking function value to illustrate the performance of the proposed method.

Implementation.
In the following experiments, we will compare our RN-MCTS with the basic Monte Carlo Tree Search algorithm without the RNN structure (B-MCTS) and the state-of-art TAG-K method, MC-TAG-K [35]. Since the three algorithms are approximate methods, it is not appropriate to directly compare ranking function values returned by the three algorithms. erefore, we traverse TAG-K results of the three algorithms and get the true ranking function values, respectively. e performance was investigated by the execution time, the true ranking function values, and the difference between the true ranking function values and the approximate ranking function values.
All MC-TAG-K, B-MCTS, and RN-MCTS algorithms are implemented using Matlab R2019a running on a PC with Intel Core i7-7700K 4.2 GHz CPU, 16 GB RAM, Windows 10 operating system, and MySql 5.7 database. All the experimental results are averaged based on five independent runs.

Exp-1: Effectiveness.
is experiment is to investigate the effectiveness of our RN-MCTS by comparing the average ranking function values of the top-K matches based on different settings of parameters. Table 3

Analysis.
e experimental results illustrate that (1) the approximate ranking function values returned by MC-TAG-K have a large deviation with the true ranking function values; (2) RN-MCTS and B-MCTS can deliver more accurate approximate ranking function values by using the neural network and MCTS; therefore, RN-MCTS and B-MCTS can get better matching results than MC-TAG-K; (3) with the assistance of neural network and optimal strategy, RN-MCTS can avoid the MCT search via a subgraph that has a lower probability to satisfy the constraints and then outperforms B-MCTS.

Exp-2: Efficiency.
is experiment is to investigate the efficiency of our RN-MCTS by comparing the average query processing time of MC-TAG-K, RN-MCTS, and B-MCTS based on different settings of parameters.
Results. Figures 4 and 5 depict the average query processing time of the three algorithms in returning different numbers of designed nodes with different setting of parameters. From these figures, we can see that RN-MCTS and B-MCTS have better efficiency than MC-TAG-K. Statistically, on average, the query processing time of RN-MCTS is 75.30% less than that of MC-TAG-K and 17.30% more than that of B-MCTS.

Analysis.
e experimental results illustrate that (1) RN-MCTS can reduce the searching space and avoid to visit all Data: CSG G � (V, E, L G , ρ); query graph Q � (V Q , E Q , L Q , λ Q , u 0 ); candidate node set C(u 0 ); number of iterations I; Perform rapid evaluate to build MCTs of C(u 0 ); Use T(·), E(·), F S (·) of rapid evaluate to train RNN; for iteration it in [1 . .

. I] do
Use probability function to select v from C(u 0 ); set the current node v u � v; In the upper tree and lower tree of v:        10 Complexity nodes and edges in the data graph. us, RN-MCTS can greatly save the query processing time than MC-TAG-K; (2) because of the RNN structure, each iteration of RN-MCTS costs more time than B-MCTS.

Conclusion and Future Work
In this paper, we have proposed an approximate algorithm RN-MCTS to support a new type context-aware graph pattern-based top-K designated node finding problem that is a corner stone for many social network-based applications. RN-MCTS achieves O(IDN c ) in time cost, and the experiments conducted on five real-world large-scale social graphs have demonstrated the superiority of our proposed approaches in terms of effectiveness and efficiency. In our future work, we will extend our model to solve the dynamic node matching problem in complex social graphs.

Data Availability
e graph data used to support the findings of this study have been deposited in the Google Drive repository at https://drive. google.com/open?id�1N2-WlMtTR7aRmXw262LtCqxRW2G 2VDems.

Conflicts of Interest
e authors declare that they have no conflicts of interest.