FP-GraphMiner – A Fast Frequent Pattern Mining Algorithm for Network Graphs

In recent years, graph representations have been used extensively for modelling complicated structural information, such as circuits, images, molecular structures, biological networks, weblogs, XML documents and so on. As a result, frequent subgraph mining has become an important subﬁeld of graph mining. This paper presents a novel Frequent Pattern Graph Mining algorithm, FP-GraphMiner , that compactly represents a set of network graphs as a Frequent Pattern Graph (or FP-Graph ). This graph can be used to eﬃciently mine frequent subgraphs including maximal frequent subgraphs and maximum common subgraphs. The algorithm is space and time eﬃcient requiring just one scan of the graph database for the construction of the FP-Graph , and the search space is signiﬁcantly reduced by clustering the subgraphs based on their frequency of occurrence. A series of experiments performed on sparse, dense and complete graph data sets and a comparison with MARGIN , gSpan and FSMA us-ing real time network data sets conﬁrm the eﬃciency of the proposed FP-GraphMiner algorithm.


Introduction
The increasing use of large communication, financial, telecommunication and social networks is providing a substantial source of problems to the graph data mining community.These differ from many traditional data mining problems in that the data records representing transactions between set of entities are not considered independent, and the inter-transaction dependencies can be represented as trees, lattices, sequences and graphs.As a result, there has been an increasing interest in studying the properties, models and algorithms applicable to graph-structured data to address these issues.
Much current research in graph based data mining focuses on estimating the reputation and/or popularity of items in a network, mining query logs, performing query recommendations, web and social network applications, and so on.As such, the frequent subgraph discovery problem occupies a significant position among the various graph based data mining algorithms.
The problem of discovering frequent patterns can be stated as follows.Given a transaction database D consisting of a set of transactions t 1 , t 2 , . . ., t n and a user-specified minimum support (or threshold) σ, the frequent pattern mining problem is to discover the complete set of patterns with a minimum support σ in D. The support σ for a given frequent pattern is defined as the ratio of the number of graphs containing the pattern to the total number of graphs.Depending on the specific problem formulation, the input transactions and pattern specification can be an itemset, a sequence, a tree, or a graph.Frequent subgraph mining is a demanding problem as there are an exponential number of subgraphs contained in a graph.For a graph with e edges, the number of possible frequent subgraphs could be as large as 2 e .As the core operation of subgraph isomorphism testing is NP-complete, it is critical to minimise the number of subgraphs that need to be considered [9].
Although the ideas in the paper are widely applicable, in this paper we base our examples and experiments on the discovery of interesting frequent patterns from a large communication network.This has a wide range of applications such as network traffic analysis, detection of node failures in a network, routing algorithms, and so on.For example, to maximise the efficiency of a network, managing network traffic is essential.Once the most frequently used paths are identified better routing algorithms can be devised.To achieve this, a communication network can be modelled as either an undirected or directed graph with the clients and servers (labelled by their IP addresses) as nodes and the communication channels between them as edges.Since the IP addresses in a network are unique, no two nodes have the same label.
The focus of this work is the frequent pattern mining of graphs to discover all frequent subgraphs contained in at least σ of the graphs in the database.The proposed Frequent Pattern Miner (FP-GraphMiner) algorithm calculates the frequent edges present in various graphs efficiently creating a special undirected graph called a Frequent Pattern-Graph (or FP-Graph).Once the graph is constructed, all frequent subgraphs can be determined for any given support.The experimental results validate the effectiveness of the proposed algorithm.
The rest of the paper is organised as follows.The remainder of this section presents the formal definitions and notations used.Section II discusses related work in the area of frequent subgraph mining while Section III describes the proposed FP-GraphMiner algorithm.Section IV provides correctness proofs of the technique while Section V provides a complexity analysis of the algorithms.Section VI deals with the empirical performance evaluation of the algorithm using synthetic datasets consisting of sparse, non-sparse (dense) and complete graph datasets, and a comparative study with MARGIN, gSpan and FSMA using real time data sets.Some conclusions are provided in Section VII.The algorithms themselves are given in an Appendix.

Definitions and Notations
The definitions and notations used in this paper are described below [3].

Labeled Graph
A labeled graph G is a 4-tuple, G = (V, E, α, β) where V is a finite set of vertices, E ⊆ V × V is a set of edges, α : V → L denotes a vertex labeling function and β : E → L denotes a edge labeling function.Edge (u, v) originates from node u and terminates at node v.For an undirected graph, (v, u), and (u, v) denote the same edge, thus β(u, v) = β(v, u).

Graph Isomorphism
Two graphs ) are isomorphic if they are topologically identical to each other, that is, there is a mapping from G 1 to G 2 such that each edge in E 1 is mapped to a single edge in E 2 and vice versa.In the case of labeled graphs, this mapping must also preserve the labels on the vertices and edges.

Subgraph Isomorphism
Given two graphs , the problem of subgraph isomorphism is to find an isomorphism between G 2 and a subgraph of G 1 , that is, to determine whether or not G 2 is included in G 1 .

Frequent Subgraph
Given a labeled graph dataset GD = {G 1 , G 2 , . . ., G k }, support or fre-quency of a subgraph g is the percentage (or number) of graphs in GD where g is a subgraph.A frequent subgraph is a graph whose support is no less than a minimum user-specified support threshold.

Related Work
Graphs serve as a promising means of generically modelling a variety of relations among data [4].They can be used to effectively model the structural and relational characteristics of a variety of datasets arising in the areas of physical sciences and chemistry such as fluid dynamics, astronomy, structural mechanics, and ecosystem modelling, life sciences such as genomics, proteomics, health informatics, and information security such as information assurance, infrastructure protection, and terrorist-threat prediction/identification. Much research has focused on finding patterns from a single large network [10], mining patterns using domain knowledge from bioinformatics [6], and finding frequent subgraphs [5,9,12].A strong interdisciplinary research area in graph mining is the problem of finding frequent subgraphs present in huge graph databases.This has application in different fields including network intrusion [11], semantic web [16,1], behavioural modelling [13] and link analysis [8,15].
A number of algorithms have used a depth-first search to enumerate candidate frequent subgraphs [22].The gSpan algorithm builds a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label.Based on this lexicographic order, gSpan adopts a depth-first search strategy to mine frequent connected subgraphs efficiently [21].Other subgraph mining algorithms focus on a level wise search scheme based on the Apriori property to enumerate the frequent subgraphs that propose an efficient frequent subgraph mining algorithm [7,14].
There are two common problems underpinning subgraph mining work such as this.First, the maximum common subgraphs (or MCS) problem often provides a suite of benchmarking activities for assessing the performance of widely used algorithms.These include measuring the similarity between two graphs, finding maximum common edge subgraphs (MCES), and the McGregor, Durand and Pasari algorithms for determining MCS of two given graphs [3].Second, maximal frequent subgraph mining finds all frequent subgraphs g i such that no frequent subgraph g j exists where g i is a subgraph of g j .A typical approach to the maximal frequent subgraph mining problem is to modify the Apriori based approach with additional pruning steps.An approach to find the maximal frequent subgraphs from graph lattices has been discussed using the MARGIN algorithm [17,18].It represents the search space as a graph lattice and mines the maximal frequent subgraphs while pruning the lattice space considerably.The ExpandCut algorithm recursively finds the candidate subgraphs.MARGIN explores a much smaller search space by visiting the lattice around the f-cut nodes.
The Frequent Subgraph Mining Algorithm (FSMA) finds all the subgraphs with a given minimum support in a given graph data set [20].It uses the normalized incidence matrix to present the subgraphs.By scanning the graph database, FSMA first finds all the frequent edges, termed 1-edge frequent subgraphs, which are then extended by adding frequent edges to get 2-edge frequent subgraphs.This procedure of subgraph extension is repeated until no more frequent subgraphs can be generated.The algorithm extends the frequent subgraphs by adding only the frequent edges instead of enumerating all subgraphs which greatly reduces the time complexity.

The FP-GraphMiner Algorithm
Many currently proposed algorithms for mining frequently occurring patterns scan the graph database more than once during the mining process.Since in practice it is commonly disk I/O that most increases response times [2], for large graph databases, multiple scans can increase the time complexity substantially.The proposed study focuses on finding frequent subgraphs in a graph database containing a huge number of related graphs using a single database pass.The objective of this algorithm is to store the details of all frequent subgraphs into a single compact undirected graph by scanning the graph database once and to mine all the frequent subgraphs with any support σ.
As discussed above, a communication network graph with unique node labels is considered for the study.A communication network can be characterised as a time series of graphs, with IP addresses (clients or servers) as nodes and the connection between them as edges.An edge-based array representation, which is more efficient compared to the vertex-based adjacency matrix representation, is used.The memory requirement of this representation is half that of the adjacency list format since it does not store an edge twice.
Each edge of the graph is represented as the 3-tuple S, D, EL , where S is the source node, D is the destination node, and EL is the edge label.Each tuple is read into an Edge Array, EA, which is a collection of all the edges of the graph.For an undirected graph, the edge array has the tuples arranged in lexicographic order of source, destination and edge label.Since no edges are repeated (edges are distinct), the number of tuples in the edge array is the number of edges in the graph.The various definitions and notations used in the proposed algorithm are as follows.
Let GD = {G 1 , G 2 , . . ., G k } be a graph database with k graphs.Each Distinct Edge, DE is represented as DE = S, D, EL .

BitCode of a Distinct Edge
Let m be the number of distinct edges of k graphs.The BitCode of a distinct edge DE i denoted as BitCode(DE i ), 1 ≤ i ≤ m, is a k length bit string, each bit corresponding to a graph in GD, consisting of 1's in the positions of the graphs in which the edge is present and 0's if it is absent.The BitCode gives information about the graphs in which the distinct edge is present.

Weight of a BitCode
The weight of a BitCode of an edge DE i , denoted as W T (DE i ), is the count of 1's in it, (i.e. the number of graphs in which the edge appears).
Since the weight of all edges in a given N ode or a given Cluster are the same (see below), the term weight can also be applied to N odes and Clusters.
Frequency Table A  1. Each N ode in the FP-Graph is a collection of subgraphs with the same BitCode (common features).The maximum number of N odes in an FP-Graph of k graphs is 2 k − 1.
2. Each Edge(U, V ) originates from N ode U and terminates at N ode V with an edge label as decimal equivalent of the BitCode of N ode V where N ode V is the immediate superset of N ode U , i.e., W T (N ode U ) < W T (N ode V ).The FP-Graph construction algorithm outlined in Section 3.1.1shows how the N odes are linked.
3. The N odes with the same BitCode weights are grouped into Clusters.Each Cluster is identified by its unique weight.The maximum number of Clusters in the FP-Graph is k.
To summarize, each N ode contains the subgraphs with the same Bit-Code and each Cluster contains the N odes with the same BitCode weight.
4. The HeaderN ode is an empty N ode pointing to the N odes in a Cluster with maximum weight (highest support).

DFS Walk in Frequent Pattern Graph
A DFS walk in an FP-Graph is defined as a walk (search) starting from the N ode U in a Cluster with a given support (σ) to the HeaderN ode with no backtracking through a sequence of N odes U 1 , U 2 , . . ., U k , such that U = U 1 and HeaderN ode = U k , where all U i are N odes in the path satisfying the following condition, W T (BitCode(U 1 )) < W T (BitCode(U 2 )) < . . .< W T (BitCode(U i )) < . . .< W T (BitCode(U k )).The DFS walk from each N ode in the Cluster to the HeaderN ode yields all the subgraphs with the given support.Both the parameters used in the proposed algorithm and the algorithms themselves are given in the appendix.

The Algorithm
To illustrate the construction of an FP-Graph consider a network communication database GD, as shown in Figure 1, consisting of a time series of graphs obtained by measuring the state of connectivity of the network at regular time intervals 1 .Since the edge labels are not significant in the process, they are not considered further.From the edge arrays the Frequency Table F T is constructed as shown in Figure 2.
Each row in the frequency table is a distinct edge obtained by performing a UNION operation on the edge arrays EA(G 1 ), EA(G 2 ), . . ., EA(G 5 ).These edge arrays are then sorted in descending order of their BitCodes, with the FP-Graph then constructed from this frequency table.The distinct edges with the same BitCode are grouped into a N ode in the FP-Graph.For instance, the distinct edges ab, ac, bc, bd, df form a N ode with BitCode 11111.The solid and the dotted rectangles in Figure 2 show the N odes and Clusters respectively.The N odes in the various Clusters are linked to form the FP-Graph as shown in Figure 3.The graphs in which these edges are present are also listed for ease of understanding.The links of a N ode to various other N odes are established by finding its superset N odes.Any N ode in FP-Graph can have one or more superset N odes.This graph can be now mined for various tasks such as finding frequent subgraphs, maximum common subgraphs, maximal frequent subgraphs, the graphs containing the given query graph and its support, and so on.The objective of the FP-GraphMiner algorithm is as follows.Given a support σ, all the frequent subgraphs with at least that support can be determined efficiently from FP-Graph by performing DFS walks starting from each N ode in the Cluster with the given support σ to the HeaderN ode.The collection of all the edges of the N odes in the DFS path constitutes a frequent subgraph.The number of DFS walks from a Cluster with the given support is the number of frequent subgraphs.If the input graphs are highly dissimilar, the resulting frequent subgraphs are not connected.By clustering all the N odes with the same support within the same Cluster the time taken to perform the search is significantly reduced.
In the case of a communication network, analyzing the frequent subgraphs with various support values provides information about how efficiently the network is utilized.Conversely, the nodes with lower support values are those communication paths that are used less frequently.This knowledge facilitates improvement in the performance of the overall network by efficiently utilising channels and for devising more effective routing algorithms.Thus, the proposed algorithm serves both as an efficient tool for communication network analysis and for detection of failure nodes.
Finding all frequent subgraphs with a given support σ.
The FP-GraphMiner algorithm (see Appendix A) performs DFS walks starting from the Cluster having the specified support to the HeaderN ode to obtain all frequent subgraphs.For instance, the frequent subgraphs with 60% support are shown in Figure 4.The frequent subgraphs extracted from the FP-Graph for a given support need not be induced.Preserving the induced nature of frequent subgraphs obtained by the above algorithm is application dependent.The induced frequent subgraph with 100% support is the maximum common subgraph.In a communication network, finding the frequent subgraphs representing communication paths for a given support need not be induced.On the other hand, if the problem is to find the sub network resulting after node failures, then induced subgraph mining is essential.
Finding graphs in GD containing the Query Graph Q.Given a query graph Q, the graphs in GD containing it can be easily identified by performing a Breadth First Search BFS starting from the HeaderN ode till all the edges of Q are obtained.The BitCodes of these N odes are collected into an array BF S(Q).An AND operation on the BitCodes in BF S(Q) gives a BitCode and the position of 1's in the resulting BitCode shows the graphs containing Q.For instance, given a query graph Q as shown in Figure 5  The FP-Graph could be efficiently mined for detecting outliers also.For instance, node i in graph 1 has only a single data transfer with node h (with frequency of utilization (support) = 1/5).If this is a server failure, necessary action could be taken to identify and remedy the problem.
The MARGIN and FSMA algorithms scan the graph database more than once by following an incremental edge growing methods while finding the maximal frequent subgraphs and all the frequent subgraphs respectively.The FP-GraphMiner algorithm scans the graph database once only to construct the FP-Graph.This FP-Graph represents all frequent subgraphs in a single data structure.The frequent subgraphs with any given support can be mined simply by performing DFS walks in the FP-Graph.The maximum number of Clusters scanned during each DFS walk would be the number of Clusters in the FP-Graph.Thus the proposed algorithm is efficient.

Proofs of Correctness
In this section, the correctness of the proposed algorithm is shown.First the essential claim that the maximum number of different weights of distinct edges of k network graphs is proved as k.
Claim 1 For a graph database GD with k network graphs, the maximum number of different weights of distinct edges DE i , 1 ≤ i ≤ m is k.
Proof: Each distinct edge DE i has a BitCode for which the number of 1's in the BitCode is termed its weight.For k graphs, each BitCode has k bits.Hence, given k length BitCodes, the weight of the BitCode can range from 1 to k.

Claim 2
The maximum number of Clusters and N odes in an FP-Graph of k graphs in a graph database GD in the worst case are k and 2 k − 1 respectively.
Proof: From Claim 1, it follows that k different combinations of weights of BitCodes are possible with k graphs and hence, by the definition of Cluster formation, there is a maximum of k Clusters.Each N ode in an FP-Graph has distinct edges with the same BitCode.The length of the BitCode of each edge is k.As k bits are used for representing one N ode, in the worst case, 2 k −1 distinct combinations are possible excluding the BitCode {000}.Hence, there can be a maximum of 2 k − 1 N odes.For instance, given k = 3, all possible 2 k − 1 combinations of BitCodes of the distinct edges are {001,010,100,011,110,101,111}.Grouping these codes based on weights would yield only 3 groups {001,010,100}, {011,110,101}, {111}.
Claim 3 A DFS walk starting from the N odes in the Cluster with a given support σ to the HeaderN ode results in all the frequent subgraphs with support σ.

Proof:
The FP-Graph is a collection of all distinct edges of k graphs arranged in order of their frequencies into various Clusters.Performing a DFS walk from any N ode to the HeaderN ode gives a frequent subgraph of N odes with at least the given support.The DFS walks starting from each N ode in the Cluster with the given support σ to the HeaderN ode results in all the frequent subgraphs with σ, 1 ≤ σ ≤ 100.

Claim 4
The number of frequent subgraphs obtained by a DFS walk from any N ode with a given support depends on its links with its superset N odes.
Proof: There are a number of DFS walks starting from a N ode and proceeding to the HeaderN ode via superset N odes.This produces a number of frequent subgraphs.For instance, in Figure 3, the links of the N ode with subgraph ad with its superset N odes are two.Hence, performing DFS traversals shows two different paths from the N ode containing the subgraph ad to the HeaderN ode, the two frequent subgraphs with the 60% support are obtained as shown in Figure 6.

Computational Complexity Analysis
In this section we provide an analysis of the time complexity of FP-Graph construction and the FP-GraphMiner algorithm.

FP-Graph Construction
The construction of FP-Graph includes constructing a frequency table and systematically linking the Nodes in various Clusters.

Constructing Frequency Table (F T ).
All the edges in the Edge Array of k graphs are scanned once to construct F T .Let the total number of edges in the k graphs be N E and the number of Distinct Edges be m.All distinct edges along with their BitCodes, are stored in F T in descending order of BitCode.The time required for arranging the rows in F T is mlogm using a hash-based implementation.The total time complexity for constructing F T is O(N E + mlogm).
Each group of edges with the same BitCode comprises one N ode in the FP-Graph.As the edges are stored in the decreasing order of their BitCodes, the number of comparisons to group the edges into N odes is m.Let the number of N odes be N, (N ≤ 2 k − 1 from Claim 2).To link any N ode i, 1 ≤ i ≤ N , to its immediate superset N ode(s), the BitCodes of N odes from N ode i − 1 to N ode 1 are compared by performing i − 1 AND operations.The N odes within the same Cluster need not be compared.The number of comparisons needed at the worst case to link the Nodes in Thus the complexity of O(N 2 ).

FP-GraphMiner
The FP-Graph is mined to obtain all frequent subgraphs with the given support.
Finding all Frequent subgraphs for a given support.Given a specific support, the maximum number of N odes visited to find a frequent subgraph using a DFS walk in the FP-Graph is 2 k − 1.So, in the worst case, to find all frequent subgraphs with a given support, the number of N odes visited is the number of Clusters * links of the N ode with its superset N ode(s).
Finding graphs containing a given query graph Q.
Given a query graph Q, the number of N odes visited to find the edges of Q by performing BF S is N .The time needed for AND operation on the BitCodes of resulting N odes to find the graphs in which Q is present is m.Thus the complexity is O(N m).
Finding significant nodes in the network.Thus to calculate the utilization of the various nodes of the network, some statistical measure is needed to find the contribution of each.The consistency of each node can be measured using a statistical weighted average which takes into account the proportional relevance of each component.
Let the maximum degree of each node in the graph database be D and σ(node i ) be the support of the cluster containing (node i ), 1 ≤ i ≤ D. In the given example, the number of nodes in GD is 9.As each node can be connected to a maximum of 8 other nodes D = 8.From the FP-Graph given in Figure 3, the weighted average (WA) of each node is computed by considering the nodes in all the clusters using the equation:  The weighted averages of all the nodes are computed as follows.The degrees of the nodes present in the subgraphs of FP-Graph with various support values are computed as above and are listed along with the weighted averages in Table 1.
After finding the weighted averages, the nodes can be ranked based on their frequency of usage.The node with highest WA is the most frequently used node.In our example, it might be found that server d has been used more compared to others and thus the failure of server d would affect the functioning of the network.

Experimental Analysis
The experiments were conducted on a 2.8 GHz Intel Pentium Dual Core machine with 504MB RAM using Microsoft Windows XP with the algorithms coded in C. A synthetic Graph Database GD consisting of sparse, non-sparse and complete graphs was generated to analyse the behaviour of the FP-GraphMiner algorithm and to investigate the time complexities.The time taken for detecting all frequent subgraphs in GD with k=100, 500 and 1000 is shown in Table 2. Synthetic data sets containing a maximum of 100, 500, and 1000 nodes for various support values of 25%, 50%, 75% and 100% were analysed.The experimental study shows that the time taken to detect all frequent subgraphs is inversely proportional to the support of frequent subgraphs.If the graphs in the database are more related, the time taken to detect the frequent subgraphs is lower.On the other hand, when the support of a subgraph is less, the length of the DFS path starting from the HeaderN ode to the last edge of the subgraph is higher.A comparative study of the FP-GraphMiner algorithm was conducted against the MARGIN [17,18], gSpan [22] and FSMA [20] algorithms relative to their performance on the real time network data from a large enterprise network using a dataset created through the Wire Shark Network Monitoring Tool2 .The network data uses static IP addresses which are unique, hence, were suitable for our experiments.Each network graph was generated by taking the aggregate at an interval of ten minutes.Six data sets are collected with 1,000, 2,000, 3,000, 4,000, 5,000 and 10,000 time series graphs.In these experiments, parameter settings were minimum support values of 2%, 5%, 10%, 50%, 70% and 100%, average number of edges, |E avg | = 50 and number of Node, |V avg | = 50.Figure 7 shows the run time comparison of the FP-GraphMiner with MARGIN, gSpan and FSMA algorithms under different support values for each data set.
From this analysis, it is clear that the FP-GraphMiner algorithm performs well in comparison to MARGIN, gSpan and FSMA.As the support increases, the FP-GraphMiner algorithm is relatively more efficient.This is because all the frequent subgraphs are obtained by simple DFS walks from the N odes in the Cluster having the given support to the HeaderN ode respectively.On the other hand, FSMA extends the subgraph directly by adding one frequent edge which requires more computational time compared to the proposed algorithm.
The MARGIN algorithm recursively invokes its ExpandCut procedure on each newly found cut which can be a time consuming process.FP-GraphMiner required only a graph traversal process without backtracking to find all the frequent subgraphs with any given support.The experimental results show that FP-GraphMiner is approximately 4 times faster than MARGIN and 1.5 times than FSMA.It can be observed that with the increase in the size of data set, the time complexity increases slowly compared to MARGIN and FSMA, because the distinct edges of all the graphs are stored only once in the FP-Graph which reduces the access time.

Conclusions and Future Work
The FP-GraphMiner algorithm constructs an FP-Graph that stores the distinct edges in all graphs only once thereby conserving memory space without loss of information.This graph can be efficiently mined to obtain all frequent subgraphs with given support.If the first cluster is a frequent subgraph with 100% support, it can easily be converted into the maximum common subgraph of GD by making it induced.No other common path exists that has more communication paths or nodes than the maximum common subgraph.
The algorithm could be efficiently enhanced for any network to make useful decisions.The FP-Graph constructed could be used for other mining concepts like graph indexing, graph classification (such as selecting discriminating features, transform graphs in to feature representation, learning classification model), and so on.
The BitCode concept also lends itself to identifying temporal cliques and alternatives.For example, a sequence of BitCodes for an edge might indicate that one edge was only used when another was not or that the use of one edge was dependent on another edge being used at the same time.Timestamping the graphs might also provide useful information regarding the use of a communications network at specific points in the day.These are areas for future research.

Figure 1 :
Figure 1: A Communication Network Graph Database GD with 5 graphs

Figure 3 :
Figure 3: FP-Graph of the graphs in GD

Figure 4 :
Figure 4: All Frequent Subgraphs with support 60% (a), the algorithm performs a BFS starting from the HeaderN ode to the node containing the last edge.The edges found as a result of BFS on the FP-Graph to find the edges of Q are shown in Figure 5(b).BFS(Q)={11111,11101,11011,11000,10001} By performing an AND operation on the BitCodes, the query graph code of Q, QGraphCode(Q) is obtained.Thus, for the given example, QGraphCode(Q) is equal to {10000}.The positions of 1's in the QGraphCode show the graphs in which Q is contained.Hence in the example, Q is present only in Graph 1.The support of Q in GD is computed as σ(Q) = 1/5 = 20%.

Figure 5 :
Figure 5: (a) Query Graph Q (b) BFS of FP-Graph for finding Q

Figure 7 :
Figure 7: Run Time comparison of FP-GraphMiner with MARGIN, gSpan and FSMA Frequency TableFT is defined as a collection of distinct edges of k graphs in GD in decreasing order with respect to the binary encoding of the BitCodes.Each row in the frequency table contains a 2-tuple DE i , BitCode(DE i ) , where DE i and BitCode(DE i ) represent the distinct edge and the graphs in which the edge is present respectively.
A Frequent Pattern Graph, FP-Graph = {N ode, Edge} is a special type of undirected graph constructed as a collection of N odes and Edges where a N ode is a collection of distinct edges with a common property and an Edge is a link between two N odes.The FP-Graph constructed from the frequency table has the following properties.
FP-GraphMiner takes the edges of k graphs represented as Edge Arrays as input and constructs the Frequency TableFT with the distinct edges of all graphs stored only once.An important property of the proposed algorithm is that the graph database is scanned only once to construct the frequency table.From the frequency table, the FP-Graph is constructed.The FP-GraphMiner algorithm has two phases.Cluster C j is termed the nearest superset of Cluster C i .A N ode p 1 in Cluster C i can be linked to the N ode(s) p 2 in Cluster C j by undirected edge(s) if the distance measure d(p 1 , p 2 ), d (p1∈Ci,p2∈Cj ) p 1 ∩ p 2 = p 1 is satisfied.This means that N ode p 2 in one Cluster is a superset of p 1 in another Cluster only if the distance between them is p 1 itself.Any N ode in a Cluster can be linked to one or more superset N odes.If d(p 1 , p 2 ) = p 1for all the N odes in C j , the N odes in the next higher level Cluster are examined to find the nearest superset N ode(s) of p 1 .For instance, given the example described below, if p 2 is the node containing edges {ab, ac, bc, bd, df } (in Figure3the Cluster with 100% support) and p 1 contains edges {de, ef, eg, f g} (the lefthand Cluster with 80% support), then d(p 1 , p 2 ) = BitCode(p 1 ) ∩ BitCode(p 3.1.1PhaseI:FP-GraphconstructionInPhaseI,thedistinct edges DE are obtained by performing a union operation on the k graphs in the database and are stored in the Frequency TableFT .The distinct edges are arranged in the descending order of their BitCodes.By grouping the edges with the same BitCodes, it is possible to obtain the subgraphs and the details about the graphs in which they are present.As it is difficult to retrieve all the frequent subgraphs for a given support from the frequency table, the FP-Graph is constructed.The N odes with the same BitCode weight are grouped into Clusters.Each Cluster is a collection of the subgraphs in GD with a unique support.The N odes in each Cluster are linked to obtain the FP-Graph according to the two step algorithm below.1.Clustering the N odes.The N odes with same BitCode weights are grouped into Clusters.Thus each Cluster is a collection of subgraphs with the same support.At the worst case, the maximum number of Clusters would be the number of graphs.2.Connecting the N odes in different Clusters.The Clusters are arranged in the hierarchy of increasing order of the support values.Let C i , C j be two successive Clusters with σ(C i ) < σ(C j ). 2 ) = 11101 ∩ 11111 = 11101 = BitCode(p 1 ).Thus p 2 is a superset of p 1 .
The edges of the graphs are taken into account in constructing the FP-Graph.From the FP-GraphMiner algorithm and the experimental study, it is difficult to analyze the efficiency of the nodes where different nodes of the network have different degrees (number of nodes to which a node establishes communication links) at different support values.This means that the nodes with different support values have different significance values.

Table 1 :
Weighted Averages of nodes in FP-Graph of Figure6

Table 2 :
Analysis of FP-GraphMiner on Synthetic Datasets Consisting of Sparse, Non-Sparse, and Complete Graphs

Table 3 :
Algorithm Parameters Collection of k bits of DE W T (BitCode(DE)) Weight of BitCode of DE M AXW T Maximum Weight of BitCode F T Row Row in Frequency Table N ode Collection of Distinct Edges with same BitCode Cluster Collection of N odes with same BitCode Weight HeaderN ode Empty Top N ode of FP-Graph σ Support W DFS walk from a N ode in a Cluster to HeaderN ode