Efficient (k, α)-Maximal-Cliques Enumeration over Uncertain Graphs

Amaximal clique (MC) is a complete subgraph satisfying that no other cliques can take it as their proper subgraph. Given an uncertain graph, the top-KMCs enumeration problem studies howto returnkMCs with the highest rank value. Existing algorithms rankMCs according to their probabilities, thus usually returnMCs with higher probabilities but less number of vertices, and fail to return largeMCs that convey moreuseful information. Considering this problem, this paper studies the problem of enumerating top-KMCs. Our approach returnskMCs with the most number of vertices satisfying that their probabilities≥ α, where eachMCis called anα-MC, and computingk largestα-MCs is called as (k, α)-MCs.We propose an efficient (k, α)-MCs enumeration algorithm, Top-KMC, whichworks in three steps, including partition, enumeration and verification. Here, partition means that wecompute the setMof allMCs without considering the probability information, as if the graph is partitioned into a set of subgraphs. Enumeration means thatwecomputeα-MCs from eachMCofM.Aseach such subgraph is an MC, the cost of computingcommonneighbors for findingα-MCs can be reduced.Verification means thatweneed to verify whether anα-MCis a subgraph of anotherα-MC. If not, it is anα-MC; otherwise, it is a uselessα-MC and should be removed.We further propose an optimized algorithmTop-KMC+to reduce both time and space by merging the above three steps into a whole step. The experimental results on real datasets showthat bothTop-KMC andTop-KMC+can returnk largestα-MCs efficiently.


I. INTRODUCTION
Graphs have been widely used to describe the complex relationships between different entities. In practice, due to noise [1], measurement errors [2], the accuracy of predictions [3], [4] and privacy concerns [5], these relationships does not exist definitely, such as protein-protein interaction (PPI) networks with experimentally inferred links [6]- [8], social networks with inferred influence [9], [10], and sensor networks with uncertain connectivity links [11]. In these applications, we often model the uncertain relationships using an uncertain graph, where the existence of each edge is denoted by a probability [12].
A maximal clique (MC) in a graph G is a complete subgraph satisfying that no other cliques can take it as their proper subgraph [13]- [16]. MC enumeration has attracted significant attention [17]- [20], due to its numerous applications in realworld problems, such as finding overlapping communities from social networks [21]- [27], recommending products for sales [28], [29], detecting ratings [30], [31], discovering followers on social networks, and identifying protein complexes in PPI networks [6]- [8], etc. MC enumeration in uncertain graph is used to find from the underlying graph all MCs satisfying that each one has probability ≥ α, where α is a threshold specified by users [17], and each MC is called an α-MC. However, α-MC enumeration is time-consuming since the number of MCs can be exponential in the number of vertices [17], which makes the enumeration computationally intractable. Furthermore, it is not necessary to enumerate all the α-MCs but some largest ones, i.e., top-K α-MCs, which convey sufficient and valid information.
Most existing works focus on returning top-K MCs on deterministic graphs [32]- [34]. For uncertain graphs, however, to the best of our knowledge, only [18] and [35] have tried to study the problemoftop-K α-MCs.Theauthorsproposedtoreturnk α-MCs satisfying that they have the largest probabilities and the number of vertices is not less than a given threshold. As they rank α-MCs according to their probabilities, they usually return α-MCs with higher probabilities but less number of vertices, and fail to return large α-MCs that convey more useful information. For example, for the uncertain graph in Figure 1, when enumerating the top-1 α-MCs according to their probabilities by limiting that the number of vertices ≥ 3, we can obtain the top-1 α-MCs {v 7 , v 10 , v 11 } with the largest probabilities. However, users may be more interested in the MC {v 5 , v 6 , v 7 , v 8 } which has the largest sizes. Considering this problem, we propose to return k largest MCs satisfying that their probabilities ≥ α, which is denoted as (k, α)-MCs. For example, if using our approach and limiting α = 0.1, then the result of top-1 α-MCs w.r.t. Figure 1 is {v 5 , v 6 , v 7 , v 8 }.
Challenge. Since an MC with more vertices may have the clique probability less than α, and an MC with a larger clique probability may have less number of vertices, we must compute all the α-MCs to find the (k, α)-MCs. For example, in Figure 1, when we set the probability threshold α = 0.7, based on the vertex ID, the first produced α-MC is {v 1 , v 2 }, which is small in fact. For this graph, the last produced α-MC {v 7 , v 10 , v 11 } is the largest one with clique probability ≥ α. We propose a baseline approach which returns the (k, α)-MCs by first enumerating all the α-MCs, then sorting them by the number of vertices, such that to get the result of (k, α)-MCs. Therefore, the efficiency of α-MC enumeration will dominate the overall performance. During computing an α-MC starting from any vertex in the original graph, we need to expand the current α-MC by a new vertex v. Here, the basic operationiscomputingthecommonneighborsofv andthevertices in the current α-MC. Obviously, the less the number of common neighbor computations involved, the higher the efficiency. For example, if we compute α-MC on G in Figure 1, when we set the probability threshold α = 0.1, we will find three α-MCs {v 1 , v 3 },{v 1 , v 4 },and{v 3 , v 4 },whichmeanswehavetocompute the common neighbors three times for {v 1 , v 3 }, {v 1 , v 4 }, and {v 3 , v 4 }. However, if we know that G is an MC, then we do not need to perform any common neighbor computation. To do that, we first compute MCs on G in Figure 1 to get the MC G , where we only need to perform the common neighbor computation once to get the MC G .
Our approach. Motivated by the above observation, we propose a novel idea to compute α-MCs from MCs that are computed first. We first propose an algorithm, Top-KMC, which works in three steps for α-MCs computation. The first step is the partition step, in which we compute the set M of all MCs without considering the probability information, as if the graph is divided into a set of subgraphs. The second step is the enumeration step, in which we compute α-MCs from each MC of M. As each such subgraph is an MC, which is an complete subgraph, the cost of common neighbor computation can be avoided. The last step is the verification step, in which we need to verify whether an α-MC is a subgraph of another α-MC. If not, it is an α-MC; otherwise, it is a pseudo α-MC and should be removed. Although Top-KMC reduces the number of common neighbor computations, it still suffers from inefficiency in time and space. First, we need to compute α-MCs over all the MCs to find all the α-MCs. Second, we need to store all the MCs in M and all the α-MCs in result set R.
To make a further improvement, we propose an optimized algorithm Top-KMC+, which combines the three steps of Top-KMCintoawholestep,andonlymaintainsaresultsetofk·2 tmax to storethelargestα-MCs,wheret max isthelargesttrussnumber [36] of the graph. Specifically, when finding an MC, we immediately compute the α-MCs over this MC, then we maintain a result set of k · 2 tmax to store the candidate α-MCs. Finally we only verify at most k · 2 tmax cliques to find (k, α)-MCs.
Our contributions are as follows. Organization. The rest of this paper is summarized as follows.
Section II provides some preliminaries. In section III, we give a review of existing works. In section V, we propose an efficient algorithm Top-KMC. In section VI, we further introduce and analyze an optimized algorithm Top-KMC+. Section VII shows the experimental results. Section VIII concludes this paper.

II. PRELIMINARIES
Definition 2.1:[Uncertain graph] An uncertain graph is donated as G = (V, E,β), where V represents a set of vertices in G, E a set of edges in G and β a function that maps each edge in E to a probability value in [0, 1]. For example, Figure 1 is an uncertain graph.

Definition 2.2:[Clique]
Given a graph G = (V, E), the vertex set C ⊆ V is a clique, if C = ∅, and there is an edge between each pair of vertices in C.
For example, there are many cliques in Figure 1, such as For simplicity, we also say the edges connecting vertices of C as C's edges. Obviously, given a clique C of n vertices, C has n(n − 1)/2 edges. We use max(C) to denote the vertex in C that has the largest ID.
For example, in Figure 1, Definition 2.4:[Clique size] Given a clique C, the size of C is the number of vertices in C, denoted as size(C).
For example, For example, in Figure 1, if α = 0.1, then Definition 2.9:[Support [36]] Given a graph G = (V, E), the support of an edge e(u, v) ∈ E, denoted by sup e(u,v) , is defined as |{ uvw |w ∈ V }|, where uvw denotes the triangle of three vertices Definition 2.10:[k-truss [36]] Given a graph G = (V, E), the k-truss of G(k ≥ 2), denoted by T k , is defined as the largest subgraph of G such that ∀ e∈E T k sup e ≥ k − 2. The truss number of an edge e ∈ E, denoted by t e , is defined as the maximum k of the k-truss that the edge e is in.
Observation 2.1: Based on Definition 2.10, given an edge e ∈ E G , for each k-truss T k containing e, t e = max(k).

III. RELATED WORKS
We discuss related work from four aspects according to whether we are computing all the MCs and whether the given graph is a deterministic graph, as in below.
MCs enumeration on deterministic graphs. The classic MCs enumeration algorithm on deterministic graphs is the Bron-Kerbosch algorithm [37], which is based on DFS approach. In 2006, Tomita et al. [38] proposed an improved algorithm BKPivoit throughabetterstrategyforpivotselectionbasedonBron-Kerbosch, which runs in time O(3 |V |/3 ). In 2011, D.Eppstein et al. [39] further modified Bron-Kerbosch by choosing more carefully the order in which the vertices are processed, which runs in time O(dn3 d/3 ) with n vertices and degeneracy d.
MCs enumeration on uncertain graphs. In 2015, Pan et al. [17] proposed an algorithm MULE on uncertain graphs, which computes all α-MCs by processing vertices in ascending order of their IDs. In 2019, Ahmar Rashid et al. [20] proposed an approach EMCTDS to enumerate α-MCs. EMCTDS utilizes a novel index named h-index to maintain vertices with degree greater than h. EMCTDS first finds the α-MCs with a descending order in sizes then tracks and deletes all the subsets of the clique. In 2019, Rong-Hua Li et al. [19] proposed two core-based pruning algorithms to reduce the graph size before computing α-MCs, such that to accelerate the computation.
Top-K MCs enumeration on deterministic graphs. In 2015, Yuan et al. [32] proposed an efficient top-K MCs enumeration algorithm EnumK for deterministic graphs. EnumK processes vertices in the order based on DFS, and obtains the top-K MCs by maintaining a result set of k. In 2018, Apurba et al. [33] first proposed a concept of Maximal Quasi-clique, which requires that thedegreeofverticesinthissubgraphshouldbegreaterthanagiven threshold. Then they proposed an algorithm TopkMaximalQC for the top-K MCs enumeration by pruning the vertices with a small core-number, such that to reduce redundant computations. In 2020, Wu et al. [34] proposed an efficient algorithm TOPKLS for finding top-K MCs, which maintains a result set of k that can be dynamically adjusted to facilitate pruning small cliques.
Top-K MCs enumeration on uncertain graphs. The concept of probability top-K MCs in uncertain graphs is first proposed by Zou et al. [18] in 2010. Zhu Rong et al. [35] further studied how to VOLUME 4, 2016 compute top-K α-MCs on large-scale uncertain graphs. Both [18] and [35] return k α-MCs that have the largest probabilities and the sizes are not less than a given threshold, due to that they rank α-MCsaccordingtotheirprobabilities.Asaresult,theymayreturn α-MCs with higher probabilities but less number of vertices, and fail to return large MCs that convey more useful information.
In practice, users may be more interested in α-MCs with the largest sizes, since they convey more useful information. Our work takes this into consideration and returns k largest α-MCs with probabilities ≥ α.

A. ALGORITHM IDEA
Since an MC with more vertices may have the clique probability less than α, and an MC with a larger clique probability may have less number of vertices, it is difficult to find the (k, α)-MCs before knowing all the α-MCs. The baseline approach, i.e., Algorithm 1, returns the (k, α)-MCs by first enumerating all the α-MCs (line 4), then sorting them by their clique sizes(line 5), such that to get (k, α)-MCs (line 6).
In Algorithm 1, we enumerate all the α-MCs by processing the vertices over the uncertain graph in ascending order according to theirIDs.Duringtheprocessing,wemaintainthreesetsC,I andX to recursively compute α-MCs, where C is used to store the current α-MC,I isusedtostorethecandidateverticesthatareconnectedto each vertex in C for expanding the current α-MC, and X is used to storetheverticesthatareconnectedtoeachvertexinC butdefinitely will not to be added to C for ensuring each α-MC is enumerated only once. Both I and X are dynamically updated by Functions GenerateIandGenerateXwhenC isexpanded(lines14to15).Note that, adding a new vertex v to C will decrease the clique probability of C by a probability equal to the product of the edges probabilities betweenvandeachvertexinC.Thus,weneedtocomputetheclique probability of current C before adding the new vertex for ensuring the clique probability of C is no less than α. In Function GenerateI, each tuple (u, r) stored in I satisfies that u is the common neighbor ofallverticesinC,u>max(C)(line22)andclq(C ∪{u}) ≥ α(line 24), where r denotes the probability that the current clq(C) will multiply when vertex u is added to C. In Function GenerateX, each tuple (u, r) stored in X satisfies that u is the common neighbor of all vertices in C and clq(C ∪ {u}) ≥ α(lines 27 to 33). The expansion will terminate when both I and X are empty(line 8).
Example 1: Considering Figure 1. Given α= 0.1 and k = 3, we enumerate α-MCs starting from the first vertex v 1 . The three set C, I, and X are updated dynamically. The running status of C, I and X are shown in Table 1. When both I and X are empty, it means that the current C is an α-MC. When processing vertices v 1 ,v 2 ,v 3 and v 4 in ascending order of their IDs, we get four α-MCs, At the end of the whole enumeration, we find all the α-MCs: 7 , v 9 } and {v 7 , v 10 , v 11 }. Then we sort them according to their sizes, and output the first 3 α-MCs: Algorithm 1: Baseline (k, α)-MCs Enumeration Input: Graph G = (V, E, β), integer k and a probability threshold α Output: (k, α)-MCs

B. ANALYSIS OF ALGORITHM
From Algorithm 1 we know that the running time of Function EUMCdominatesthewholeprocess.Anexecutionoftherecursive of Function EUMC can be viewed as a search tree [17]. Each call to EUMC is a node of this search tree. The first call to the method is the root node. A node in this search tree is either an internal node that makes one or more recursive calls, or a leaf node that does not make further recursive calls. Specifically, the running time at each leaf node is O(1). This is because there are no further recursive calls, thatis,I isempty,andcheckingthesizeofI takesconstanttime.The time taken at each internal node is O(|V |). Line 12 takes O(|V |) time as we add all vertices in C to C and also u. Line 13 takes constant time. Lines 14 and 15 take O(|V |) time. Furthermore, for each vertex, the total number for call Function EUMC is no more than O(2 |V | ). In conclusion, the time complexity of Algorithm 1 is O(|V | · 2 |V | ). The running status of C, I and X, where for each tuple (u, r) in I and X, we omit the probability r due to space limit.

V. TOP-KMC ALGORITHM A. ALGORITHM IDEA
From the above discussion we know that the efficiency of α-MC enumeration is limited by the frequent common neighbor computationsofverticesinthecurrentcliqueC.Obviously,theless thenumberofcommonneighborcomputationsinvolved,thehigher the efficiency. That is, if we first divide the given uncertain graph into some MCs, without considering the probability information, we do not need to perform any common neighbor computations when we compute α-MCs on these MCs. Such preprocessing will help to avoid redundant common neighbor computations since the number of common neighbor computations of MC is much less thanthatofα-MC.Notethat,theremayexistsomeoverlapbetween differentMCs,whichmeansthatoneα-MCcomputedfromanMC may be a subgraph of another α-MC computed from other MCs. We need to remove such α-MCs before returning the final results.
Based on this observation, we propose Top-KMC, ie., Algorithm 2, to compute α-MCs in three steps. First, we compute the set M ofallMCswithoutconsideringtheprobabilityinformationinline1 by calling Function BKPivoit [38]. Second, we compute the result set R of all α-MCs from each MC in M (line 3). Third, we identify the first k largest α-MCs (line 4), and return them (line 5).
Next, we mainly discuss the details of the second and third steps in below.
Enumeration of α-MCs from MC (Function EMMC). In Function EMMC, we compute α-MCs from each MC in M. As there is an edge between each pair of vertices in an MC, the cost of common neighbor computations can be avoided. For each MC g, wefirstcheckwhetherg'sprobabilityisnotlessthanα(line8).Ifso, g is an α-MC, then we directly put it into the result set R without further processing(line 9). Otherwise, we compute α-MCs from g in line 14 by calling Function EUMC, where we do not need to computeanycommonneighborsbecauseg itselfisanMC(lines10 to 15).  7 , v 9 } and {v 7 , v 10 , v 11 }, we directly put them into the result set R since their clique probabilities are not less than the probability threshold α(lines 8 to 9). However, when computing {v 1 , v 3 , v 4 }, we need to call Function EUMC to get all the α-MCs Fast Removing Pseudo-α-MCs (Function FRPMC). There may exist some overlaps between different MCs from Function BKPivoit, which means that one α-MC computed from an MC may be a subgraph of another α-MC computed from other MCs. Therefore, before returning the final results, we need to remove these pseudo-α-MCs, which are subgraphs of another α-MCs.To identify whether an α-clique in R is a pseudo-α-MC, the intuitive yet expensive way is comparing it with all other α-cliques in R to check whether it is a subgraph of another α-clique. Our approach makes acceleration using the following observation.
Observation 5.1: Given two cliques C 1 and C 2 , if C 1 is a subgraph of C 2 , then size(C 1 )≤ size(C 2 ). By Observation 5.1, we first sort all α-cliques in R according to theirsizesindescendingorder(line17),thencompareeachα-clique with the ones before it to find the top-K results. However, when we process the n th α-clique, we need to compare it with the n − 1 α-cliquesbeforeit,whichmeansthatweneedtocheckwhetheritis a subgraph of another one n − 1 times. For further accelerating the process, we propose an inverted index based approach to identify the pseudo-α-MCs. We assign each vertex an inverted list to record whichα-MCsitbelongsto(line18).Thisinvertedlistisconstructed during processing the α-cliques in R. Let C be the n th processed α-clique in R in descending order w.r.t. the clique sizes, if C is an α-MC, then we add n to every inverted list of the vertices in C. The basic idea is to transform multiple subgraph checking operations into one set intersection operation(line 21). We have the following result to guarantee its correctness.
Theorem 1: Assume that all the α-cliques are processed in descending order w.r.t. their sizes, and each vertex v is associated with an inverted list L[v] recording which cliques it belongs to. Given an α-clique C, C is an α- Proof 1:Givenanα-cliqueC,ifthesetintersectionforinvertedlists ofC isanemptyset,itmeansthattheredoesnotexistalargerα-MC containing all the vertices in C. Therefore, C is not a subgraph of any other α-MCs, that is, C is an α-MC.
On the other hand, if C is an α-MC, it means that it is not a subgraph of any other α-cliques in R. And before processing C, its processing order has not yet been added to its inverted lists. Therefore, if C is an α-MC, the set intersection of its inverted lists is empty.
Based on Theorem 1, when processing an α-clique C, we just needtoperformsetintersectiononinvertedlistsofC(line21).Ifthe result is an empty set, it means that C is not a subgraph of any other α-cliques before C, then we know immediately that C is an α-MC (lines 24 to 30); otherwise, C is a pseudo-α-MC and we directly delete it from R(lines 22 to 23).
Example 4: Continue Example 3. Given all the α-cliques in R, we need to identify and remove all the pseudo-α-MCs to get the final results. We first sort all the α-cliques in R according to their sizes.
Then we verify these α-cliques by dynamically updating inverted lists of each vertex as shown in Table 2   Proof 2: Given the MC C with the maximum size in M, for each e(u, v) ∈ C,thereareatleastsize(C) − 2commonneighborsfor u and v, which means that there are at least size(C) − 2 triangles containing e. Based on Definition 2.9 and 2.10, we know that size(C) ≤ t e , ∀e ∈ C. Thus, for all MCs in M, the upper bound for their sizes is max(t e ) = t max .
The time complexity of Function EUMC in Algorithm 1 is O(|V | · 2 |V | ) [17]. Since the input of Function EUMC is an MC For Function FRPMC, first, we need to sort all α-cliques in R. Here,wehavecountedthenumberofcliquesinsomeofrealdatasets and find that there are very few large α-MCs, and their sizes are concentrated in a few numbers, as shown in Table 3. Thus, we sort R by counting sort. Line 27 takes O(t max · 2 tmax · 3 |V |/3 ) time bycountingsortbecausetherearet max · 2 tmax · 3 |V |/3 α-cliques to be sorted.
From Theorem2, weknow thatthe maximumsize ofeach clique C is bounded by t max . We only update the inverted lists for the vertices in α-MCs which means the sizes of L are bounded by k(lines 24 to 30). For each clique C in R, line 21 takes O(k · t max ) time for set intersection.
Theorem 3: Given the result set R, after sorting all cliques in R in descending order according to their sizes, we only need to check the first k · 2 tmax cliques to find all the (k, α)-MCs from R.

Proof 3:
Given an MC C, the number of α-cliques of C is no more than 2 tmax , where size(C) is bounded by t max based on Theorem 2. Since we return the top k results, we only need to check the first k · 2 tmax cliques to find all the (k, α)-MCs from R.
BasedonTheorem3,weneedtoexecuteline21atmostk · 2 tmax times(lines 25 to 27), which means that in the worst case the cost of lines 20 to 30 is O(k 2 · t max · 2 tmax ). Thus, the whole time Based on the above analysis, the whole time complexity of Algorithm 2 is O(t max · 2 tmax · 3 |V |/3 ).

A. ALGORITHM IDEA
From Example 2, 3 and 4 we know that Top-KMC suffers from largestoragespace.Specifically,inthefirststep,weneedtomaintain a set M to store all the MCs computed by Function BKPivot. In the second step, we need to maintain all the α-MCs computed by Function EMMC from each MC in M. However, we only need to verify at most k · 2 tmax α-cliques in the last step. The reason for this problem is that the three steps are performed separately.
Basedonthisobservation,weproposeTop-KMC+,i.e.,Algorith-m3,whichcombinesthethreestepsintoonesteptocomputeα-MCs. First, we use a priority queue R for maintaining the α-cliques in descendingorderaccordingtotheirsizes(line1).Here,wemaintain R based on radix sorting, where we put the cliques with the same size into one list, because the clique sizes are concentrated in a few numbers as Table 3 shown. Such processing will take constant time insteadofO(log|R|)timebytheheapsorting.Foragivenuncertain graphG = (V, E,β),wecomputeallMCswithoutconsideringthe probability information by calling Function BKPivoit. Different fromTop-KMC,whenanMCg isoutputbyFunctionBKPivoit,we immediately compute α-MCs form g by Function ENUMERATE (lines 2 to 3), such that we do not need to maintain an additional set M to store all the MCs computed by Function BKPivoit, where G can be divided into at most 3 |V |/3 MCs. From Theorem 3 we know that, we only need to use the result set R to maintain the largest k · 2 tmax α-cliques. After getting all the k · 2 tmax α-cliques, we call Function FRPMC to find all the (k, α)-MCs from R(lines 4 to 5).
Example 5: Consider Figure 1, given k = 5 and α = 0.1. First, we use a priority queue R of k · 2 tmax = 5 · 2 4 = 80 to maintain the α-cliques. Compared to Algorithm 2, which maintains a result set R of 2 |V | = 2048 in the worst case, Algorithm 3 saves large storage space. When we get the first MC g 1 = {v 1 , v 2 , v 3 } from Function BKPivoit, Function ENUMERATE computes α-MCs from g 1 . In Function ENUMERATE, since the current |R| < k · 2 tmax = 5 · 2 4 = 80, we directly compute α-MCs and add each new generated α-MC into R until |R| = 80, where when |R| = 80 we will replace the new generated α-MC C with thesmallestoneC ifsize(C) > size(C ).Thenwegettheresult set R = {{v 1 , v 2 , v 3 }}. Next, similar to the above process, after computing α-MCs from the second MC The time cost of Function ENUMERATE is dominated by Function EMMC and counting sort on R. The time complexity of Function EMMC is O(t max · 2 tmax · 3 |V |/3 ), which we have analysed in the previous section. Since R is bounded by k · 2 tmax , the time complexity of counting sort on R is k · 2 tmax . Note that, compared to the counting sort in Algorithm 2, whose time complexity is O(t max · 2 tmax · 3 |V |/3 ), the sort in Algorithm 3 is more efficient. Since O(t max · 3 |V |/3 ) O(k), the time cost of Function ENUMERATE is O(t max · 2 tmax · 3 |V |/3 ).
The difference between Function FRPMC in Algorithm 3 and Algorithm 2 is that we do not need to execute line 17 of Function FRPMC in Algorithm 2, because the priority queue R has been sorted in Function ENUMERATE. Thus, the time complexity of Function FRPMC in Algorithm 3 is O(k 2 · t max · 2 tmax ), where we have analysed in the previous section.
Based on the above analysis, the whole time complexity of Algorithm 3 is O(t max · 2 tmax · 3 |V |/3 ).

VII. EXPERIMENTS
We conducted extensive experiments to evaluate the performance of our proposed algorithms. The compared algorithms include: (1) BASIC algorithm (the basic algorithm described in Section IV); (2) Top-KMC algorithm(described in Section V); (3) Top-KMC+ algorithm(described in Section VI). All algorithms wereimplementedinC++andcompiledbyMicrosoftVisualStudio 2019. All the experiments were conducted on the operating system

A. DATASETS
We used 12 datasets to evaluate the performance of the three algorithms. The datasets include: anthra 1 , mtbrv 1 , agrocyc 1 , ecoo 1 , vchocyc 1 , amaze 2 , kegg 3 , xmark 4 , nasa 4 , citeseer 5 , go 6 , yago 7 . Among them, anthra, mtbrv, agrocyc, ecoo, and vchocycdescribethenetworkinbiologybetweenthebiochemistry and the genome; amaze and kegg describe the network of chemical reactions and neural networks in organisms; xmark and nasa describe the connection network between XML documents; citeseer describes the delivery network of protein reactions; go describes the information between gene ontology; yago describes thesemanticinformationbetweendifferentsemanticsrelationship. These datasets represent relevant knowledge in different fields. Detailed statistics of these datasets are summarized in Table 4, where |V |, |E|, d and t max represent the number of vertices, the number of edges, the average of degrees and the maximum truss number in the graph. Table 5 to Table 8 show the comparison of running time. We first vary α by fixing k = 5, then vary α by fixing k = 50, such that to observe the impacts of both α and k.

B. COMPARISON OF RUNNING TIME
From the comparison between three algorithms, we can see that Top-KMC and Top-KMC+ are more faster than BASIC. This is due to the less the number of common neighbor computations of Top-KMC and Top-KMC+, described in Section V and VI. Specifically, Top-KMC and Top-KMC+ are more than 20 times faster than BASIC on amaze, kegg, xmark, vchocyc, mtbrv, anthra, ecoo and agrocyc. And Top-KMC and Top-KMC+ are more than 10 times faster than BASIC on nasa and go. However, Top-KMC and Top-KMC+ are only about 10 times faster than BASIConyagoandciteseer.Thisisbecausethetimecomplexity ofTop-KMCandTop-KMC+areO(t max · 2 tmax · 3 |V |/3 ),which EUMC.
The impact of k.Weonlyreportthefiguresforyagoandagrocyc, due to the results on other datasets show the similar trends. The running time of Top-KMC+ is shown in Figure 3, when varying k from 10 to 500 with a given default α= 0.5, on 12 uncertain datasets. According to the experimental results, we can see that withtheincreaseofk,therunningtimeofTop-KMC+isincreasing. Because in Function FRPMC, we update the inverted lists of all the vertices for each α-MC until we get the largest α-MCs. The larger the k, the more update operations we need to do.

C. COMPARISON OF STORAGE SPACE
We counted the number of cliques and vertices that each algorithm needs to store during the processing, to prove the superiority of Top-KMC+ compared with BASIC and Top-KMC. Recall that BASIC needs to store all the α-MCs, Top-KMC needs to store all the MCs and α-MCs, Top-KMC+ only needs to store first k · 2 tmax α-MCs to find (k, α)-MCs. In Table 9, we counted the number of result set |R| storing all the α-MCs, the number of set M, the result set of k · 2 tmax of Top-KMC+, and the total number of vertices |V | of all algorithms. From Table 9 we can see that the average space Top-KMC+ needs is about 17 times less than BASIC, and 33 times less than Top-KMC. Especially on the sparse graphs(such as xmark, go, vchocyc, anthra, ecoo and agrocyc) Top-KMC+ needs space more than 20 times less than BASIC, more than 40 times less than Top-KMC. These results demonstrate that Top-KMC+ needs much less space.

VIII. CONCLUSION
This paper presents two efficient algorithms for enumerating top-K α-MCs over uncertain graphs, namely Top-KMC and Top-KMC+. Our algorithms are different from existing algorithms, where existing algorithms rank α-MCs according to their probabilities while our algorithms return k α-MCs with the most number of vertices satisfying that their probabilities ≥ α. Top-KMC works in three steps for α-MCs computation, which reduces the number of common neighbor computations. Top-KMC+ combines the three steps into one step for reducing the running time and saving more storage space. The experimental results demonstrate the efficiency of Top-KMC and Top-KMC+ both in time and storage space.Especiallyonsparsegraphs(suchasamaze,kegg,xmark, vchocyc, mtbrv, anthra, ecoo and agrocyc), Top-KMC and Top-KMC+ are on average more than 20 times faster than BASIC. Even on the dense graphs (such as yago and citeseer), Top-KMC and Top-KMC+ are on average about 10 times faster than BASIC. For storage space, the space Top-KMC+ needs is average about 17 times less than BASIC, and 33 times less than Top-KMC.