Improved Algorithm for the Network Alignment Problem with Application to Binary Diffing

In this paper, we present a novel algorithm to address the Network Alignment problem. It is inspired from a previous message passing framework of Bayati et al. [2] and includes several modifications designed to significantly speed up the message updates as well as to enforce their convergence. Experiments show that our proposed model outperforms other state-of-the-art solvers. Finally, we propose an application of our method in order to address the Binary Diffing problem. We show that our solution provides better assignment than the reference differs in almost all submitted instances and outline the importance of leveraging the graphical structure of binary programs.


Introduction
The problem of finding a relevant one-to-one correspondence between the nodes of two graphs may be known as the Network Alignment problem (NAP).It has a wide variety of applications such as image recognition [27,14,30], ontology alignment [2], social network analysis [28,16] or protein interaction analysis [15,10].Multiple formal definitions have been proposed.For instance, one may only consider the node content and aim at finding the mapping with maximum overall similarity score.Such formulation reduces to the Maximum Weight Matching problem (MWM) [4].On the contrary, one may only focus on the graph topologies and search for the alignment with maximum induced overlapping edges.In this case, the problem is known as the Maximum Common Edge Subgraph problem (MCS) [1].In this paper, we propose a mixed formulation: given any two directed attributed graphs A and B, and two measures of similarity on both nodes and edges, find the one-to-one mapping that maximizes the sum of similarity of both matched nodes and induced edges.In other words, we are seeking the assignment that maximizes a linear combination of MWM and MCS.
Several methods have been proposed to address this problem.Among them, NetAlign [2] introduces a complete message passing framework based on max-product belief propagation.In the present paper, we propose several modifications to this algorithm in order to significantly speed up the computation and to control the messages convergence.We show that these modifications provide better assignments than the original model as well as other state-of-the-art solvers in most problem instances.Finally, we show that our method could be used to efficiently retrieve the differences between two binary executables.
The rest of this paper is organized as follows.Section 2 introduces in more details the network alignment problem and reviews some existing solutions.The original model of NetAlign as well as our proposed optimizations are described in Section 3. Finally, Section 4 is dedicated to the experimental evaluation of our method.

Problem formulation
Let us consider any two directed attributed graphs A = (V A , E A ) and B = (V B , E B ), where V A = {1, . . ., n} are the vertices of A (resp.V B = {1 , . . ., m } for B) and Without loss of generality, we assume that none of the graph includes self-loops (they can be considered as node attributes if necessary).
We assume given two arbitrary non-negative measures of similarity on both nodes and edges, σ Using these measures, we may encode all pairwise node similarity scores into a flatten vector p ∈ R |V A |×|V B | + such that p ii = σ V (i, i ), as well as the similarity of all potential induced edges into matrix and 0 otherwise.Finally, we describe any one-to-one node mapping through a binary vector x ∈ {0, Given these definitions, the network alignment problem consists in solving the following constrained quadratic program: where α ∈ [0, 1] is an arbitrary constant that determines the trade-off between node and edge similarity.This problem is a generalization of the Quadratic Assignment problem and, as such, is known to be NP-complete and even APX-hard [23].Though exact algorithms exist, they rapidly become intractable as the number of vertices rises [7].In practice, the computation of the NAP for graphs of more than a hundred nodes must be approximated.In the rest of this section, we review some of the existing approaches proposed to address the problem.

Related work
Amongst the first approaches to approximate the NAP are the spectral methods that can be distinguished into two main categories.On one hand, spectral matching approaches are based on the idea that similar graphs share a similar spectrum [25].Thus, they aim at best aligning the (leading) eigenvectors of the two affinity matrices (or Laplacians) [14,21].On the other hand, PageRank methods approximates the NAP through an eigenvalue problem over the matrix Q [24].The idea consists in computing the principal eigenvectors of Q, and to use it as a similarity score of every possible correspondences.The resulting assignment can then be computed using conventional MWM solvers.Over the years, several improvements have been proposed to enhance the procedure [16,20,12,28].
Other common approaches propose to directly address the quadratic program by means of relaxations.The most common convex relaxation consists in extending the solution set to the set of doubly stochastic matrices.The relaxed problem can then be exactly solved using convex-optimization solvers, and is finally projected into the set of permutation matrices to provide an integral assignment.However, when the solution of the convex program is far from the optimal permutation matrix, the final projection may result in an incorrect mapping [18].Other approaches make use of a concave [27] or indefinite [26] relaxations.The induced programs are generally much harder to solve but yield better results when properly initialized.Note that most methods use a combination of both relaxations [30,29].
Several other methods are based on a linearization of the NAP objective function.The idea is to reformulate the quadratic program into an equivalent linear program, and to solve it using conventional (mixed-integer) linear programming solvers.However, this reformulation usually requires the introduction of many new variables and constraints, and computing the exact solutions of the linear program may become prohibitively expensive.In most cases, relaxations must also be introduced.A successful method, based on Adams and Johnson linearization and using a Lagrangian dual relaxation have been proposed by Klau [15] and later improved by El-Kebir et al. [10].
Finally, a message passing framework has shown promising results [2].It is directly derived from a previous model that provided important results on solving the MWM using max-product belief propagation [4].In our work, we chose to apply this model to our use case.
Note that an important limitation to the NAP is the size of the matrix Q that grows quartically with the size of the graphs.This memory requirement may become prohibitive for relatively large graphs encountered in many real-world problems.It is mainly due to the intrinsic nature of the problem which requires that every potential correspondence is evaluated with regards to other candidates, in order to take into account its topological consistency.In practice, most methods are designed to efficiently exploit the potential sparseness of the matrix Q.Therefore, they generally apply to sparse graphs only.Moreover, several approaches propose to also restrain the number of potential candidates [10,2] and thus the problem complexity.This pre-selection may rely on prior knowledge or on arbitrary decision rules.It mostly aims at preventing the algorithm to compute the assignment score of highly improbable correspondences.The framework we introduce in the next section make use of this interesting feature.

Network alignment via Max-Product Belief Propagation
In this section, we first recall the original model of Bayati et al. [2] and then introduce our proposed improvements.More details about the complete framework and practical implementation of NetAlign can be found in [3].

Original model
The most common reformulation of a constrained optimization program into probabilistic graphical models usually requires the introduction of a factor-graph which assigns maximum probability to the solution of the program.For more details about factor-graph graphical models, we refer the reader to Kschischang et al. [17].In order to design such graphical model, we must encode both the objective function and the constraints of NAP into an equivalent probability distribution.This encoding is done through the factorization of several functions (function nodes) over the different variables of the program (variable nodes).In this form, the mode of the distribution, can be efficiently computed (at least approximated) using the max-product algorithm.In the following, we introduce the factor-graph designed to address NAP.

The factor-graph
We first define the set of variable nodes These variables record the correspondences belonging to the current mapping.We then introduce the different function nodes of our graphical model.We distinguish factors providing the energy to the objective function from factors encoding the program constraints.
On one hand, the objective function is encoded via two sets of function nodes c ii : {0, 1} → R + and c ii j j : {0, On the other hand, the hard-constraints of NAP are encoded using {0, 1} Dirac measures f i : {0, 1} |∂ f i | → {0, 1}, and similarly for g i , such that: where we denote x ∂ f i = {x i j ∈ x, j ∈ V B }, and similarly, By factorizing all the function nodes, we obtain the probability distribution of our factor-graph: where the normalization constant Z denotes the partition function of the model.It is clear that the support of our model distribution ( 1) is equivalent to the set of feasible solutions in NAP.Furthermore, the vector x with maximum probability corresponds to the optimal solution of the NAP.

The message passing framework
The main interest of the factor-graph (1) is the ability to efficiently compute an approximation of its mode using the max-product algorithm.Following the message passing framework proposed by Pearl [22], and denoting by m (t) the value of the message m after t iterations, we may apply the following updates [2] : x j j c ii j j (x ii , x j j )µ (t) X j j →c ii j j (x j j ) Since x ii are binary valued, computing any message λ a→b (x ii ) for x ii = 0 and x ii = 1 is redundant.Therefore, we may halve the computation cost, by only considering its log-ratio m a→b = log λ a→b (1) λ a→b (0) .Following this notation, it can be shown [2] that the messages from the variable nodes to the function nodes introduced above simplify to: where we use the notations: x + = max(0, x).Consequently, the message-passing framework reduces to the following updates:

Solution assignment
After each message-passing iteration, the algorithm must compute the current best solution based on the updated messages.In their work, Bayati et al. [2] proposed several mechanisms, called "rounding strategies".Unfortunately, all of them require to address an instance of the MWM problem.Even worse, according to the authors, the best rounding strategy requires to solve exactly two MWM problems.Though this can be done in reasonable time, proceeding to this computational step after each iteration seriously slows down the algorithm.
To overcome this issue, we propose another simple assignment procedure based on the current estimated "maxmarginals".In fact, following the notation introduced in Section 3.1, and referring to the computation rules of Pearl [22], we may estimate the max-marginal distribution of each variable node such that: After t iterations, we may deduce the current best assignment from the sign of each max-marginal log-ratios log X ii (0) : Note that this mechanism may result in a partial mapping.Therefore, after the last iteration, i.e. when the updates converge or reach the maximum number of iterations, we propose to enhance the resulting assignment with less confident matches by solving a MWM problem on the estimated max-marginal log-ratio of each unmatched node Xii .

Auction based -complementary slackness
A well known problem of the Max-Product algorithm when running on loopy graphical models is that it is not guaranteed to converge.In fact, it may fall into infinite loops and oscillate between few states [19].Therefore, most implementations include a mechanism that enforces convergence [6,13].In their work, Bayati et al. [2] propose a damping factor to mitigate the updates over iterations.Once this damping is sufficiently low, the message updates become insignificant, and the algorithm converges.
In our work, we propose another mechanism based on the concept of -complementary slackness [5].This relaxation has been originally proposed for the Auction algorithm to address MWM instances that admit multiple optimal solutions.The idea is to prevent the saturation of the complementary slackness with a small constant margin.This scheme not only breaks ties and ensures the convergence but also provably finds to the optimal solution for an small enough [5].Furthermore, for larger values, it shows to generally provide near-optimal assignments in much less computation time.Though very similar in its MWM version (α = 1), our model is quite different from an Auction algorithm in the general case.In order to adapt the idea of -complementary slackness to our message-passing scheme, we propose the following modifications of updates (2) and (3): X ki →g i In our experiments, this mechanism shows to strongly favor the messages convergence and thus reduce the number of required running iterations.More importantly, this scheme tends to improve the overall final assignment score in many cases.However, this relaxation suffers from an important drawback: the value of the introduced must be chosen carefully.If set too small, the mechanism cannot fully play its part and the algorithm may reach a maximum number of iterations before converging.On the contrary, if is too high, the algorithm tends to converge too quickly to a poor local optimum.In their work, Bertsekas [5] propose an iterative method, called -scaling, to properly setup the relaxation.The idea consists in repeatedly decreasing after the messages converged, until it reaches a small enough value, known to provide an optimal solution.In our work, we suggest the opposite scheme.The model starts with a rather small that helps to softly break local ties.Then, as the algorithm iterates, we propose to rise the relaxation value each time the messages have not improved the current objective function for few iterations.As rises, the messages are more and more likely to escape their local optimum and to fall into another better one.As soon as the current assignment improves, is set back to its original value, such that the new local solutions can be carefully explored.

Evaluation
The proposed evaluation of our method, named QBinDiff, is twofold.We first analyze its performances as a NAP solver.To do so, we submit the exact same problems to several state of the art solvers, and compare the computed alignment scores, without any consideration about the underlying purpose of the problem instance.Then, we evaluate the relevance of our solution in order to address the binary diffing problem.Therefore, we compare the resulting mappings to ground truth assignments and evaluate each matching method with respect to accuracy metrics.In all our experiments, we ran our method with default parameters: = 0.5, and within a maximum number of 1000 iterations.

Benchmark experiments
We compare our method to four state-of-the-art NAP solvers and their associated benchmarks (see Table 1): PATH [27], NetAlign [2], Natalie 2.0 [11], and Final [28].All these solvers have been configured with their default parameters.In order to analyze the ability of each solver to provide proper solutions both in terms of node similarity and edge overlaps, we tested different values of the trade-off parameter α.Notice that some problems include a similarity score matrix with several zero entries.In some models (ours, NetAlign, Natalie), those entries are considered as unfeasable matches whereas they are legal correspondences in others.These models would thus optimize the problem on a subset of all possible one-to-one mappings.
Our results show that our approach outperforms or nearly ties the other existing methods on every problems (see Table 2).It appears to provide better results on sparse graphs, while it may compute slightly suboptimal assignments on the densest one (1EWK-1U19).It also seems to be the best fitted to perform diffing at different arbitrary setting of the trade-off parameter α, even in the degenerated MCS case (α = 0), unlike most other evaluated methods.
In terms of computing time, as expected, QBinDiff takes much less time to approximate the NAP than NetAlign.Regarding other solvers, both Natalie and Final run within comparable time while Path tends to be very expensive, and may become prohibitive for larger problem instances.Note that it was not able to provide a solution to the Flickr-Lastm problem when α = 0.25 within 8 days, and was considered timed-out.Of course these timings depends on the quality of the implementation and should only be considered with respect to their order of magnitude.We used the implementations provided by the authors of other methods.

Binary Diffing experiments
In a second set of experiments, we propose to evaluate the relevance of our model in order to address the Binary Diffing problem.Given two binary executables A and B, this problem consists in retrieving the correspondence between the functions of A and those of B that best describes their semantic differences.This problem can be reduced to a network alignment problem over the call graphs of A and B.
The results of the alignment should be compared to some ground truth.We computed it as follows.We first downloaded the official repository of a program, then compiled the different available versions using GCC v7.5 for x86-64 target architecture and with -O3 optimization level.Once extracted, each binary was stripped to remove all symbols, then disassembled using IDA Pro v7.2 1 , and finally exported into a readable file with the help of BinExport2 .During the problem statement, only plain text functions determined during the disassembly process are considered.For each program, assuming that this extraction protocol provided us with n different versions, we propose to evaluate our method in diffing all the n(n−1) 2 possible pairs of different binaries.For all these diffing instances, we finally had to extract the ground truth assignments.We proceed in two steps.We first manually determine what we think to be the function mapping that best describes the modifications between two successive program versions.This is done considering the binary symbols, as well as the explicit commit descriptions.Then, we deduce all the remaining pairwise diffing assignments by extrapolating the mappings from version to versions.Formally, if we encode the mapping between A 1 and A 2 into a boolean matrix M A 1 →A 2 such that M A 1 →A 2 ii = 1 if and only if function i in A 1 is paired with function i in A 2 , then, our extrapolating scheme simply consists in computing the diffing correspondence between A k and A n as follows: We applied this extraction protocol to three well known open source programs, namely Zlib3 , Libsodium 4 and OpenSSL5 from which we collected respectively 18, 33 and 17 different binary versions and therefore 153, 528 and 136 diffing instances.Statistics describing our evaluation dataset are given in Table 3.
As a baseline, we chose to compare to the two most common diffing tools: BinDiff and Diaphora.BinDiff uses a matching algorithm close to VF2 [8] originally introduced to approximate the MCS whereas Diaphora proposes a different greedy assignment strategy that first matches the most similar functions and then searches for potential correspondences in the remaining ones.This mapping mechanism is known to provide 1  2 -approximate solutions to the MWM problem [9].Note that, in order to compare with NAP solvers, and because, in general, binary diffing favors recall over precision, QBinDiff is designed to produce a complete assignment and does not include a mechanism to limit the mapping of very unlikely correspondences during computation.
Our experiments show that QBinDiff generally outperforms other matching approaches in both alignment score and recall (see Table 4).In fact, our method appears to perform clearly better at diffing more different programs, whereas it provides comparable solutions on similar binaries.This highlights that the local greedy matching strategies of both BinDiff and Diaphora are able to provide good solutions on simple cases but generalize poorly on more difficult problem instances.This results should be view as promising in the perspective of diffing much more different binaries.
We reproduced our experiments with different trade-off parameters α, in order to estimate which setup should be used such that the optimal assignment meets the ground truth.Our computations suggest that a trade-off around 0.75 is a fair choice though a slightly higher value could provide satisfying assignments as well, especially while diffing OpenSSL programs.

Limitations
Though our method improves the state-of-the-art, some difficulties remain.A first limitation of our approach concerns the design of the network alignment itself.Indeed, the determination of the trade-off parameter α is subject to arbitrary considerations and may have an important impact on the resulting Another difficulty of our approach is the determination of the relaxation parameter .As previously mentioned, its setting controls on a trade-off between the quality of the solution and the speed at which the messages converges.While the proposed solution gives very satisfactory results, other rising schemes could be used to adapt the parameter to the current solution during computation.We leave this investigations to future work.
Finally, our formulation is limited by its memory consumption associated to the use of a quartic memory matrix Q.Though the proposed model enables to significantly reduce the problem size by limiting the solution set to the most probable correspondences, this relaxation inevitably induces information loss, especially for large graphs where the relaxation must rise consequently.In practice, graphs of several thousands of nodes can be handled efficiently.For larger instances, a solution could consists in first partitioning the graphs into smaller consistent subgraphs, and then proceed the matching among them.However such partition is not trivial and might result in important alignment errors.

Conclusion
In this paper, we introduced a new algorithm to address the network alignment problem.It leverages a previous model and includes new mechanism to enforce message convergence as well as speed-up the computation.Moreover, it proved to provide better assignments than the original version in almost all alignment instances.
Our evaluation showed that our approach outperforms state of the art solvers.It also appears to be very well fitted to compute proper solutions for different trade-offs between node similarity and edge overlaps.We finally proposed an application of our method to address the binary diffing problem.Our experiments showed that our algorithm provides better assignments than other existing approaches for most diffing instances.This result suggests that the formulation of the binary diffing problem as a network alignment problem is the correct approach.
1877-0509 © 2021 The Authors.Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the scientific committee of the KES International.

Table 1 :
Description of our benchmark dataset.

Table 2 :
Resulting objective scores of each solver on different benchmark problems.The last column records the average computing time in seconds.

Table 3 :
Description of our binary diffing dataset.The last five columns respectively record the number of different binary versions, the number of resulting diffing instances, the average number of functions and function calls and the average ratio of conserved functions in our manually extracted ground truth.Moreover, this trade-off certainly depends on the density of the graphs since densest graphs mechanically include more potential edge overlaps.Finally, the proposed model is designed to compute complete assignments and therefore does not include any mechanism to optimize the precision.This later could be done in introducing a penalty term ζ to the node similarity scores such p ii = σ V (i, i ) − ζ.Correspondences with negative similarity scores would thus belong to the final mapping if they induce enough topological similarity.