A Heuristic for Direct Product Graph Decomposition

In this paper we describe a heuristic for decomposing a directed graph into factors according to the direct product (also known as Kronecker, cardinal or tensor product). Given a directed, unweighted graph~$G$ with adjacency matrix Adj($G$), our heuristic searches for a pair of graphs~$G_1$ and~$G_2$ such that $G = G_1 \otimes G_2$, where $G_1 \otimes G_2$ is the direct product of~$G_1$ and~$G_2$. For undirected, connected graphs it has been shown that graph decomposition is"at least as difficult"as graph isomorphism; therefore, polynomial-time algorithms for decomposing a general directed graph into factors are unlikely to exist. Although graph factorization is a problem that has been extensively investigated, the heuristic proposed in this paper represents -- to the best of our knowledge -- the first computational approach for general directed, unweighted graphs. We have implemented our algorithm using the MATLAB environment; we report on a set of experiments that show that the proposed heuristic solves reasonably-sized instances in a few seconds on general-purpose hardware.


Introduction
Decomposition of complex structures into simpler ones is one of the driving principles of mathematics and applied sciences.Everybody is familiar with the idea of integer factorization, a topic that is actively studied due to its number-theoretic as well as practical implications, e.g., in cryptography.The concept of factorization can be applied to other mathematical objects as well, such as graphs.
Once the concept of "graph product" is defined, one may naturally ask whether a graph G can be decomposed into the product of two (or more) smaller graphs.
Graph products are an active area of research because they are involved in a number of computer science applications, such as load balancing in distributed systems [2], network analysis [11], symbolic computation [4], and quantum computing [9].The most common types of graph products that have been investigated in the literature are: Cartesian product, Direct product, Strong product and Lexicographic product.Of these, the Direct product, also known as Kronecker or Cardinal product, is widely used and will be the focus of this paper.For example, the Kronecker product of graphs has applications related to graph coloring, decomposition, embedding and matching [1,8], automata theory [5], concurrency modeling in multiprocessor systems [10] and computer networks [11].We use the symbol × to denote the direct product, e.g., G = G 1 × G 2 .
It has recently been shown in [3] that deciding whether an undirected, unweighted, non-bipartite graph G is composite according to the direct product, i.e., whether there exist nontrivial graphs G 1 , G 2 such that G = G 1 × G 2 , is at least "as difficult as" deciding whether two graphs are isomorphic (a graph is nontrivial if it has more than one node).More formally, the graph isomorphism problem is polynomial-time many-one reducible to the graph compositeness testing problem (the complement of the graph primality testing problem).A consequence of this result is that the graph isomorphism problem for undirected, non-bipartite graphs is polynomial-time Turing reducible to the primality testing problem.It is therefore unlikely that there exists a polynomial-time algorithm for graph factorization according to the direct product, unless graph isomorphism is in P , i.e., graph factorization is graph isomorphism-hard.
The problem of graph factorization has been extensively studied [6] from the theoretical point of view: mathematical properties of graph products are known, as well as factorization algorithms for a few special cases.For example, it is known that prime factorization of an undirected, connected and non-bipartite graph with n nodes and m edges can be found in time O(mn 2 ) [7].However, in [3] it is shown that prime factorization of an undirected, unconnected, non-bipartite graph is graph isomorphism-hard, suggesting that the lack of connectedness plays a major role in making direct product primality testing and factorization harder.
Despite the large volume of theoretical work, the problem of graph factorization has not received yet much attention from the experimental research community.Indeed, to the best of our knowledge, no implementation of graph factorization algorithms for general directed graphs is available.The problem is exacerbated by the fact that the existing algorithms only work on special kinds of graphs, and it is not known whether they can be generalized to arbitrary directed graphs, or whether different algorithms exist for general graphs.In this paper we begin to bridge the gap between theory and practice by proposing a heuristic for direct-product factorization of general graphs: given a directed, unweighted graph G with n nodes and two positive integers n 1 , n 2 such that n = n 1 × n 2 , our algorithm looks for two nontrivial graphs G 1 , G 2 with n 1 and n 2 nodes, respectively, such that G = G 1 × G 2 , provided that such graphs exist.Our heuristic is based on gradient-descent local search.As such, it is not guaranteed that it finds a solution, even if one exists; however, we carried out a set of computational experiments that show that our algorithm does find a valid solution quickly in many cases.
To the best of our knowledge, the heuristic described in this paper is the first computational solution to the problem of factorization of general unweighted graphs, i.e., graphs whose structure is not subject to any constraint.
A directed graph G = (V, E) is described as a finite set V of nodes V = {v 1 , . . ., v n } and a finite set of edges E ⊆ V × V , where an edge e ∈ E is an ordered pair e = (u, v), u, v ∈ V ; an edge of the form (v, v) is called self loop or simply loop.Given a graph G, V (G) and E(G) are the set of nodes and edges of G, respectively.We denote by G 1 ∪ G 2 the disjoint union of graphs G 1 and G 2 , i.e., the graph with node set Four types of graph products have been investigated in the literature: Cartesian product, Direct product, Strong product and Lexicographic product.In all cases, the product of two graphs G 1 , G 2 is a new graph G whose set of nodes is the Cartesian product of V (G 1 ) and V (G 2 ): For the Cartesian product G 1 □ G 2 , the edge set is defined as For the Lexicographic product G 1 • G 2 , the edge set is defined as In this paper we are concerned with the Direct product, also known as Kronecker or cardinal product.The direct product of two graphs G 1 , G 2 is denoted as G = G 1 × G 2 , where the edge set is defined as 1 shows an example of the direct product of two graphs G 1 , G 2 .The edge set of an unweighted graph G can be represented as an adjacency matrix Adj(G).If G has n nodes, its adjacency matrix M = Adj(G) is an n × n binary matrix, where m ij = 1 if and only if (v i , v j ) ∈ E. We denote the set of binary matrices of size n × m as B n×m , B = {0, 1}.
The algorithm described in the paper relies on the fact that the direct product G 1 × G 2 of two graphs can be expressed in terms of the Kronecker product of their adjacency matrices [6,14].Given an n × m matrix B and a p × q matrix C, the Kronecker product A = B ⊗ C is an np × mq matrix that is composed of n × m blocks of size p × q, each block being the product of elements of B and the whole matrix C: Note that if B, C are binary matrices, then A = B ⊗ C will be as well.The relation between the direct product of graphs and the Kronecker product of their adjacency matrices is expressed by the following Lemma.
where P is a suitable permutation matrix.
We recall that a permutation matrix P ∈ B n×n is a square binary matrix with exactly a single 1 on each row and column.In other words, the adjacency matrix of the direct product of G 1 , G 2 is equal to the Kronecker product of the adjacency matrices of G 1 and G 2 , up to a rearrangement (relabeling) of the nodes of the resulting graph G 1 × G 2 .
For example, for the graphs in Figure 1 we have: The adjacency matrix of graph G P Permutation matrix × The direct graph product operator ⊗ The Kronecker matrix product operator ϕ(A B, C) Metric that shows "how well" A can be written as the Kronecker product B ⊗ C (lower is better) In this case the permutation matrix is the identity matrix, since we are assuming the mapping A nontrivial graph G is a graph with more than one node (|V (G)| > 1).We say that a graph G is prime according to a given graph product ⊙ if G is nontrivial and G = G 1 ⊙ G 2 implies that either G 1 or G 2 are trivial, i.e., one of them has exactly one node.
Finally, we introduce the concept of block matrix that will be used in the following to discuss several subroutines of our heuristic.Definition 2.2 Let us consider a binary matrix A made of dimB × dimB submatrices (or blocks) C ij of size dimC × dimC each.Let µ be the average number of ones over all blocks1 , and let s ij be the number of ones in C ij ; we say that A is a block matrix if and only if for each block C ij : In other words, a block matrix is made of blocks such that each block is either entirely zero, or has a number of 1s that is strictly above the average number of 1s over all blocks.Fore ease of clarity, some examples of block matrices are provided in Figure 3f and in Figure 4c.
Table 1 summarizes the notation used in this paper.

A Heuristic for Direct Product Factorization
In this section we describe a heuristic for searching for a nontrivial decomposition G = G 1 × G 2 of a directed, unweighted graph G.The heuristic is not guaranteed to find such a decomposition, even when one exists.We assume that the size (number of nodes) of G 1 and G 2 is known; if this is not the case, then by the definition of direct product it must hold that n = n 1 × n 2 where n, n 1 , n 2 are the number of nodes of G, G 1 , G 2 respectively; therefore, both n 1 and n 2 must be nontrivial divisors of n.Since the number of divisors of n is bounded from above by n, we can brute-force all combinations of n 1 , n 2 , whose number grows polynomially w.r.t.n.Therefore, the algorithm performance would be impacted by (at most) a polynomial factor in n.
The basic idea is to take advantage of Lemma 2.1 to find a suitable permutation matrix P so that the (permuted) adjacency matrix of G is in block form and can therefore be written as the Kronecker product of two smaller matrices B and C; these matrices can then be interpreted as the adjacency matrices of two graphs G 1 , G 2 with the property that G = G 1 × G 2 .The problem is equivalent to finding a graph G ′ isomorphic to G such that Adj(G ′ ) = B ⊗ C; this is not surprising, given the relationship between graph isomorphism and compositedness testing proven in [3].
One more ingredient is needed to complete the heuristic, i.e., a way to decide whether a binary square matrix can be written as the Kronecker product of two smaller matrices.This is an instance of the more general nearest Kronecker product (NKP) problem [13,12] is minimized, according to some norm ∥ • ∥.If A is the Kronecker product of B and C, then the minimum norm would be zero (ϕ(A, B, C) = 0).The NKP problem (2) can be solved by computing a Singular Value Decomposition (SVD) of a suitably reshaped and permuted version of matrix A (see [13] for details).
To recap: to decompose a directed, unweighted graph G with adjacency matrix M = Adj(G), we need to find a permutation of M, say P ⊺ MP, and two square matrices M 1 = Adj(G 1 ) and M 2 = Adj(G 2 ) of given sizes such that: or "no solution found" P = Identity matrix of size n × n; repeat for i = 1, . . ., K do Let P i be obtained from P by exchanging a random pair of rows/columns; return "no solution found"; end With a slight abuse of notation, in the following we write ϕ(A) to denote the minimum value of ϕ(A, B, C) that can be achieved by suitably choosing B, C; in other words, ϕ(A) shows how well matrix A can be expressed as the Kronecker product of smaller matrices B, C of given sizes.ϕ() is a lower-is-better metric: if ϕ(A) = 0, the matrix A is in Kronecker form.
Note that we must impose the additional constraint that M 1 and M 2 must be binary matrices, which is not guaranteed by the algorithm for Kronecker factorization described in [13].For this reason we developed a novel algorithm for solving the NKP problem on binary matrices, described below.
Algorithm 1 shows a very high-level overview of the proposed heuristic.The algorithm uses the gradient-descent technique to find the permutation P such that the matrix P ⊺ AP can be decomposed according to the Kronecker product.
Matrix P is built incrementally, starting from the identity permutation, by exchanges of rows and columns.At each step, the algorithm tries to decrease the value of ϕ(P ⊺ AP) so that either the value becomes zero (in which case we have found a decomposition of the input graph), or no optimal permutation is found within the allotted number of steps.
To limit the search space, at each step the algorithm generates K random permutations of the current matrix P; for each permutation P i , the algorithm solves the NKP problem by identifying two binary matrices B i , C i such that is minimized.If the minimum value r j is not zero (within a user-defined tolerance tol ), the process is repeated starting from the "best" permutation matrix P j found so far.If no solution is found after some maximum number of iterations, the procedure assumes that matrix A can not be expressed as the Kronecker product of two matrices of size n 1 × n 1 and n 2 × n 2 .
Despite its appealing simplicity, a direct implementation of Algorithm 1 is not effective in solving the graph decomposition problem.First of all, gradient-descent procedures may become stuck in a local minimum, with the result that they fail to find a global optimum (in our case, the global optimum is a permutation P j for which r j = 0, provided that such a permutation exists).Another issue, already stated above, is that we need to solve a modified version of the NKP problem, in which the factors B and C must be binary matrices.

Algorithm Details
We now provide the detailed description of our graph decomposition heuristic, where both these issues will be addressed.The complete pseudocode of the heuristic is provided in the following, while a MATLAB implementation along with the source code can be downloaded from https://github.com/calderonil/kron.
As already shown in (1), the Kronecker product A = B ⊗ C of two binary matrices B and C has a block structure: if b ij = 0 then the corresponding block b ij C will contain only zeros.The algorithm consists of a main procedure -alternateLocalSearch -and by three subroutines.The main procedure, detailed in Algorithm 2, swaps rows/columns and evaluates if the swaps lead to an improvement (reduction) in the value of function ϕ() (2).We employ two different metrics var and frob to evaluate ϕ().Let s ij be the number of 1s in the block C ij , and let var denote the variance of the number of 1s of each block, i.e., σ 2 = V ar{s ij | 1 ≤ i, j ≤ dimB }.If matrix A approaches block matrix form, then the variance decreases because, intuitively, A has "almost full" and "almost empty" blocks (refer to Definition 2.2).Indeed, a matrix already in block form is made of two types of blocks: empty blocks, that contain only 0s, and full blocks that are nonzero and share a considerable amount of 1s.Conversely, the metric frob measures how much the nonempty blocks of A fit the Kronecker form, i.e., whether or not the content of the blocks is the same; this metric is based on the one used in [13] for solving the MKP problem: the rows of the same block of size dimC × dimC of A are concatenated, and become a row of a new matrix F. F is then multiplied with F ⊺ element by element.Metric frob is the squared sum of the elements of this product.
The local search procedure starts with metric var, and switches back and forth to frob when no significant improvement in the value of ϕ() is observed for a given number of iterations.The use of two different metrics is motivated by the empirical observation that it speeds up convergence towards the solution, and helps to escape local minima.When we find a swap that improves the value of function ϕ(), we apply the swap to the current permutation matrix P and proceed with the next iteration.The main loop terminates when the number of iterations exceeds a user-defined threshold, or when a decomposition is found.
Before swapping rows and columns, matrix A is processed by the kronGrouping procedure (Algorithm 3).

Algorithm 3: Kron grouping
Input: A Output: groupedA for i=1 to dimA do for j=1 to dimA do if i==j then MR(i,j)=MC(i,j)=0; else /* Simil computes the similarity between two rows/cols using the dot product.To weight zero-based and one-based similarity, both A and 1-A are considered.*/ MR(i,j) = simil(A(i,:),A(j,:)) + simil( 1-A(i,:),1-A(j,:)); MC(i,j) = simil(A(:,i),A(:,j)) + simil( 1-A(:,i This procedure tries to permute A in such a way that it becomes similar to a block matrix.The intuition is that A should be permuted in such a way that it consists of blocks that contain as many 1s as possible, and others that are all zero.kronGrouping tries to exchange rows/columns in such a way that nonempty blocks (i.e., blocks of A that contain 1s) either lose or acquire 1s.To this end, kronGrouping evaluates the similarity between rows/columns; given two vectors v = (v 1 , . . ., v n ) and w = (w i , . . ., w n ), the similarity of v and w is a value that is proportional to the number of elements for which v i = w i .More specifically, as understandable from the last for loop of the algorithm, six different permutations are tested, each one relying on a linear combination of rows similarity and columns similarity.Refer to Figure 2 for an example.This information is used to produce a permutation that groups similar rows/columns in the same submatrix.
At each iteration of the main loop, alternateLocalSearch performs several operations before starting to test each possible swap.First of all, the procedure checks whether A is a block matrix.If not, subroutine outsiders is applied to A (Algorithm 4).While kronGrouping tries to rearrange the whole matrix at a glance, outsiders follows a fine tuning perspective and derives an ordered list of swaps that seem to be more convenient.This procedure allows to detect those rows and columns that are evidently misplaced with respect to the block structure we desire to reach.
To derive the swaps list, outsiders verifies which block of A should preferably be filled (i.e. it has a number of 1s above average) and which block should preferably be emptied.This condition always occurs when A is not a block matrix, as the number of elements set to one in nonempty blocks is unbalanced.The density of each block to be filled is considered as well: the best swap is supposed to be the one that moves 1s from a block to be emptied replacing zeros in a block to be filled.However, moving elements in blocks that already have a high density would produce overfilled blocks that are unlikely to appear in a feasible factorization.When the procedure terminates the bestSwaps list is prepended to the randomized list of each legal swap (baseSwaps) that is used in the main loop as fallback.If the list of swaps is empty, there are no elements set to one in each of the blocks that are likely to be emptied.Thus, according to Definition 2.2, the matrix is now a block matrix.The basic principles of outsiders procedure are outlined in Figure 3.At the beginning, outsiders identifies which block of A should receive 1s (because it already has more 1s than average) and which block should lose 1s.Then, two matrices are produced, WR and WC.WR contains the number of 1s found in each portion of a row of length dimC -the portion of the row corresponding to a given block.If the block should receive 1s, the number of zeros is counted instead.The second matrix WC contains the same information computed column-wise.A third matrix MS is computed in order to find the best swap: the procedure loops on each row/column on a block-by-block basis and multiplies the elements of WR and WC corresponding to blocks that are in opposite conditions, i.e. one receives and the other loses 1s.The larger the values of MW, the better.This product is tuned according to the density of the block that receives 1s.However, moving A is processed by the onionSearch subroutine.
The procedure onionSearch is divided in two parts (Algorithm 5).First, the block matrix is rearranged to push the maximum feasible number of blocks to the top-left corner, then, starting from the top-left corner, it performs a submatrix factorization in a layer-by-layer fashion.This second step is designed to adopt the first nonempty block as template and to reproduce the same layout in each other nonempty block.
Following the principles depicted in Figure 4, a binary block matrix is derived from A through permutations so that the dot product with a weight matrix is maximized.To find a convenient permutation, random swaps are applied following the same principle of the main local search.When the procedure reaches the 75% of the optimum (cornerizeOptVal ), the block matrix is deemed to be sufficiently rearranged in a top-left fashion and the procedure terminates.Weights are defined to increase the maximum number of filled blocks in those layers of the onion that are processed first.We use the term "onion" to refer to the fact that the factorization works by examining matrix A layer-by-layer.When A reaches this form, we also refer to it as a cornerizedMatrix (see Algorithm 2 for reference).
We should now imagine the block matrix as being made of dimB layers, starting from the topleft corner.The first layer contains one block, the second layer contains three blocks, and so forth.The last layer is made of blocks that belong to the last row and column of the block matrix; refer to Figure 4b.The second part of onionSearch tries to find a feasible factorization for submatrices following a layer-by-layer perspective, as shown in Figure 5.The local search follows the same principles of alternateLocalSearch but uses metric frob only.Moreover, rows/columns swaps are allowed in a limited range only.Specifically, swaps are limited to those indices corresponding to rows and columns that fall in the unsorted layers.
It is important to note that the single block B 11 representing the first layer is implicitly factorized as B 11 = I 1 ⊗ B 11 .The special case where B 11 is empty does not represent an issue.In this case, indeed, the procedure treats the first layer of the onion as already sorted and proceeds to the next layer.B 11 is thus used as template during the local search performed on the successive 2 × 2 blocks submatrix, composed of four blocks.As rows and columns swaps may only range in the second -unsorted -layer, the sole feasible solution is the one that permutes each block in accordance to the one serving as template.If the local search performed on the 2 × 2 blocks submatrix succeeds, onionSearch steps to the third layer.Blocks falling in the first and in the second layer will stay untouched as they are sorted already.The procedure steps forward until the last layer is processed.If the local search on the last layer succeeds, the problem is globally solved.Otherwise, alternateLocalSearch proceeds testing swaps randomly to find a feasible solution.
Finally, to handle local minima, random permutations are applied in two more cases.First, it is possible that the list of best swaps returned by the outsiders procedure is exhausted without improvements.If this condition occurs through several iterations, a random permutation is applied to 45% of rows/columns of A. Second, if alternateLocalSearch fails to reduce the value of function ϕ() after a maximum number of iterations (i.e. both best swaps and default swaps are exhausted), a complete random permutation is applied (the local search restarts from scratch).
Note that, each time alternateLocalSearch applies a random permutation to escape from local minima, independently of its extent, the procedure kronGrouping is called again.

Computational experiments
We implemented the heuristic described in Section 3 in the MATLAB programming environment; the source code is available online at https://github.com/calderonil/kron.The program provides a graphical user interface (Figure 6) that allows the user to generate a random binary matrix A of given size dimA = dimB × dimC ; the program then looks for a permutation P such that A = P ⊺ (B ⊗ C)P where B and C are binary matrices of size dimB × dimB and dimC × dimC , respectively.A is generated starting from random P, B, C, which are then "forgotten" and must therefore be computed from scratch; this ensures that a factorization of A always exists.The user must provide the densities ρ B , ρ C (fraction of 1s) of matrices B, C; the density ρ A of A will therefore be ρ A = ρ B × ρ C .Although our application allows the user to impose additional constraints on the graph represented by the matrices (e.g., non-bipartitedness and/or non-connectedness), we did not impose any constraint for the experiments described in this section.
The combinations of parameters used in the experiments are shown in Table 2.The density (fraction of 1s) of A, B, C are denoted as ρ A , ρ B and ρ C , respectively.We consider several Table 2: Each row summarizes the performance of our heuristic on 10 different initial random matrices A; for each matrix A we executed the heuristic 10 times, each one starting from a random permutation of A. α = dimC /dimB is the ratio between the sizes of C and B. t ′ avg is the average of the minimum execution times of each different problem.The time spent on instances for which no solution has been found have been excluded from average, minimum and maximum execution times.combinations of sizes dimB , dimC according to a parameter α that denotes the ratio between the size of C and B (α = dimC /dimB ).Intuitively, high values of α denote that the size of the factors of A are different.As this assumption does not cause a loss of generality, we also set dimC ≥ dimB .As dimB also defines the number of blocks of A, limiting this parameter improves the algorithm speed, as many procedures loop through the blocks of A.
Each row of the table summarizes the result of one run: a run consists of 10 random problem instances with the chosen parameters.Since our heuristic is sensitive to the initial conditions, for each instance we consider 10 initial random permutations of the matrix A; this is equivalent to fixing B, C and choosing ten different random permutation matrices P. Therefore, each row summarizes 10 × 10 executions of our program.
The program has been executed on a desktop PC with an Intel Xeon CPU running at 3.30 GHz with 16 GB of RAM running Windows 10 (Matlab R2021a).Both a single instance mode and a batch mode are provided; the former solves a single problem, while the latter generates a set of random instances with the same parameters (dimA, dimB , dimC ).To collect the results we show in the following, the program was executed in batch mode.For a single instance, our program executes alternateLocalSearch for at most 500 iterations (see the main loop of Algorithm 2); when no feasible factorization is found after the maximum allowed number of iterations, the procedure stops and the failure count is incremented.
For each run we collect five metrics: the percentage of executions at the end of which no factorization was found (failure %); the minimum and maximum execution times in seconds (t min and t max , respectively); the average execution time across all executions that did find the optimal solution (t avg ); the average of the minimum execution time for each different problem (t ′ avg ).More precisely, t ′ avg is computed as follows: 1.For each combination of parameters, we generate 10 random instances consisting of matrices A, B, C of the appropriate size and content; 2. For each instance, we generate 10 random permutation matrices P; we get 10 variations A = P ⊺ (B ⊗ C)P that only differ by the permutation P 3. We compute the minimum time to get the solution of the 10 variations from the previous step; 4. t ′ avg is the average of the minimum times computed at the previous step.
t ′ avg is useful because there is a significant variance across the execution times of the 10 variations of the same problem (see below).Indeed, as stated above, there might be a huge variation in the time required to factorize a matrix A = P ⊺ (B ⊗ C)P depending on the permutation P. Therefore, t ′ avg is the average time to solve a problem instance if we were able to parallelize the heuristic across 10 independent execution units, so that the first one that gets the solution stops the computation.As can be observed from Table 2, the minimum time to compute a solution is very low (a few seconds) for all problems.The largest matrix A (size 300 × 300) can be factored in t min = 6.71s.It should be observed that as the density of A increases, the problem becomes more difficult for 0 20,000 40,000 60,000 80,000 our heuristic as witnessed by the increasing fraction of unsolved instances.We also observe that there is a large variability between the minimum and maximum times required to solve an instance; indeed we observe that the gap between t min and t max becomes more than an order of magnitude, especially for large matrices A that are decomposed into factors of unbalanced sizes (α = 3).This is due to the fact that the heuristic is sensitive to both the permutation P, and to the sequence of swaps that are applied during the computation (the swaps are in part generated pseudo-randomly).
To strengthen the study, we also carried out a set of tests on wider matrices, following the very same approach.In this second set of tests, we kept the density fixed.Five groups of experiments were considered, ranging from 200 × 200 matrices to 625 × 625 matrices, with respect to the size of A. For each group of experiments, the size of A was fixed, varying the ratio α.Experiment results are listed in Table 3.
We now turn our attention to the study of the dependence of the execution time on the number of edges of the graph whose adjacency matrix is A; the number of edges is simply the number of 1s in A. To this aim, we performed 90 additional experiments (9 separate problem instances with 10 random initial permutations each).We set ρ A = 0.25 and dimB = 10, and we increased the dimension of C according to the following steps: dimC = {10, 15, 20, 25, 30, 35, 40, 50, 62}.As shown in Figure 7, t avg and t ′ avg grow more or less linearly with respect to the number of edges of A until dimC reaches the value of 50 (i.e., until A has approximately 65000 edges).When this value is exceeded, the execution time increases at a faster rate.

Conclusions and Future Works
In this paper we presented a heuristic for decomposing a directed graph into factors according to the direct product: given a directed, unweighted graph G with adjacency matrix Adj(G), our heuristic searches for a pair of graphs G 1 and G 2 such that G = G 1 × G 2 , where G 1 × G 2 is the direct product of G 1 and G 2 .The heuristic proposed in this paper represents -to the best of our knowledge -the first computational approach for general directed, unweighted graphs.We provided a MATLAB implementation that we used to run a set of computational experiments to assess the effectiveness of our approach.Our implementation can factorize a graph of size 300×300 in a few seconds.In a few worst-case scenarios the time grows to a few minutes, and is due to the fact that our heuristic is sensitive to the structure of the input; although it may fail to find a solution, in our experiments we observed failures in just a few instances.The complete source code of the Matlab implementation can be downloaded from https://github.com/calderonil/kron.
We are planning to extend the heuristic along two directions: first, to handle weighted graphs instead of just unweighted ones; second, to compute the approximate Kronecker decomposition of unweighted graphs, where (a suitable permutation of) the input matrix A can be expressed as a Kronecker product of two smaller matrices B and C, plus an additional binary error term E, i.e., A = P ⊺ (B ⊗ C)P + E.

Figure 2 :
Figure 2: kronGrouping procedure.In this example we show the rows/cols selection with pivot = 1 according to the best permutation found by the algorithm.(a) and (b):The current pivot (yellow) is compared with each other row and column of the matrix.(c) For each pivot, the dimC − 1 rows and columns that are most similar to it are selected as pivot neighbours in the current permutation (best viewed in color) and the permutation is finally applied.Es evident from (c), the resulting matrix is similar to a block matrix as desired.Please note that in this figure and the following ones, black squares stand for an element set to zero, and white squares stand for an element set to one.

Figure 3 :
Figure3: outsiders procedure.The procedure detects those blocks that should gain or lose 1s.In (a) blocks that are going to acquire 1s are highlighted.In (b) we show the best swap is among rows/columns 2 and 24.The main loop will use this swap as first, resulting in an improvement with respect to the current metric.As such, the swap is effectively performed (c).The function outsiders is called again at the next iteration and it selects (15, 16) as best swap (e).The swap is then applied; the matrix is now divided in blocks (f).

Figure 4 :
Figure 4: onionSearch procedure, part 1.The binary block matrix EF (a) is permuted to maximize the dot product performed against the weight matrix W (b). When the local search reaches 75% of the optimum, the block matrix is deemed to be sufficiently rearranged in a top-left fashion and the procedure terminates (c).As evident from (b) -where onion layers' boundaries are highlighted in red-weights are set up to push the maximum number of filled blocks in those layers of the onion that are processed first.

Figure 5 :
Figure 5: onionSearch procedure, part 2.An example of the onionSearch procedure.The block matrix is factorized layer-by-layer.The first layer, composed by a single block B 11 , is implicitly factorized as B 11 = I 1 ⊗ B 11 (a).It is thus used as template during the local search performed on the 2 × 2 blocks submatrix, composed of four blocks (b).As rows and columns swaps may only range in the second layer, the sole feasible solution is the one that permutes each block in accordance to the one serving as template.The procedure steps forward through the third (c), fourth (d) and fifth layer (e).As the local search on the last layer succeeds, the problem is globally solved.The solved factors are shown in (f).

Figure 6 :
Figure 6: A sample snapshot of the MATLAB application implementing the heuristic.

1 • 10 Figure 7 :
Figure 7: Dependence of the execution time on the number of 1s of A. Given ρ A = 0.25 and dimB = 10, this chart shows the linear growth for tavg and t ′ avg depending on the size of C. Showing the matrix dimension in the x-axis could be misleading, the number of edges in A is used instead.

Table 1 :
Summary of notation.
t min t min t min t ′ Size of C Size of A failure %

Table 3 :
This table shows another set of experiments conducted on bigger matrices.Each row summarizes the performance of the heuristic on 10 different initial random matrices A; for each matrix A we executed the heuristic 10 times, each one starting from a random permutation of A. Five groups of experiments are listed, ranging from 200 × 200 matrices to 625 × 625 matrices, with respect to the size of A. For each group of experiments, the size of A was fixed, varying the ratio α.Each group includes the α = 1 case, i.e., the case where the size of B and C is the same.