Alternating Minimization Methods for Solving Multilinear Systems

<jats:p>Recent works on the multilinear system <jats:inline-formula>
                     <math xmlns="http://www.w3.org/1998/Math/MathML" id="M1">
                        <mi mathvariant="script">A</mi>
                        <msup>
                           <mrow>
                              <mi mathvariant="bold">x</mi>
                           </mrow>
                           <mrow>
                              <mi>m</mi>
                              <mo>−</mo>
                              <mn>1</mn>
                           </mrow>
                        </msup>
                        <mo>=</mo>
                        <mi mathvariant="bold">b</mi>
                     </math>
                  </jats:inline-formula> with an order-<jats:inline-formula>
                     <math xmlns="http://www.w3.org/1998/Math/MathML" id="M2">
                        <mi>m</mi>
                     </math>
                  </jats:inline-formula> and dimension-<jats:inline-formula>
                     <math xmlns="http://www.w3.org/1998/Math/MathML" id="M3">
                        <mi>n</mi>
                     </math>
                  </jats:inline-formula> tensor <jats:inline-formula>
                     <math xmlns="http://www.w3.org/1998/Math/MathML" id="M4">
                        <mi mathvariant="script">A</mi>
                     </math>
                  </jats:inline-formula> and a vector <jats:inline-formula>
                     <math xmlns="http://www.w3.org/1998/Math/MathML" id="M5">
                        <mi mathvariant="bold">b</mi>
                     </math>
                  </jats:inline-formula> of dimension-<jats:inline-formula>
                     <math xmlns="http://www.w3.org/1998/Math/MathML" id="M6">
                        <mi>n</mi>
                     </math>
                  </jats:inline-formula> have been motivated by their applications in data mining, numerical PDEs, tensor complementary problems, and so on. In this paper, we propose an alternating minimization method for the solution of the system mentioned above and present several randomized versions of this algorithm in order to improve its performance. The provided numerical experiments show that our methods are feasible for any tensor <jats:inline-formula>
                     <math xmlns="http://www.w3.org/1998/Math/MathML" id="M7">
                        <mi mathvariant="script">A</mi>
                     </math>
                  </jats:inline-formula> and outperform some existing ones in the same case.</jats:p>


Introduction
Many higher-dimensional problems in scientific computing come down to solve the following system of equations: where A � (a i 1 i 2 ,...,i m ) is a tensor in R [m,n] , the set of all mthorder n-dimensional tensors over the real field, and the right-hand side b a vector in R n , and Ax m−1 is a vector in R n , whose ith-entry is defined by We call (1) multilinear system since it is obviously linear over x for each mode. Similar to linear systems case, we also call (1) a nonhomogeneous multilinear system if the righthand side is nonzero; otherwise, it is a homogeneous one.
Many problems in scientific computing including data mining [1], the numerical solution of nonlinear PDEs [2][3][4][5][6], tensor complementary problems [7], and higherdimensional statistics [8], can be characterized via the multilinear systems mentioned above. To be precise, by using the finite difference method, the discretization of nonlinear PDE, on zΩ , can be reformulated as the form of (1) [3]; see Section 4 for details. Particularly, if A is a Z-tensor, it has been shown in [7] that the sparsest solution of the tensor complementarity problem, can be obtained by solving the following relaxed form: min ‖x‖ 1 s.t. Ax m−1 � b, x ≥ 0.
Recently, several works concerning the multilinear system (1) have been obtained when the coefficient tensor A is a structured tensor. In [3], Ding and Wei have shown that this system has a unique positive solution when the coefficient tensor A is a nonsingular M-tensor with a positive vector b and then generalized the classical iteration methods for linear systems and the Newton method for nonlinear equations to the multilinear case. Specifically, a homotopy method was also presented by Han [10] for finding the unique positive solution. Very recently, for symmetric M-tensor system, Li et al. [11] and Xie et al. [12], respectively, proposed splitting methods based on the ones in matrix theory and tensor methods for (1). Besides, an algorithm based on the fast Fourier transform was proposed by Xie et al. [13] for the tensor system (1) when the tensor A is a circulant one.
In this paper, we continue to study the multilinear system (1) in the case that the contained tensor A is a general one. In contrast to the structured case mentioned above, there is little work on (1) with a general tensor A as far as we are informed. Generally, it is difficult to derive solvability conditions for a general multilinear system as in [3], so we alternatively study the least-squares problem associated with (1), that is, roughout this paper, the symbol ‖·‖ denotes the 2norm of a vector or matrix. We call (7) the tensor least-square (TLS) problem corresponding to the multilinear system (1). e approaches presented in this paper are quite different from the ones mentioned above. By the definition of tensorvector multiplication, we rewrite the TLS problem (7) as the following optimization problem (see Section 3 for details): is is a multiblock optimization problem with consensus constrained conditions.
Alternating minimization iteration is popular for solving large-scale optimization problems; see, for example, [14,15], which realizes the reduction of the problem from a nonlinear case to some linear subproblems via updating only one variable while the others are fixed. We apply the alternating minimization strategy to (8), followed by a projection concerning with the consensus constraints. We call it the alternating minimization method for solving the TLS problem, denoted by AMM (i.e., Algorithm 1). Although the alternating minimization scheme and its various variants are widely exploited, their convergence analyses are commonly difficult [16]. We leave the convergence analysis of the AMM an open problem. It should be pointed out that we proposed an alternating iterative method for (1) [17]; this kind of iterative method is ADMM-type one, so the method proposed here is very different from the ADMM-type method.
Notably, a revived interest in the design and analysis of algorithms for solving large-scale optimization problems is to add randomness to the update order of the multiblock updates, which is beneficial to alleviate the iteration complexity and improve the convergence; see, for example, [18][19][20] and the references therein. As variants of the AMM, we present two random iterations for solving the TLS problem (7); see Algorithms 2 and 3 for details. Besides, Algorithm 4 is another modified version of the AMM. At present, we have not found a similar iterative method for solving multilinear systems. e contribution of this paper is as follows: (1) we considered the least-square problem of the multilinear system (1) by transforming it as an optimization problem. (2) We proposed an alternative iterative method for the optimization problem mentioned above, which is easy to carry out in a parallel version. (3) As modifications of the alternative iterative method, we apply the randomized technique to accelerate it, that is, Algorithms 2 and 3. e performed numerical results show that the randomized methods may have better performance under suitable initial iterative vectors. e outline of this paper is as follows: in Section 2, we recall some notations and operations and then give a short review of the classical alternating minimization method. In Section 3, we establish some alternating minimization methods for the TLS problem (7), which are the variants of the classical one. In Section 4, some numerical experiments are performed to illustrate the efficiency and feasibility of the proposed algorithms. Finally, we conclude this paper.

Notation and Operations.
We will briefly describe notation for tensors and tensor operations used in this paper. Scalars are denoted by lowercase letters, for example, a; vectors are denoted by boldface lowercase letters, for example, a; matrices are denoted by boldface capital letters, for example, I, identity matrix with proper size; tensors are denoted by calligraphic script letters, for example, A. e set of all N-order I 1 × I 2 × · · · × I N -dimensional tensors over real field is denoted by R I 1 ×I 2 ×···×I N ; particularly, the set of all mth-order n-dimensional tensors over real field is denoted by R [m,n] .
We need the following concepts related to tensors; see, for example, [21] for details.
Definition 1 (tensor-vector multiplication). Let a tensor T ∈ R I 1 ×I 2 ×···×I N and a vector v ∈ R I n ; then, the n-mode (vector) product of the tensor T with the vector v, denoted by T• n v, is an I 1 ×, · · · , I n−1 × I n+1 × · · · × I N tensor, elementwise, Particularly, if T ∈ R I 1 ×I 2 ×I 3 and v ∈ R I 2 , then where T(: , i 2 , : ) stands for the i 2 th lateral slice of this tensor. In other words, the 2-mode (vector) product between a tensor of order 3 and a vector with the appropriate size is a matrix, the linear combination of the lateral slices taking the associated elements of the vector as the coefficient.
By the definition of tensor-by-vector multiplication, one can rewrite (1) as the form of Definition 2 (tensor-matrix multiplication). For the tensor T in Definition 1 and the matrix A ∈ R J n ×I n , the n-mode (matrix) product, denoted by T× n A, is an I 1 ×, · · · , I n−1 × J n × I n+1 × · · · × I N tensor, elementwise, Definition 3 (matricization/unfolding). For the tensor T in Definition 1, the mode-n matricization (unfolding) of the tensor T, denoted by T (n) , is an I n × I 1 I 2 , . . . , I N matrix, whose (i n , j)−entry is defined by T (n) (i n , j) � t i 1 i 2 ,...,i N , with j � 1 + N k�1,k≠n (i k − 1) k−1 s�1,s≠n I s .

e Classical Alternating Minimization
Method. Consider the following multiblock minimization problem: where F: R n 1 × R n 2 × · · · × R n d ⟶ R is a multivariate function. is problem can be solved by the so-called block nonlinear Gauss-Seidel (B-NGS) iteration [15,22], namely, which reduces to the ALS method if each subproblem embedded is a least-squares problem; see [16,21,23,24] for some recent developments and applications.

Alternating Minimization Methods
is section is devoted to the derivation of the alternating minimization method (AMM) as well as some variants for the solution of the system of multilinear equation (1).
To facilitate implementation of the algorithms, we convert equivalently (10) into the following form: And, correspondingly, the optimization problem (8) can be written as where For any p ∈ 2, 3, . . . , m { }, updating one block x p while fixing the remaining ones, we have where and . e above procedure reformulates the alternating minimization method for solving the TLS problem (7), which can be precisely represented in the following algorithm: We have the following comments in order.

Remark 1
(a) is algorithm enables that the large-scale TLS problem (7) is solved by several small-size subproblems, which may be computed in an efficient and perhaps highly parallel manner. In other words, Algorithm 1 is more convenient to be parallelized than the existing methods proposed in, for instance, [3,11,13]. (b) Obviously, every subproblem contained in Algorithm 1 is a linear least-squares problem, and the least-norm solution to the subproblem in (16) has where the subscript † stands for the Moore-Penrose inverse of a matrix [25]. Additionally, for the large-scale linear (leastsquares) subproblems contained in this algorithm, one can apply the iterative methods (e.g., LSQR, GMRES [25]) to them. (c) e update in step 2 corresponds to the consensus restrictions in (15). In other words, From Algorithm 1, one can observe that the main amount of calculations hinges on forming the matrices A (k) ≠ p in (17) for p � 2, 3, . . . , m. So, it is easy to know that the computation complexity of this algorithm is of order O(n m−1 ). Notably, if the tensor A in (1) has the CP decomposition with CP-rank r, the computation complexity of the above algorithm decreases to O((m −1) 2 nr).

Mathematical Problems in Engineering
Just as mentioned in the Introduction, a revived interest in the design and analysis of algorithms for solving largescale optimization problems is to add randomness to the updating order of the multiblock variants. e above randomly updating technique can be utilized in Algorithm 1. In fact, updating x (k+1) p for stochastically chosen p in 2, 3, . . . , m { }, we obtain the first randomized AMM. In Algorithm 2, we randomly update x p by (16) In this procedure, all the block coordinates are updated after one loop. Furthermore, if we randomly update x p in a subset of 2, 3, . . . , m { }, we obtain the second randomized AMM.
It follows from Algorithm 3 that some variants in this algorithm may not be updated within one loop, which is the difference between the above two algorithms. In numerical experiments, we will see that the randomized algorithms presented above indeed have better performance than the original AMM, that is, Algorithm 1.
In addition, we note that, in Algorithm 1, x (k+1) is the average value of all the coordinates x (k+1) p with p � 2, 3, . . . , m. at is to say, the average value is obtained only after all the coordinates are updated. On the other hand, if we can take the average value at once after one coordinate is updated until all the coordinates are contained, we then obtain another modification of Algorithm 1.
Certainly, we can also add the randomness in Algorithm 4 as in Algorithm 2.
We should mention that the alternating minimization methods presented above are numerically feasible and efficient for solving the multilinear system (1) or the associated least-squares problem, which are very different from some existing ones; see, for example, [3,11,12], but we have not found a feasible approach to complete their convergence analyses, which will be further studied in future work.
As is well-known, the Newton method is an efficient approach in solving nonlinear equations [22]. In [3], the authors established the Newton iteration for solving the multilinear system (1) in the case that the coefficient tensor A is a symmetrical nonsingular M-tensor. When the tensor A in (1) is not symmetric, we can also derive a similar Newton-type iteration method (say NS-Newton for convenience) as presented in [3].
Suppose that the tensor system (1) is solvability, we rewrite it as is is a nonlinear equation. Notice that the first-order en, the iterations are as follows.
If the tensor A is symmetric over the last m −1 modes, we obtain ∇g(x) � (m −1)A ≠p for any p ∈ 2, 3, . . . , m { }. In particular, the above algorithm reduces to the Newton iteration proposed in [3] when the tensor A was a symmetrically nonsingular M-tensor. In this case, the matrix ∇g(x (k) ) is nonsingular as well. It should be emphasized that the matrix ∇g(x (k) ) may not be nonsingular for a general tensor; in this case, one can replace the inverse with its Moore-Penrose inverse.
Our numerical tests show that this is an efficient algorithm. In contrast to Algorithm 5. e first three algorithms proposed previously can be implemented in parallel when updating x (k+1) p , which indicates that these methods are promising for solving large-scale multilinear problems.

Numerical Experiments
In this section, we provide some numerical experiments to illustrate the feasibility and efficiency of our algorithms proposed in the previous section. All codes were implemented in MATLAB (version 2016a) with a machine precision 1.0e − 16 on a personal computer with Inter(R) Core(TM) i5-4200M@2.5GHz, and 4.00 G memory. All tensor-vector multiplications were computed turning to the Tensor Toolbox Version 2.6 (http://www.sandia.gov/ tgkolda/TensorToolbox/index-2.6.html).
For the convenience of expression, we use "IT" and "CPU" to denote the number of iterations and the elapsed CPU time (in seconds), respectively. Let "ERR" and "RES" represent, respectively, two symbols defined by where x (k) stands for the kth iteration value. For consistent multilinear systems, the stopping rule is that RES<ε ≔ 1.0e − 03, or that the number of iterations exceeds the maximum k max � 10000. On the other hand, for the inconsistent case, the terminated criterion is that the error ERR<ε or that the number of iterations exceeds the above maximum k max . In addition, the number of iteration steps, the CPU time, and the residual contained in the tables below are, respectively, the average of 5 runs from different initial iteration vectors unless otherwise stated. In particular, the symbol "--" in the tables means that the iteration did not derive the desired solution before the terminated rule is satisfied.

Example 1. Let the tensor A in (1) be given by
where s � (1 + α) · max 1≤i≤n (Be m−1 ) i with e � ones(n, 1) and α � 0.01, and the right-hand side b is chosen such that It is obvious that the tensor A in this example is a symmetrically nonsingular M-tensor, which follows from eorem 3.2 of [3] that the multilinear system (1) has the unique solution x * . We compared our algorithms (i.e., Algorithms 1-5) with the Newton method presented in [3] for the different choices of the order m and dimension n of the tensor A. Notably, the latter is a feasible method only for symmetrically nonsingular M-tensor systems and denoted by S-Newton for simplicity.

Mathematical Problems in Engineering 5
modifications of the AMM method, the R-AMM-I and R-AMM-II methods generally have fewer numbers of iterations than the original one; the paid price is that both of them cost more CPU time. For the S-Newton method, the number of iteration steps is not necessarily less than those of the AMM method, the R-AMM-I method, and the R-AMM-II method, but the clasped CPU time of which is always less than the others. e numerical results in Table 2 illustrate that one can improve the performance of the R-AMM-III method by adding randomized acceleration. Nevertheless, the improved version still needs more iteration steps and CPU time. Hence, in the sequel, we only discuss the performance of Algorithms 1-3.
Since the ADMM-type method proposed by Liang et al. in [17] is an alternating iterative method and the SOR method presented in [26] is an extension of the well-known SOR method for solving linear systems, we compared these two methods with the proposed ones here; see Table 3 for details.
is table shows that the AMM and its two randomized versions as well as the S-Newton method have better performance than the SOR method in the number of iteration steps and the elapsed CPU time, while the ADMM-type method takes less iteration steps sometimes for larger m.
In order to show the feature of Algorithms 1-3, and 5, we, respectively, plotted the figure regarding the residual versus the iteration step k in Figure 1 for m � 4, 8 and n � 2 with initial iterative vectors x (0) � [1, 2] T . From these figures, one can see that the convergence of these algorithms is nonlinear.
Next, the proposed iterative methods of this paper will be applied to two numerical examples deriving from realistic applications.
Since the coefficient tensor A (22) in this example is a nonsymmetric and nonsingular M-tensor, we compared the three proposed iterative algorithms with the NS-Newton method, the SOR method, and the ADMM-type method for initial iterative vector x (0) chosen randomly and reported the numerical results in Table 4.
is table reflects that the numbers of iterations steps of the three AMM-type methods proposed in this article and the NS-Newton method are identical and the CPU time is almost the same. Relatively, the SOR method has better performance than the ADMMtype method.
By applying the "optimize then discretize" approach to the differential equation (24), it can be discretized as a 3 rdorder Bellman equation [8]:

Mathematical Problems in Engineering
and In this example, we let σ(x, λ) � 0.2, α(x) � 2 − x, η(x) � 0.04, u(x, λ) � 0.04λ, β(x) � 1 + x, g(x) � 1, and λ � −1. e coefficient tensor A here is a nonsymmetric and nonsingular M-tensor. We compared the three proposed iterative algorithms with the NS-Newton method, the SOR method, and the ADMM-type method for the initial iterative vector x (0) chosen randomly and reported the numerical results in Table 5. From this table, one can see that the three proposed iteration methods in the present paper as well as the NS-Newton method have almost the same convergence; they take fewer iteration steps and CPU time than the SOR method and the ADMM-type method. Notably, the SOR method needs more iteration steps than the ADMM-type method. � ones(1, m) * n, and let the right-hand side b be chosen such that A(x * ) m−1 � b, where tenrand(v) returns an mth-order n-dimensional tensor containing pseudorandom values drawn from the uniform distribution and x * is the same as that defined in Example 1.

Example 4. Let the tensor A in (1) be given by
In this example, the tensor A is a randomly generated tensor; then, the solution of the multilinear system (1) may not be unique. We compared the proposed algorithms (i.e., Algorithms 1-3) with the NS-Newton method and the ADMM-type method for randomly produced initial vectors x (0) � z (0) 1 � rand(n, 1). e obtained results were displayed in Table 6, which indicates that all the tested methods converge before the number of iteration steps arrives at the    Figure 2, we also displayed the convergence behavior of Algorithms 1-3 and 5 when m � 4, 8 with n � 2 and initial iterative vector From the above numerical tests, we claim that Algorithm 1 is an efficient method for solving some consistent multilinear system (1); what is more, the first two modified versions of this algorithm are also feasible. In particular, one of the advantages of these algorithms is that they can be carried out in a parallel version, which indicates that our algorithms may be implementable for large-scale tensor systems, while the Newton type methods are not easy to run in a parallel version.
Additionally, we may always meet the inconsistent multilinear system because of errors. In the next example, we consider the least-squares problem associated with the multilinear system (1); there, the coefficient tensor A and the right-hand side b are chosen randomly.
Example 5. Let the tensor A in (1) be given by A � tenrand(v) ∈ R [m,n] with v � ones(1, m) * n, and the righthand side b � randn(n, 1) ∈ R n , where tenran dn(v) returns an mth-order n-dimensional tensor containing pseudorandom values drawn from the standard normal distribution.
We consider the least-squares problem of the multilinear system (1), since the tensor A and the right-hand side b are random, which may lead to an inconsistent multilinear system. In view of the bad performance of Algorithm 4, we only compare Algorithms 1-3 and 5 with the SOR method [26] and the ADMM-type method [17].
For the randomly chosen initial vector x (0) � rand(n, 1), we still run 5 trials of the algorithms in the cases that m � 3: 3: 18 with n � 2, respectively; the numerical results are listed in Table 7. In the tests, the SOR method is not convergent before the number of iteration steps arrives at k max , and so it is not contained in Table 7. From this table, we can observe that all the tested algorithms are convergent. Relatively, the NS-Newton method often possesses the least number of iterations for the lower m, while the ADMM-type method has better performance than the others for the higher case m. Moreover, the ADMM-type method takes less CPU time than the former when their iteration steps are identical. On the whole, the original AMM method performs better than the two random versions.
Next, we present an example in which the entries of the coefficient tensor vary greatly, which means that it leads to ill-posed multilinear systems. For m � 5 and n � 2, we, respectively, ran Algorithms 2 and 3 for randomly chosen 1000 initial vectors x (0) � rand(n, 1) and displayed the numerical results in Figure 3, in which the x-axis refers to the kth initial vector and the y-axis represents the difference of the numbers of iteration steps (left) or the elapsed CPU time (right) of the two algorithms. Specifically, the red symbol "+" means that the number of iteration steps or the CPU time of Algorithm 2 is more than that of Algorithm 3, and the green symbol " * " means that their iteration steps are identical. Otherwise, we use the blue cycle "°." In Figure 3, the red symbols account for 49.6% of the total (left) and 39.8% of the total (right). Moreover, for the blue symbols "°", there are 48.1% of cases (left) and 60.2% of cases (right). We should mention that the number of iteration steps of Algorithm 1 always exceeds the k max for the chosen initial vectors. e obtained numerical results indicate that all the algorithms introduced in this paper are feasible and efficient for the tested multilinear systems. In particular, the randomized versions always have better performance than the original one. Notably, many others trials not listed here also reflect that our algorithms are commonly comparable to the existing ones when solving some structured multilinear systems. However, there are still many problems to be studied; for instance, although the proposed algorithms are numerically effective, unfortunately, we could not give their proofs of convergence at present. Moreover, for very large m, our algorithms may be intractable due to the curse-of-dimensionality [28]; thus, how to break the curse is an interesting but important work.

Conclusions
In this paper, we introduced the alternating minimal method as well as its randomized versions for solving the leastsquares problem corresponding to the multilinear systems (1), that is, Algorithms 3.1-3.4. e performed numerical  experiments reflect that the proposed algorithms are feasible and efficient. However, we cannot provide the convergence proofs of the two iterative algorithms and leave it as an open problem, which will be considered in the future work.
Data Availability e underlying data are included in the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.