Statistical-mechanical analysis of compressed sensing for Hamiltonian estimation of Ising spin glass

Several powerful machines, such as the D-Wave 2000Q, dedicated to solving combinatorial optimization problems through the Ising-model formulation have been developed. To input problems into the machines, the unknown parameters on the Ising model must be determined, and this is necessarily a nontrivial task. It could be beneficial to construct a method to estimate the parameters of the Ising model from several pairs of values of the energy and spin configurations. In the present paper, we propose a simple method employing the $L_1$-norm minimization, which is based on the concept of the compressed sensing. Moreover, we analyze the typical performance of our proposed method of the Hamiltonian estimation by using the replica method. We also compare our analytical results through several numerical experiments using the alternating direction method of multipliers.


I. INTRODUCTION
Recent years have shown a rapid increase in the development of new types of computers. A representative of such new types of computers is the D-Wave machine (current system is called D-Wave 2000Q), which implements quantum annealing [1]. Quantum annealing is invented for solving the optimization problem by utilizing the quantum fluctuation [2][3][4]. In ideal procedure of quantum annealing, the quantum adiabatic theorem assures slow control of the quantum fluctuation fixes time-evolved state into nontrivial ground state of the instantaneous Hamiltonian. The computational time to attain the ground state is characterized by the energy gap between the ground and excited states [5]. Beyond the scheme of the adiabatic quantum theorem, non-adiabatic approaches are also proposed in several ways [6][7][8][9]. Recent analysis reveals possibility of exponential speedup by utilizing the relaxation of the excited states into the ground state due to the thermal effect. In addition, various attempts beyond the standard quantum fluctuations are demonstrated to attain quantum speedup by implementing some tricky quantum effects [10][11][12][13][14]. In order to investigate the state of the spin configuration after quantum annealing, a machine-learning technique is also proposed [15]. The D-Wave machine can solve combinatorial optimization problems and speedily perform the Gibbs-Boltzmann sampling. The D-Wave machine implements the Ising model on its superconducting circuit [16][17][18]. Thus, the optimization problems are written in terms of the Ising Hamiltonian and restricted to the case only with biases and two-body interactions on the peculiar sparse graph, known as the chimera graph, and bias terms. When the type of Ising Hamiltonian of the optimization problem is already known, the problem can be directly implemented in the D-Wave machine. Beyond the chimera graph in the D-Wave machine, the generic form of the Ising Hamiltonian is fully-connected model, namely, the interactions are dense. We then employ a specialized technique, which has been frequently studied, to embed the Ising Hamiltonian on the D-Wave machine. However, in most of the cases, we do not necessarily know how to formulate the optimization problem in terms of Ising Hamiltonian. In our problem setting, we do not know the detailed form of the cost function but only the pairs of the spin configurations and energy values of the Ising Hamiltonian. Here we assume that it takes a relatively long time to attain the value of the cost function on the given input. For unfamiliar readers, we take one of the representative. When we investigate the effect of some drugs, it appears after long time duration. We cannot attain immediate response of the combination of the selected drug. Then we infer the cost function only from several pairs of input and output. This is a typical inverse problem to infer an Ising Hamiltonian from pairs of the input and output because the Hamiltonian can be regarded as a function that outputs the energy value from the spin configuration as the input. We then assume that the Ising Hamiltonian has sparse interactions, and thus can be rather easily embedded on the D-Wave machine. Next, the inverse problem considered in the present study is formulated as the setting of the so-called compressed sensing (CS) in terms of the signal processing [19]. In CS, we aim at reconstruction of the original signal from a small number of observations. In general, the problem for recovering the original signals requires sufficient observations. However, when the original signals are assumed to be sparse, the L 1 -norm minimization problem can lead to the exact answer even from a limited number of observations [20,21]. By use of the L 1 norm, we can extract several significant influencers from a vast number of combinations hidden in dataset. Therefore the L 1 norm is often utilized in various realms in the data analysis [22][23][24][25][26][27]. In terms of the Hamiltonian estimation, we do not necessarily need a number of pairs of the spin configurations and energy values to infer the number of sparse interactions.
In the present study, we formulate the Hamiltonian estimation as the L 1 -norm minimization problem and propose a method to estimate an unknown cost function utilizing L 1norm minimization. In addition, we demonstrate its theoretical assessment. As our formulation is of the same form of the well-known problem in CS, it can be evaluated using a replica method [28,29], which is a sophisticated tool in the field of statistical mechanics [30][31][32][33][34][35] that theoretically guarantees the performance of reconstruction for various problems. To verify our analysis, several numerical experiments are performed, in which we estimate unknown coupling constants in the Ising Hamiltonian by employing the alternating direction method of multipliers [36] (ADMM). Then, the theoretical limits of the reconstruction are compared with the numerical results. We show that our analysis via the replica method is validated through the numerical experiments.
The remainder of the present paper is organized as follows. Sect. II shows the formulation of Hamiltonian estimation problem and presents the method for solving it by using the CS framework. Sect. III shows the analysis of the typical performance of our proposed method through the replica method. In Sect. IV, we show the numerical results of the estimation of coupling constants using the ADMM, and compare it with our analytical result discussed in Sect. III. In Sect. V, we summarize our results and discuss several future outlooks.

II. HAMILTONIAN ESTIMATION
We first explain the problem setting in our study. From only a set of pairs of the energy values and spin configurations, we estimate the coupling constants of the Hamiltonian. In several unfortunate cases, we cannot formulate directly the cost function of the optimization problem. We assume that it is hard to obtain the value of the cost function (energy values). For example, for finding the optimal choice of drugs for curing diseases, we need to confirm the effect of the combination. In general, the results cannot be obtained immediately. Therefore, we have only a small number of pairs of the cost function and the choice of the drugs in terms of binary variables (chosen or not). In such a case, if we can find the cost function from only a limited number of "hints", we can find the optimal solution from direct optimization of the estimated Ising Hamiltonian.
We formulate the above problem as follows. Let us consider a general full-connect Ising system with N sites. The Ising spins σ i ∈ {−1, +1} are located at sites i ∈ V, where V is a set of sites in the system. The coupling constants between spins are defined as J i j for any two combinations of sites i and j in V. We assume that the interaction between spins are symmetric and no self-interaction occurs. The Hamiltonian is assumed to be expressed as In addition, the Hamiltonian is called an energy function in the field of information science. Its observed value is called energy, especially in physics. We then obtain a set of M (< N(N − 1)/2) values of the energy E := (E (1) , E (2) , · · · , E (M) ) T and an observation matrix of spin configurations S, which is constructed by the given spin configurations. Here, S is an M × N(N − 1)/2 matrix expressed as We assume that a set of N(N −1)/2 original coupling constants is expressed as J 0 := (J 0 12 , · · · , J 0 1N , J 0 23 , · · · , J 0 N −1, N ) T , and a set of energies E is expressed as In general, the case that the number of unknown variables J 0 is larger than the number of linear equations as in Eq. (2), is an underdetermined system. When we simply solve Eq. (2), we cannot obtain a unique solution. Hence, a method that can adequately estimate the unknown J only from the given M pairs of energies and spin configurations is desired. To attain the above-mentioned objective, we utilize the concept of CS. We impose an assumption of sparsity on J 0 . In particular, we assume that the probability distribution of J 0 i j is where δ(x) denotes the Dirac delta function, ρ = 2K/N(N −1) (0 ≤ ρ ≤ 1) denotes the density of the nonzero components in the vector J 0 , and N (0, 1) denotes the Gaussian distribution with a vanishing mean and unit variance. Note that we do not know which component in J 0 is nonzero or zero. We then formulate the estimate of J 0 as the L 1 -norm minimization with respect to J := (J 12 , · · · , J 1N , J 23 , · · · , J N −1, N ) T based on the CS concept. The L 1 -norm minimization in our problem setting is formulated as follows: where J 1 := i< j |J i j | denotes the L 1 norm with respect to J. The solution of Eq. (4) under the constraint yields an adequate estimate of the unknown J 0 under several conditions with respect to ρ and α = 2M/N(N − 1). The problem in Eq. (4) can be solved using various optimization algorithms, such as the coordinate descent [37], fast iterative shrinkage-thresholding algorithm [38], augmented Lagrangian method [39,40], or ADMM [36,41].

III. REPLICA ANALYSIS
In Sect. II, we formulated the L 1 -norm minimization problem in our problem setting. In this study, we emphasize on how the estimation performance obtained through our method depends on ρ and α. In the present section, we investigate the typical behavior of our method proposed in Sect. II. To assess the proposed method, we employ the replica method [28,29]. The replica method has been developed in the field of statistical physics and is frequently used to analyze the free energy, which is defined through a partition function as the normalization constant of the distribution and is difficult to calculate analytically in most cases. The replica method gives an analytical representation of the configurational average of a partition function instead of the partition function itself based on the self-averaging property.
The typical behavior of the estimation of the coupling constants described in Eq. (4) can be investigated by analyzing the partition function of the following distribution by using the replica method: where Z β (E) is the partition function of this distribution. The logarithm of Z β (E) is an important quantity, namely, the free energy, and is defined as where [· · · ] S,J 0 denotes the configurational average with respect to S and J 0 . According to the procedure of the general replica analysis [30], we evaluate [Z n β (E)] S,J 0 in Eq. (6) for n ∈ N. Furthermore, [Z n β (E)] S,J 0 is performed with an analytical continuation for n ∈ R after completing the computation of the partition function into a closed form and obtaining a function form of n.
By using the expression E (µ) : where J a denotes the a-th replicated vector of the coupling constants, and J a 1 denotes the L 1 norm of J a . Here, we assumed P(J a ) ∀a ∈ {1, 2, · · · , n} to be the Laplace distribution: P(J a ) ∝ exp(−β J a 1 ). Note that Eq. (7) is only valid ∀n ∈ N. To obtain an analytical expression for a part of Eq. (7), n a=1 δ(S(J a − J 0 )) S , we define the following quantity: where a = 0, 1, 2, · · · , n is the index of the replica. Here, we assume that where · · · x denotes x (· · · )P(x), and c (−1 ≤ c ≤ 1) is a constant. Then, we consider the case in which c = 0, i.e., σ (µ) i σ (µ) = 0. Eq. (9) and Eq. (10) are rewritten as respectively. The assumptions in Eqs. (9)-(12) are unique characteristics in the models with Ising variables. From the assumptions in Eqs. (9)-(12), u a µ σ (µ) = 0 is obtained. Similarly, we obtain u a µ u b µ σ (µ) = 2(q 00 − q 0a − q a0 + q ab ), where are the order parameters with respect to the coupling constants. Herein, we assume the following replica symmetric ansatz to extend Eq. (7) to n ∈ R: By considering Eq. (14), u a µ can be represented as (n + 1) multivariate Gaussian random variables with a vanishing mean and covariance u a µ u b µ σ (µ) = 2(ρ − 2m + Q) for a = b and u a µ u b µ σ (µ) = 2(ρ − 2m + q) for a b. u a µ in Eq. (15) can be expressed in the new form where s a and t are the Gaussian random variables with vanishing mean and unit variance. By using the representation in Eq. (15), the following expression is obtained: where [· · · ] u denotes ∫ Dt ∫ Ds a (· · · ), and whereQ,χ, andm are auxiliary parameters for introducing the integral expressions of the Kronecker delta for the replica symmetric solutions Q, χ, and m, respectively. Here, we define Φ(t;Q,χ,m) where See Appendix B for an overview of a more detailed derivation of Eq. (17). The extremization of Eq. (17) yields the following saddlepoint equations: where we define H(x) := ∫ ∞ x Dt and G(x) := (x + 1)H 1/ √ x − x/2π exp (−1/2x). The equations in Eq. (17) and Eqs. (19) coincide with the results by Kabashima et al. [30] and Ganguli and Sompolinsky [31]. Therefore, the stability condition for the successful estimation by using the L 1 -norm minimization in our problem setting is expressed as which is also similar to the results by Kabashima et al. [30] and Ganguli and Sompolinsky [31]. We can also obtain an analytical expression for the typical value of the mean squared error (MSE), which is often used as a performance index in various fields. By using the extremum solutions in Eqs. (19), the MSE is expressed as where x 2 denotes the L 2 norm x 2 := ( i x 2 i ) 1/2 for x = {x i }, and · · · β→∞ x denotes the average with respect to P(x) as β → ∞. Figure 1 shows the result of our analysis in this section. The blue curve shows the theoretical reconstruction limit of the true coupling constants estimated through the L 1 -norm minimization, expressed as α = 2(1 − ρ)H(1/ √χ ) + ρ. The heat map in Fig. 1 shows the expectations of the MSE described in Eq. (21). The region colored white to black via red shows the values of the MSE in Eq. (21), indicating whether the estimation succeeds or fails. The expectation of the MSE is calculated using Q and m, which are obtained by updating Q, m, χ,Q,m, andχ using Eqs. (19), until (x k+1 − x k ) reaches a value less than 10 −10 for ∀x ∈ {Q, m, χ,Q,m,χ}. The values of ρ and α are varied in steps of 0.02 in the ranges of 0.02 ≤ ρ ≤ 1 and 0.02 ≤ α ≤ 1. The averages of the MSEs of 100 trials for each ρ and α are plotted in each block of the heat map, containing 50 × 50 = 2500 blocks in the map. The white blocks represent regions where the estimation succeeds.
Here, we determined that the estimation succeeds when the MSE in Eq. (21) reaches a value less than 10 −3 . In contrast, the black blocks indicate regions where estimation fails. The typical performance can be achieved by adequately solving the L 1 -norm minimization problem.

IV. NUMERICAL VERIFICATIONS
In this section, we numerically demonstrate the performance of our proposed method. In the experiments, we use an Ising model whose the Hamiltonian is described in Eq. (1). We employ the ADMM [36] to solve the L 1 -norm minimization in Eq. (4) with the constraint described in Eq. (3). This solves the optimization problem min x f (x) by splitting variables x ∈ R d into x ∈ R d and z ∈ R d , where d denotes the number of dimensions of an original set of variables, and min x,z { f (z) + g(x)} is solved subject to x − z = 0, where f (z) and g(x) are assumed to be convex functions. The optimization is attained by minimizing the augmented Lagrangian, L(x, z; λ) = f (z) + g(x) + λ T (x − z) + r x − z 2 2 /2, where λ := {λ i | i = 1, 2, · · · , d} denotes a set of Lagrange multipliers, and r denotes a penalty constant. We can obtain optimal x, z and λ by iterating the following equations: x k+1 = arg min x L(x, z k ; λ k ), z k+1 = arg min z L(x k+1 , z; λ k ), and λ k+1 = λ k + r(x k+1 − z k+1 ), alternately, where k is the number of the iteration steps.
Based on the procedure by Boyd et al. [36], the following optimization problem is considered in our experiments: where K is a vector consisting of N(N − 1)/2 components, and λ is a vector consisting of N(N − 1)/2 Lagrange multipliers for the constraint described in Eq.
where Λ = λ/r. Optimal J, K and Λ are obtained by iteratively calculating the following equations: where I denotes a N(N −1)/2×N(N −1)/2 identity matrix, and ST y (x) denotes the soft-thresholding function [42,43] defined as Through the experiments, we showed that the black region converges to the inside α ≤ 2(1 − ρ)H(1/ √χ ) + ρ with the system size increases. This behavior is natural as the replica method premises that the system size is infinitely large. Therefore, we obtained several empirical results, and thus our analysis can be considered to be valid.

V. CONCLUSIONS AND DISCUSSIONS
In this paper, we proposed a method for estimating unknown coupling constants in an Ising model by using the CS technique. Moreover, we analytically investigated the performance of our proposed method using the replica analysis. Finally, we compared our analytical result of the replica analysis and the numerical results obtained through the ADMM, and showed that our analytical results mostly coincide with the numerical results.
The proposed method can be regarded as a general method for sparse representation of the cost functions of various optimization problems. Owing to the limitations of hardware designs, the implementation of an optimization problem is not easy task. When we embed the Ising model in actual machines, the constraints with respect to their structures must be considered. For instance, the D-Wave machine is restricted to a chimera graph, which has a sparse number of edges. Thus, various techniques are desirable for making the structure of the optimization problem sparse without losing the important features of the original one. The construction of such a methodology can contribute to the extension of the practical uses of the machines in several fields.
The point of the analysis in Sect. III was the assumption for the components of the observation matrix described in Eqs. (9)- (12). We are also interested in the performance of the estimation when the observation matrix has another constraint. For instance, we can consider the case where we can ask for the value of the cost function for the oracle by sending the spin configuration. Then, the spin configuration is assumed to be completely random as in the present study. However, it is more efficient to send a spin configuration that is different from the previously sent configurations. In other words, orthogonal observation should be valuable for inferring the Hamiltonian from the pairs of spin configurations and the value of the cost function. In our future studies, we will assess the theoretical performance when S changes and the design the structure of S to attain a better estimate of the Hamiltonian using the replica method or other statistical-mechanical analyses, such as cavity method [44,45]. In the present study, we focus on the estimation of the Hamiltonian itself. Depending on the problem setting, the resulting ground state is important. While estimating the Hamiltonian, we may find the ground-state spin configuration. Away from the procedure based on CS, we will report the direct manipulation of optimization while estimating the Hamiltonian only from a small number of observations on the cost function in the future.