Cost reduction of the bond-swapping part in an anisotropic tensor renormalization group

O (χ 2 d + 1 ) , where χ is the maximum bond dimension and d is the dimensionality of the system. We propose an alternative method for the bond-swapping part and it scales with O (χ max ( d + 3,7 ) ) , though the total cost of ATRG with the method remains O (χ 2 d + 1 ) . Moreover, the memory cost of the whole algorithm can be reduced from O (χ 2 d ) to O (χ max ( d + 1,6 ) ) . We examine ATRG with or without the proposed method in the 4D Ising model and ﬁnd that the free energy density of the proposed algorithm is consistent with that of the original ATRG while the elapsed time is signiﬁcantly reduced. We also compare the proposed algorithm with a higher-order tensor renormalization group (HOTRG) and ﬁnd that the value of the free energy density of the proposed algorithm is lower than that of HOTRG in the ﬁxed elapsed time.


Introduction
Tensor network algorithms are theoretically and numerically useful tools for quantum and classical many-body systems [1,2]. For example, to study a 1D quantum many-body system, one can use a density matrix renormalization group [3][4][5], which is based on matrix product states [6,7] and can be understood as a variational method for a wave function. For higher-dimensional cases, there are algorithms based on projected entangled pair states [8][9][10]. In addition to those algorithms, the tensor renormalization group (TRG) [11] is an approach for coarse-graining tensor networks and calculates the partition function of a 2D classical model. For studying two-or higher-dimensional classical systems, we can use a higher-order tensor renormalization group (HOTRG) [12]. There are series of improved algorithms for coarse graining on the basis of various philosophies [13][14][15][16][17][18][19][20][21][22][23][24][25].
Since the tensor network algorithms avoid the sign problem, they have received attention not only from condensed matter physics but also from high energy physics [26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42]. We hope that the lattice models suffering from the sign problem in high energy physics (such as quantum chromodynamics with finite quark density, the θ vacuum of quantum chromodynamics, chiral gauge theory, and the super-symmetric model) can be analyzed by the tensor network algorithms. However, if one computes the physical quantities of such high-dimensional systems with the algorithms, the computational costs and the memory costs get worse and it is difficult for the algorithms to provide results with sufficient precision.
A novel algorithm, the anisotropic tensor renormalization group (ATRG), has been proposed recently [25]. It is one of the tensor network algorithms for coarse-graining tensor networks and PTEP 2020, 013B02 H. Oba can be applied to an arbitrary dimensional lattice model like HOTRG. Moreover, it has a much lower computational cost than HOTRG. The bottleneck part in ATRG is the bond-swapping part, which consists of a contraction of two tensors and a partial singular value decomposition (PSVD) of a matrix. The computational cost and the memory cost of the part are O(χ 2d+1 ) and O(χ 2d ) respectively, where χ is the maximum bond dimension and d is the dimensionality of the system. If the costs of the bond-swapping part in ATRG are reduced, we can expect to obtain more accurate results than those from the original ATRG with the same elapsed time and the same memory resource.
In this paper, we propose a bond-swapping method, the computational cost of which can be reduced from O(χ 2d+1 ) to O(χ max(d+3,7) ) and the memory cost of the whole algorithm can be reduced from 6) ). Furthermore, we numerically examine the accuracy of the free energy density of the 4D Ising model and the elapsed time of ATRG using the proposed method to compare with the original ATRG and HOTRG.
The rest of this paper is organized as follows. In Sect. 2, we review the original ATRG algorithm for a 4D hypercubic system. In Sect. 3, the proposed bond-swapping method is explained. In Sect. 4, we present numerical results of the 4D Ising model for a comparison between ATRG with the proposed method and the original ATRG or HOTRG and discuss their performances. A summary and outlook are given in Sect. 5. In Appendix A, we explain a method for making tensors, E and F, in ATRG. In Appendix B, we study the parameters of the randomized singular value decomposition (RSVD).

Anisotropic tensor renormalization group
In this section, we explain the original ATRG [25] for a 4D hypercubic system. First of all, we rewrite the partition function of the system in terms of a tensor network as where T (init) i is an initial tensor, i runs all lattice sites, and Tr means summing up all the bond indices of the initial tensors. If the direction of the coarse graining is set in the x-axis 1 , which is the vertical direction in Fig. 1, we redefine the initial tensor as T (init) i; 0 1 x 0 x 1 , where 0 = (y 0 , z 0 , w 0 ) and 1 = (y 1 , z 1 , w 1 ) are bundles of the bond indices 2 . Then, we execute the coarse graining of the tensor network to reduce the degrees of freedom using ATRG [25]. The whole flow of ATRG is shown in Fig. 2 where there are seven steps, (a)-(g). First, in step (a), a tensor T 3 is decomposed with the singular value decomposition (SVD), where α is a new bond for the x-axis. Then, we make four tensors, A, B, C, and D, as 1 Another axis, such as y, z, or w, can be treated in a similar way.
2 0 and 1 consist of a combination of y 0 , z 0 , w 0 and that of y 1 , z 1 , w 1 respectively. For example, 0 = y 0 + (z 0 − 1)χ (init) + (w 0 − 1)(χ (init) ) 2 and 1 = y 1 + (z 1 − 1)χ (init) + (w 1 − 1)(χ (init) ) 2 if all of the bond indices run from 1 to χ (init) . Although the bond dimensions of each of the tensor legs are different in some cases, we can make the bundles similarly. 3 T is the initial tensor at the first coarse graining.  in the initial tensor network of a 4D system. Some red bonds have numbers; two bonds are the same bond if they have the same number. (a) We focus on two tensors in the initial tensor network. (b) If the direction of the coarse graining is set in the x-axis, we redefine the initial tensor as In step (b), the two tensors, B and C, are contracted and a tensor M is obtained by Then in step (c), we decompose M into two tensors, X and Y , to change the combinations of the bonds from 1 α of B and 2 β of C to 2 α of X and 1 β of Y as One may use the PSVD, such as the Arnoldi method [43] and the RSVD [44], for the decomposition of M in Eq. (8). In step (d), two tensors, E and F, are made from A, X , Y , and D with the method explained in Appendix A. In step (e), we contract A, X , and E to make G, and contract D, Y , and F The flow of the original ATRG. In step (a), a tensor T is decomposed into two tensors such as A and B or C and D with the SVD. In step (b), the two tensors, B and C, are contracted by summing up the bond x 1 and a tensor M is obtained. In step (c), M is decomposed by the PSVD into two tensors, X and Y , to change the combinations of bonds from 1 α of B and 2 β of C to 2 α of X and 1 β of Y . In step (d), two tensors, E and F, are made from A, X , Y , and D with the method explained in Appendix A. In step (e), we contract A, X , and E by summing up the bonds α, 0 , 2 to make G, and contract D, Y , and F by summing up the bonds β, 1 , 3 to make H . In step (f), G and H are contracted by summing up the bond x 1 to obtain a coarse-grained tensor T . In step (g), we reorder the bonds of T for the next coarse-graining direction, the y-axis.
to make H as where 0 = (y 0 , z 0 , w 0 ) and 1 = (y 1 , z 1 , w 1 ). Then, in step (f), G and H are contracted to obtain a coarse-grained tensor T : Finally, in step (g), we reorder the bonds of T for the next coarse-graining direction, the y-axis, and redefine T as T 0 1 y 0 y 1 in the same way as the initial tensor, where 0 = (x 0 , z 0 , w 0 ) and PTEP 2020, 013B02 and S {G}γ γ , V {G}x 1 γ , V {H }x 1 δ , and S {H }δδ are contracted to obtain K: In step (d) of Fig. 3, we decompose K with the SVD, and, in step (e) of Fig. 3, we contract the following tensors to make A , B , C , and D :  C Finally, in step (f), we reorder the bonds of A , B , C , and D for the next coarse graining in a similar way to step (g) in Fig. 2.
The total computational cost of the original ATRG is O(χ 2d+1 ) and the total memory cost is O(χ 2d ). The former cost comes from step (e) in Fig. 2 and the bond-swapping part, which consists of steps (b) and (c) in Fig. 2. The latter cost arises from the bond-swapping part. Therefore, the bond-swapping part is the bottleneck of both the computational cost and the memory cost in ATRG.

Cost reduction of the bond-swapping part
In this section we propose a bond-swapping method instead of steps (b) and (c) in Fig. 2. The flow of the proposed method is shown in steps (c), (d), (e), and (f) of Fig. 4. The first step of the proposed bond-swapping method (step (c) of Fig. 4) is the SVDs of B and C as and defining B s and C s as C s Note that there is no truncation in this step 4 and the red lines in Fig. 4 mean that the maximum bond dimensions of the indices, μ and ν, are taken to be χ 2 . In step (d) of Fig. 4, we perform the contraction of B s and C s to make M s : Then, in step (e) of Fig. 4, we change the combinations of the bonds from μα of B s and βν of C s to να of X s and μβ of Y s as using the PSVD. Finally, in step (f) of Fig. 4, we make X and Y as By applying the above method to ATRG, we can reduce the computational cost from O(χ 2d+1 ) to O(χ max(d+3,7) ) in the swapping part and the memory cost of the whole algorithm from O(χ 2d ) to O(χ max(d+1,6) ). 5 Note that there is no gain when we apply the proposed bond-swapping method to ATRG for 2-and 3D systems (d = 2, 3).

Numerical results
In this section, we present the results of the 4D hypercubic Ising model with the periodic boundary condition to examine the effectiveness of the proposed bond-swapping method. We use the Ising model, which has 1024 4 spin sites, and set the temperature to 6.68, which is close to the critical point of the Monte Carlo result [45]. We use the RSVD as the PSVD part, and the parameters of the RSVD iterations and oversamples are taken to be χ and 2χ respectively. A detailed study of these parameters is given in Appendix B.
First, we show the comparison between ATRG using the proposed method and the original ATRG. The free energy densities obtained by the two algorithms are shown in Fig. 5. From this figure, we can see that the result of the proposed algorithm is consistent with that of the original ATRG. This consistency would be because the rank of M in the original ATRG is χ 3 and coincides with the rank of M s in the proposed algorithm. Next, Fig. 6 shows the total elapsed times of the two algorithms and the time of the proposed algorithm is significantly reduced from that of the original ATRG for larger χ . From Figs. 5 and 6, we conclude that the performance of ATRG can be improved without the loss of the accuracy of the free energy density. Figure 7 shows the details of the elapsed times PTEP 2020, 013B02 H. Oba    of the two algorithms at χ = 16. We can see that the speeding up of ATRG is attained by a cost reduction of the bond-swapping part and step (e) in Fig. 2, which contains O(χ 9 ) contractions, turns out to be the most time-consuming part in the proposed algorithm.
Next, we compare the results of the proposed algorithm and HOTRG. The maximum bond dimensions in HOTRG are set to the integers in 3 ≤ χ ≤ 7. Figure 8 shows the relation between the free energy densities and the elapsed times of the two algorithms. The elapsed times are monotonically increasing when the bond dimensions are growing in this figure. We assume that the larger the maximum bond dimension χ is, the higher the accuracy of the free energy density becomes. Since the free energy densities of the proposed algorithm and HOTRG are monotonically decreasing as the maximum bond dimensions are increasing in Fig. 8, we can consider that the lower the free energy density is, the higher the accuracy of that becomes. From this viewpoint, the proposed algorithm can obtain a more accurate free energy density than HOTRG in the same elapsed time when the maximum bond dimensions of the two algorithms are increasing.
We use a machine that has 132 GB for the memory and Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60 GHz for the CPU. The programs for the algorithms are written with Python 3.7.1; we use numpy.tensordot in NumPy 1.15.4 for the tensor contractions, scipy.linalg.svd in SciPy 1.1.0 for the SVD of the tensors, and sklearn.utils.extmath.randomized_svd in scikit-learn 0.20.1 for the RSVD of the tensors.

Summary
We propose a bond-swapping method, the computational cost of which can be reduced from O(χ 2d+1 ) to O (χ max(d+3,7) ). Moreover, the memory cost of the whole algorithm can be reduced from O(χ 2d ) to O(χ max(d+1,6) ) with the method. We examine ATRG using the method with the 4D hypercubic Ising model and find that the free energy density obtained by the algorithm is consistent with that obtained by the original ATRG while the elapsed time is significantly reduced. Moreover, we compare the proposed algorithm with HOTRG in terms of the free energy density of the Ising model and the elapsed time. Then, we find that the proposed algorithm can obtain a more accurate free energy density than HOTRG in the same elapsed time for large maximum bond dimensions. For future work, step (e) in Fig. 2, which has the O(χ 2d+1 ) contractions, is the bottleneck of the proposed algorithm. To speed up this part, we will apply parallel computing to the bottleneck contractions. Furthermore, we will analyze the physical quantities of the 4D Ising model with large maximum bond dimensions to compare with other results [45,46].
In step (a) of Fig. A.1, the tensors and their conjugate tensors are contracted by summing up the as In step (b) of Fig. A.1, we contract L and W by summing up the bonds α andα: In step (c) of Fig. A.1, we perform the SVD or the eigenvalue decomposition of P and the truncation to retain χ largest singular values, where U {P} is an isometry made by the left singular vectors that corresponds to the χ largest singular values. We can also obtain Q and U {Q} from Y and D in a similar way to P and U {P} . Then, in step (d) of Fig. A.1  We can use the same way for another direction, and E and F consist of tensors of directions other than the coarse-graining one, such as 11/14 PTEP 2020, 013B02 H. Oba   Fig. B.1. The relative errors of the free energy densities of the Ising model on a 1024 4 lattice with the original ATRG and the proposed algorithm, where χ = 10, 14, the temperatures are set to 6.65 and 6.68 (t = 6.65, 6.68), and then the number of the RSVD iterations is taken to be χ . The oversampling parameter n is changed from 1 to 2χ . We find that the results of the original ATRG and the proposed algorithm show similar behavior at χ = 10; this is large enough to take n ≥ 20 at χ = 10, 14.
To complete this step for making E and F, 4(d − 1) times O(χ d+3 ) contractions, 2(d − 1) times O(χ 6 ) contractions, and 2(d − 1) times O(χ 6 ) SVDs are needed 6 . Then, the bottleneck of this part is the O(χ d+3 ) contraction. PTEP 2020, 013B02 H. Oba behavior at χ = 10; this is large enough to take n ≥ 20 at χ = 10, 14. Therefore, we use χ and 2χ for the numbers of the RSVD iterations and oversamples respectively for the numerical results shown in Sect. 4. We believe that this is large enough for the RSVD to converge to the full SVD results when χ ≥ 10. Finally we remark that when degenerate singular values are truncated, the results of the RSVD may not converge to those of the full SVD even if the parameters are taken to be sufficiently large.