Homogeneous multiscale entanglement renormalization ansatz tensor networks for quantum critical systems

In this paper, we review the properties of homogeneous multiscale entanglement renormalization ansatz (MERA) to describe quantum critical systems. We discuss in more detail our results for one-dimensional (1D) systems (the Ising and Heisenberg models) and present new data for the 2D Ising model. Together with the results for the critical exponents, we provide a detailed description of the numerical algorithm and a discussion of new optimization strategies. The relation between the critical properties of the system and the tensor structure of the MERA is expressed using the formalism of quantum channels, which we review and extend.


Introduction
Simulations of quantum many-body systems, such as lattice models, ultracold atoms in optical lattices or electrons in crystal materials, have played a crucial role in modern physics. Recently, tensor network methods, such as density matrix renormalization group (DMRG) and matrix product states (MPS), underwent a very fast development, after which tools and ideas coming from quantum information theory allowed them to extend their field of application and improve their deep comprehension [1]- [7]. In particular, the introduction of algorithms for the simulation of real-time dynamics [1,5,6,8] paved the way to exciting new applications and a deeper understanding of complex phenomena in many-body quantum physics [10,11], and strongly contributed to the interest of the quantum information processing community in the field. More recently, tensorial representations that seem naturally suited to describing the physics of critical ground states [12]- [14] have been introduced. Tensor network methods have been also extended to include an efficient description of two-dimensional (2D) systems [15], fermionic systems [16]- [18] and long-time dynamics [19,20]. The relation between the tensor network structure and the critical exponents has been discussed in [21].
It is nowadays well understood [2,5] that MPS algorithms are a variational formulation of the celebrated DMRG by White [22]. In the standard approach to finite-size DMRG, a sequence of projective transformations is determined by truncating the density matrix of a given subsystem to its effective support (up to a certain precision). The operators are consequently updated and the measures are performed in the new reduced basis. Minimization of the energy is ensured by diagonalizing the whole Hamiltonian at every step and by using its ground state to get the reduced density matrix. Instead of this constructive approach, in the MPS approach one writes the sequence of projective measurements as a tensorial structure with precise contraction rules, and then optimizes over the tensors by the preferred method (gradient [2], imaginary 3 time evolution [1,14] and Monte Carlo [23,24]). Theoretically, this corresponds to interpreting the many-body system wave function as a huge N -indexes tensor (N being the number of system sites), and to expressing it in terms of a sequence of smaller tensors that are linearly concatenated via contractions that involve auxiliary 'bond' indexes. The MPS decomposition is a universal representation, in the sense that any many-body wave function can be expressed in this form. However, in MPS decompositions (with constant dimensions in their ancillary bonds), the two-point correlation functions decay exponentially at long range, making them ideal for the study of non-critical systems in 1D. In contrast, for critical systems, where the block entropy is logarithmic divergent [25]- [27], the bond dimensions should increase polynomially to ensure an accurate description of the many-body system. In order to solve this problem, different tensorial decompositions have been revisited, yielding the notion of tensor tree networks (TTNs) [28]. A TTN is a tensorial decomposition of the many-body system wave function that, differently from MPS, is characterized by a layered, hierarchical structure that effectively halves the number of sites at each layer. Thanks to this feature, when all the layers are assumed to be identical [28], a TTN appears to naturally encompass scale invariance and supports the power-law decay of correlation functions (this is a consequence of the fact that the 'distance' between two points in the TTN network is logarithmic in the real-space separation) [29].
A further improvement was the introduction of a scheme where logarithmic corrections to area law were encompassed not only on average, as in TTNs, but for any partitioning of the system, namely the so-called Vidal's multi-scale entanglement renormalization ansatz (MERA) [12]. Although introduced originally as a 'constructive' modification to a renormalization procedure, such a scheme can also be regarded as the introduction of proper unitaries in a TTN-like scheme. Such unitary tensors 'mix up' the local degrees of freedom at different length scales, helping in effectively 'disentangling' the system locally. This ultimately yields a tensorial representation of a critical system with fixed bond dimensions independently of the system size [12]- [14]. This approach also opens up very interesting perspectives for the study of 2D systems [15]. The price to pay for the improved tensorial structure of MERA is an increase in the overall computational cost, even though it still remains polynomial with the bond dimension.
The properties of homogeneous MERAs and their relation to the critical exponents were exploited in [21], [30]- [32]. The key ingredient of this analysis resides in a quantum channel description of the tensor contractions of the network. For homogeneous MERAs, the relevant quantities of the system can be characterized by the iterative application of a single completely positive trace-preserving (CPT) map, whose fixed point represents the average local density matrix at the thermodynamic limit and whose secondary eigenvalues are related to the critical exponents of the model [21]. The aim of the present paper is to revisit this formalism with particular attention to its operative implementation. We present new algorithmic strategies and review established ones. Moreover, we present the numerical results for new tensor structures as a 2D structure with the most efficient contraction costs known up to now.
The paper is organized as follows. In section 2, we recast the known results on MERA with particular emphasis on the operational point of view. In section 3, we review already established and new tensor network optimization methods. Finally, in section 4, we present some numerical results benchmarking the quantum TN channel method in 1D and 2D problems, and in particular in a new optimal 2D structure.

Structure causal cones
In order to describe the tensorial structure of the MERA, in this section we focus on 1D physical systems, postponing its generalization to the 2D case [15] in section 2.2.
Consider then a generic quantum state | of a spin-d lattice with N sites (N being a power of 2 or 3, depending on the topological MERA structure one adopts, see below). Selecting a single-site local basis {|φ ; = 1, . . . , d}, it can be expressed as with 1 ,..., N being the associated wave function (each index allowing values 1, . . . , d). The MERA consists in interpreting the latter as a type-0 N tensor and writing it in terms of a collection of smaller tensors that are properly contracted, as shown in figure 1 for a binary and, the recently introduced [31], ternary MERA. Here a standard pictorial representation has been adopted, according to which a generic type-M M tensor of elements [T ] u 1 ,...,u M 1 ,..., M is described as a node (or 'blob') with M lower (or in-coming) and M upper (or out-coming) vertexes (or 'legs'), and where contractions of tensors along one index is obtained by drawing the corresponding legs directly connected. In the MERA representation, two different classes of tensors are used: the type-2 2 tensors denoting the unitary disentanglers χ of elements [χ] u 1 u 2 1 2 , and the type- for simple 'annihilation' of the drawings. Specifically indicating withχ ,λ their corresponding Hermitian conjugates of elements [χ] u 1 u 2 where '·' represents the standard up-down contraction of tensorial indexes and where δ is the usual Kronecker delta (for a pictorial representation of equation (2) see figure 2, whereχ and λ are graphically represented by mirror-inverted copies of the corresponding χ and λ). For simplicity, we show the isometries only for the ternary MERA. We concentrate on this structure because, as it was shown in [31], it offers several advantages over the traditional binary structure (where analogous rules apply). The MERA structure is then closed at the top by the hat tensor of type-0 τ , which is required to satisfy the condition to ensure proper normalization to | . It is worth stressing that, if no constraints are imposed on the dimensions of the indexes of the χ and of the λ, then every quantum state | can be expressed as detailed above for a proper choice of such tensors 6 .
As anticipated in the introduction, in contrast with MPSs, the tensors in a MERA structure are organized in a hierarchical sequence of layers (alternating two semi-layers of χs and λs), which resembles a real-space renormalization flow and induces a causal relation among them (i.e. properties at one given point in the network can depend upon the preceding layers via the elements that are immediately on top of it). Moving along the graph from the bottom to the top, the number of indices gets decimated after every full layer by a constant factor of 2 or 3, depending on the chosen value of τ , whereas the legs weight m (dimension of the corresponding index) can change according to the chosen renormalizing truncation. This leads to a logarithmic depth of the MERA tensor network in the physical size N of the system (∼ log N levels forming the MERA), which is ultimately the key ingredient to support power-law decay of long-range correlations, as will be explained later, in section 2.4. The other evident characteristic of the structure is the self-similarity at different layers, suggesting a simple scale-invariant ansatz consisting in taking all the isomorphic tensors to be identical (and clearly a uniform leg weight Causal cone related to two adjacent pairs of sites, highlighted in bold dark green for the binary (upper) and ternary (lower) MERA. The contraction path is obtained by percolating from the physical sites with a drift along the renormalization flow (i.e. upwards). The letters L/C/R refer to the contraction rules introduced by equation (7) (binary) and by equation (5) (ternary). m = d). This latter remark will lead us to formulate the fixed-point approach to critical systems directly at the thermodynamical limit (TL), reviewed in the present paper.
Within the tensorial representation, any operator A acting on the system can be associated with tensors too (the lower and upper legs corresponding to the input and output indexes of the associated matrix). Furthermore, their expectation values on | can be computed by sandwiching such graphs between the networks representing the state and its Hermitian conjugate, and by properly contracting all legs. For MERA decompositions, such a computation largely simplifies, thanks to the constraints (2), which act as tensor-canceling rules. Indeed it turns out that the tensors χ and λ of the MERA, which cannot be reached via percolation from the subsystem k on which A is nontrivially operating 7 , annihilate with their corresponding Hermitian conjugates (see figure 3). In close analogy with relativistic kinematics, this leads us to identify a causal cone for any given (connected) subset of sites [13]. This allows one to express the expectation value of A as where A k is the tensor associated with the operator A, Q (n→0) k is the graph formed by the tensors of | that enters in the causal cone of the k subsystem, and n is the full MERA depth, i.e. the total number of levels of the MERA. Depending on τ , we can have n = log 2 (N ) − 2 for a binary MERA or n = log 3 (N ) − 1 for the ternary version (here the hat tensor is considered the zeroth level).

7
A peculiarity of the MERA is that, at each level, the causal cone has bounded width (number of involved tensors), and this grows progressively smaller until it reaches a stable size. The chosen MERA topology determines the minimal size of such stable structures. For instance, in the case of binary MERAs (τ = 2), stable causal cones are associated with three adjacent sites (four adjacent sites can be regarded as a metastable size, meaning that, depending on the position within the level, the causal cone can shrink into three sites or remain at four). On the contrary, for ternary MERAs (τ = 3), three adjacent sites are metastable, whereas the minimal causal cones are formed by two adjacent sites [13,31]. Specifically in the latter case, propagating two sites up their causal cone is obtained by plugging together the following type- 2 6 tensors M of elements where for ease of notation the typographic symbols and • stand for contraction of the corresponding indexes. Three modalities are possible, according to where the indices u 1 u 2 of one level are plugged to the upper level: left (L) with 2 3 , center (C) 3 4 , and right (R) 4 5 .
Omitting the full notation, we can then write where for j = {0, . . . , n}, the sequence of labels a (k) j ∈ {L , C, R} specifies the plugging mode determined by the position of the k subsystem within the system (the definitions of the plugging modalities of the tensor mirror those given for the Ms and can easily be derived from the graphical representation of the tensor-e.g. see figure 3). Analogous expressions hold also for binary MERAs. In this case, the main building blocks are type- 3 6 tensors of the form which yield an expression for Q (n→0) k similar to equation (6)-the main difference being the labels a (k) j that now span a binary (not ternary) set-(e.g. see [36]). In the evaluation of observables acting on two disconnected subsystems k and k , analogous simplifications can be performed, leading to the construction of their joint causal cone. For instance, suppose that k and k individually define two minimal causal cones. Due to the topology of the MERA, after a number of layers, which is logarithmic in the distance between the subsystems (i.e. n −n = log 2 |k − k | + O(1) or n −n = log 3 |k − k | + O(1) for binary and ternary MERAs, respectively), such cones will inevitably intercept. The joint causal cone of k and k will then be composed of two parts: a lower part, which percolates the two sites from the bottom layer of the graph to the interception layer and which can be constructed by two independent concatenations of M tensors defined in equation (6); and an upper part, which terminates the percolation from the interception layer to the top layer of the graph and which is represented by a (possibly unstable) causal cone P. As a result, given A and B, two operators acting on k and k , respectively, we can compute their expectation value |A ⊗ B| as where the adjacent writing of two tensors without contraction indicates the tensor product of them. As shown in section 2.4, the logarithmic dependence of n −n upon the subsystems is the key ingredient that leads to power-law decay of correlations [12,21]. Before concluding this section, a couple of remarks are mandatory. The properties described here give an insight into why MERA structures are promising candidates for an The computation of an expectation value with a 2D X -Y striped structure (right). The top square represents the averaged density matrix at a given level ρ (n) and the red square an averaged raised operator h (n−1) , equation (30). For the sake of readability, we omitted the upmost tensorsM x ,M y . The causal cone basic element along one direction M x,y (left) is formed by the contraction of two isometries λ (purple) and one disentangler χ (light blue). efficient description of critical ground states. Nevertheless, it should be stressed that MERA decompositions are not translational invariant in a trivial way. Indeed, different from other tensor structures, such as the MPS or projected entangled pair states (PEPS), setting all the tensors equal in a 1D MERA with periodic boundary conditions does not give a translational invariant state: if we translate by one site the state, the topology of the tensor graph modifies in such a way that the causal cones have to be reconstructed from scratch, with no simplification of the operations available. The breakdown of such a fundamental symmetry suggests that the MERA may give good qualitative results, although not as precise as DMRG. Nonetheless, by increasing the bond dimension m (i.e. truncation in the renormalization group flow), the MERA should accommodate better and better approximations of the translational invariance [13,14]. Thus, to describe translational invariant systems by means of a MERA, we set all the tensors of a given layer identical and study the convergence of the results with respect to the bond dimension m. From now on, we focus on this particular subset of ansatzes.
Finally, it is also worth stressing that a MERA structure differs substantially from a simpler TTN, due to the presence in its topology of loops introduced by disentanglers χ. Thus, when performing optimization over the tensors, it is not trivially possible to formulate the optimization problem (see section 3.1) as alternating least squares problems [28,33] and discard the isometric requirement toward the upper level. Instead, a direct nonlinear optimization over isometries has to be done, which is more difficult and subtle to handle [14]. Some possible methods to tackle this optimization are reviewed in section 3.1, together with new ones (see also [30]).

Two-dimensional (2D) systems
It is a well-known fact that difficulties in tackling 2D systems are related to their entanglement properties [7,25,29]. PEPS were indeed proposed as a generalization of MPS to face this issue [7,34]. On the other hand, as anticipated in the introduction, MERA tensorial structures can satisfy these requirements, thanks to the presence of disentanglers mixing the neighboring degrees of freedom [15]. Different ways to arrange isometries and unitaries while preserving a finite-width causal cone have been put forward in [15,17,35]. In figure 4, we propose here yet another one, characterized by alternating stripes along the xand y-directions made up of the  (18)). The contraction of the green upper part of the graph defines the operator σ (n) , while the lower red parts define the averaged maps ( ⊗ ).
ternary 1D structure described before. Such an ansatz might violate the geometrical symmetry of the problem (translations and 90 • rotations) even more markedly than others. However, being a direct generalization of the 1D case, its numerical implementation presents several advantages that ultimately lead to a reduction in the computational cost (see section 4).

Evaluation of observables
The process of evaluating observables, described in section 2.1, can be read as a sequential application of linear superoperators, or maps, acting either to the observable itself or to a reduced density matrix at the top level [13,21,30]. The first interpretation is directly related to a constructive renormalization procedure. Let us first label the MERA layers with an index j ∈ {0, 1, . . . , n}, with j = n indicating the bottom layer and j = 0 the upper MERA layer, respectively. All the operators of the Hilbert space at a certain level j of the decimation are further re-defined at level j − 1 by the corresponding RG isometric transformation. The process goes on until the effective observables are defined on the treatable system at the top and then evaluated. The top-bottom interpretation instead finds its roots in a circuit picture, where the original compact state is spread on successive layers by means of isometries and entangling unitaries. The two complementary pictures are commonly addressed as Heisenberg and Schrödinger ones. In this subsection, we review their basic properties, referring the reader to [36] for an exhaustive mathematical description. A practical guide on how to implement the Step-by-step (from left to right) optimal contraction rules of the superoperators C (upper) and L (lower) onto the averaged density matrix at a given level ρ (n−1) (green four-leg tensor) for the ternary MERA. The operator R is obtained by symmetry of L . Color codes are given in figure 2 for χ,χ , λ andλ; the purple (light blue) 'bananas' represent the tensors resulting from the contraction of χ andχ (λ andλ). In the last two steps, tensors multicolor code recalls the tensors that have already been contracted. Contractions scaling costs as a function of m are given by counting the number of contracted and free indices.
contractions involved in a map application can be found instead in section 3, and an explicit graphical representation in figure 6.
Formally, the Heisenberg description of the RG flow implied in the calculation of the MERA expectation values can be obtained as follows. Using then equation (5), we can write equation (6) as follows: where for j = 1, . . . , n we defined with A (n+1) k := A k being the tensor that describes the bare operator A. Accordingly, A k can now be interpreted as the tensorial representation of the updated version A ( j) of the operator A after n + 1 − j renormalization steps (again, A (n+1) is identified with A). In operator language, the mappings (10) define thus a collection of raising superoperators˜ and whose Krauss form is given by the tensors M ( j) a (k) j , as indicated by the last identity. (Here, for ease of notation, we have introduced the convention of identifying an operator with the tensor that represents it. Note that, in this formalism, the labels a (k) j , defined in equation (6), specify which lower legs of the tensor M ( j) should encode the input degree of freedom of the associated operator). The mappings˜ a are completely positive, meaning that˜ a ⊗ I take positive operators into positive operators (for every identity superoperator I acting on any ancillary system), and unital, meaning that the identity operator is a fixed point of˜ a . Using these definitions, the expectation value (4) can then be cast as in where '•' represents super-operator composition and ρ a is the (reduced) density matrix at the top level of the MERA. The latter is obtained by taking the trace of the hat tensor on the legs complementary to the plugging ones denoted by a, i.e.
We stress that, in writing equation (12), we assumed the subset k to be associated with a minimal stable causal cone of the MERA. The analysis, however, can easily be extended to cover other configurations. The Schrödinger representation of the above identity is constructed by introducing the adjoint ( j) a of the superoperators˜ ( j) a . As a consequence, equation (12) can be cast in the equivalent form: By construction, the maps ( j) a are CPT. They define a reversed RG flow, proceeding from the top of the MERA to its bottom layer, which brings the hat state ρ a (k) 0 of equation (13) into the reduced density matrix of | associated with the k subsystem on which A is acting, via a series of steps of the form (Here, again, we have identified operators with the tensors that represent them and ρ ( j) describes the evolution of ρ a (k) 0 at the jth RG step.) A quantity of interest, especially for descriptions that are explicitly breaking the translational invariance symmetry, is the average of an observable expectation value across the whole lattice. For the cases described by equation (14), and remembering that, in our notation, the index k is used to indicate the subsystem position along the many-body state, this corresponds to computing quantities of the form [36] with ρ (0) being the average of ρ (0) a over a and respectively, for ternary and binary MERAs. In deriving the above expression, one uses the fact that, by varying k, the string {a (k) j } j=0,n spans all possible 'words' with n + 1 characters chosen from the set of a. It is worth stressing that, for a generic MERA, the maps ( j) will depend explicitly upon the layer level j due to the fact that the χs and the λs are not uniform. This, of course, is no longer the case when considering homogeneous MERA networks. As will be discussed in section 4, in this case, equation (16) becomes particularly simple to treat, making it a useful tool to treat short-range, translational invariant, properties of the system. Two-point correlation functions can also be reformulated in terms of the superoperators. We choose here two possible ways of performing averages over them, the first one being more convenient for computational purposes and the second one having a clearer correspondence with the correlations at fixed distance usually defined in condensed matter physics. For ternary MERAs, both approaches start from a four-neighboring-site density matrix σ (n) at the interception layern averaged over translations, which can be obtained from the upper level by consecutively applying the proper CPT superoperators to the hat state, in the Schrödinger scheme. Below this interception layer, we choose an algorithm of level growth, by assembling the CPT maps of our knowledge (the a ones), in order to achieve some kind of longrange reduced density matrix. The first idea considers the iterative application of the ⊗ defined above in equation (16). Following the definitions for a ternary MERA, this gives the expectation value where n = n −n > 0. Note that the observable A is averaged over a subset formed by 3 n neighboring sites, called a causal shadow in [36]. Similarly, B is averaged over a shadow, but the sites it acts upon are distant 3 n from the first set. Furthermore, the correlation between the two averages is again averaged over a set of 3¯n discrete translations, each one jumping over a distance of 3 n . In a similar manner, the usual fixed distance correlator averaged over all translations can be obtained as well by applying iteratively the map This yields and n = n −n as above. It is straightforward to note that, in both cases, fixed distance and causal shadows, the distance at which we are calculating the correlator scales exponentially with n. Similar expressions hold also for binary MERAs (see, e.g., [36]).

Infinite networks and properties in the thermodynamical limit (TL)
Scale-invariant case. In a generic MERA, the tensors entering the decomposition may differ from node to node and their indexes may have arbitrary (finite) dimensions. Under these conditions, any state of the physical system S can be represented, as in figure 1, by a proper choice of the χs and the λs, provided that the bond dimension m increases enough along the RG flow. In section 2.1, however, we have already restricted our investigations to the case where tensors in each semi-layer are assumed equal. The choice was motivated as an attempt to emulate translational invariance, just as would be done in the traditional RG procedure.
Following [21], [30]- [32], [36], here we adopt a completely homogeneous ansatz where also all layers are assumed to be equal. With this choice, the MERA identifies an even narrower subset of many-body quantum states, characterized by an intrinsic scale invariance symmetry [12] that is typical of critical systems. This property makes such an ansatz appealing for the characterization (at least approximate) of ground-state properties of critical, translational invariant Hamiltonians and their numerical simulation [14], [30]- [32]. The self-similarity allows us to drop layer indices from expressions of section 2.1 and leads easily to an investigation of their TL, N → ∞. Evaluation of both local observables and correlation functions can be performed by repeatedly applying the same CPT maps. The thermodynamical properties are then related to the convergence and to the mixing (or relaxing) property of the CPT map sequence [36,37]. It is worth recalling here that a mixing channel is characterized by the property with f being the CPT map that transfers every operator into a density matrix ρ f (the unique fix point of ) times Tr[ ] (due to trace preservation). The key point is the fact that the vast majority of CPT maps acting on a given system are mixing maps (the non-mixing ones form a subset of zero-measure). As a consequence, apart from some rare pathological cases, the TL of expressions (16)-(20) is well defined 8 . Exploiting the above property, the spatially averaged expectation value of the local observable (16) then becomes with ρ (th) being the fix point of the MERA channel . In a similar way, the thermodynamic limit of two-point correlation functions under causal shadows (18) with σ (th) being the TL of the average four-(nearest)-neighbouring-sites density matrix 9 . As we can see, the chosen shadow depth n (which was kept constant when approaching the limit) simply fixes the number of times the map ⊗ must be applied. An identical argument holds for fixed-distance correlation functions, yielding the expression AB (th) The scaling behavior of a correlation function is determined by its deviation from the uncorrelated value of the observable expectations. Unfortunately, when dealing with nontranslational invariant states, one has to take into account the residual classical correlation that 14 can arise. With this, we can rescale the correlator of interest to us (say the fixed-distance one) and then approach the TL in the proper order: where is the version of σ (th) where we have taken out the quantum correlations while keeping eventual classical ones. Note, for instance, that ζ (th) = ρ (th) ⊗ ρ (th) , even though it is separable by definition 10 .
The correlation functions computed in equation (25) are of particular interest, since from their decay one can deduce whether the system is critical and, in this case, its universality class. This task is traditionally performed by computing the two-point functions, fitting them either with exponentials or with power laws, and finally applying finite-size scaling theory to recover the behavior in the thermodynamic limit. For homogeneous MERAs, thanks to equations (22) and (23), this operation largely simplifies. Indeed, expanding the argument of the superoperator in terms of its eigenvectors, equation (25) can be simplified as where the sum is taken over the eigenvalues ε of , and the g ε (·) factors are finite-order polynomials in their main argument (whose coefficients depend on A, B and σ (th) − ζ (th) ). Now, exploiting the fact that Tr[σ (th) − ζ (th) ] = 0, we must have that lim n→∞ n (σ (th) − ζ (th) ) is the null operator. This tells us that σ (th) − ζ (th) has zero component over the unique eigenoperator of with eigenvalue 1. Furthermore, recalling that the distance at which we are calculating the correlation is k = 2 · 3 n , we obtain Analogous results are obtained for causal shadow correlators as well (the resulting expression is identical to (27) once we replace the eigenvalues of with those of ⊗ ). This implies that (apart from negligible logarithmic corrections) the fixed-distance correlations of a homogeneous MERA state show a power-law-like scaling behavior in the TL, the logarithms of the eigenvalues of playing the role of critical exponents. This is an irrefutable signature of the critical character of such states.

MERA optimization
In the previous paragraphs, we reviewed some possible tensor structure ansatzes and the mathematical description that allows one to extract information from them. From this starting point, given the Hamiltonian under consideration, one should optimize the variational ansatz to minimize the energy of the system and thus obtain an efficient description of the system ground state. The aim of the present section is to provide a step-by-step guide to the implementation of the various operations to accomplish this goal. We discuss in some detail the 1D case. This procedure can be straightforwardly generalized to a 2D structure: for example, the tensor structure introduced in section 2.2.
The building blocks of the algorithm (reading information, optimizing or finding the CPT map spectra) can be formulated as a repeated application of the superoperators of section 2.3 [13,14,30]. The computational requirements for such operations set the scaling of the whole algorithm in terms of the bond dimensions m. For the ternary MERA, this scaling is m 8 [31]. The scaling of the optimization algorithm of the 2D striped proposal presented here is m 12 , giving an improvement with respect to previous proposals, where it was m 16 [15] or worse [35].
In the following, we focus on 1D systems, and, given the favorable scaling of the ternary with respect to the binary MERA, we review here some contraction rules for the former [30], the latter being only slightly different [14]. Given the causal cone structure (6), three different cases can be distinguished for each of the two 'directions' and˜ in the structure (11)- (15). It is then sufficient to implement a few basic operations to cover all the cases. In figure 6, we show how to get minimal computational scaling through possible sequences of contractions (not necessarily unique).
Diagonalizing maps. As pointed out in section 2.4, we have to compute the spectrum of the descending superoperator in order to compute the critical exponents ruling over the power-law decay of correlations. Moreover, we have to determine the fixed point ρ (th) both for measuring local observables (22) and for having access to the coefficients in the correlation decay. A generalized eigenproblem is thus to be solved in order to find the eigenvalues with the largest modulus. This task can be performed by means of a Lanczos-like algorithm. In all our numerical implementations, we made use of a Davidson method [40]. Such diagonalizers work directly with a subroutine that applies the operator on a given input, exactly what we just described in figure 6. This is a considerable advantage with respect to writing the whole representation of the CPT map, which would take order m 8 memory slots and m 10 operations! The solvers build an effective tridiagonal matrix out of a moderate number N op of operations starting on a random input. A guess of the eigenvector usually speeds up the process considerably, by accelerating the convergence. We exploit this last feature for re-imposing the fix-point constraint after tensor optimization, due to the slight variance of the map.

Cost function optimization
In this section, we review existing tools and present new ones to optimize a MERA structure with largest bond size m to find the best possible approximation of the ground state of a shortrange Hamiltonian H acting on a system with local dimension d. The optimization can be rephrased in terms of minimization of the global energy [17,30,32], or as maximizing the fidelity along an imaginary time evolution driving the system towards the ground state [14]. In both cases, one has to find the extremal points of a properly defined cost function. In the following, we focus on the former, because the latter optimization can be performed by means of very similar tools [14].
The local connectivity of the ansatz and the width of its causal cone fix the number of neighboring sites that have to be considered in the application of the MERA map. This has to be taken into account when considering models with finite range interactions. For example, in the specific case of the ternary structure (5), Hamiltonians acting on at most two neighboring sites are mapped into effective ones acting again on two adjacent sites at the upper level. In the striped 2D proposal, one has to consider 2 × 2 plaquettes. Henceforth, we limit the explanations to such minimal cases, since the generalization to more complicated interactions is straightforward (e.g. up to five adjacent sites are mapped into only three sites in a single application of the raising operator˜ ).
Expressing the Hamiltonian H as a sum of local contributions H = i h i , the minimization problem is set as follows: where is described by a MERA, h is the averaged expectation value of local Hamiltonian h i , and the last equality holds for translational invariant Hamiltonians. The average defined in equation (28) can be rewritten, by means of equations (11)-(15), as valid for every 1 < n < N , where ρ (n) and h (n) are, respectively, the (averaged) lowered density matrix and the raised Hamiltonian under the nth layer. The previous expression defines an extremal problem (in most of the cases not bilinear in the tensors) to be solved together with the constraint of isometricity [31,32]. The problem can be attacked by approximating it as a bilinear one for a pair of tensors in a given position, assuming that the others are fixed, and then reiterating the procedure [14]: where R and S are the tensors resulting from the contraction of the whole tensor structure, except the tensors to be optimized, χ,χ or λ,λ (we refer the reader to [30] for an extensive treatment of such methods). The latter observation means that, due to the presence of loops in the topology of the MERA, it is not possible to discard the non-isometrical part of the (bilinear) solution to upper levels as it is done in MPS or TTN [33]. Unfortunately, no algorithm is known for solving a bilinear problem with isometric constraint (2), so we have to resort to further approximations or to other methods, as reported in the next section [30,32]. When considering an infinite fully homogeneous MERA, the fixed-point equation for the thermodynamical density matrix ρ (th) should also be taken into account. The coupled equations are then (ρ (th) ) = ρ (th) and min with h (sd) being the Hamiltonian block raised over the low-lying layers accounting for short distances [31,32]. Writing the analytical dependence of the fixed-point condition on the tensors poses a further challenge. One possible strategy to tackle this problem is to alternate solutions of the bilinearized problem (30), with ρ (th) fixed, through the favorite technique and imposition of the constraint (31). Although not guaranteed to converge, because of local minima or pathological interplay between optimization and constraint, this path has been proven to lead to satisfactory results [15,31,32]. Another way to cope with this problem is to define an energy functional acting directly on the tensors by intermediate computation of the fixed point. Such a functional can then be minimized, for example, via direct search techniques, as shown below.
Gradient methods. A gradient method on the tensors treated as a simple collection of variables is not directly performable due to the isometric constraint (2) and the presence of loops in the tensor structure [12,24]. However, by exploiting the group nature of unitary operators, we can define the new tensor as the product of a unitary operator and the old one: where K j× j is a base for the j × j Hermitian matrices and u ∈ R 4 and v ∈ R 6 are vectors with purely real values. The initial point corresponds then to u = v = 0, and the gradient of the energy expectation value (29) can be computed without complications of non-commutativity, being ∂χ /∂ u = i K · χ and ∂λ /∂ v = i K · λ. Moreover, bilinearization (30) is not needed and we can tackle directly the quartic functional. Thanks to trace cyclicity, the energy gradient can always be expressed as with G w being a Hermitian operator that can be computed, as shown in figure 8. Any of the known gradient-based optimization strategies can be now applied to update the tensor structure ansatz, following strictly the gradient [41], or taking random increments along the components keeping only the gradient signs [23,32], or using conjugate gradient techniques with estimation of the maximal move [41]. The main disadvantage of these techniques is that they usually need a large number of iterations and a subtle tuning of the move amplitude to get a good convergence.
Linearized problem. An alternative procedure consists in linearizing the problem, i.e. mimicking that χ andχ are independent variables and solving for one while the other is kept constant at the old value, and then repeating [14,30]. The minimization problem (30) then becomes min µ (n) where Q t is an 'environment' for the unitary tensor µ (figure 7), which depends on the level n, the density matrix ρ (n) and the old values of both χ (n) and λ (n) . Such a simplified problem has an exact solution in terms of the polar decomposition of the environment, Q µ pol.dec.
≡ µ P, µ = χ, λ, P pos. def., (35) with P being the corresponding positive matrix containing moduli of the singular values. It is then sufficient to make the Hamiltonian block negative defined (by shifting down its spectrum) in order to have this absolute value maximization coinciding with energy minimization. The main advantage of this strategy compared with the gradient-based one is that, in general,  (30) for a = L (upper) and a = C (lower), defined to minimize χ and λ; the R a and S a tensors are indicated. The rightmost column represents equation (34) in the considered cases, where Q is the 'environment' tensor. The color codes are the same as in figure 6. The red tensor is the raised averaged Hamiltonian h. it is characterized by a faster convergence, even though it is not formally guaranteed to monotonically decrease the energy. In the case of fully homogeneous infinite ansatz, this procedure has to be alternated with the fix-point constraint, as a global solution of the problem of equation (31) is not available [32,36].

Direct search algorithms.
A possible alternative approach we propose here, to tackle the two requirements simultaneously (31), is the application of the direct search methods. A realvalued energy function E acting on vectors u and v with real components can be defined as follows: 1. From the real vectors, unitary (isometric) tensors are computed by considering them as components along a proper Hermitian base as for the gradient method and exponentiating (truncating) the result, From these tensors, the averaged CPT map ( u, v) is defined and its fix point ρ ( u, v) f is computed via diagonalization. 3. Finally, the output energy is computed by simply contracting and tracing as Given the previous functional, a direct search routine [41] can be applied, i.e. an optimization algorithm that does not explicitly compute gradients being based on the dynamics of a polytope in the configurational space. These algorithms, after initialization, cost a single functional evaluation for every step and energy is guaranteed never to increase. They converge in a number of steps as a multiple of the number of coordinates in the configurational space: as the degrees of freedom scale as m 4÷6 , their practical exploitation might be limited. Nonetheless, they might be exploited to initialize the tensor structure, which can be enlarged later with the technique described below.
Imaginary evolution. As already mentioned, another possible strategy that has been proven successfully in the finite case is the use of imaginary time evolution to extremize the figure of merit as the ground-state energy. We refer the reader to [14,30] for an extensive treatment of the subject.

Expanding an existing structure
Up to this point, we have considered a tensorial structure with a given bond size m, and we have shown how to deal with it and optimize the ansatz towards the minimal energy. Increasing the bond size m enlarges the degrees of freedom kept along the renormalization and the practical effect to increase the variational space of minimizations. All the methods reviewed in the previous section start from a random initial collection of entries or some initial educated guess, and proceed through a complicated energy landscape towards the goal. It is rather intuitive that, with increasing number of degrees of freedom, such a landscape would become more and more perplexed. Starting naively, the standard algorithm from a random point might strongly suppress the convergence speed or inhibit it completely in the case of glassy landscapes. A nontrivial trade-off between increased description capability and decreased optimizability has thus to be faced.
Here, we describe a simple strategy to escape from such a devil's cage (also suggested in [15]) by actually 'inflating' the converged ansatz with a certain bond dimension m into a new one with bonds m and the same expectation values. The increased variational space allows for the minimization process to go on towards a better minimum, starting from an already reasonable guess. The guess is that the old parameters will be slightly corrected, and the new ones take care of improving the description. We explain in detail this simple tensor network transformation, along its basic steps: 1. The asymptotic density matrix (stored for convenience) is re-encoded, setting the old entries as if they were corresponding to the first m states of an m -dimensional system (1 i, j, k, l m ), with zeros elsewhere: elsewhere.
2. The isometric tensors χ are enlarged as ρ and then the unitary vectors living on the new degrees of freedom are defined. The simplest choice is where the delta is intended as acting between the compacted indices, i.e. the tensors are seen as normal matrices. Otherwise, it is also possible to sort out some random, mutually orthogonal, unitary vectors on such components. and X X Y for = 0 (center) and = 0.5 (right), and the relative error ε for m = 4.

1D systems
In table 1, we report some of the numerical results for the critical exponents of the model (36) obtained with a binary MERA [12] using the approach developed in [21]. The results for the critical exponents have been shown in [32]. The results obtained for the Ising model are very promising. However, it is well known that the Ising model has a simple spectrum and properties due to the fact that it is equivalent to free fermions [39]. When the iMERA algorithm is applied to the study of a more complex model, such as the X X Y model, the errors are greater and they increase with (see table 1, center and right). One critical exponent is computed with precision of less than 50%, clearly indicating that the numerical resources used are not sufficient. This result reflects the increasing complexity of the studied model, which can also be detected by studying the decay of the populations of the fixed point ρ th . Indeed, guided by the experience of DMRG, we expect that, if the exact fixed point of equation (31) is a density matrix with fast decaying populations, we can obtain a good description of it with already very small m. In contrast, the approximated expectation values and correlators will not give a good description of the physical model at the TL. This expectation is confirmed in figure 9, where we show the ordered populations of the fixed point density matrix ρ th for the critical and non-critical Ising model, as well as for the X X Y model. As can be clearly seen in the figure and in the inset (where the same data are shown in log-log scale), the ρ th describing critical states displays an initial power-law decay and then an exponential tail. In the case of the Ising model without criticality, the density matrix is almost a perfect pure state. (The ratio between the first and the second population is of the order of 10 −5 , reflecting the fact that the ground state is ferromagnetic.) In the critical state case (B = 1), the populations display a power-law decay with a final exponential tail. A similar behavior is shown by the X X Y model at = 0 (where we obtain very good precision), with a drop in the populations after about 40 states that suggests that the exponential tail is a numerical artifact due to the finite size of the density matrix. However, the density matrix dimension is sufficient to account for the required information as the good results on the obtained critical exponents confirm. This is not the case with = 0, where the errors increase rapidly and where a cut in the populations of the order of a few per cent with respect to the biggest ones (where the exponential tail begins, again around the 40th population) introduces large errors on the critical exponents.
Finally, in the second inset, we plot the von Neumann entropy of the reduced density matrix ρ th , which again reflects the increasing complexity of the state to be described. It is zero for the ferromagnetic state, approximately one for the critical model and two for the X X Y model. This quantity can thus give an operational indication of the resources needed to obtain a good state description, and can be related to the conformal charge of the model, as shown in [31]. Ordered populations p α of the fixed points ρ th = α p α |α α| of the studied models. Inset (left): the same but in log-log scale. Inset (right): von Neumann entropy of the reduced density matrices S = α p α log 2 p α for the Ising model with B = 50 (blue full square) and B = 1 (empty square), and the X X Y model as a function of (red circles).

2D Ising model
One great potential of MERA is its use in dimensions higher than one: another case where DMRG cannot be efficiently applied. In two dimensions, the tree-like structure has to be constructed in both spatial dimensions. This construction can be done in various ways, some of which are used in [12,13,35]. We implemented a double-layer X -Y structure built by alternating disentanglers and isometries in both directions, as depicted in figure 4, based on the 1D ternary MERA introduced in [31]. This implementation allows for a favorable scaling of the algorithm of the order of O(m 12 ). As mentioned before, this tensor structure explicitly breaks the symmetry between the two spacial directions, possibly introducing some artificial numerical results.
In the same spirit as in the previous paragraph, we tested our optimization algorithm, determining the first eigenvector of the MERA transfer matrix for an Ising model in two dimensions, defined by the Hamiltonian (36) with γ = 1 and B = 2b. This problem was recently investigated by Evenbly and Vidal [15], who studied the critical exponent for the magnetization and the decay of spin-spin correlation functions. Our results are consistent with those in [15].
Monte Carlo simulations predict at B MC 3.04 a critical point with critical exponent related to the vanishing of the order parameter β MC c 0.32 [38]. We computed the critical field B c and exponent β c for different values of the dimensions m of the tensors. The errors as a function of 1/m are reported in figure 10 and display an exponential convergence to the predicted values [14]. Extrapolating the data as a function of 1/m, the critical field and exponent predicted by the quantum MERA channel are a few per cent away from the results of [38]. Right: exponential fit of the convergence to the Monte Carlo data of [38] of the computed critical field B c and critical exponent β, as a function of 1/m.

Conclusions
In this paper, we have reviewed how to build and optimize a different layered hierarchical structure as 1D and 2D binary and ternary MERAs. We have reviewed the quantum channel approach to study the TL and efficiently extract the critical exponents from such tensor structures. We have presented our results for the determination of the critical exponents in a 2D quantum Ising model and shown that, even with modest numerical resources, one can obtain good results. Finally, we have introduced some heuristic methods to control and check the convergence of the numerical simulations. The presented algorithms and methods can be applied to different tensor networks built to account for different physics: for example, the recently introduced fermionic approach to tensor methods [16]- [18].