Entanglement area law for shallow and deep quantum neural network states

A study of the artificial neural network representation of quantum many-body states is presented. The locality and entanglement properties of states for shallow and deep quantum neural networks are investigated in detail. By introducing the notion of local quasi-product states, for which the locally connected shallow feed-forward neural network states and restricted Boltzmann machine states are special cases, we show that Rényi entanglement entropies of all these states obey the entanglement area law. Besides, we also investigate the entanglement features of deep Boltzmann machine states and show that locality constraints imposed on the neural networks make the states obey the entanglement area law. Finally, as an application, we apply the notion of Rényi entanglement entropy to understand the power of neural networks, and show that image classification problems can be efficiently solved must obey the area law.


I. INTRODUCTION
Understanding entanglement features of quantum systems is crucial for understanding many important physical phenomena. One of the most outstanding issues of entanglement features is that entanglement entropy is somehow bounded by the area of quantum systems. This idea, now as an important part of holography principle, can be applied into many different physical areas, such as topological order [1][2][3], fractional quantum hall effect [4], topological insulator and topological superconductor [5,6], anti-de Sitter Space/Conformal field theory (AdS/CFT) correspondence [7][8][9][10] and so on.
Holography principle asserts there is a duality between the boundary quantum field theory and the bulk gravitational theory. More precisely, it claims that the d + 1 dimensional conformal field theories (CFT d+1 ) are equivalent to the gravitational theory on d + 2 dimensional anti-de Sitter space AdS d+2 . Based on the holographic approach, Ryu and Takayanagi proved that the entanglement entropy of a subsystem A in CFT d+1 is related to the area of static minimal surface γ A in AdS d+2 whose boundary matches the boundary ∂A, the famous Ryu-Takayanagi formula [11] reads is the d + 2-dimensional Newton constant. The key point here is that the entanglement entropy is * Electronic address: giannjia@foxmail.com,zajia@math.ucsb.edu † Electronic address: wuyuchun@ustc.edu.cn ‡ Electronic address: gpguo@ustc.edu.cn bounded by the area of the quantum system and there is a duality between geometry and entanglement. For applications in quantum many-body systems, it is now a wellknown result that the ground states of local gapped quantum systems obey the entanglement area law [12,13]: the value of entanglement entropy between a subsystem A and its complement A c scales at most the area Area(A) rather than the volume Vol(A) of subsystem A. It can be understood intuitively that the entanglement area law is a result of the fact that the correlations of particles in a "natural" quantum system are usually local, thus the contribution to the entanglement entropy between A and A c given by cutting the correlated pairs between A and A c only depends on the pairs of particles in the vicinity of the boundary. Although there are many numerical and theoretical results support this intuitive argument mainly in (1 + 1) system and in some (2 + 1)D systems, rigorously proving the entanglement area law is extremely challenging and many sophisticated mathematical tools, like Toeplitz matrix theory [14,15], Fisher-Hartwig theorem [16], Lieb-Robinson bound [17], Chebyshev polynomial [18] and so on, must be used. It is now one of the central problems in Hamiltonian complexity theory to establish the entanglement area law.
On the other hand, given a real quantum many-body system, we will be facing an extremely large (about 10 23 or more) number of freedoms, which makes it a notoriously difficult task to solve the Schrödinger equations directly. However, fortunately, physical systems often have a simplified internal structure, for which we can use exponentially fewer parameters to characterize the ground states and time evolutions of the system, this makes many numerical and theoretical methods possible. The traditional mean-field approach can solve the equations for many weakly correlated systems. For strongly correlated quantum systems, many new tools are developed these years. Quantum Monte Carlo sampling [19] provides a high-accuracy method for studying large systems, however, it suffers from the sign problem which makes it unable to be applied to frustrated spin systems and interacting fermion systems. Tensor network representation of quantum states, such as density-matrix renormalization group (DMRG) and matrix product states [20], projected entangled pair states (PEPS) [21], folding algorithm [22], entanglement renormalization [23], time-evolving block decimation (TEBD) [24], etc., play an important role in calculating 1d and 2d quantum systems and even in the construction of AdS/CFT correspondence [25,26]. Among all of these numerical and theoretical methods to representing and approximating quantum states, the neural network as an important tool of machine learning, which shows great power in approximating given functions and extracting features from a big set of data, is now attracting many interests from both physicists and computer scientists.
Neural networks are recently introduced as a new representation of quantum many-body states [27], and it shows great potential in solving some traditionally difficult quantum problems, for instance, solving some physical models and studying the time evolution of these systems [27,28], representing toric code states [29], graph states [30], stabilizer code states [31,32] and topologically ordered states [29,31,33,34], studying quantum tomography [35,36], and so on. Quantum neural network states are currently subject to intense research and represents a new direction for efficiently calculating ground states and unitary evolutions of many-body quantum systems. These researches stimulate an explosion of results to apply machine learning methods to investigate condensed matter physics, like distinguishing phases [37], quantum control [38], error-correcting of topological codes [39], etc. The interplay between machine learning and quantum physics has given birth to a new discipline, now known as quantum machine learning.
In this work, we present a study of the entanglement properties of the quantum neural network state. It has been shown that locally connected restricted Boltzmann states obey the entanglement area law [28]. Here we give a more comprehensive study of the entanglement properties of both shallow and deep neural network states. And as an application, we apply the notion of entanglement entropy to the understanding of the representational powers of neural networks in image classification problems.
The paper is organized as follows. In Sec. II, we introduce the notion of local quasi-product state and establish the entanglement area law of these states. Since locally connected neural network states are special cases of the local quasi-product states, they also obey the entanglement area law. Sec. III presents the study of deep Boltzmann machine (DBM) states, and by introducing the geometry to deep Boltzmann machine, we prove that local DBM states obey the entanglement area law. In Sec. IV, we apply the notion of Rényi entropy to the un-derstanding of the power of the neural network in solving image classification problem, and we show that the target function of classification problem of local smooth images obeys the entanglement area law. Finally, we discuss in the last section some subtle issues of the locality and entanglement of the neural network states. The Schrödinger equation of condensed matter system usually involves a large number of freedoms which makes it extremely difficult to be solved exactly. However, the eigenstates of the Hamiltonians of these natural systems often have an internal simplified structure, which makes many approximating or even exact methods possible. Neural network states are introduced as ansatz states of many-body quantum systems recently, and because of their good performance in solving some problems which can not be solved using the state-of-the-art method, many attentions are attracted [27,28,30,40]. Here, to explore the area-law entanglement of the neural network states, we first introduce the concept of quasiproduct states. As we will show later, the locality constraint imposed on the neural network architecture results in states of quasi-product form.
Let S = {s 1 , · · · , s N } be a system with N particles, by a local K-cluster cover we mean a class of local subsets of S, viz., C 1 , · · · , C M , called local cluster, for which each C i only contain at most K particles in a local region and ∪ M i=1 C i = S. A local K-cluster quasiproduct state can then be defined as Ψ(s 1 , · · · , s N ) = Φ 1 (C 1 ) × · · · × Φ M (C M ), where each cluster term Φ i (C i ) is a function of freedoms of particles contained in C i , and the size of clusters K := max{|C i |} does not depend on the system size N . It is obvious that product state is just 1-local quasi-product state, i.e., each Φ i (C i ) is just the function Φ i (s i ), since each local cluster C i only contains one particle s i , we will also refer this kind of states as local 1-cluster quasi-product state.
It turns out that many crucial classes of quantum states can be expressed as local quasi-product states, such as cluster state, Z 2 -toric code states, graph states, Z 2 -stabilizer code states, Kitaev's D(Z d ) quantum double ground states. They are all explicitly constructed in local RBM form [29][30][31][32], but we will show later in this section that all local RBM states are the local quasiproduct state.
Many of examples of local gapped systems come from local commutative Hamiltonian H = k H k for which [H k , H l ] = 0, ∀k, l and each local term H k only acts on a local region S k of the system 1 . It is very natural to use the quasi-product state as an ansatz state to solve the eigenvalue equation HΨ(s 1 , · · · , s N ) = E 0 Ψ(s 1 , · · · , s N ). We can assign a cluster C k to each local term H k and usually we also make constraint that S k ⊆ C k , i.e., C k contain all particles which H k acts on nontrivially. In this way, the eigenvalue equation can be simplified as is the ground state energy of H k . Then using some other properties, like symmetry, of the system, we can alternatively solve these equations with less variables to given the solution of the original eigenvalue problem.
Here, for illustration, we choose cluster stabilizer code, toric code and graph state as examples.
Example 1. Cluster stabilizer code state, or equivalently, the ground state of (1 + 1)D symmetry protected topological (SPT) phase Hamiltonian defined on a 1d lattice with periodic boundary condition can be represented by a 3-local quasi-product state. Each term σ z k−1 σ x k σ z k+1 is called a stabilizer. The cluster state is a Z 2 × Z 2 protected topological state [41], which can used for measurement-based quantum computation [42][43][44]. Here, we validate the efficiency of the local quasi-product state representation by explicit constructions, the local cluster is chosen as 3-local cluster corresponding to each stabilizer, i.e., Φ k (C k ) = Φ k (s k−1 , s k , s k+1 ). The ground state satisfies σ z k−1 σ x k σ z k+1 s1,··· ,s N =±1 Ψ(s 1 , · · · , s N )|s 1 , · · · , s N = s1,··· ,s N =±1 Ψ(s 1 , · · · , s N )|s 1 , · · · , s N for all k, which is equivalent to s k−1 s k+1 Ψ(· · · , s k−1 , s k , s k+1 , · · · ) =Ψ(· · · , s k−1 , −s k , s k+1 , · · · ), ∀k. (1) Using the 3-local quasi-product state ansatz state Ψ(s 1 , · · · , s N ) = N k=1 Φ k (s k−1 , s k , s k+1 ) and cancelling the same term from two sides of the equality, we obtain These are highly nonlinear equations, thus are very difficult to solve directly and the solution is not unique in general. But noticing that the model is translationally invariant, we can assume that all local clusters Φ k (C k ) are of the same form. Via this simplification, we can obtain an solution: It is easily verified that the local quasi-product state Example 2. Let us now consider the toric code model (Z 2 -Kitaev quantum double model) [45], which is the most simple model of topologically ordered states and plays an important role in quantum error correcting codes and fault-tolerant quantum computation theories [46]. Given a L × L square lattice with periodic boundary condition (i.e., on a 2d torus T 2 ), on each edge there is an associated spin space C 2 , thus there will be N = 2L × L qubits in total. To each vertex v and plaquette we assign a stabilizer operator A v = j∈∂v σ x j and B p = j∈∂p σ z j , and the Hamiltonian is of the form The ground state of the Hamiltonian is four-fold degenerate (which corresponds to the order of the first Z 2 homology group of torus T 2 , viz., Here, by explicit construction, we show that the ground state of toric code model can be represented as 4-local quasi-product state. Actually, we can assign a cluster to each vertex and plaquette, thus the state is of the form only contain spins which A v (resp. B p ) acts nontrivially. Like in the cluster state construction, we have the constraints As has been shown in the RBM form [29], it is easily checked that are a set of solutions. The excited state can also be represented in quasi-product form in a similar way. We must stress here that this is just one of the solutions, in fact there are a lot of other solutions depending on the choice of the local clusters.
Example 3. Another example we will consider here is the graph state , which is an important class of multipartite entangled quantum states and is useful for quantum error correcting codes, measurement-only quantum computation and so on [47]. For a given graph G with vertex set the graph state is defined as where U ij is a two-qubit controlled-Z gate. The wave function thus takes the form To represent the state using the local quasi-product state, we can assign a cluster to each edge e = i e j e ∈ E(G), , which is obviously a 2-local quasi-product state.

B. Shallow neural network states
Here we will construct two important classes of quasiproduct states via feed-forward and stochastic recurrent neural networks, which will be the main focus of this work. To this end, we first need to introduce the notion of geometry for neural networks.

The geometry of neural network states
Inspired by the geometry of tensor network states [48], here we introduce the notion of the geometry of the neural network states, which turns out to be crucial for understanding entanglement features. Hereinafter, we will concentrate on the neural networks with a layered structure, which are also the most studied cases. The physical freedoms are placed on some fixed layer of neural networks, e.g., the visible layer of restricted Boltzmann machine or input layer of the feed-forward neural network, the layer will be referred to as physical layer. Notice that the physical layer has its geometry given by the physical system. For example, if the physical freedoms (like spins) are placed on the square lattice, we can impose the neurons (represent physical freedoms) to has the same square lattice geometry. After the geometry of the physical layer is fixed by the geometry of the physical system, all other layers are imposed to have the same geometry duplicated from the physical layer as shown in Fig. 1.
Recall that the geometry of tensor network states is characterized by the positions of local tensors and their contraction pattern, here for neural network states, similar results hold. We can compare the distance between neurons in different layers since these layers have the same geometry. Now we can define the notion of locality of a neural network. For a neuron h it only connected the neurons in (l − 1)-th and (l + 1)-th layers in the ε-neighborhood of h i (see Fig. 1 for illustration). If all the neurons of a neural network are local ε-connected with each other, we say the neural network is a local ε-connected neural network. Similar construction has been used in Refs [29,31,32,49] for exactly constructing neural network states of some physical systems. When the in a local ε-connected neural network, each neuron h (l) i only connect with K neurons both in (l − 1)-th and (l + 1)-th layers, we call it a K-local neural network. For a K-local neural network, a corresponding quantum state can be given. Usually, there are two different ways to build quantum neural network states [40], the first approach, which is also the approach we choose to use in this work, is to introduce complex weights and biases into the neural network; the second approach is to represent the amplitude and phase of a wavefunction separately. We will prove that quantum states build from K-local neural networks obey the entanglement area-law, since there are all quasi-product states, and the entanglement area law of quasi-product states will be established later. To make this construction more clear, let us see two important examples.

Local restricted Boltzmann machine states
In this part, we will introduce the notion of restricted Boltzmann machine states, which were introduced in Ref. [27] for calculating ground state and unitary evolution of strongly correlated many-body systems. The restricted Boltzmann machine (RBM) was invented by Smolensky [50], it is an energy-based neural network model [51,52]. Since RBM only has two layers of neurons, one visible layer and one hidden layer, it can be regarded as a shallow neural network.
We now build quantum states from local RBM and show that they are local quasi-product states. The RBMs have a layered structure, which makes the locality defined above applicable. To construct a local RBM state, we first impose the locality constraints on the visible layer, which is nothing but the physical layer, each visible neuron corresponds to the physical freedom (e.g. spin), denoted as S = {v 1 , · · · , v n }, and the geometry of the visible layer is inherited from the physical system. The hidden neurons are denoted as {h 1 , · · · , h m }, which are placed on the hidden layer with geometry duplicated from visible layer, viz., the distance between two neurons can be defined the same as visible layer (the distance of visible layer is inherited from the physical system). The weight between h j and v i is denoted as W ij , the biases of v i and h j are a i and b j respectively. The RBM representation of quantum states is obtained by tracing out all hidden neurons, viz., if h j = 0, 1, and by notation ij we mean h j and v i are connected. The K-local RBM state can be defined as the one whose hidden neurons h j only connect with at most K visible neurons v j k in a local ε-neighborhood of h j . From the construction it is easily checked that the Ψ RBM (v 1 , · · · , v n ) = j Φ j (v i : v i h j ). This kind of construction has be used in Refs [29,31,32,49] for investigating physical properties of complex systems. Let us now see an one-dimensional example. As shown in Fig. 2 (a), the quantum states built from 3-local RBM neural network is a local quasi-product state. From Eq.
, thus it is a 3-local quasiproduct state.

Local feed-forward neural network states
Another crucial class of quasi-product states is local feed-forward neural network states. To start with, let us first briefly recall the notion of a feed-forward neural network. The neuron of the feed-forward neural network is modeled by McCulloch-Pitts neuron model [53], n inputs x 1 , x 2 , · · · , x n values are transmitted by n corresponding weighted connections with weights w 1 , w 2 , · · · , w n . After the input values have reached the neuron, they are added together with weights n i=1 w i x i and the result is then compared with the bias b of the neuron to determine if it is activated or deactivated. The activation status is characterized by the activation function F . Therefore the output of the neuron is given by y There are several commonly used activation functions such as step function, sigmoid function and so on. Here, to make the construction more general, we won't restrict the form of activation function and we allow the activation function of each neuron to be different in one neural network. A feed-forward neural network is several layers of neurons for which the neurons in adjacent layers are connected with each other but there is no intra-layer connection, as shown in Fig. 2 (b).
To build quantum states from feed-forward neural network, complex weights and biases and complex activation functions need to be introduced 2 . We assume the output value of the output layer if y 1 = F 1 (v 1 , · · · , v n ), · · · , y m = F (v 1 , · · · , v n ), the quantum states is construct as their product, Ψ(v 1 , · · · , v n ) = m j=1 F j (v 1 , · · · , v n ) where the normalization factor is omitted. If we add the locality constraints of connections between each layer, then we get the local feed-forward neural network states. See Fig. 2 (b) for an example. The local constraints make the corresponding states quasi-product states. As in Fig. 2 (b), the value of the first output neuron In practical applications, introducing complex weights and biases and complex activation functions may lead to some difficulties for training the neural network, to overcome this shortage, the amplitude and phase of quantum state are usually represented by two feed-forward neural network separately, see Ref, e.g., [40]. But here for our purpose, we choose to use the complex neural network approach.
on particles v 1 , v 2 , v 3 , the corresponding quantum states is of the form Ψ(v 1 , · · · , v 9 ) = Φ 1 (v 1 , v 2 , v 3 ) × · · · × Φ 7 (v 7 , v 8 , v 9 ) which is obviously a quasi-product state. It worth mentioning that the number of layers of network should not be too large, otherwise the size of local cluster |C| of the corresponding states will be comparable with the system size N , which will break the locality constraint.

C. Entanglement area law of local quasi-product states
Entanglement entropy is a crucial theoretical tool for investigating quantum many-body systems. We now establish the entanglement area law of local quasi-product states, since local restricted Boltzmann machine states and local feed-forward neural network states are special cases of local quasi-product states, they all obey the entanglement area law. We prove that in arbitrary spatial dimensions the local quasi-product states obey the entanglement area law for arbitrary connected bipartition of the system. More precisely, we have the following theorem: Theorem 1. For an N-particle system S, suppose that |Ψ = s1,··· ,s N Ψ(s 1 , · · · , s N )|s 1 ⊗ · · · ⊗ |s N is a K-local quasi-product quantum states, then the Rényi entropies of the reduced density matrix ρ A = Tr A c |Ψ Ψ| with respect to the bipartition S = A A c satisfies the following area law where Area(A) denotes the number of particles on the boundary of A and ζ(K) is a scaling factor only depends on the size of local cluster K.
Proof. We first define three kinds of local clusters: (i) The clusters which only contain particles in A (as C int in Fig.  3), called internal clusters; (ii) The clusters which only contain particles in A c (as C ext in Fig. 3), called external clusters; and (iii) The clusters which contain particles both in A and A c (as C bd in Fig. 3), called boundary clusters. We will denote the set of particles contained in boundary clusters as bd are all boundary clusters, and similarly for ∂A c . The interior of A, denoted as IntA, is defined as IntA = A\∂A; the exterior of A, denoted as ExtA, is defined as ExtA = A c \∂A c .

cluster, we have
ext )|A c , states |Φ L (∂A) are labeled by particles contained in ∂A and states |Φ R (∂A c ) are labeled by the particles contained in ∂A c . If each local system s i can take p values, there are at most p |B| terms contained in summation of Eq. (5). We stress here that |B| only depends on the area AreaA and cluster size K, more precisely, |B| ≤ R × AreaA. Therefore after tracing out the A c part, we get ρ A with rank at most p |B| , thus the Rényi entropy S α (A) is upper bounded by ζ(K)AreaA. For a physical system with a fixed background spacetime, particles only interact with their neighborhoods, the strength of the interaction usually decays to zero if particles are sufficiently far apart. Thus for any manybody physical system, there is a preexisted geometry which characterizes the distance of two particles. However, when we write down a many-body quantum state, this part of geometrical information is usually erased in some sense. For a multipartite quantum system S, we have a state |Ψ S which encodes the full information of the system. To understand entanglement feature of |Ψ S , we first divide S into two parts A and A c , note that Rényi entropy S α (A) of the reduced density matrix ρ A quantifies the entanglement between A and A c , then we can define entanglement feature of |Ψ S as the set of Rényi entropies S α (A) over all entanglement subsystem A. The entanglement feature of the system usually contains the information of the geometry of the system, namely, using the entanglement-geometry correspondence (duality), we can recover the geometry information from entanglement features.
Since the paradigm of the neural network is to adjust the connection weights, when weights are zero we say there is no connection between two neurons. It is then natural that the geometry of the neural network reflects in the connectivity of the network. In the spirit of entanglement-geometry correspondence, the entanglement is then encoded in the connectivity of neural networks of the state. This is consistent with the intuition we get from tensor network states, for which we can easily read out entanglement properties from the geometry of the tensor network. Since locally connected neural network states are all local quasi-product states, thus they all obey the entanglement area law. The locality of the states is encoded in the connection pattern of the neural network, which agrees well with our intuition. This kind of construction will also be useful for understanding the entanglement-geometry correspondence [49], such as Ryu-Takayanagi formula in a discrete form [11,25,26].

III. ENTANGLEMENT FEATURES OF DEEP NEURAL NETWORK STATES
In this part, we investigate the entanglement properties of the deep neural network state, we will take the deep Boltzmann machine (DBM) as an example. Although many progress of RBM states has been made, the DBM states are less investigated [30]. There are several crucial reasons why we need deep neural network rather than shallow one: (i) The representational power of shallow network is limited, there exist states which can be efficiently represented by deep neural network while the shallow one can not represent [30]; (ii) Any Boltz-mann machine (BM) can be reduced into a DBM, this also makes some limits in usage of shallow BM (with just one hidden layer, viz., RBM) [30]; (iii) The hierarchy structure of deep neural is more suitable for encoding holography [49,54,55] and for procedure such as renormalization [56]. Now let us take a close look at the geometry of a DBM neural network. Since we can reduce a DBM with M hidden layers into a DBM with only two hidden layers by folding trick [30] (see Fig. 4), it is sufficient to consider the deep neural network with only two hidden layers. The procedure is the same as what has been done for shallow neural networks. The visible layer consists of physical variables (visible neurons), thus the geometry is given by the fixed background geometry (e.g., the lattice structure of the system). The geometry of the shallow and deep hidden layers are just duplicated from the visible lay geometry. Then we can define the distance between neurons not only in the same layer but also in different layers. For a given neuron h, the ε-neighborhood B(h; ε) is defined as the the disk region centered at h and with radius ε. An ε-local neural network is the one where each neuron only connects neurons in their ε-neighborhood, the maximum connecting number of each neuron is K, we call the network a local K-connecting (or K-local) DBM, see Fig. 1 for an illustration.

A. Entanglement area law
Here we show that the geometrical information, locality, of the deep neural network, also results in the area law of the entanglement entropy.
Theorem 2. For any K-local DBM state |Ψ , Rényi entropy of the reduced density matrix ρ A = Tr A c (|Ψ Ψ|) satisfy the following area law where Area(A) denotes the number of particles on the boundary of A and ζ(K) is a scaling factor only depends on local connection number K.
Proof. Inspired by the work of Deng et al. [28], here we establish the area law by explicit construction. For K-local DBM states, we can group connections into several sets using the hidden neurons in the first hidden layer. Note that for DBM states, the co- where sum is over hidden neurons h = (h 1 , · · · , h m ) of first hidden layer and g = (g 1 , · · · , g l ) of second hidden layer and all hidden neurons are assumed to take values 0, 1 here (of course, for the case of taking values ±1, the result also holds). Then, like what have usually been done for area-law tensor network states, we need to factorize the coefficients into a partial prod- in which Φ j = Φ j (v j1 , · · · , v j K ; g j1 , · · · , g j K ) = hj =0,1 exp{h j (b j + i; ij W ij v i + k; kj W kj g k )}, v j1 , · · · , v j K are (at most) K visible neurons connected to h j , and g j1 , · · · , g j K (at most) K hidden neurons connected to h j in deep hidden layer (where we have used the assumption of K-locality). Now using an important trick of Ref. [28], the visible layer neurons can be divided into six groups: A 3 consists of visible neurons which connect A c part by a hidden neuron, A 2 consists of visible neurons connecting neurons of A 3 via a hidden neuron, and A 1 = A \ (A 3 A 2 ); similarly, we can define A c 3 , A c 2 and A c 1 . Obviously, A (A c ) is the disjoint union of A 1 , A 2 and A 1 (A c 3 , A c 2 and A c 1 ). Similar division can be applied to the deep hidden layer (since the layer has a fixed background geometry same as visible layer), we first assume that the corresponding bipartition of the layer is B B c , the deep hidden neurons are then grouped into six parts: we denote the set of shallow hidden neurons which connect A 3 and A c 3 (also B 3 and B c 3 ) as C Bd , ones connect A 1 and A 2 (also B 1 and B 2 ) as C Int and ones connect A c 1 and A c 2 (also B c 1 and B c 2 ) as C Ext . We can introduce the state then the state Ψ can be decomposed as Tracing out the A c part, we get ρ A is the weighted sum of several one-dimensional projectors (which are not necessarily orthogonal), via Gram-Schmidt orthogonalization, it is clear that the rank of ρ A is upper bounded by a function f (K) only depends on K. Since the Rényi entropy of a density matrix take the maximum value only if all eigenvalues p i are equal, i.e., p 1 = · · · = p r = 1/r ≥ 1/f (K) with r the rank of matrix, we then complete the proof.
Let us give some intuitive explanation about the construction. Since each visible neuron is correlated with neurons in ε-neighborhood, we can regard ε as the correlation length of the state, the correlation between A and A c comes predominantly from visible neurons sit in 2εstrip around the boundary ∂A. Then the visible neurons deep in the region A can not be correlated with visible neurons deep in the region A c , only neurons near the boundary contribute to the Rényi entanglement entropy S α (A) and thus result in the area law.

B. Entanglement volume law
As we have proved above, the locality of the neural network reflects in the area law of entanglement features. It is natural to ask what about the neural network with nonlocal connections. We can expect the fully connected DBMs exhibit entanglement volume law [57][58][59], actually this is the case. In contrast to tensor network for which the efficiency 3 strongly depends on the validity of the entanglement area law of the state, the neural networks are still efficient in representing many-body states obeying volume law. As has been pointed out in Refs. [29,30], shallow neural networks are capable of some critical-system states obeying entanglement volume law. We can trivially add a deep hidden layer and give some trivial connections to make a deep neural network exhibit volume law. Despite the triviality of the construction, we want to stress some crucial point of the volume-law neural network: (i) there must be some nonlocal connections in the neural network architecture, which is the origin of the volume law entanglement; (ii) the representation is efficient, i.e., the number of hidden neurons and connections increases at most polynomially at the number of visible neurons.
The volume-law DBM states have a close relationship with the maximally multipartite entangled states [60]. The philosophy behind the construction is that we can make the particles in the smaller region A fully correlate with its complementary A c such that all information of A is encoded in A c in some way, then ρ A is proportional to identity matrix 1 Vol(A) with order Vol(A) (where Vol(A) denotes the number of particles contained in region A), which further implies that Rényi entropies satisfy the volume law. Another important issue is that the number of the neural network is closely related to the entanglement properties of the corresponding neural network states. A neural network state with more hidden layers will tend to exhibit the volume law entanglement. This can be seen from feed-forward neural network more easily, if the number of the hidden layers increases, eventually, the size of the local clusters will be comparable with the system size, this will break the entanglement area law.

IV. UNDERSTANDING THE POWER OF NEURAL NETWORK USING RÉNYI ENTANGLEMENT ENTROPY
The success of neural network achieved in tasks like image classification problem suggests that to understand the power of the neural network we will need to establish a new information theory of the functions of images rather than just of the image itself. In fact the classification problem shares great mathematical similarity with the quantum spin model [61]. The images correspond to the spin configurations and the target function of the image classification problem corresponds to the wave function of the spin system. The functions f of the images with N = L × L pixels form a Hilbert space, and by normalization, these functions are in one-to-one correspondence with the wave functions of a L × L quantum spin model, thus the notion of Rényi entanglement entropy S α (A) of a connected subregion of image makes sense. As we will see, this entanglement entropy can be used as a measure of the difficulty for approximating a function. In general, functions obey the entanglement volume law need O(2 N ) parameters to approximate, while functions obey entanglement area law can be approximated using a neural network with poly(N ) parameters. Here we will argue that entanglement entropy of target functions of reasonable image classification problems obey the entanglement area law.
We known from quantum spin model that to represent a general quantum state using, e.g., tensor network and neural network, O(2 N ) parameters are needed [57][58][59], but with the locality constraint of the Hamiltonian, polynomially many parameters are sufficient to represent the ground state, this is characterized by the area law of entanglement entropy of these states [12].
Following the Ref. [61], here we first present the explicit definition of image classification problem. The case we will consider here a is two-label classification problem It is extremely difficult to solve the optimization problem directly. In the neural network approach, function f is represented by a neural network (with fixed architecture) with parameter set Ω = {w ij , b i }, i.e., f (I) = f (Ω, I), the cost function then becomes a function of these parameters C T [f ] = C T (w ij , b i ), algorithms like gradient descent method can then be applied to find the minimal value.
Let us see an example of the image classification problem for which an exponentially large number of parameters are needed to approximate the corresponding target function. Suppose that we have randomly generated a set of L × L-pixel images and set them as the target set T of images, then T does not have any pattern at all. It turns out that this problem can only be solved with exponentially many parameters and the target function obeys the volume law of entanglement [57][58][59]. By contrast, problems whose target set has an intrinsic pattern can be solved using polynomially many parameters. For example, determining whether a given image only contains loops or not can be solved efficiently and the target function obeys the entanglement area law (which in fact corresponds to the toric code model in quantum spin models [45]). It is natural to ask what kind of conditions the target set of images must satisfy to make the classification problem can be solved efficiently. In Ref. [61], two important conditions are given: (i) For images I and I , consider two connected regions A and A c , if I and I are the same in region in A c then they must be the same on the boundary of A; (ii) For two regions A I and A I of images I and I , the number N I,I of possible images, for which A c I and A c I are the same, only depends on the B-range part of boundary of A I and A I . These two conditions characterize the property of smoothness of the images in some sense, we will refer to this kind of problem as a locally smooth image classification problem. From the above discussion, we may come to the following result, where ζ is a scaling factor depending on B and not depending on the size of the images (number of pixels).
This can be proved straightforward by calculating the density matrix corresponding target function f T , and by tracing the region A c part, we obtain a density matrix ρ T (A) with rank at most 2 B×Area(A) . Therefore we have the Rényi entanglement area law.

V. CONCLUSION AND DISCUSSION
In this paper, the authors establish the entanglement area law of the shallow and deep neural network states. By introducing the notion of locality into the neural network representations of quantum states, we can see that the resulting local neural network states obey the entanglement area law. It is worth mentioning that there are some subtle issues in the construction here.
The first crucial issue we want to discuss is the topology of the neural network states. Each L-layer neural network architecture corresponds to a L-partite graph G, in a L-partite graph, the vertex set E(G) are divided into L disjoint subsets, which correspond to the different layers of the neural network, and there is no edges in each of these subsets (no intra-layer connections in neural network language), and we say that two neural networks are equivalent if the corresponding graphs are isomorphic, we denote the isomorphic class of neural network as G. The neural network in an isomorphic class have the same representational power and the corresponding quantum states have completely the same physical properties. In our construction, we fix the geometry of the physical layer of the neural network, and use this fixed background geometry, we introduce the notion of locality into the neural network, since every neural network which is equivalent to this network with the fixed background geometry have the same topology, thus the properties of the state only depends on the topology not geometry of the state. This means that each time we want to study the neural network states represented by neural networks given by the equivalent class G, we can choose a representative from G without loss of generalities, and for convenience, we can always choose the one with fixed background geometry.
Another issue we want to stress is the neural network approach to AdS/CFT correspondence, or more precisely, entanglement-geometry correspondence in this context. To realize the Ryu-Takayanagi formula, we first construct a neural network whose physical layer corresponds to the boundary physical freedoms, and the bulk geometry is given by the hidden layers and connections between neurons. The essential idea behind this approach to holography (or entanglement-geometry corre-spondence) is that the entanglement feature is encoded in the neural network geometry. That is, the bulk geometry is given, and the neural network is tiled on the background bulk geometry, the structure of the neural network provides the required entanglement features on the boundary which is exactly the dual of the bulk geometry. Note that in Ref. [49], the inverse problem of the above problem is investigated, where they explored how to use given entanglement features of a state to determine the optimal holographic geometry. These topics will be left for our future studies. During the preparation of our manuscript, we notice that a related work was made available [55].