Compact Neural-network Quantum State representations of Jastrow and Stabilizer states

Neural-network quantum states (NQS) have become a powerful tool in many-body physics. Of the numerous possible architectures in which neural-networks can encode amplitudes of quantum states the simplicity of the Restricted Boltzmann Machine (RBM) has proven especially useful for both numerical and analytical studies. In particular devising exact NQS representations for important classes of states, like Jastrow and stabilizer states, has provided useful clues into the strengths and limitations of the RBM based NQS. However, current constructions for a system of $N$ spins generate NQS with $M \sim O(N^2)$ hidden units that are very sparsely connected. This makes them rather atypical NQS compared to those commonly generated by numerical optimisation. Here we focus on compact NQS, denoting NQS with a hidden unit density $\alpha = M/N \leq 1$ but with system-extensive hidden-visible unit connectivity. By unifying Jastrow and stabilizer states we introduce a new exact representation that requires at most $M=N-1$ hidden units, illustrating how highly expressive $\alpha \leq 1$ can be. Owing to their structural similarity to numerical NQS solutions our result provides useful insights and could pave the way for more families of quantum states to be represented exactly by compact NQS.


Introduction
The quantum many-body problem encapsulates numerous fascinating and technologically relevant phenomena even in its simplest instance of localised spins, where effective interacting lattice Hamiltonians can give rise to antiferromagnetism, frustration and topological spin liquids [1,2].Capturing these collective many-body effects in numerical calculations is a striking challenge due to the exponential growth of their Hilbert space C 2 ⊗ C 2 ⊗ • • • ⊗ C 2 with system size.One strategy to overcome this 'curse of dimensionality' is the variational method where an approximate representation of a system's ground state is found within a class of trial states defined by a tractable number of parameters [3][4][5].To be successful some physical insight is often needed when devising trial states, so they can capture expected correlations, while also having a functional form that allows key observables, like energy, to be efficiently computed.The major issue is that variational calculations are inevitably biased by the trial state and can inadvertently predict the wrong physics.To avoid this, sophisticated trial states are needed whose expressibility, and thus number of parameters, can be systematically enlarged and in principle can even become exact in some extreme limit.
A powerful example of such a variational approach are tensor network states [6][7][8][9].These work by decomposing the amplitudes of a quantum state into a network of connected multidimensional arrays of complex numbers that form the variational parameters of the trial state.Tensor network states have several highly useful properties.First, efficient deterministic numerical algorithms have been devised for their direct optimisation.Second, the number of variational parameters, governed by the size of the tensors and the geometry of their connections, can be directly related to the entanglement between subsystems of the state, which often obeys an 'area law' [10].Consequently tensor networks possess a rather broad form of variational bias to low entanglement states.These properties have proven to be extremely effective for systems in one spatial dimensional (1D) where matrix product states (MPS) [6,9] have allowed a vast range of physics to be accessed with unprecedented accuracy [11].For 2D systems projected entangled pair states [7,8], tree tensor networks [12,13] and multiscale entanglement renormalization ansatz [14] have attempted to replicate this success.However, the scaling of the computational cost of these tensor network algorithms with the number of variational parameters, while polynomial and hence formally efficient, are nonetheless severe enough to render accurate calculations extremely demanding.It is an active field of research to devise schemes to reduce this cost [15,16].
In recent years there has been considerable interest in adapting techniques from machine learning to help tackle many-body physics problems [17][18][19].A popular approach is to use generative models based on artificial neural networks (ANN) which have been especially successful in taming the curse of dimensionality for conventional 'big data' problems, allowing complex patterns and abstractions to be identified [20].This ability has strongly motivated applying ANN to efficiently encode quantum many-body states.A direct way to accomplish this, introduced by Carleo and Troyer [21], is to view an ANN itself as many-body ansatz wavefunction in which the couplings between neurons act as complex variational parameters that are stochastically optimised.This neural-network quantum state (NQS) approach is applicable to any number of spatial dimensions and can leverage the differing strengths of the numerous variants of ANN known in machine learning.After the original work [21] based on a restricted Boltzmann machines (RBMs) architecture [22] with complex parameters, subsequent studies investigated their generalisation to deep Boltzmann machines [23][24][25], as well as feedforward [26][27][28][29], convolution [30][31][32][33] and recurrent neural networks [34].A schematic of the relation between Jastrow and stabilizer states.Graph states lie within the intersection of these two distinct classes of states.By applying graph theoretic tools to Jastrow states we propose a new larger class of states called vertex modified Jastrow (VMJ) states in which arbitrary single-spin gates are applied specific spins.Although a modest generalisation of Jastrow we show this is sufficient to capture stabilizer states and provide a simple procedure for constructing compact NQS.
In this work we focus exclusively on NQS based on the venerable RBM architecture for several reasons.First, they have a simple shallow structure comprising only two layers of neurons, a visible layer with N units and a hidden layer with M units.Second, it is well known they can capture arbitrary functions once M scales exponentially with N [35], meaning RBMs are an exhaustive ansatz and suggests they could be an effective weakly biased ansatz for approximations with a tractable M. Third, RBMs can host states exhibiting volume-law entanglement scaling [36] indicating a quite distinct representational power compared to tensor networks, despite the rich conceptual connections between them [37][38][39].Finally, since they have no intralayer couplings RBMs allow for efficient sampling making them extremely well suited to stochastic optimisation within variational Monte Carlo (VMC) [3][4][5].
As such NQS wavefunctions have been applied to wide-ranging problems including frustrated spin systems [40,41], interacting Fermi [42] and Bose systems [43,44], simulating quantum circuits [45][46][47], as well as describing states with topological order [48][49][50] and non-abelian symmetries [51].Moreover, they have proved to be very effective at enhancing [37] more traditional pair product wave functions in a hybrid approach for both fermionic [52] and spin systems [53], and have also been successfully generalised for open quantum systems [54][55][56][57].Complementary to numerical studies, which essentially treat NQS as a blackbox of amplitudes, are analytical studies directly examining their expressive power by constructing exact representation of broad and relevant classes of quantum states.Initial work in this direction focused on important specific cases like the Toric code states and a symmetry-protected cluster state [58] to show how topological order can be captured, as well as specially designed graph/stabilizer states to demonstrate analytically volume-scaling entanglement [36].A construction to represent any graph state as an NQS then followed [23].This was then generalised to weighted graph states [37], spin Jastrow states, which encompass all (weighted) graph states including Laughlin-like states describing chiral topological phases [37,48,49], as well as the entire class of stabilizer states [59][60][61][62] containing quantum error correcting codes like the Toric code states.Even more exotic forms of topological ordering based on hypergraph states and XS-stabilizer states have also been shown to have exact NQS representations [61].All these studies have enriched our understanding of how features like the sign and nodal structure as well as non-local correlations can be encoded within NQS.
As illustrated in figure 1, Jastrow states [63] and stabilizer states [64] both form large distinct classes of states defined in general by O(N 2 ) parameters and whose intersection contains graph states [65].Currently known explicit NQS constructions encode these states in M ∼ O(N 2 ) hidden units with the sparsest possible connections [23,48,49,[59][60][61][62].Yet the expressiveness of NQS depends sensitively on the pattern of connections, so numerical optimisations often exploit the most general hidden units possessing system-extensive connectivity.The implications of these exact representations for numerical calculation is therefore unclear.One might attempt to improve the efficiency by directly mimicking their specialised sparcity structure, although this nullifies the flexibility of NQS that originally motivated them and introduces significant bias.Conversely, if these exact constructions do genuinely represent the sparcity and complexity of hidden units needed for Jastrow and stabilizer states, then they also suggest that they are rather non-typical states compared to those that commonly emerge from numerical calculations.Moreover, it would mean that if this sparcity structure was not known in advance, then these classes of states are prohibitively expensive to numerically 'learn' since the optimisation problem would begin with an unfavourable O(N 3 ) number of parameters [49].However, the non-uniqueness of NQS representations together with a parameter counting argument strongly suggest that Jastrow and stabilizer states should be representable exactly by an NQS with only M ∼ O(N ) system-extensive hidden units.
In this paper we confirm this conjecture.Our result is built using vertex modified Jastrow (VMJ) states, which are a new modest generalisation of Jastrow states in which a specific subset of spins can have arbitrary single-spin gates applied to them.Crucially, the enlarged class of VMJ states encompasses stabilizer states, as shown in figure 1.By combining graph theoretic concepts with tensor network diagrammatics for NQS states [37] we show how to construct NQS representations of VMJ states with at most M = N − 1 hidden units.The systemextensive connectivity of this new NQS representation enhances its relevance for numerical calculations and highlights the role of strong visible-hidden unit correlations.In particular it indicates that NQS optimisations with a very low hidden unit density α = M/N < 1 should be able to capture and improve on Jastrow states, thereby providing an efficient and systematic way to go beyond this commonly used variational class.Indeed this dramatic ability of α < 1 NQS to exactly represent complex classes of quantum states resonates with recent work [66] revealing their ability to capture several orders of a perturbative expansion.We coin the term compact NQS for α 1 to highlight this acute compression in hidden unit complexity.
The structure of this paper is as follows.In section 2 we give an overview of the manybody problem, the variational method and NQS in their original complex RBM formulation.In section 3 we discuss the conventional M ∼ O(N 2 ) construction of NQS for Jastrow states, before modifying it to arrive at a compact NQS based on perfect hidden-visible correlations and demonstrating its emergence via numerical optimisation for the one-dimensional XXZ spin chain.In section 4 we outline diagrammatic tools that allow NQS to be recast as a tensor network and identify a number of properties, most crucially when arbitrary single-spin gates can be readily absorbed into an NQS tensor network.In section 5 we use the tensor network approach to identify a canonical form for Jastrow state NQS, introduce some graph theoretic concepts, and use these to define VMJ states and their general compact NQS representation.In section 6 we review graph and stabilizer states, prove that any stabilizer state can be described as a VMJ-NQS, and illustrate this analytically for several important special cases.Finally, in section 7 we conclude and discuss some open problems.

Neural-network quantum states
In this section we briefly introduce the quantum many-body problem and VMC, before reviewing NQS in terms of complex RBM as a powerful ansatz for this approach.

Quantum many-body problem and variational approach
In this work we will focus on physical systems composed of N spin- 1  2 particles each described by a vector of Pauli operators ( X j , Ŷ j , Ẑ j ) for j = 1, 2, . . ., N and a local basis where v j ∈ {+1, −1}.The z basis for the full system is then and any many-body quantum state can be expanded in this basis as via its complex amplitudes Ψ(v).It will also be convenient on some occasions to label the z basis instead using a 'qubit' binary string q = (q 1 , q 2 , . . ., q N ) ∈ {0, 1} N where v = (−1) q = 1 − 2q.
Our central problem is to solve the eigenvalue problem Ĥ | Ψ 0 = E 0 | Ψ 0 for the ground state | Ψ 0 of a system governed by an interacting Hamiltonian Ĥ comprised of products of the Pauli operators over two or more spins.Given that the expectation value of any observable Â for a general (unnormalised) state | Ψ is the variational approach reformulates this problem as the minimisation of the energy E = Ĥ Ψ over the exponentially many amplitudes Ψ(v).Performing this task exactly is only feasible for small systems N ∼ O (10) [67].
The curse of dimensionality is circumvented by instead restricting the optimisation over a specialised class of states | Ψ p , dependent on parameters p, whose number scales polynomially with N. The variational principle E 0 E p 0 = min p Ĥ Ψ p is then used to determine the 'best' approximation Ψ p 0 within the ansatz for | Ψ 0 .Typically even for simple observables A Ψ p cannot be evaluated exactly from variational ansatzes for many-body systems [4,5] and so Monte Carlo sampling is employed giving the VMC approach [3].A crucial feature of VMC is that only ratios of amplitudes for an ansatz Ψ p (v)/Ψ p (v ) between different spin states | v and | v are needed for Markov chain sampling, so we can safely ignore the normalisation of quantum states in this work.

Restricted Boltzmann machine formulation
NQSs are a promising approach for constructing highly efficient and accurate approximate representations of the exponentially many amplitudes Ψ(v).The structure of NQS is motivated from classical probabilistic models called RBMs commonly used in machine learning [21].In these models there are two sets of units, N visible units representing the system itself, and M hidden units which are additional degrees of freedom to be marginalised.Amplitudes of NQS then follow from a complex Boltzmann-like ansatz Figure 2. The bipartite graph G RBM of interaction weights w between hidden and visible units in an RBM with edges shown as solid arcs (see section 3).For completeness the biases a and b on each unit are depicted here as self-loop edges, but are given by dotted arcs since these are not included in a simple RBM graph.
characterised by a classical 'energy' function describing the interactions between the visible units, with the physical configuration v, and hidden units, with configuration h = (h 1 , . . ., h M ).The geometry of an RBM is shown in figure 2.
An NQS is thus defined by the MN + M + N complex parameters λ = {a, b, w} in the energy function, comprising an N-dimensional vector a of visible biases, an M-dimensional vector b of hidden biases, and M × N matrix w of weights.Assuming the hidden units are also described by Ising-like variables h j ∈ {+1, −1} then after tracing over them we obtain the well-known amplitude expansion [21] Ψ which is particularly well-suited to variational optimisation.It will prove useful for later that we introduce the following terminology: Definition 1 (Univalent visible unit).The number of hidden units connected to a given visible unit in an NQS is called its valency.Any visible unit connected to a single hidden unit is said to be univalent.The number of such visible units in an NQS is the univalency of the representation.
Although equation ( 3) is based on a complex Boltzmann-like form with E λ (v, h) restricted to pairwise visible-hidden interactions only, tracing out the hidden units generates an effective energy function for the visible units alone (v,h) , ( 6 ) that can contain much more complicated higher-order interactions.This expressiveness is controlled by increasing the number M of hidden units allowing for complex correlations within Ψ λ (v) to be encoded.The NQS ansatz is exhaustive in the sense that for M ∼ 2 N arbitrary states Ψ(v) can be described exactly [35].However, practical NQS representations instead have scaling M ∼ poly(N ) which can be sampled in a formally efficient way, including exact representations of Jastrow [37,48,49], graph [23], stabilizer [59][60][61][62], hypergraph and XS-stabilizer states [61].Numerical calculations have demonstrated that many important interacting systems have ground states that can be well approximated by an efficient NQS with α ∼ 32 feasible with VMC [68].However, if such calculations involve states where M scales superlinear in N then they can quickly become impractical.This strongly motivates understanding NQS that have exact representations with M ∼ O(N ), and even more strictly compact NQS defined as: Definition 2 (Compact NQS).An exact NQS representation of a state for a system size of N is said to be compact if it requires M N hidden units.

Jastrow states
One of the most well-established and intuitive variational ansatz are so-called Jastrow states.
Originally devised for wavefunctions in the continuum [63], Jastrow states work by imprinting pairwise correlations on top of a reference state of independent particles.Indeed, one of the earliest VMC calculations performed for a many-body quantum system utilised Jastrow states to study Lennard-Jones interacting bosons modelling liquid 4 He [69].For discrete systems like spins a general definition of Jastrow states involves a graph.Specifically, a graph G is defined by the set V, which is a finite-sized subset of N labelling vertices, and a set E(G) comprising two-element subsets (a, b) of V labelling edges [70].We will only consider undirected edges so the order of vertices in (a, b) is of no relevance.However, when summing over elements of E(G) it is convenient to use the convention that (s, t) denotes the source s and target t vertices that obey s < t.There is a one-to-one correspondence between a graph G and its symmetric binary |V| × |V| adjacency matrix Θ defined as Θ ab = 1 if {a, b} ∈ E and Θ ab = 0 otherwise.We will draw a graph with vertices as bordered squares and edges by arcs joining pairs of them, for example as (7) and as seen earlier in figure 2 where we had two classes of vertices, visible and hidden.In the following we will consider simple graphs, so they contain no self-loops or multiple edges between the same vertices.In addition to this, and without loss of generality1 we will concentrate on connected graphs where any two vertices a and b are always connected by at least one sequence of edges (e.g. a path) in G.
Given a graph G over |V| = N visible units Jastrow states have amplitudes that are conveniently parameterised in a complex Boltzmann-like form as Figure 3.The fully connected graph G fc of interaction weights V between visible units in a generic Jastrow state.For completeness the visible biases a each unit are depicted here as self-loop edges with dotted arcs.
in terms of an energy function Consequently, a Jastrow state is defined by the N + |E(G)| complex parameters η = {c, V}, comprising N visible biases c and |E(G)| non-zero elements V jk for j < k of a strictly upper triangular N × N matrix V of pairwise interactions between visible units.In contrast to NQS the form of Jastrow states is severely limited by its two-body energy function for visible units in equation ( 9).As such Jastrow state amplitudes are simply the product of arbitrary two-spin states between all pairs connected by an edge in G as Jastrow states are therefore most expressive for a fully connected graph G fc , as shown in figure 3. Given their similar forms of parameterisation it is natural to ask how to convert a Jastrow state into an NQS and what their hidden unit complexity is.

Compact NQS for Jastrow states
A direct and commonly advocated [23,25,49,52] mapping of a Jastrow state into an NQS proceeds by mediating each pairwise Jastrow interaction V jk via an interaction with a hidden unit h i .This is accomplished by solving the expansion for each Jastrow interaction term in equation ( 9).There are many solutions for the hidden unit weights w i j [23][24][25]71], such as ) , and or a symmetric solution derived from the matrix square-root as Inserting the decomposition equation ( 11) into equation ( 8) we arrive at a complex RBM formulation Since this NQS requires M = |E(G)| hidden units, each with the smallest receptive field of just two visible units [49], we will call it a sparse-Jastrow NQS.For the most expressive Jastrow state G = G fc we thus need M = 1 2 N(N − 1) hidden units in total.There is good reason to suspect that this sparse-Jastrow NQS significantly overestimates the hidden unit complexity of Jastrow states.In particular an NQS with M = N hidden units, each having system-extensive connectivity, already has N 2 weights w, which from a pure parameter count argument should be sufficient.Our first result indeed confirms this is the case:

Lemma 1 (A compact Jastrow state NQS). Any Jastrow state for N spins can be represented exactly by a compact NQS with M = N system-extensive hidden units in which each hidden unit has a perfect correlation with a unique visible unit.
Proof.The effective energy function in equation ( 6) for a complex RBM can be written as We take M = N hidden units and a square interaction weight matrix w where the diagonal elements are w ii = S 1. Focusing on the argument of the logarithm in equation ( 16), we separate out the diagonal term w ii , evaluate the sum over h i as h i = v i and h i = −v i and define Inserting this expansion for each hidden unit factor in equation ( 16) gives where the last term generates multi-body interaction terms between visible units.However, in the limit S → ∞ these terms vanish giving a purely two-body effective energy function The panel shows a zoom of the N dependence for S = 2, which is illustrative of all values of S examined.The data and scripts used to create these plots in MATLAB can be found in reference [75].
after dropping the irrelevant constant NS.Thus, E eff,∞ (v) reproduces exactly the Jastrow amplitudes equation ( 8) once a + b = c and w = 1 2 (V + V T ) + lim S→∞ S N×N with only M = N hidden units.Like the sparse construction the hidden biases b are not needed and both have an identical number N(N − 1) of non-zero weights.However, in contrast to the sparse construction these weights have been concentrated into hidden units with the largest possible receptive field spanning the entire system.As such we will denote this compact NQS as an extensive-Jastrow NQS.
The crucial step in the extensive-Jastrow NQS construction was the introduction of a diverging interaction weight w ii = S.This serves to perfectly correlate each hidden unit h i with one unique visible unit once an overall scale factor is dropped.Owing to the presence of diverging weights w ii one might reasonably question whether the extensive-Jastrow NQS are numerically pathological compared to the sparse-Jastrow NQS.We now investigate this with the help of a non-trivial numerical example.

Numerical example-XXZ spin-chain
Consider the XXZ model for a spin-1 2 chain with an anisotropy Δ and periodic boundary conditions N + 1 ≡ 1.This model displays three distinct ground state phases: (i) a ferromagnetic phase (Δ −1), (ii) a critical phase The data and scripts used to create these plots in MATLAB can be found in reference [75].
(−1 < Δ 1) and (iii) a gapped AF phase (Δ > 1).The following translationally invariant antiferromagnetic spin-Jastrow state derived from a chiral boson conformal field theory (CFT), has been found to be a very good approximation of XXZ ground states for Δ > −1 with an overlap (after restoring normalisation excess of 99% for the majority of the critical region [72].It comprises a δ( N j=1 v j ) contribution to enforce zero-z-magnetization constraint, j∈odd v j to imprint the Marshall sign rule, and a product over positive-definite Jastrow factors controlled by the single positive parameter α related to the conformal dimension.The CFT state is exact for Δ = −1 (α = 0) and Δ = 0 (α = 1 4 ), while its numerical minimisation for −1 Δ 1 is well approximated by α = 1 2π arccos(−Δ), as shown in figure 4(a).For Δ = 1 (α = 1 2 ) the CFT state is not exact and instead reduces to the well known Haldane-Shastry state [73,74].
The extensive-Jastrow NQS solution is rendered numerically benign by retaining a finite diagonal weight w ii = S. Specifically, given Jastrow interactions V we consider softened weights with S < 10.The Δ = 0 XXZ ground state provides an ideal test case for this softened extensive Jastrow NQS.In figure 4(b) we show the deviation of the overlap 1 − O between these states as a function of S for a sequence of increasing system sizes N. Two features are evident.First, the overlap converges to unity exponentially with increasing S, consistent with the scaling of the multi-body terms in equation (17).Second, there is a weak decrease in O with increasing N, but this is easily controlled by the moderate values of S considered.Together this demonstrates the robust accuracy of the extensive-Jastrow NQS construction away from the formally exact S → ∞ limit.
To confirm the practical utility of the softened extensive-Jastrow NQS we performed VMC optimisation on an NQS for an N = 20 XXZ chain.For numerical stability a parameter cap of p cap = 5 was applied.As is standard when applying VMC to the XXZ chain, the zero magnetisation constraint was enforced directly within the Monte Carlo sampling, removing the need to explicitly describe this with hidden units.Moreover, a gauge transform Ĝ = j∈odd exp(−iπ Ẑ j /2) was applied to ĤXXZ to give so the new Hamiltonian has exclusively non-positive off-diagonal elements in the z basis, making it stoquastic.As a result its ground state is now guaranteed to have non-negative amplitudes in the z basis [76] allowing us to restrict the NQS to real parameters.We minimised the NQS at Δ = 0 for 1 M N using stochastic reconfiguration [5,77] with system-extensive hidden units possessing randomly initialised weights and biases far below p cap .In figure 5(a) we show 1 − O with the Δ = 0 exact Jastrow ground state as a function of M. The deviation in the overlap displays a precipitous drop off of 4 orders of magnitude before plateauing at M 18 above the softening limit seen in figure 4(b) due to finite sampling fluctuations.We take the 1 − O ∼ 10 −7 as indicative of converging on an exact representation for M < N. Interestingly the weights w i j found by this numerical solution, reported in figure 5(b), show that each hidden unit couples across the whole system and each interacts with a single unique visible unit with a weight that saturates p cap .These observations are consistent with the numerical optimisation 'learning' the softened extensive-Jastrow NQS structure with S = p cap , demonstrating it is indeed a practical and stable NQS solution.The extensive-Jastrow NQS also explains earlier numerical observations in reference [49] where an M ∼ O(N ) scaling was found to describe exactly the Jastrow ground state of a 2D square lattice governed by the Laughlin state's parent spin Hamiltonian.The same optimisation scheme was performed for Δ = 1 where the Jastrow CFT state is not exact.Here 1 − O again plateaus for M 18, but at a value orders of magnitude higher than at Δ = 0, as shown in figure 5(c).For the hidden unit numbers considered the NQS has not converged to the exact ground state.However, the NQS result does outperform the CFT state (Haldane-Shastry state) with 1 − O nearly 4 times smaller with the same number of hidden units.Examining the RBM interactions in figure 5(d) shows that hidden units still favour optimisation into a Jastrow-like form, but with an increasingly softened interaction.This deviation is expected since NQS is a more expressive ansatz than Jastrow precisely due to the higher-order correlations introduced by the softened hidden units.
The numerical results in figure 5(a) suggest that even Jastrow states defined on a fully connected graph can be described with M < N hidden units.While only a minor improvement from the M = N extensive-Jastrow NQS introduced already we will find that understanding this behaviour opens the path for a considerable generalisation of the states that can be exactly captured by compact NQS.

Tensor network formulation
A powerful alternative formulation of NQS views them instead as a tensor network allowing for diagrammatic rewrites of their components [37].In this section we outline some key features of tensor network diagrams, introduce an NQS tensor network motivated from the RBM graph in figure 2, and discuss diagrammatic observations about NQS tensor networks crucial for our main result.

Tensor network diagrams
The exponentially many amplitudes Ψ(v) can be viewed as an order-N tensor Ψ v 1 v 2 ...v N .Tensor network theory [6][7][8][9] is a versatile way of handling this structureless tensor by decomposing it into many lower order tensors contracted together in a network.Here we will make repeated use of tensor network diagrams that form an important analytical tool in this approach.These represent generic tensors of any order as a shaded circle • with protruding legs for each index its possesses.Given two order-3 tensors A abc and B xyz the contraction of them to form an order-4 tensor C abxz = α A abα B xαz is expressed as a graphical equation by joining the respective legs (21) For the most part we will only consider generic order-2 tensors, e.g.A ab which can be equivalently regarded as a matrix A. We will use other shapes or shapes with symbols inside them to represent tensors with special structure.In particular, we will use a dot • to denote graphically the so-called COPY tensor.This is an essential building block for sampleable tensor networks in which the amplitudes Ψ(v) can be exactly and efficiently evaluated [37].
The COPY tensor [78,79] is the multi-index equivalent of the identity matrix.Specifically, for the case of three indices, the COPY tensor has elements which are zero unless all its indices are equal.It is represented diagrammatically as (23) and it generalises straightforwardly to any number of indices.The name COPY tensor reflects that if any leg is contracted with a z basis 2 state | ↑ = ( 1 0 ) and | ↓ = ( 0 1 ) the same state gets copied to all the other legs, thereby factorising the tensor as (24) Terminating any leg with an equal superposition ) removes the corresponding leg giving a COPY tensor with an order reduced by one (25) The order-N COPY tensor alone expands as the sum of two product terms (26) equivalent to the z basis amplitudes of an N spin ferromagnetic GHZ state.As we shall see shortly this basic tensor will form the skeleton of NQS.
A key property of COPY tensors is the so-called 'fusion' rule which allows COPY tensors having one or more legs contracted together to be amalgamated into one COPY tensor, e.g. as (27) The rule also applies in reverse so a COPY tensor can be split up into an arbitrary network of connected COPY tensors with the same number of open legs.
A corollary of the fusion rule is that diagonal matrices can commute across the COPY tensor between any legs (28) Similarly, the X component of anti-diagonal matrices distribute across legs as (29)

NQS tensor network
The simple bipartite RBM graph in figure 2 readily motivates a corresponding tensor network representation [37].Specifically, each vertex is replaced by COPY tensor, each edge between the ith hidden and jth visible unit is replaced by a contraction with a generic 2 × 2 coupling matrix C (i j) , while vertices representing visible units have an additional open leg.Together this gives (30) For this and forthcoming diagrams the colour of a tensor only denotes a convenient visual grouping, and each order-2 tensor is otherwise distinct.We will call any tensor network sharing this structure an NQS tensor network.Dissecting the network we see that in isolation each hidden unit is similar to a GHZ state, extensive over the whole system in general, but deformed locally by its coupling matrices as (31) The variational parameters provided by each hidden unit are therefore encoded by its set of N coupling matrices Υ (i) = {C (i1) , C (i2) , . . ., C (iN) }, explicitly tabulated as As such the amplitudes of each hidden unit comprise two terms found by summing the product of coupling matrix elements selected by v along each row of equation (32) as The amplitudes of the full NQS tensor network then follow as the product of each of these hidden unit correlators which, like complex RBMs can be exactly and efficiently sampled.As discussed in reference [37] coupling matrices can be a useful intuitive tool for unravelling the correlations and structures a given hidden unit imprints on the amplitudes of the overall NQS.

Gauge freedom and RBM equivalence
The NQS tensor network contains 4MN complex parameters, compared to MN + M + N parameters for a complex RBM, suggesting these formulations are either not equivalent or that the NQS tensor network is over-parameterised.The latter is proven in the following result: Lemma 2 (NQS tensor network RBM equivalence).An NQS tensor network for N spin-1 2 particles comprising MN 2 × 2 coupling matrices C (ij) is equivalent to the complex RBM formulation with MN weights w and N + M biases a, b.
The proof is presented in appendix A. The key step follows from equation (28), which demonstrates that NQS tensor networks possess a gauge freedom, analogous to that of MPS [6], where (anti-)diagonal matrices can be shuffled between coupling matrices connected to the same COPY tensor.Given any coupling matrix can be decomposed into Boltzmann-like form as 0 0 e − bij e w i j e −w i j e −w i j e w i j e ãij 0 0 e −ã i j , complete equivalence to the complex RBM formulation in equations ( 3) and ( 4) follows from reshuffling the diagonal matrices [38].Armed with lemma 2 we will proceed using the tensor network formulation of NQS and exploit the diagrammatic rewrites it affords in analytic constructions.

Diagonal circuit unravelling
An insightful alternative rewiring of a general NQS tensor network is made by elevating each hidden unit correlator to a diagonal operator in the z basis so that the full NQS is constructed as where is the uniform superposition reference state.This is equivalent to diagrammatically unravelling each hidden unit in the NQS tensor network, using the COPY tensor fusion/splitting rule equation (27) and | + state termination rule equation (25), into the form of a non-unitary preparation 'circuit', as shown here for an NQS with M = 3 system-extensive hidden unit: (37) This circuit3 comprises entirely of diagonal two-spin operators and one ancilla spin per hidden unit that is initialised in | + and projected out into | + at the end.Since all operators commute their ordering in this circuit is irrelevant, reflecting the multiplicative structure of NQS.As each hidden unit involves a distinct ancilla the addition of more independently enhances the expressiveness of the ansatz.

Applying single-spin operators to NQS
Consider the following problem which is central to our main result.For an NQS initial state | Ψ NQS what is the NQS representation of Q | Ψ NQS where a local operator Q is applied to a single spin of the physical system?In general this is non-trivial optimisation problem requiring both parametric and structural changes to the NQS [45,46].However, the tensor network formalism reveals two special cases where the update is simple: Lemma 3 (Applying single spin operators to NQS).The application of a single-spin operator Q to an NQS can be captured exactly by only changing the NQS parameters for two special cases: (i) an arbitrary operator Q applied to a visible unit that is univalent in the initial NQS; (ii) an operator Q that is (anti-)diagonal in the z basis applied to any visible unit.
Proof.For case (i), where the visible unit j is only one connected to a single hidden unit i, then Q is easily absorbed into the coupling matrix as C (i j) → C (i j) Q, for example as: (38) For case (ii) Q can be applied to any spin since, via equations ( 28) and ( 29), it can be commuted past the visible unit's COPY tensor and absorbed into the NQS representation, for example as (39) irrespective of the visible units valency in the NQS.Lemma 2 allows these parameter changes to be expressed in terms of complex RBM parameters, if required.

Generalising compact Jastrow NQS
In this section we apply the tensor network formalism to Jastrow states revealing a new type of compact NQS representation and a novel pathway for generalising it to wider classes of states.

Tensor networks for Jastrow states
Reformulating Jastrow states in terms of tensor networks provides several equivalent diagrammatic forms.Analogous to the RBM graph earlier, a Jastrow state defined over a graph G leads to a sampleable tensor network constructed by replacing vertices with COPY tensors possessing an open leg and edges i by contractions with 2 × 2 edge matrices J (i) .This produces a correlator product state (CPS) amplitude [81] ψ Since the CPS tensor network is constructed from COPY tensors it possesses gauge freedom ensuring the representation in terms of J (i) is completely equivalent to the Boltzmann parameterisation in equation ( 10) in terms of biases c and interactions V.For the extremal case of G fc in figure 6(a) the resulting CPS tensor network is shown in figure 6(b).The CPS tensor network can be straightforwardly unravelled into a circuit of diagonal two-spin operators, shown in figure 6(c).In contrast to the same unravelling of a general NQS there are no ancilla involved, consistent with the Jastrow energy function equation ( 9) having interactions V only between visible units.Crucially the CPS and circuit networks can both be rewired in multiple ways to have the geometry of an NQS tensor network, a selection of which are shown in figure 6(d).
Although specialised to the graph G fc we can nonetheless glean several useful features about these new NQS tensor networks justifying why they should be considered the canonical Jastrow NQS form.First, they are compact possessing M = N − 1 hidden units with decreasing coordination N, N − 1, . . ., 2. This explains why convergence to an exact numerical solution can appear with M < N earlier in section 3.2.Second, they possess bare wires connecting some visible and hidden COPY tensors.From equation ( 19) these encode a diverging interaction enforcing a perfect correlation between those units, which in the tensor network language is equivalent to a coupling matrix C (i j) =  2×2 .Third, one of these new NQS forms, first in figure 6(d), has RBM weights w = V + lim S→∞ S N−1×N , where V denotes the Nth (entirely zero) row removed from V. Since V is a strictly upper-triangular matrix it also gives rise to the successively decreasing coordination of hidden units.Finally, they represent most direct and minimal translation of G fc into an NQS since the 1  2 N(N − 1) edge matrices J (i) become the NQS coupling matrices.
Contrast figure 6(d) to the tensor network versions of the Jastrow NQS representations introduced earlier in section 3.1.For sparse-Jastrow NQS the tensor network possesses only order-2 hidden COPY tensors, as depicted here for an N = 4 spin G fc Jastrow state (41) while the extensive-Jastrow NQS has perfect correlations present between unique pairs of visible and hidden units, as shown here (42) Both involve N(N − 1) coupling matrices, but as expected can be diagrammatically rewired into the new forms shown in figure 6(d).However, crucially the new set of Jastrow NQS contain variants in which any single visible unit is univalent, as illustrated in figure 6(d).In light of lemma 3 this is an important property not present in either the sparse equation (41) or extensive equation ( 42) Jastrow NQS representations.
By deleting couplings from the Jastrow NQS for G fc , equivalent to removing edges in the graph, we can readily obtain a Jastrow NQS defined over any graph G.This NQS will potentially possess more univalent visible units, dependent on the structure of the underlying graph G.For this reason we will now introduce some additional graph-theoretic concepts allowing us to formalise the Jastrow NQS construction for any graph G and determine the freedom in its univalency.

Graph theoretic tools
We will exploit a number of basic concepts from graph theory [70] and illustrate them using an example graph G from equation ( 7) earlier (43) First, the neighbourhood of a vertex a ∈ V, denoted by N a (G), is defined as the set of all vertices that are adjacent to vertex a, N a (G) := {b ∈ V|(a, b) ∈ E(G)}.Second, a leaf vertex is a vertex with only a single edge incident on it, and we will denote the set of these in a graph as L(G).Third, an independent set I(G) is a set of vertices in which no pair shares an edge between them.Of the many such sets I(G) for a given graph those with the largest cardinality are maximum independent sets denoted as α(G).Fourth, a closely related concept is that of a vertex cover.A vertex cover C(G) is set of vertices such that each edge of the graph is incident to at least one vertex in the vertex cover.Of the many sets C(G) those with the minimum cardinality form minimum vertex covers, denoted β(G).We illustrate these four concepts here: (44) A given maximum independent set has a corresponding minimum vertex cover that is its complement such that α(G) + β(G) = V, so finding one automatically gives the other.However, identifying either is an integer linear programming problem known to be NP-hard [82] in general.

Vertex modified Jastrow-NQS
Pulling together the observations from section 5.1 and tools from section 5.2 we present here a general procedure graph2nqs for constructing a Jastrow NQS tensor network directly from its graph G: Given we can choose any ordered vertex cover C(G) in graph2nqs there is considerable freedom in our eventual Jastrow NQS.Here are some useful special cases:

Lemma 4 (Minimum hidden unit Jastrow NQS). A Jastrow state defined over G can be expressed as a compact NQS tensor network comprising M
Proof.Given graph2nqs introduces perfectly correlated hidden units for each member of the vertex cover using β(G) provides the minimum.The extremal case of the fully connected graph G fc , where |β(G fc )| = N − 1, is the maximum possible size for a vertex covering and so bounds the hidden unit complexity of any Jastrow state.
The set of leaf vertices L(G) in G will always be univalent in an NQS.However, for graph2nqs we have the further property that if the first members of the ordered vertex cover C(G) form an independent set I(G) then all the vertices Q = L(G) ∪ I(G) will be univalent in the resulting Jastrow NQS.Using these observations we arrive at:

Lemma 5 (Maximum univalency for a Jastrow NQS). Given a graph G the maximum sized set of univalent visible units in its Jastrow NQS tensor network is
, where graph G is G with leaf vertices pruned.
Proof.Apply graph2nqs with the following ordered vertex cover C = {α(G ), β(G )}, where G is G with leaf vertices pruned and G is G with the vertices in α(G ) removed.The visible units in Q = L(G) ∪ α(G ) will thus be univalent in the resulting Jastrow NQS tensor network, and is the maximum possible owing to α(G ) being extremal.Note, the use of β(G ) is to minimise the total hidden unit count, but is not strictly necessary.Any vertex cover C(G ) will suffice to complete C(G).
To illustrate these results we apply them here to a Jastrow state defined by the graph in equation (43).We find a maximum sized univalent set of vertices Q by applying the construction from lemma 5 as (45) and forming an ordered vertex cover C(G) = {2, 5, 4}.It turns out that C(G) = β(G) in this case, although it is not guaranteed to be so in general, and gives a Jastrow NQS (46) Since |L(G)| = 1 and |α(G )| = 2 in the example we arrive at a Jastrow NQS with univalent visible units The enlarged univalency of Jastrow NQS defined over graphs G motivates the following generalisation:

Definition 3 (VMJ-NQS).
A vertex modified Jastrow NQS (VMJ-NQS) is a Jastrow state defined over a graph G which has arbitrary single-spin operators applied to any set of univalent vertices Q present in the Jastrow NQS as j∈Q Q j | Ψ JS .
By construction VMJ-NQS are compact since they possess the same M N − 1 hidden units as the underlying Jastrow state.Despite being a seemingly modest generalisation applying single-spin operators to Q has several implications.First, since operators are applied locally to single spins they cannot increase the entanglement content of the original Jastrow state | Ψ JS , and so we can view the Q j 's as fine-tuning.Second, like Jastrow states, VMJ states are equivalent to non-unitary preparation circuits with no ancilla but possess fewer perfectly correlated visible and hidden units.Third, and most importantly, the presence of single-spin operators significantly enrich the nodal structure of VMJ states compared to Jastrow states.Specifically, for Jastrow states zero amplitudes can only be introduced by the zero elements in edge matrices J (i) , for example In contrast the application of a nondiagonal single-spin gate generates a superposition of Jastrow state amplitudes, conditioned on the state of the spin it was applied to, introducing the potential of vanishing amplitudes due to interference effects.We will now show that a consequence of this is that VMJ-NQS can capture a much wider class of quantum states, namely stabilizer states.

Graph states and stabilizer states
In this section we introduce graph states and stabilizer states along with an explicit procedure for constructing VMJ-NQS representations of them.

Graph states
The importance of graph states stems from their ability to possess volume scaling amounts of entanglement between subsystems [65] and that they form a resource for measurement-based quantum computation [83,84].A graph state | Ψ G is defined over G for a set of spin-1 2 particles all initialised in the state | + as in which a controlled-phase gate Ĉz jk =  j ⊗  k +  j ⊗ Ẑk is applied between any pair of spins ( j, k) if there is a corresponding edge in E(G).The amplitudes Ψ G (v) are thus non-zero and equal in magnitude for all z basis states | v , but possess an intricate sign structure imposed by the controlled-phase gates.Graph states correspond to a special class of spin-Jastrow state with a unitary preparation circuit (once normalisation is restored), a CPS tensor network with identical Hadamard-type edge matrices and an NQS representation with M = |β(G)| hidden units [37].Graph states are also a special case of another class of states called stabilizer states.

Stabilizer states
Stabilizer states are central to constructing quantum error-correction codes [64,80] and for describing the degenerate ground state manifolds of numerous interacting topological systems [85].Their definition relies on the Pauli group for N spin-1/2's P N which consists of 4 × 4 N N-fold tensor product operators τ p1 ⊗ p2 ⊗ . . .⊗ pN , where τ ∈ {±1, ±i} is an overall phase factor and each pj ∈ { j , X j , Ŷ j , Ẑ j }.The Clifford group C N for N spins consists of all unitaries Û whose action is to map under conjugation Pauli group elements among themselves, so ÛP N Û † = P N .For a single spin the local Clifford group C 1 , after disregarding a global phase, contains 24 unitaries including the Pauli gates as well as the Hadamard and phase gates [80] respectively.The Clifford group C N can be generated by quantum circuits comprising the gates Ĥ and Ŝ along with the controlled-phase gate Ĉz jk .The key idea of the stabilizer formalism is to represent a quantum state not by a vector of amplitudes but by a set of unitary operators that each 'stabilize' the state.Specifically, for stabilizer states these operators form an abelian subgroup of P N defined by N generators Ta that commute and are independent in the sense that removing a generator defines a smaller subgroup.A stabilizer state | Ψ S is then the unique eigenstate with +1 eigenvalue of each generator A graph state | Ψ G is described by stabilizers In contrast to graph states, the amplitudes Ψ S (v) of stabilizer states can in general possess a nodal structure with zero amplitudes arising from parity constraints, while their non-zero amplitudes are equal magnitude with values ±1, ±i.Consequently stabilizer states can be written in a normal form in terms of Boolean-valued functions f (q), g(q), h(q) that are quadratic, linear and affine polynomials of the z basis qubit labels q = 1 2 (1 − v), respectively.Another powerful way to describe stabilizer states is to encode their generators into the rows of an N × 2N binary check matrix [86] partitioned into two N × N matrices X and Z with elements x a j and z a j , respectively.The bits x a j z a j determine the Pauli operator at the jth spin for the generator Ta as 00 →  j , 10 → X j , 11 → Ŷ j and 01 → Ẑ j , while elements s a of an additional N × 1 binary vector s specify the overall sign as (−1) s a .The independence of the generators is equivalent to the rows of G being linearly independent, and they all mutually commute if and only if G Λ G T = 0, where defines a symplectic inner product for the rows.The check matrix is not unique since we are at liberty to swap rows in G and s simultaneously, corresponding to relabelling the generators.
Similarly we can swap columns simultaneously within X and Z corresponding to relabelling the spins.Crucially we can also add rows modulo 2, corresponding to replacing a generator Ta with Ta Tb when a = b, so long as we also update its sign s a .Following equation ( 52) the check matrix for a graph state has the form where θ is its adjacency matrix of the graph G, and signs s = (0, 0, . . ., 0).The usefulness of stabilizer states stems in a large part from the Gottesman-Knill theorem [64,80] which establishes that any Clifford quantum circuit Û ∈ C N acting on an z basis state | v can be efficiently simulated classically.This follows since any | v is a stabilizer state with Ta = (−1) ( Ẑa , so the state Û | v has stabilizers T a = Û Ta Û † ∈ P N .The new stabilizers can be computed from elementary gates Ĥ j , Ŝ j and Ĉz jk which induce simple updates on G and s for each generator a = 1, 2, . . ., N as [86]: • applying Ĥ j , set s a → s a ⊕ (x a j • z a j ) and then swap x a j with z a j ; • applying Ŝ j , set s a → s a ⊕ (x a j • z a j ) and then set z a j → z a j ⊕ x a j ; , and then set z a j → z a j ⊕ x ak , z ak → z ak ⊕ x a j .
We will exploit the first of these two updates in the following.

VMJ-NQS for stabilizer states
Significant efforts have been made to determine exact NQS representation of stabilizer states.Explicit constructions for the RBM parameterisation of stabilizer states have been devised based on iteratively reducing the normal form in equation (53) [61], and also by manipulating the check matrix into a canonical form [59,60,62]. Analogous to Jastrow states, both these earlier constructions have a formal hidden unit complexity M ∼ O(N 2 ).Here we give a new approach exploiting that any stabilizer state | Ψ S is locally Clifford equivalent to a graph state | Ψ G as [87,88] for some ûa ∈ C 1 with a = 1, 2, . . ., N. Given graph2nqs provides a compact NQS for | Ψ G we can obtain a VMJ-NQS for | Ψ S from this so long as all the local Clifford gates ûa can be trivially absorbed via lemma 3. Our main result is to show that this is indeed always possible: Theorem 1. (Stabilizer state NQS).All stabilizer states have an exact VMJ-NQS representation, and so require no more than M = N − 1 hidden units.
Proof.Similar to reference [60] our proof exploits the mapping of the check matrix G of a stabilizer state | Ψ S into its canonical form.Specifically, by using row additions modulo 2 we perform Gaussian elimination from the top-left downwards, tracking the signs s and potentially swapping spins, to put the check matrix G into the form where r is the rank of X, and A is an r × (N − r) matrix.Next, we perform Gaussian elimination from the bottom-right upwards for the last N − r rows, again tracking the signs and doing any spin swaps necessary.Since all the rows of G were linearly independent originally the (N − r) × N submatrix [D E] must be full rank so we obtain the canonical form At this point we perform a sequence of local Clifford gates to render this canonical form into that of a graph state check matrix as in equation (56).First, we perform Ĥ gates on the last N − r spins giving making the new X (3) matrix full rank and resulting in the stabilizer state now having non-zero amplitudes on all basis states | v .The stabilizer commutativity condition G (3) Λ G (3)T = 0 must still hold so hence B = BT and A = DT , meaning that the new Z (3) matrix is symmetric.Second, Z (3) is not yet an adjacency matrix due to B potentially having non-zero diagonal elements.These elements correspond to a Ŷa operator in the corresponding generator Ta so we apply Ŝa gates on any of the first r spins where this is the case.This flips Ŷa → − Xa in Ta while leaving any Ẑa 's in other generators unchanged.Finally, we fix the pattern of signs s accumulated during these manipulations by applying Ẑa to any generator which has s a = 1, inducing Xa → − Xa flipping its sign, and overall making s = (0, 0, . . ., 0).We are left with a graph state check matrix of the form where B is B with its diagonal elements zeroed out.The matrix Z G is now a valid adjacency matrix for a graph G. Having mapped | Ψ S to | Ψ G we now find the inverse for the sequence of local Clifford gates applied giving the equivalence in equation (57).Crucially the bottom zero corner of Z G shows that the set of N − r spins which had the non-diagonal Ĥ gates applied to them form an independent set I H (G). The Ŝ gates applied to the first r are diagonal, as are Ẑ gates.Consequently all gates applied are compatible with lemma 3 and can be trivially absorbed into the NQS generated by graph2nqs using I H (G) at the start of its ordered vertex cover4 .

Analytic examples
To finish this work we now present some illustrative examples of theorem 1 applied to stabilizer states arising from quantum error correction codes and to a topological ground state.It is useful to introduce some specialised graphical representations of the Clifford gates Ẑ, Ŝ † and Ĥ as (63) respectively, for the resulting NQS tensor network diagrams.We add to the 6 generators defining this code space the operator 7 j=1 Ŷ j so the unique stabilizer state is the superposition of logical qubit states | Ψ steane = | 0 L − i | 1 L .Applying the procedure outlined in the proof of theorem 1 results in the following transformation of the

stabilizer generators represented in this table
The bottom row denotes the local Clifford gates to be applied to each qubit after the corresponding stabilizer state is constructed.The final form corresponds to a graph state defined with vertex Clifford operators shown in figure 7(a).As expected the non-diagonal Ĥ gates are applied to the independent set I G = {5, 6, 7} in this graph.We can then form a vertex cover C(G) = I G ∪ {1, 3} from which graph2nqs then constructs as a VMJ-NQS with M = 5 hidden units, as shown in figure 7(b).In figures 7(c) and (d) we show some more examples of VMJ-NQS for stabilizer states arising from the five-qubit [ [5,1,3]] code [90] and the nine-qubit Shor code [91].
governing spins on a 2L × L lattice located at the bonds of a 2D L × L square lattice with periodic boundary conditions.Here + denotes the set of spins forming a star s around a vertex, while denotes the set of spins around the perimeter of a plaquette p of the square lattice, as depicted in figure 8(a).All the terms with Ĥtoric mutually commute and form a set of 2L 2 generators in which any ground state is a simultaneous +1 eigenstate.However, since s∈+ j∈s Ẑ j =  and p∈ j∈p X j =  this set contains only 2(L 2 − 1) independent generators and so defines a four-dimensional degenerate ground state code subspace.
To be a +1 eigenstate of a given star term j∈s Ẑ j requires that an even number of the 4 spins in s are | ↓ .Given that neighbouring star terms overlap by a single spin a configuration state | v can be a simultaneous +1 eigenstate of all star terms + so long as the pattern of | ↓ spins form closed loops around the lattice, which we denote as | l .An example is shown in figure 8(b).Given a closed loop state | l plaquette terms flip pairs of spins in the surrounding stars and so annihilate or create loops.We can form a +1 eigenstate of all plaquette terms by equally superposing all closed loop configurations reachable from the chosen | l .One of the simplest is the equal superposition of all possible closed loops5 Figure 9. (a) The direct NQS for | Ψ toric [37] in which hidden units align with the star set + for the lattice.This stabilizer state is uniquely specified by adding to the generator set two 6 Wilson operators which involve spins on any non-contractible paths x and y that cut the lattice through vertices along the x-and y-axis, respectively, examples of which are shown in figure 8(a).An NQS representation of | Ψ toric can be constructed directly [37].Using the qubit labelling its non-zero amplitudes occur for configurations | q where q simultaneously satisfies N/2 binary equations Aq T = 0 with the N/2 × N coefficient matrix A encoding q s 1 ⊕ q s 2 ⊕ q s 3 ⊕ q s 4 = 0 for all sets of star spins s 1 , s 2 , s 3 , s 4 .Such constraints are easily enforced by a tensor network (67) commonly called the XOR tensor since its non-zero elements reflect the truth table of the 3 input bits and 1 output bit for a cascade of two XOR gates.The XOR tensor generalises straightforwardly for n legs [79].Given equation ( 67) is already in the form of a hidden unit with Hadamard coupling matrices, the NQS for | Ψ toric is found by gluing an XOR tensor to the COPY tensors of spins in each star.This yields a direct NQS with M = N/2 hidden units, as shown in figure 9(a) for a 6 × 3 lattice.This representation elegantly reflects the translational invariance of the state, but as a consequence has no univalent visible unit COPY tensors, meaning it cannot emerge as a solution from the VMJ-NQS construction.
The VMJ-NQS construction applied to | Ψ toric has some general features.It identifies a graph state equivalence in which Ĥ gates are applied to an independent set I H (G) that also forms a vertex cover.Consequently, the VMJ-NQS generated comprise entirely of Hadamard coupling matrices making each hidden unit an XOR tensor applied to receptive fields typically larger than 4 spins.The construction therefore reorganises the XOR constraints defining | Ψ toric such that each each constraint has one spin that is exclusive to it that is a member of I H (G).This is equivalent to performing Gaussian elimination on the coefficient A, and additionally reveals that only N/2 − 1 constraints are actually needed, since one can always be removed from the direct NQS by simply adding all the others to it.A particular VMJ-NQS solution with M = 8 hidden units is shown in figure 9(b) for a 6 × 3 lattice.In this case it shares 6 hidden units with the direct NQS, but its final 2 hidden units have a coordination of 6.Given the considerable non-uniqueness of VMJ-NQS it is an interesting open question whether their structure can ever reflect the translational invariance of | Ψ toric , perhaps reduced over a larger unit cell.

Conclusion and discussion
We have show that Jastrow states have an exact NQS representation with M ∼ O(N ) hidden unit complexity that is substantially more efficient than the M ∼ O(N 2 ) previously conjectured.While this construction formally required diverging weights w i j it was seen to be robust to softening and a numerical example for the XXZ chain demonstrated that this new form of Jastrow NQS can emerge from an unbiased VMC optimisation.Focusing on the graph structure of Jastrow states, and using the tensor network formulation, we refined this result to show that Jastrow states possess an exact NQS representation requiring no more than M = N − 1 hidden units, namely the largest size of a vertex cover of a graph.Moreover, these NQS representations are guaranteed to possess at least one univalent visible unit.This presented an opportunity to generalise them to VMJ-NQS by applying arbitrary single-spin gates to those units.By exploiting graph states, which are a special case of spin Jastrow states, this simple modification enhanced the expressiveness of VMJ-NQS sufficiently to capture all stabilizer states with no increase in the hidden unit complexity.Ultimately this result is a direct implication of the fact that both Jastrow and stabilizer states can all be generated by preparation circuits involving two-spin diagonal gates acting only on the visible units (and hence no ancilla) followed by non-diagonal single-spin unitary gates applied at independent vertices of the circuit's graph.
There is good reason to suspect that wider classes of quantum states can be encapsulated by VMJ-NQS and hence share their compactness.For stabilizer states the construction relied on graph states, a special subset of Jastrow states, and local Clifford gates, a special subset of single spin unitaries.Yet VMJ-NQS are defined for any Jastrow state over a graph G and can absorb any single-spin gates, unitary or otherwise, at independent vertices G, so very little of this generality was exploited.
A good place to start are classes of quantum states closely related to graph and stabilizer states that are known to have efficient RBM representations [61].This includes so-called hypergraph states [93] which generalise standard graph states.Their construction again starts with all spins initialised in | + but are now coupled via hyperedges where subsets of p vertices Z = { j 1 , j 2 , . . ., j p } have controlled p−1 -phase gates Ĉz j 1 j 2 ...
However, for p > 2 this hidden unit cannot be rearranged to expose a perfect correlation with one visible unit.As such an NQS for a hypergraph state built from these correlators does not benefit from the merging of COPY tensors used for standard graph states that allowed all edges originating from a vertex to be collected at a single hidden unit.This suggests that hypergraph states require a hidden unit for each hyperedge, so that for a fixed p we have a complexity M ∼ O(N p ).Another related rich class of states are XS-stabilizer states [94], which generalise stabilizer states to allow for non-abelian stabilizers by drawing operators from the group generated by the single spin operators { √ i, X, Ŝ}.Similar to how stabilizer states generalise graph states, XS-stabilizer states generalise the simplest p = 3 hypergraph states by introducing nodal structure through parity constraints, giving them a formal hidden unit complexity of M ∼ O(N 3 ).It thus remains an open question whether compact NQS are a special case for Jastrow, graph and stabilizer states, or if they can also be found for hypergraph and XS-stabilizer states.
Further to this, the gauge symmetry reduction of tensor-network NQS outlined here has direct implications for the application of NQS to systems with higher on-site dimension.In particular it points to the need to go beyond the conventional complex RBM formulation.In recent work [95] we have considered this for the case of spin-1 systems and work in progress is examining it for bosonic systems [96] where Jastrow states are a widely used variational ansatz.In this context it is an interesting open question whether our algorithm for constructing NQS for stabilizer states can be straightforwardly generalised for qudits [97].
Finally, our work gives useful guidance about the structure of NQS beyond the exact states considered.For example, in numerical calculations using the complex RBM parameterisation we have found that allowing moderate-valued weights enables the optimisation to locate near perfect visible-hidden correlations that can improve the accuracy, even when the exact state is not a Jastrow state.Consequently, if the value of weights are heavily constrained during an optimisation then such compact solution might be missed.We have also seen new patterns of receptive fields emerge, like a decreasing coordination pattern interpolating between system-extensive and sparse connectivity, which could be exploited in numerical calculations.Moreover, our results have demonstrated that even α = 1 NQS can exactly capture highly nontrivial quantum states that exhibit volume-scaling entanglement and topological order.These observations contribute to the perpetual balance within variational approaches between using a more specialised ansatz with fewer parameters, but more bias, verses systematically increasing the parameters in an ansatz to improve expressiveness, but risk complicating the energy landscape traversed by the optimisation.

Appendix A. Boltzmann-like parameterisation of coupling matrices
Every 2 × 2 coupling matrix C (i j) can be parameterised locally in Boltzmann form as h i v j = exp c i j + w i j h i v j + bij h i + ãij v j , ( A . 1 ) for h i , v j ∈ {+1, −1} in terms of a weight w i j , partial biases ãij and bij , and a scale factor c i j .These complex parameters are found from the coupling matrix elements as As discussed in section 3.2 the zeros appearing in coupling matrix elements can be handled numerically by softening C (i j) h i v j → max(C (i j) h i v j , e −S ), where S ≈ 5 − 10, to avoid divergent parameters.
Using this decomposition we can immediately remove an overall irrelevant constant e c i j from each coupling matrix, reducing the total number of complex parameters of the NQS tensor network to 3MN.The decomposition −− e bij 0 0 e − bij e w i j e −w i j e −w i j e w i j e ãij 0 0 e −ã i j , is represented diagrammatically as (A.2)

Figure 1 .
Figure 1.A schematic of the relation between Jastrow and stabilizer states.Graph states lie within the intersection of these two distinct classes of states.By applying graph theoretic tools to Jastrow states we propose a new larger class of states called vertex modified Jastrow (VMJ) states in which arbitrary single-spin gates are applied specific spins.Although a modest generalisation of Jastrow we show this is sufficient to capture stabilizer states and provide a simple procedure for constructing compact NQS.

Figure 4 .
Figure 4. (a) The overlap O with the exact XXZ ground state for N = 20 spins as a function of the anisotropy Δ for ψ CFT using α = 1 2π arccos(−Δ).The line through the points is drawn to guide the eye, while the dashed horizontal line is the 99% overlap.(b) The deviation of the overlap 1 − O between the Δ = 0 exact Jastrow ground state and its softened NQS representation as a function of S. The different symbols represent increasing system sizes N = 6, . . ., 20, and the solid line is ∝ exp(−4S) for reference.The panel shows a zoom of the N dependence for S = 2, which is illustrative of all values of S examined.The data and scripts used to create these plots in MATLAB can be found in reference[75].

Figure 5 .
Figure 5. (a) The deviation in the overlap 1 − O of the numerical NQS with the exact N = 20 XXZ ground state for Δ = 0 versus hidden unit number M. The lines drawn between points are to guide the eye.(b) The weights w i j between hidden units i and visible units j for the M = N = 20 NQS Δ = 0 solution.The interactions have been rearranged in order of decreasing maximum coupling strength.(c) The plot of 1 − O for Δ = 1 where the XXZ ground state is not Jastrow.The dashed line is the value of 1 − O for the Jastrow CFT state.(d) The weights w i j for the M = N = 20 NQS Δ = 1 solution.The data and scripts used to create these plots in MATLAB can be found in reference[75].

Figure 6 .
Figure 6.(a) A fully connected graph G fc for N = 4 spins.(b) A CPS tensor network corresponding to G fc in (a), with the first edge matrix J(1) highlighted (and in each subsequent diagram as well to illustrate where it moves to).(c) A rewiring using equations (27) and (25) of (b) into a circuit of diagonal two-spin operators.(d) Some possible rewirings of (b) and (c) into the geometry of an NQS tensor network.The examples drawn emphasise that any visible unit can be made univalent.All diagrams generalise straightforwardly for any N.
(a) Begin the construction of the tensor network by inserting a COPY tensor with an open leg for each vertex in G, illustrated here for a simple example: (b) Pick any ordered vertex cover set C(G), and for each vertex in it split its corresponding COPY tensor into two as , with the first COPY tensor retaining the open leg, so it continues to represents the visible unit, and the second COPY tensor representing a perfectly correlated hidden unit: (c) Proceed through C(G) = {c 1 , c 2 , . . .} in order.For vertex c j contract a separate order-2 tensor J (i) between c j 's corresponding hidden COPY tensor and each visible COPY tensor corresponding to the vertices in l ∈ N j (G)/ j−1 k=1 c k , neighbouring c j , but with the previous members of C(G) removed:

Figure 7 .
Figure 7. (a) The graph defining a graph state locally Clifford equivalent to | Ψ steane = | 0 L − i | 1 L , with the operators shown for each vertex.The independent set I G = {5, 6, 7} is highlighted.(b) The VMJ-NQS constructed from the graph in (a) using the vertex cover completed with the addition of the vertices {1, 3} also highlighted.The Clifford gates at each vertex can be trivially absorbed via lemma 3. (c) The VMJ-NQS for the logical qubit state | Ψ [[5,1,3]] = | 0 L of the five-qubit [[5, 1, 3]] quantum code.(d) The VMJ-NQS for the state | Ψ shor = | 0 L + | 1 L constructed from the nine-qubit Shor code.Notice in this case the concatenated GHZ structure of the code is apparent.

Figure 8 .
Figure 8.(a) A 3 × 3 square lattice with vertices × forms a 3 × 6 lattice of spins (blue circles) located at the centre of bonds.Spins s involved in one star + surrounding a vertex are shown by the red dashed lines, while the spins p involved in one plaquette are shown as a blue shaded square.Possible non-contractible paths x,y for the Wilson loop operators Ŵx,y are also depicted as horizontal and vertical solid green lines.(b) A closed loop configuration state | l .
Some visible units have open legs which indicate connections wrapping around the periodic boundaries displayed here as connections to duplicate spins shown as grey circles.(b) The VMJ-NQS representation of | Ψ toric found through the construction procedure outlined in the proof of theorem 1.The resulting graph state has Ĥ gates applied to the independent set I H (G) = {1, 2, 3, 13, 14, 15, 17, 18} which also alone form a vertex cover for graph2nqs.