Flexible resources for quantum metrology

Quantum metrology offers a quadratic advantage over classical approaches to parameter estimation problems by utilizing entanglement and nonclassicality. However, the hurdle of actually implementing the necessary quantum probe states and measurements, which vary drastically for different metrological scenarios, is usually not taken into account. We show that for a wide range of tasks in metrology, 2D cluster states (a particular family of states useful for measurement-based quantum computation) can serve as flexible resources that allow one to efficiently prepare any required state for sensing, and perform appropriate (entangled) measurements using only single qubit operations. Crucially, the overhead in the number of qubits is less than quadratic, thus preserving the quantum scaling advantage. This is ensured by using a compression to a logarithmically sized space that contains all relevant information for sensing. We specifically demonstrate how our method can be used to obtain optimal scaling for phase and frequency estimation in local estimation problems, as well as for the Bayesian equivalents with Gaussian priors of varying widths. Furthermore, we show that in the paradigmatic case of local phase estimation 1D cluster states are sufficient for optimal state preparation and measurement.


I. INTRODUCTION
Quantum metrology is positioned at the forefront of modern quantum sciences, spearheading the development of future quantum technologies. By utilizing the power of quantum mechanics to gain advantages over previously known techniques in practical tasks such as parameter estimation [1][2][3][4], state discrimination [5], or hypothesis testing [6], quantum-enhanced measurement procedures have already led to breakthrough discoveries [7,8]. Moreover, nonclassical effects can be harnessed to enhance the precision of determining quantities of interest, including magnetic fields [9,10], forces [11,12], phases [13,14], or frequencies [15][16][17]. For many different applications, the quantum advantage manifests as a quadratic scaling gap in terms of the relevant resources [18][19][20][21], e.g., the number of sensing systems, with respect to the best classical approaches. However, to achieve this so-called Heisenberg scaling, different tasks require different resource states as well as different (potentially non-local) measurements, which have to be separately determined for any specific case, rendering the design of a universally applicable, optimal sensing device difficult. Moreover, this still leaves open the important (and often ignored) question of how the desired states and measurements can be implemented efficiently.
Here we report on the design of a flexible device that allows one to obtain a quantum scaling advantage for * nicolai.friis@univie.ac.at a large class of different metrological problems by using only a specific entangled state and single-qubit operations. We show that a 2D cluster state [22,23] a particular entangled state associated with a rectangular lattice that can be prepared by commuting, nearestneighbour interactions among qubits on the lattice -allows achieving Heisenberg scaling for an important group of paradigmatic metrology problems. This includes the sensing of local observables such as magnetic fields [9,10], as well as the estimation of phases [18,19], frequencies [15][16][17], and certain interaction strengths [24]. Crucially, we show that this can be done both in the local (frequentist) approach with arbitrarily many repetitions, and in the (single-shot) Bayesian approach for arbitrary cost functions and priors (see, e.g., Ref. [25]), including flat [19][20][21] and Gaussian priors with varying width [26,27]. The key difference between these estimation problems lies in the incorporation of a priori available knowledge about the estimated parameter. In local estimation, no quantification of prior knowledge is required in principle, but it is often assumed that fluctuations around a well-known value of the parameter are being estimated in order to make use of the quantum Fisher information (QFI) as a relevant figure of merit. In Bayesian estimation, the initial information is encoded in a prior probability distribution that is updated according to Bayes' law after each individual measurement.
The optimal probe state for these different problems vary strongly, ranging from Greenberger-Horne-Zeilinger (GHZ) states in the case of local phase estimation, to arXiv:1610.09999v2 [quant-ph] 30  certain superpositions of states with different Hamming weights (e.g., with sine-shaped profiles for the coefficients [19]) for Bayesian phase estimation (for flat priors). Moreover, also the corresponding optimal measurements are vastly different, including simple local measurements for GHZ states, but also complicated, entangled measurements on all qubits [28,29], e.g., discrete Fourier basis measurements for Bayesian estimation with flat priors [19]. In particular, some states and measurements may be significantly more difficult to realize than others. The 2D cluster state allows one to deal with all of these problems. On the one hand, the fact that it is a universal resource for measurement-based quantum computation (MBQC) [30,31] trivially enables arbitrary state preparation and measurements on a subset of the qubits in the cluster, provided the latter is large enough. On the other hand, MBQC provides a simple, unifying framework in which state preparation and measurements can be assigned an unambiguous resource cost in terms of the overall number of qubits in the cluster 1 , as illustrated in Fig. 1. To guarantee a quantum scaling advantage for metrological applications, the probe preparation and measurements must be efficiently executable. That is, any metrological scaling advantage is lost if the size of the cluster required for a given estimation strategy with an N -qubit probe grows as N 2 or stronger, in which case it becomes favourable to use all qubits in the cluster as individual, classical probes instead.
We show that the preparation of optimal probe states and corresponding suitable measurements for local as well as Bayesian phase and frequency estimation can indeed be carried out efficiently using 2D cluster states. For the local scenario, we explicitly construct the preparation and measurement strategy achieving optimality. For the Bayesian scenario, we present a construction that can generate all optimal probe states with a linear overhead in N . We then introduce a compression procedure that can be implemented on a 2D cluster with O(N log 2 N ) qubits, which enables one to efficiently perform measurements even when the circuit descriptions of the corresponding unitaries are of exponential size in the number of qubits of the compressed space. These constructions allow achieving Heisenberg scaling for phase and frequency estimation scenarios using the 2D cluster in a flexible manner. Crucially, this flexibility holds the potential for yielding (nearly) optimal scaling performance for a variety of estimation problems, and hence goes beyond the capabilities of architectures dedicated to specific individual tasks [33]. To further illustrate these general results, we discuss a particular choice of probe states and measurements that can be efficiently implemented in our framework, for which Heisenberg scaling can be achieved for Gaussian priors of varying widths.
This paper is structured as follows. In Section II we first discuss the basic structure of parameter estimation problems and the general form of all optimal probe states. We then argue that 2D cluster states provide flexible resources to achieve Heisenberg scaling in phase and frequency estimation problems by using an efficient compression to the subspace of the optimal probes. In Section III we then show how Heisenberg scaling can be achieved in Bayesian phase (and frequency) estimation, before demonstrating in Section IV how the necessary probe states can be prepared in a measurement-based architecture consisting of O(N ) qubits. Finally, we introduce the explicit construction of the efficient compression algorithm required for the measurements in Section V. At last, we discuss our findings and their implications in Section VI, including generalization to the estimation of quantities other than phases and frequencies.

II.A. Parameter Estimation Problems
In typical parameter estimation procedures, one wishes to determine an unknown parameter θ that is not directly measurable. To this end, a probe state described by a density operator ρ o is prepared, which undergoes a dynamical evolution governed by θ, encoding the parameter in the resulting state ρ(θ). The evolution can in principle be an arbitrary quantum channel but we are here mainly interested in pure states ρ o = ψ ⟩⟨ ψ and unitary channels, where for a unitary U θ = exp(−iθH) generated by the Hamiltonian 2 H = H † . For example, in phase (and frequency) es- 2 We work in units where ̵ h = 1. In addition, we adopt the usual convention of Hamiltonian estimation where the eigenvalues of timation, one considers a local Hamiltonian for N qubits, i.e., and H i acts nontrivially only on the ith qubit. Typically, one has where Z is the usual Pauli operator, but other local Hamiltonians can be brought to this form by local unitaries. After the encoding, a measurement of the probe state ρ(θ) is performed, which can be represented by a positive-operator valued measure (POVM), i.e., a set {E m } of positive semi-definite operators E m ≥ 0 satisfying ∑ m E m = 1, where 1 is the identity operator. For an introduction to POVM measurements see, e.g., [34, pp. 90] or [35]. From the measurement outcomes, labelled m, an estimate of the parameter in question can be obtained. The precise nature of the estimator depends on the type of estimation scenario, distinguishing, for example, between local and Bayesian estimation mentioned previously. All these scenarios have in common that the precision of the estimation [as quantified by some figure of merit, e.g., the mean-square error (MSE)] improves with the number N of probe systems. For classical strategies based on product states, this increase is at most linear in N , which is referred to as the standard quantum limit (or shot noise scaling). However, using approaches based on the optimal quantum mechanical probes the improvement in this figure of merit can be quadratic in N , i.e., achieving (optimal) Heisenberg scaling. For reviews of parameter estimation techniques and quantum metrology we direct the reader to, e.g., Refs. [1,3,4] or the Appendix.
In local phase (and frequency) estimation one typically considers many repetitions of the same measurement that provide an estimate, whose variance one is interested in minimizing using the available resources. In this scenario, the optimal N -qubit probe state is a GHZ state and the accompanying optimal measurements are local X measurements. This can be determined via the QFI, the relevant figure of merit for local estimation, as we explain in more detail in Appendix A.I. In Bayesian parameter estimation (see, e.g., Refs. [4,36] or Appendix A.II), the situation is somewhat different. Here one quantifies the initial knowledge (or belief) about the parameter by a prior probability distribution that is updated after H (and hence θ) are taken to be dimensionless. For example, for frequency estimation one then has θ = ωt, where the time t is assumed to be known precisely.
each single measurement. In this case, a figure of merit is the average variance of the updated distribution. In the Bayesian estimation scenario, the optimal probes and measurements depend on the shape of the prior and the cost function used. For instance, for phase estimation with flat priors (i.e., no prior knowledge), the optimal probe state achieving Heisenberg scaling is given by where n ⟩ are eigenstates of H corresponding to its N + 1 different eigenvalues, and the coefficients ψ n have a sinusoidal profile (see, e.g., Ref. [19]), i.e., Although different from the optimal measurement, we find that for the state in Eq. (5) a projective measurement in the basis obtained via the quantum Fourier transform (QFT) of the basis { n ⟩} allows for Heisenberg scaling for Bayesian phase and frequency estimation with Gaussian priors of varying widths, as we discuss in Section III, as well as in Appendices A.IV and A.V. The crucial observation required to extend the applicability of this approach to arbitrary priors (and cost functions) lies in noticing that in N -qubit phase (and frequency) estimation scenarios of any kind, H only has N + 1 different eigenvalues. For each of these values, only one representative eigenstate needs to be selected. Moreover, within the subspaces corresponding to fixed eigenvalues one may choose those eigenstates that can be prepared most efficiently. Instead of the typical Dicke states that are symmetric with respect to the exchange of the qubits, we therefore employ eigenstates corresponding to a unary encoding of n, i.e., All optimal probe states can hence be chosen to be of the form of Eq. (5) with n ⟩ ≡ n ⟩ un for some choice of the coefficients ψ n . Most importantly, all of these probe states have support in an (N + 1)-dimensional subspace of the 2 N -dimensional overall Hilbert space. Therefore, the problem of optimal state preparation and measurements for N qubits can be translated to that of λ ∶= ⌈log(N +1)⌉ qubits (where the logarithm is understood to be to base 2), provided that one can efficiently and coherently convert the unary encoding n ⟩ un to a binary encoding in λ qubits. More precisely, one can initially prepare a state of λ qubits and convert it (efficiently) to the desired N -qubit state for sensing (using at least N − λ auxiliary qubits). After the parameter has been encoded, one performs the reverse procedure before carrying out the final measurement on λ qubits. In Section V we present a quantum circuit of size O(N log 2 N ) (and its MBQC representation) achieving exactly such a unary-to-binary compression. On the logarithmically small space of these λ qubits the probe state preparation and measurement can then be carried out even with exponential overhead in λ while maintaining Heisenberg scaling.

II.B. Parameter Estimation in MBQC Architectures
The premise for taking advantage of the quadratic scaling gap in resources (here, the number of qubits) between the quantum strategy described in the previous section and the best classical strategy is that the required probe states and measurements can be implemented efficiently.
Here, we will take efficiency to mean that the overhead in the number of qubits used for the execution of the quantum strategy, including preparation and measurement, must grow less strongly than N 2 . To illustrate this requirement, consider a situation where an array of qubits is provided and one is given the task of using the array most efficiently for the estimation of a parameter. For instance, an array of spins (which may otherwise be used for quantum computation or quantum simulation) could be exposed to a magnetic field with fixed direction but unknown strength for this purpose. If one has the ability to prepare arbitrary quantum states of these (spin) qubits, then one may initialize a GHZ state for local phase estimation, or the corresponding optimal state for Bayesian phase estimation (or any other estimation problem for that matter). However, as we have seen in the previous section, states and measurements that offer advantages for different metrological problems are in general quite distinct, and the conversion from one to the other may involve arbitrarily long sequences of entangling operations. The preparation and measurement hence comes at a cost that we wish to quantify.
An approach that allows for preparing arbitrary quantum states and performing any measurements on them, while naturally including a resource count for these tasks is MBQC. In this paradigm, introduced in Ref. [30], an array of qubits is initialized in a particular (entangled) quantum state, typically a so-called cluster state [22]. A cluster state is a type of graph state, i.e., it can be represented by a graph (a set of vertices v i and edges e ij connecting the vertices). Each vertex represents a qubit initialized in the state + ⟩, and controlled phase gates CZ, given by are applied to each pair of qubits connected by an edge. For simplicity, we will here only consider 2D cluster states where the underlying graph is a regular, rectangular lattice, but in principle, also other graph states [37] could be considered for our purposes. By applying only single-qubit gates and carrying out local measurements on a subset of all qubits in a 2D cluster, arbitrary unitary operations can be implemented on the remaining qubits [31]. Performing a unitary transformation in the circuit model of quantum computation hence translates to a sequence of measurement angles for single-qubit measurements in the cluster. For a more detailed introduction to MBQC see Refs. [38,39], or Appendix A.VI.
In other words, a number of the initial qubits can be sacrificed to obtain a probe state of fewer qubits, which is more suitable for a given metrological task at hand. Note that using the unmodified cluster state as a probe state itself does not provide a scaling advantage with respect to classical strategies, i.e., its QFI is O(N ). Similarly, additional qubits can be used to implement arbitrary measurements by performing appropriate unitaries followed by computational basis measurements. Here, one needs to ensure that only the part of the cluster used to prepare the probe state is subjected to the transformation encoding the parameter. This can be achieved, e.g., by appropriately timed Pauli-X operations on the qubits used for the measurement at the middle and at the end of the interaction period. For spins this corresponds to the general practice of refocusing of the magnetisation, i.e., a spin echo.
Crucially, the overall number of qubits required for the preparation and measurement of this N -qubit probe state must grow less than quadratically with N to maintain a potential metrological scaling advantage. This is possible, for instance, for local phase estimation, where the optimal measurement strategy can be carried out with 2N − 1 qubits in a 1D cluster state as shown in Fig. 2. As we will show in the following, such efficient constructions also exist for Bayesian phase (and frequency) estimation problems. In Section IV, we demonstrate that all probe states (including the optimal ones) of the form of Eq. (5) can be efficiently prepared from a 2D cluster state using only local operations. In Section V we then present the unary-to-binary compression requiring O(N log 2 N ) qubits of the cluster to reduce the problem of implementing optimal measurements to the subspace of λ ∶= ⌈log(N + 1)⌉ qubits. On this subspace, projective measurements in any basis can be carried out efficiently, provided that the unitary transformation relating it to a computational-basis measurement requires no more than O(2 λ ) (nearest neighbour) gates. This is the case, for instance, for the QFT measurement, which performs optimally for flat priors [19] and achieves Heisenberg scaling for Gaussian priors of varying widths as we will show next.

III. QUANTUM ADVANTAGE IN BAYESIAN ESTIMATION
We now briefly discuss the Bayesian phase estimation scenario, more details on which can be found in Appendix A.II, and show that the combination of sine states and QFT measurements can achieve Heisenberg scaling. In Bayesian parameter estimation, the initial knowledge about the parameter is encoded in a prior probability distribution p(θ). When a measurement with POVM (a) In the preparation stage (green), the resource state, a four-qubit GHZ state, is created by measurements of the three qubits of a 1D seven-qubit cluster state. Given the measurement outcome sn of the qubit labelled 2n, the qubit (2n + 1) is corrected locally by a Pauli-X operation if ∑ n i=1 sn is odd. After the local corrections, the encoding transformation U θ is applied, imprinting the parameter that is to be estimated. In the final measurement stage (orange), the remaining qubits in the cluster are locally measured. In (b), the preparation and sensing stages are illustrated as MBQC measurement patterns in a graphical notation (see, e.g., Ref. [38]). Measured qubits are represented by circles inscribed with the corresponding measurement angle in the x − y plane (here ϕ = 0), while output qubits are indicated by diamonds ( ). The connecting lines between qubits indicate the initial application of CZ gates, and all qubits are assumed to have been initialized in the state + ⟩.
elements {E m } is performed on the parameter-encoded state ρ(θ), the conditional probability of obtaining the outcome labelled m is To obtain the unconditional probability for the same outcome, these values are weighed according to one's prior belief, i.e., p(m) = dθ p(θ) p(m θ).
The information obtained in a measurement with outcome m is then used to update this belief via Bayes' law, obtaining the posterior distribution p(θ m) given by In turn, the posterior distribution provides an estimatê θ(m) for the parameter viâ θ(m) = dθ p(θ m) θ .
As a figure of merit for this estimation procedure one then quantifies the width of the posterior by a suitable measure V (m) post and averages over all possible outcomes, such that For instance, when the parameter in question has support over all of R (e.g., for frequency estimation, see Appendix A.V), one may use the MSE Here, we want to focus on phase estimation, i.e., the case where the parameter has support on the interval [−π, π] ⊂ R. When the prior is appropriately narrow, one may still use the MSE, which allows the use of some simple techniques (e.g., a Bayesian version of the Cramér-Rao inequality, see the Appendix A.II.2 and Ref. [25]) for the comparison with classical strategies. Nonetheless, wrapped distributions and covariant measures of their width are in general more suitable for phase estimation.
As an example, one can consider the wrapped Gaussian distribution of the form where q ∈ Z, and the mean angle is The non-negative parameter σ can be identified with the circular standard deviation corresponding to the width of the underlying Gaussian distribution. However, for our purposes, it is more useful 3. Example for quantum strategy:. The inverse of the average phase variance V φ,post of the posterior is shown for up to N = 100 qubits for the measurement strategy using probe states with coefficients as in Eq. (6) and QFT measurements. Although N is an integer with N ≥ 1, the curves have been plotted for continuous values of N for the purpose of illustration. The prior is chosen to be a wrapped Gaussian with θo = 0 and curves are shown for values of σ from π 8 (blue) to π (green) in steps of π 8 . The curves, although difficult to tell apart visually, are distinct. Additional analysis of this measurement strategy using the MSE and comparisons with classical strategies can be found in Fig to quantify the width of this wrapped distribution by the Holevo phase variance [40] V φ , given by Likewise, we will quantify the width of the posterior by For the probe states of Eq. (5) with the sinusoidal profile of Eq. (6), and the QFT measurement represented by the basis { e k ⟩}, where we then calculate the average phase variance V φ,post = ∑ m p(m) V (m) φ,post . The results for various values of σ and for up to 100 qubits are shown in Fig. 3. The numerical results indicate that for all widths of the priors the example quantum strategy exhibits Heisenberg scaling. In Appendix A.IV we discuss the performance of this measurement strategy in more detail and give a comparison with the performance of classical strategies, which can be shown to exhibit shot noise scaling.

IV. EFFICIENT PREPARATION OF PROBE STATES
In this section we present a method that allows for the efficient preparation of the probe state of Eq. (5), which immediately generalizes to any state in the subspace of optimal probes spanned by { n ⟩ un } n=0,...,N . This method relies on the simple observation that in the bitstring (u 1 u 2 u 3 . . . u N ) representing the state i.e., where u k ∈ {0, 1} and n = ∑ k u k , the n entries u 1 , u 2 , . . . , u n = 1 are always to the left of the entries u n+1 , . . . , u N = 0. In other words, the k-th qubit can only be in the state 1⟩, if all of the k − 1 qubits before are also in the state 1⟩. Focussing on the sine state of Eq. (6) as an example, note that the coefficients are all real and positive. Initializing all qubits in the state 0 ⟩, the circuit preparing the sine state must hence be a cascade of N (controlled) single-qubit Y -rotations CR y (φ i ), whose angles {φ i } i=1,...,N determine the weights ψ n , see Fig. 4. This becomes apparent when inspecting the single-qubit Pauli-Y rotations The action of the circuit in Fig. 4 then transforms the k-th qubit to the state cos φ k 2 0 ⟩ + sin φ k 2 1 ⟩ if the (k − 1)-th qubit is in the state 1 ⟩. All together, these N rotations are parametrized by angles φ n ∈ [0, π 4), such that both the sine and the cosine in the above expression are non-negative. It is straightforward to verify that the output of the circuit is the state of Eq. (5) with amplitudes Note that ψ 0 uniquely determines φ 1 and that each of the ψ n depends only on {φ k } n+1 k=1 . This allows inverting Eq. (23) and expressing the angles φ n as which allows reconstructing the rotation angles for any real, non-negative choice of {ψ n }.
Having found the circuit shown in Fig. 4, the only difficulty is to arrange the required measurements such that the overall preparation procedure can be embedded efficiently in a rectangular 2D structure, which is shown in Appendix A.VI.2. We hence arrive at the MBQC measurement pattern depicted in Fig. 5, which generates the sine state of Eq. (5) with weights as in Eq. (6). It requires a square 2D cluster of (at most) 3 × (4N − 2) qubits to prepare an N -qubit probe state. Crucially, the number of qubits in the cluster increases only linearly with the size of the probe. Moreover, any other probe state in the subspace spanned by the vectors { n ⟩ un } n=0,...,N can be prepared with the same efficiency in a similar way by replacing the Pauli-Y rotations by other single-qubit unitaries.
Next, we will show in Section V how a large class of useful measurements of the encoded probe states (including the QFT measurement) can be carried out efficiently.
MBQC pattern for sine state. In (a) the measurement pattern for the preparation of the sine state of Eq. (5) is shown (in part). The measurement angles ϕi (i = 1, 2, 3) determine the angle φ1 of the first rotation Ry(φ1) in Fig. 4, while the angles αi and βi are chosen to realize Ry(−φ2 2) and Ry(φ2 2), respectively, which combine with the CZ gate of the cluster to realize the first controlled operation in Fig. 4. The initial Hadamard gates to switch the qubits initialized in + ⟩ to 0 ⟩ are also included in this measurement pattern. (b) shows the pattern as part of an initial 2D cluster. Assuming that each qubit in the cluster is initially connected to its nearest neighbours, the qubits indicated by isolated gray disks have to be disconnected from the remaining cluster by Z-measurements. The qubits indicated by (blue) diamonds represent the probe state qubits, which are subsequently exposed to the transformation U θ .

V. EFFICIENT UNARY-TO-BINARY COMPRESSION
Finally, we turn to the implementation of the measurements required to achieve Heisenberg scaling. In principle, the optimal measurement for a given prior and cost function may be an arbitrarily complicated measurement in an entangled basis of N -qubit states, for example, a projective measurement in the QFT basis (see, e.g., Ref. [34,Chapter 5] or [42,43]).
Fortunately, closer inspection reveals that we do not require arbitrary measurements on N but only on λ ∶= ⌈log(N + 1)⌉ qubits, where the logarithm is understood to be to base 2. This is the case because all encoded information about the phase is stored within the (N + 1)-dimensional subspace spanned by the vectors { n ⟩ un } n=0,...,N . All optimal measurements can hence be restricted to this subspace. To exploit this observation, we will now present an efficient algorithm that coherently compresses the information encoded in the probe state on the 2 N -dimensional Hilbert space of N qubits to the exponentially smaller space of λ ∶= ⌈log(N + 1)⌉ qubits.
The principle of operation of this N -step compression algorithm, shown in Fig. 6 (a), is to switch from the unary encoding of the number n in the state to a binary encoding of the same number via a unitary transformation and extend the result to superpositions of different states n ⟩ un by linearity. The unary-to-binary conversion is achieved by successive binary addition of each bit in the string (u 1 , u 2 , . . . , u N ) to the bit string of an auxiliary register of length λ initially representing the number 0. The corresponding qubits are initialized in the state In the k-th step of the procedure, the bit u k is added to the binary representation The binary addition of u k to the least significant digit b (k − 1) 0 of n (k − 1) is performed by a half adder circuit, see where ⊕ denotes addition modulo 2. The CNOT is preceded by a Toffoli gate whose target is an additional auxiliary qubit which stores the carry bit (see, e.g., Ref. [44][45][46] for quantum arithmetic operations). This carry bit is then added to the next binary digit b (k − 1) 1 by another half adder. The procedure carries on until reaching the final binary digit b (k − 1) λ−1 , where the half adder can be replaced by a simple CNOT gate, since the register size was chosen such that the final carry bit is always 0.
In each of the N steps A k , one of the unary bits is added to the bits of the binary representation by way of λ half adder circuits. Each of the latter consists of a Toffoli gate writing the carry bit on one of the λ−1 auxiliary qubits initialized in the state 0 ⟩, and a CNOT gate carrying out the modulo-2 addition. The final half adder does not require its own auxiliary qubit or Toffoli gate, since the last carry bit always takes the value 0. After the binary addition, the carry bits and the respective unary register are uncomputed, i.e., coherently erased. For the carry bits this is achieved by Toffoli gates, while the register carrying the value u k is switched to 0 by a generalized Toffoli (a CNOT gate with multiple controls) conditioned on the binary encoding of the number k (shown in A1 for k = 1, where ○ and •, indicate conditioning on the states 0 ⟩ and 1 ⟩, respectively). A final parallel application of nearest neighbour swap gates ( ) arranges the auxiliary and binary register qubits appropriately for the application of the next step A k+1 . The circuit depth and size of each A k is O(λ). In (b), the complete measurement pattern for Bayesian phase estimation in MBQC is shown, incorporating the preparation scheme (green) of Fig. 5 into the same 2D cluster as the measurement procedure. Note that for the parameter encoding, only the preparation part (green) should be exposed to the transformation, while the remaining cluster must be shielded or dynamically decoupled (see, e.g., Ref. [41]). Each of the A k circuits from (a) can be translated to a measurement pattern A MBQC k on O(λ 2 ) qubits of the cluster, which are connected to the k-th output qubit of the preparation phase (blue disks). Black lines indicate "teleportation wires" of length O(λ), i.e., that additional qubits have to be introduced to connect the correct input qubits (blue) to the corresponding parts A MBQC k of the cluster. After the unary-to-binary compression, measurements (e.g., the QFT) can be carried out efficiently on the logarithmically small subspace even if their MBQC implementation requires O(2 λ ) qubits.
Subsequently, the qubits corresponding to the carry bits and u k have to be disentangled from the qubits carrying the binary encoding. For the carry bits, this is achieved by another cascade of Toffoli gates [see Fig. 6 (a)], since the carry bit can only have the value 1, if both of the previously added bits have the value 1 as well. To coherently erase u k , note that the binary string (b (k) λ−1 , . . . , b (k) 0 ) encodes the number k only if u k = 1. We can hence flip the corresponding qubit conditioned on the binary encoding of k using a generalized Toffoli gate. Using the already existing ancillas (which have previoulsy been returned to the state 0 ⟩), this multi-controlled CNOT gate can be realized in a standard construction using λ − 1 nearest-neighbour (NN) SWAP gates, preceding and following an array of 2(λ − 1) Toffolis on three adjacent qubits along with a single CNOT [34, p. 184]. Conditioning on states 0 ⟩ rather than 1 ⟩ requires at most 2λ additional single-qubit X gates. Having disentangled all other qubits from the λ qubits storing the binary encoding, we perform another λ NN SWAPS in anticipation of inputting the next unary digit u k+1 .
Taking into account that each Toffoli or NN SWAP gate can be realized with a constant overhead in NN CNOT and single-qubit gates, we find that the circuit for A k requires at most O(λ) NN CNOT and single-qubit gates. The entire unary-to-binary compression algorithm consists of N such elements, resulting in a circuit size of O(N log N ) on an input of length O(log N ), which can hence be realized with at most O(N log 2 N ) qubits in MBQC, see Fig. 6 On the logarithmically sized (in N ) output, any measurement can then be performed efficiently as long as the corresponding unitary on λ ∶= ⌈log(N +1)⌉ qubits requires no more than 2 λ NN gates. While this does not cover all possible unitaries (e.g., the construction discussed in Ref. [34, p. 193] requires O(λ 2 2 2λ ) two-qubit and singlequbit gates), some particularly useful unitaries may be much less costly. For instance, an implementation of the QFT on a λ-qubit linear nearest-neighbour architecture presented in Ref. [43] has circuit size O(λ log λ) and depth O(λ), meaning an overhead of only O(λ 2 ) qubits (depth times input length) in a measurement-based setting.

VI. DISCUSSION
In summary, we have shown that 2D architectures for MBQC provide flexible resources for quantum-enhanced metrology tasks. That is, an initial array of qubits prepared in a 2D cluster state and local operations are used to achieve Heisenberg scaling for phase and frequency estimation in both the local (frequentist) and the Bayesian approach to parameter estimation. In the Bayesian scenario, the preparation procedure presented can be applied to execute strategies with optimal states for arbitrary priors and cost functions. This flexibility allows outperforming other approaches where a fixed probe state (e.g., an array of differently sized GHZ states) is used for different task without adaption to the specific problem at hand. The efficient compression algorithm further allows to perform measurements with up to exponential circuit sizes. This includes the QFT measurement that is optimal for flat priors, provides Heisenberg scaling for Gaussian priors of varying widths, and is expected to perform similarly well also for other priors under certain regularity conditions.
In principle, our results can be generalized also to scenarios beyond phase and frequency estimation. For all local Hamiltonians that are not proportional to Z, appropriate local corrections can be applied on the sensing qubits before and after the encoding such that the overall transformation commutes with the controlled phase gates used to create the cluster. For instance, when H = 1 2 X, Hadamard gates before and after U θ produce an encoding transformation that commutes with CZ and can hence be applied after the entire cluster for sensing and measurements has been prepared. Moreover, when the corresponding states and measurements giving Heisenberg scaling are known, a similar method can also be employed for nonlocal interaction Hamiltonians, provided that they are proportional to a product of Pauli operators, or linear combinations of products of only one type of Pauli operators. For example, for parameter estimation with Ising-type couplings of the form H = ∑ i,j c ij X i ⊗X j , GHZ states and local measurements achieve Heisenberg scaling [24], which can hence be efficiently implemented in our scheme. Nonetheless, many interesting questions regarding the applicability to general dynamics and scaling beyond the Heisenberg limit [47][48][49][50][51] remain.
Our results are of practical significance since they suggest that a single platform, 2D cluster states, can be flexible enough for a plethora of precision-enhanced parameter estimation tasks. In addition, this platform could in principle also be part of an integrated device, where a parameter estimation strategy is used to learn about, e.g., stray fields or the particular form of noise processes. For this purpose, part of the 2D cluster state can be used for sensing, while the remaining qubits are used to perform MBQC. The gathered information from the parameter estimation can then be used to improve the performance of the computation: By learning stray fields, one can compensate for systematic errors. By learning the particular shape of a noise process, one can adapt to an optimized error correction code, thereby reducing the overhead for fault-tolerant implementations.
At the same time, this connection between computational and metrological resources provides interesting insights. The advantage in metrology is provided by the entanglement of the cluster state, i.e., the CZ gates applied to neighbouring pairs of qubits, which ensures the improved performance with respect to an array of unentangled, individual qubits. At the same time, it is known that metrological advantages can, but need not arise solely from entanglement [52][53][54]. For example, nonclassicality in terms of squeezing can lead to Heisenberg scaling in precision [55,56] without any entanglement when the average energy is considered as the resource. This work hence also contributes to the discussion of the required physical resources for parameter estimation [57], and the relationship between computational power and metrology [58].
Finally, open questions remain regarding the role of noise [59][60][61], especially in connection with adaptive approaches to computation and error-correction involving metrology [62][63][64][65][66][67]. Although noise is known to be problematic in the limit of infinitely many qubits since it is known to restrict to a linear scaling of precision, i.e., I ≤ κN for some constant κ, the approach presented here holds the promise of significantly outperforming classical strategies for finite system sizes. Indeed, this follows from the observation that the constant κ strongly depends on the strength and type of the noise [60,64] and can be arbitrarily large if the noise is weak enough. Meanwhile, the overhead needed for preparation and measurement of the optimal state does not depend on the noise, leaving room for an arbitrarily large advantage of our scheme over classical strategies for any fixed N . In addition, techniques that deal with errors and maintain a metrological advantage are known (see, e.g., [62][63][64]) and may be applicable here. We leave such extensions for future work, along with the explicit determination of optimal [68,69] and "pretty good" states [70] for specific metrological tasks in our framework, where recent algorithmic approaches [71]  In this appendix, we give a detailed description of the local parameter estimation scenario and show how Heisenberg scaling can be achieved using a GHZ state and local measurements.

A.I.1. The Local Estimation Scenario
We consider a typical parameter estimation scenario, where θ, the quantity of interest, is encoded in a density operator ρ(θ) by a dynamical (unitary) transformation We then perform a measurement with POVM elements {E m } which yields an outcome m. The (conditional) probability of obtaining the measurement outcome m (given that the parameter has the value θ) is then To each measurement outcome m, an estimatorθ(m) assigns a corresponding estimate for the value of θ. The estimator is called unbiased if it assigns the value θ on average, that is, if the expected value of the estimator satisfies This requirement ensures the accuracy of the measurement procedure, but not its precision, which is determined by the variance V [θ(m)] of the estimator. We use the mean-square error (MSE) given by and σ = V [θ(m)] is the associated standard deviation. Unfortunately, it is often the case that a given estimator offers high precision only within a small range of the parameter θ, but not globally, as we shall discuss for a simple example in Appendix A.I.4. Such estimators are hence useful locally, i.e., for estimating small fluctuations of the parameter around some known value. In such local estimation scenarios, accuracy is guaranteed even when unbiasedness as specified in Eq. (A.3) is required to hold only in the vicinity of this value.
To increase the precision, the procedure consisting of preparation, encoding, and measurement may be repeated a number of times, say ν, providing estimates θ (i) (i = 1, . . . , ν), from which the mean valuē and the associated MSE can be calculated. As ν increases, the mean and variance computed from the measurement data converge to the expected value ⟨θ(m) ⟩ of the estimates and the expected value of the corresponding variance, V [θ(m)], respectively. Trusting that the results of the individual runs are independent and identically distributed (i.i.d.), the variance of the distribution of mean values with ν samples decreases linearly with ν. The overall expected precision associated to the resultθ ν is hence quantified by the standard error of the mean, given by In other words, the precision increases with the number of runs, but the options for choosing a probe state, measurement, and estimator still leave room for improvement.
It is here that measurement strategies using genuine quantum features such as entanglement and nonclassicality can provide advantages with respect to classical strategies. To determine the potential gain and to allow comparisons with the best classical protocol it is useful to eliminate the choice of estimator, and consider the important Cramér-Rao bound, before discussing an example estimation scenario in Appendix A.I.4.

A.I.2. The Cramér-Rao bound
For any unbiased estimator the variance V [θ(m)] can be shown (see, e.g., Refs. [72][73][74] or Appendix A.I.3) to satisfy the Cramér-Rao (CR) inequality where I ρ(θ) is the Fisher information (FI) given by Here it is noteworthy that, on the one hand, the FI does not depend on the choice of the estimator (as long as it is unbiased), and one can hence determine a lower bound for the variance based solely on the initial state and the chosen measurement. On the other hand, the FI typically depends on the value of the parameter and an unbiased estimator for which the CR inequality globally becomes an equality may not exist for all values. However, estimators can be found for which the bound is tight locally, and globally in the asymptotic limit of ν → ∞, see, e.g., Ref. [75]. One may then further ask, what the optimal measurement strategy is for a given probe state and parameter encoding. The maximization of the FI over all possible POVMs then yields (see, e.g., [76]) the quantum Fisher information (QFI) I ρ(θ) , given by where the operatorŜ θ ≡Ŝ[ρ(θ)], called the symmetric logarithmic derivative (SLD) is implicitly given by the relationŜ and where the dot indicates the partial derivative with respect to θ, i.e.,ρ = ∂ ∂θ ρ. The corresponding quantum Cramér-Rao bound is hence simply V [θ(m)] ≥ 1 I ρ(θ) . The optimal measurement for which the FI and the QFI coincide is a projective measurement in the eigenbasis of the SLDŜ θ [76].
If we additionally restrict to pure probe states ψ ⟩ as before, the QFI takes the simple form (see, e.g., Ref. [4]) where ψ θ ⟩ = U θ ψ ⟩ is the encoded state and the dot indicates a partial derivative with respect to θ. Since U θ = e −iθH , a simple computation then reveals that the QFI for such scenarios is proportional to the variance of the Hamiltonian generating the dynamics, i.e., and the SLD coincides withρ(θ). The QFI is hence maximal for pure states that maximize the variance of H, see, e.g., Refs. [4,77].
Let us now consider an estimation scenario where a probe state of N qubits is subject to a local transformation, i.e., where the Hamiltonian is of the form H = ∑ i=1,...,N H i and H i acts nontrivially only on the ith qubit. For simplicity, we assume that each qubit undergoes the same local transformation, H i = H j ≡ H 1 ∀i, j, and that the local Hamiltonian has eigenvalues 3 ± 1 2 with the corresponding eigenstates denoted by 0 ⟩ and 1 ⟩. We may further align our reference frame such that H 1 = S z = 1 2 Z, where S i is the spin-1 2 angular momentum operator for direction i = x, y, z and X, Y, Z denote the usual Pauli operators. A comment on estimation scenarios for other Hamiltonians can be found in Section VI of the main text, but here we are restricting our discussion to phase estimation scenarios where U θ = ⊗ N n=1 U (n) θ , such that U (n) θ = exp(−iθS (n) z ) acts only on the nth qubit. For ease of notation, we will drop the superscript (n) in the following when referring to single-qubit operations and there is no risk of confusion.
If the probe state is classical, i.e., a product state of the form ψ 1 ⟩ ⊗ ψ 2 ⟩ ⊗ . . . ⊗ ψ N ⟩, then the QFI becomes maximal when the local single-qubit probe states are all chosen to be + ⟩ = 0 ⟩ + 1 ⟩ √ 2, maximizing the variance of H 1 = S z . From Eq. (A.13) it then follows immediately that the largest possible value of the QFI for a classical probe of N qubits is The corresponding SLD is easily found to beŜ θ = cos θS y −sin θS x ⊗N , i.e., the optimal measurement is realized by single-qubit projective measurements in the basis U θ+π 2 ± ⟩, where ± ⟩ are the eigenstates of S x = 1 2 X.
It hence becomes obvious that classical measurement strategies can (at most) decrease the variance linearly with the number of qubits. This scaling behaviour is referred to as the standard quantum limit. As we shall discuss next, a different scaling behaviour can be achieved for quantum probes. Let us now revisit the local phase estimation scenario for an entangled state, for instance, the N -qubit GHZ state, given by A quick calculation of the QFI of Eq. (A.12) for this state provides the result I( ψ GHZ ⟩) = N 2 . The precision may hence quadratically increase with the number of qubits. This optimal scaling behaviour is usually called the Heisenberg limit. To see how one can practically achieve Heisenberg scaling, let us consider a simple parity measurement, that is, a projective measurement with outcomes m = +1 (even) and m = −1 (odd), and associated POVM elements Denoting the single-qubit projectors as P ± = ± ⟩⟨ ± , we can write where the sum is over all N n permutations π i . One then straightforwardly finds Tr E n ρ(θ) = Using the definition in Eq. (A.8) one can then verify that this measurement is optimal, i.e., the FI and QFI coincide, I( ψ GHZ ⟩) = I( ψ GHZ ⟩) = N 2 . We then only need to find a suitable estimator. We can construct such an estimator from the expected value of the associated observable M , which has the spectral decomposition M = E even −E odd = X ⊗N , such that ⟨ M ⟩ = cos(N θ). Crucially, note that the required measurements are just local X-measurements, the results of which are multiplied to obtain the overall measurement result in each run, i.e., m = m 1 m 2 . . . m N . For θ ∈ [0, π N ] we then assign the estimator Computing the mean and variance for this estimator one finds, The estimator is only unbiased for θ = π 2N , but in this case the variance admits Heisenberg scaling and takes the value π 2 4N 2 . However, one can do better than this by averaging over the outcomes before assigning an estimate, rather than averaging the individual estimates. Practically speaking, one can view this as estimating m(θ) = cos(N θ) followed by a simple reparametrization using θ(m) = arccos(m) N . This estimator is unbiased by definition, since m(θ) = ⟨ M ⟩ and one finds the variance Propagating the error through the reparameterization then yields One can hence get a quadratic scaling advantage for local phase estimation using an N -qubit GHZ state and local measurements. By extension via error propagation, Heisenberg scaling is also maintained for frequency estimation by reparameterizing θ = ωt for any fixed interrogation time t. As shown in Fig. 2 in the main text, the preparation of an N -qubit GHZ state can be realized using a (2N − 1)-qubit 1D cluster state, which hence constitutes a resource for local phase and frequency estimation at the Heisenberg limit.

A.II. Bayesian Parameter Estimation
In Appendix A.II.1, we first review the basic structure of Bayesian parameter estimation problems. We then discuss an inequality that serves as a Bayesian analogue of the Cramér-Rao bound in Appendix A.II.2 and present a simple proof in Appendix A.II.3, before highlighting an interesting connecting between Bayesian estimation and noisy local estimation in Appendix A.II.4. Finally, we investigate the limitations of the MSE cost function for Bayesian estimation in Appendix A.II.5.

A.II.1. The Bayesian Estimation Scenario
Much like in the local estimation scenario discussed in Appendix A.I, the Bayesian scenario considers the estimation of a parameter θ that has been encoded onto a quantum state ρ(θ) by performing a measurement given by some POVM {E m }. As before, the conditional probability to obtain the outcome m given that the parameter has the value θ is p(m θ) = Tr E m ρ(θ) .
(A.33) However, where the local estimation scenario requires only that the parameter be close to values for which an unbiased estimator is available, the Bayesian estimation scenario captures all previously held belief about θ in a probability distribution referred to as the prior p(θ). Performing a single measurement, the probability to obtain the outcome m is then simply where we have defined the quantity Γ = dθ p(θ) ρ(θ) , (A.35) following the notation of Ref. [36]). Given some outcome m, we then want to provide an estimateθ(m) for the value of the parameter. To this end, note that Bayes' law lets us determine p(θ m), the probability that the parameter had the value θ given the outcome m, as As an estimate we then simply average the possible values of θ weighted with the corresponding probabilities p(θ m), i.e., Here, a comment on the choice of V post as a figure of merit for the average increase in the knowledge is in order. For parameters (and priors) that have support on the entirety of R, the MSE is certainly a useful choice. However, when estimating parameters with bounded support other quantifiers of the width of the posterior may be more appropriate. For instance, for phase estimation one may consider the Holevo phase variance as discussed in Section III. We will nonetheless consider the MSE in the following. This has several reasons. First, the MSE can still be useful for phase estimation when the priors are suitably narrow (see Appendix A.II.5) and it allows to establish some simple bounds (see Appendix A.II.2) for the optimal classical estimation strategies, as we shall explain in Appendix A.III.2. Second, the MSE is of course useful for frequency estimation problems (see Appendix A.V), where the parameter range is not bounded. We hence allow the parameter to take values θ ∈ [−∞, ∞] for the remainder of this work. As a simple example, consider a Gaussian prior of width σ > 0 centered at θ = θ o , that is, while the remaining term determines the average decrease in width of the posterior with respect to the prior.

A.II.2. A Bayesian Cramér-Rao bound
The average variance of the posterior can be bounded from below using the van Trees inquality (see, e.g., Ref. [74,78]  is the classical Fisher information of the prior and is the averaged (over the unknown parameter θ) FI associated to the state ρ(θ) and the POVM {E m } as specified in Eq. (A.8).
Since the QFI I ρ(θ) arises as a maximization of the FI I ρ(θ) over all POVMs, we have I ρ(θ) ≥ I ρ(θ) . If, as before for the local case, we consider the parameter to be encoded by a unitary transformation of the form of Eq. (A.1), the QFI is independent of θ, as we have shown in Appendix A.I.2. This allows us to bound the average FI by the QFI, i.e., and consequently we can modify the van Trees inequality to .
(A. 49) In contrast to the (quantum) Cramér-Rao inequality (A.7), the bounds in (A.45) and (A.49) are generally not tight, so they do not allow us to conclude that a measurement strategy exists such that 1 V post grows quadratically with N . And while it can indeed be shown [25] that Heisenberg scaling is asymptotically achievable for arbitrary priors in the Bayesian regime we require an explicit description of the involved states and measurements to determine whether these can be efficiently implemented. Nonetheless, a simple consequence of the van Trees inequality pertains to the classical scaling behaviour. Recall from Eq. (A.14) that the maximal value of the QFI for product states is proportional to N . This implies that V post decreases at most linearly with N for classical strategies, i.e., 1 V post ≤ N + I p(θ) , where I p(θ) is a constant independent of N . For instance, for the Gaussian prior of Eq. (A.42), which we want to focus on in the following, we have I p(θ) = 1 σ 2 .
At this point, two comments on the choice of Gaussian priors are in order. First, note that there exists an interesting connection between Bayesian estimation with Gaussian priors and local estimation subject to parallel, Gaussian noise [26]. As is outlined in Appendix A.II.4, this connection provides an alternative way of computing the variance V post via the (quantum) Fisher information of the probe state after a noisy channel. Here, we do not explicitly consider the problem of noisy metrology in more detail, but we refer the interested reader to Refs. [59,60].
The second comment concerns the fact that the probability distribution of Eq. (A.42) has support on the entire real line, whereas for phase estimation, θ only takes values in an interval of length 2π. In addition, the use of the MSE means that differences between estimates and parameter values larger than π are disproportionately penalized. Intuitively it is clear that this becomes an issue when the width of the Gaussian prior becomes comparable with (half of) the length of the interval for θ. In Appendix A.II.5 this problem is discussed in more detail.
For sufficiently narrow priors the MSE is hence still a useful cost function for the variance and (non-wrapped) Gaussians can be employed instead of the more complicated wrapped Gaussians to simplify calculations. Moreover, the use of the MSE (rather than some circular statistics equivalent or covariant cost function, cf. Ref. [4]) as a measure for the precision of the estimate allows us to remain within the framework of Ref. [36]. It also permits us to apply the Bayesian Cramér-Rao bound of Ineq. (A.49), which provides a straightforward comparison with classical strategies, as we shall discuss in Appendix A.III.2. Finally, note that these considerations arise for the phase estimation problem discussed in this section, but are no cause for concern in the frequency estimation paradigm, which is presented in Appendix A.V.

A.II.3. A Proof of the Bayesian Cramér-Rao Bound
We now want to present an explicit proof that the average variance V post of the posterior p(θ m) can be bounded from below by the van Trees inequality [78], which is the Bayesian equivalent of the Cramér-Rao bound, given by where I p(θ) is the classical Fisher information of the prior, given by andĪ ρ(θ) =Ī ρ(θ), {E m } is the Fisher information associated to the state ρ(θ) and the POVM {E m }, averaged over the (unknown) parameter θ. That is, it is given bȳ In the frequency estimation scenario, the parameter θ is typically allowed to take on any value in R, but the prior is assumed to have compact support, such that p(±∞) = 0. In the phase estimation scenario, on the other hand, the parameter can take values in the interval [a, a+2π] for some a ∈ R and w.l.o.g. one may pick a = 0. In this case, one may assume that the probability densities are either wrapped, e.g., the prior satisfies p(θ) = p(θ mod 2π) and θ is to be understood as θ mod 2π. Alternatively, one can also treat θ to be any real number, and require that the prior be sufficiently narrow. In the latter scenario, one can still use the MSE approach for the variance, but care needs to be taken with the initial width of the prior, as discussed in Appendix A.II.5. With this in mind, we now discuss a proof of Eq. (A.50). First, note that

. Relating Noisy Local Estimation with Bayesian Estimation for Gaussian Priors
In this appendix we discuss an interesting connection between noisy local estimation and Bayesian estimation for Gaussian priors. We hence consider a local estimation scenario as in Section A.I.1, where "parallel" noise is present on top of the unitary encoding of Eq. (A.1). That is, the noise is generated by the same Hamiltonian as the encoding of the parameter but distributed according to some probability distributionp(θ). The state encoding the parameter is then given bỹ where the noise can be understood as part of preparing the initial statẽ starting from some pure state ψ ⟩. We further assume that the noise has a Gaussian profile centered around zero, that is, the noise distribution is where θ o is the mean of the Gaussian prior p(θ) of Eq. (A.42). We can now see how the encoded state of this noisy local scenario corresponds to the quantity Γ from Eq. (A.35) in the Bayesian scenario, i.e., where we have substituted θ ′ = θ − θ o . To establish a similar connection for η from Eq. (A.38), we make use of the fact that the prior (and the noise distribution in the local scenario) are Gaussian, such that With this, we find where the dot indicates a partial derivative w.r.t. This result immediately informs us about an important property of the Bayesian scenario. Since I(ρ) is the QFI in a scenario with parallel noise that can be viewed as dephasing, one cannot expect Heisenberg scaling of I(ρ), i.e., that I(ρ) increases quadratically with N , see Refs. [59,60]. Instead, it is clear that I(ρ) ≤ 1 σ 2 since V post ≥ 0. On the other hand, one expects that I(ρ) approaches the bound 1 σ 2 from below as N increases. As suggested in Ref. [61], it is reasonable to assume that for some positive constant K and some power α ≥ 1, such that V post = K σ 4 N α . Therefore, a scaling advantage of a quantum strategy with respect to a classical strategy is obtained if one finds an (efficiently preparable) state and POVM such that α > 1.

A.II.5. Limitations of the MSE Approach
Here, we aim to discuss the limitations of applicability of the mean square error (MSE) cost function for Bayesian phase estimation, i.e., for a scenario where the parameter θ is encoded by a unitary U θ = e −iθH , with H = 1 2 Z for each qubit. Since the difference between the two eigenvalues of H is 1, it is immediately apparent that values of θ that differ by 2π cannot be distinguished in such a scenario. This periodicity is not accurately reflected in the use of the MSE, since estimates that differ by integer multiples of 2π are unduly penalized. In a local estimation scenario where small fluctuations around a fixed value of the parameter are being estimated, this is not an issue. Similarly, this is of no concern for Bayesian estimation when the prior is sufficiently localized, but can become an issue for larger values of σ [where we focus on Gausian priors as in Eq. (A.42)]. We are therefore interested in quantifying for which values of σ the approach using the MSE cost function becomes problematic.
We will take a pragmatic point of view and consider the MSE approach as useful, if this post-processing of the measurement data provides an increase in knowledge in the sense of an average decrease of the width of the posterior p(θ m). We therefore ask, what the minimal MSE of the posterior can be in principle, given a fixed Gaussian prior of width σ. When obtaining a measurement outcome m, the corresponding estimate may in principle only be understood modulo 2π. In other words, if no prior knowledge is available, and one were to trust the estimate of the parameter unconditionally, the posterior would be a "comb" of Dirac delta functions δ(θ − 2πk) for all values k such that θ − 2πk lies within the allowed range of parameters. For an unrestricted range, θ ∈ R, we hence have infinitely many side-peaks at distances 2πk for k ∈ Z. If we take into account the prior information, some of these peaks are suppressed by its shape, e.g., as exp(− θ 2 2σ 2 ) for a Gaussian prior. The optimally reachable posterior is then given by where the normalization is given by As illustrated in Fig. A.1, the MSE of this optimal posterior strongly increases from around σ ≈ π 2, and from around 5π 4 the width stays virtually constant as compared to the MSE of the prior. Of course this does not mean that the measurement process does not provide information about the parameter. Clearly, knowing the value of θ exactly modulo 2π is more useful than a uniform prior. However, the MSE simply fails to capture this distinction. We hence have to keep this limited applicability of the approach using (non-wrapped, Gaussian) priors and the MSE cost function in mind. Specifically, we restrict our analysis to Gaussian priors of widths smaller or equal than 1.

A.III. Classical Bayesian Estimation Strategies
After introducing the quantities of interest for Bayesian parameter estimation in the previous appendix, we now want to illustrate these techniques for classical Bayesian estimation. This provides the opportunity to establish a direct comparison with the results obtained for a strategy exploiting quantum features that we will present in Appendix A.IV.
We consider a strategy to be classical, if no quantum correlations are used for the state preparation or measurement, which corresponds to the choice of product states for N qubits along with single-qubit measurements. The Bayesian approach allows updating the estimation strategy based on the outcomes of previous measurements. Consequently, a parallel strategy of N individual probes that are prepared and measured in the same way may not be optimal even among the classical measurement schemes. At the same time, the explicit evaluation of a sequential measurement strategy with intermediate updates is computationally extremely demanding. To give a fair representation of the performance of classical strategies we hence consider a bound for the sequential measurement scheme in Appendix A.III.2, and compute the average variance explicitly for the optimal parallel strategy in Appendix A.III.3. In preparation for these scenarios, we begin with the single-qubit Bayesian estimation problem in Appendix A.III.1

A.III.1. Single-Qubit Measurements
For the scenario that we consider here, i.e., Gaussian priors as in Eq. (A.42) and unitary parameter encodings as in Eq. (A.1), the optimal single-qubit measurement strategy for Bayesian estimation is similar to that of the local scenario. That is, the probe state is chosen to be + ⟩, i.e., a uniform superposition of the eigenstates of H. The optimal accompanying measurement is a projective measurement with POVM elements which corresponds to a measurement in a direction on the equatorial plane of the Bloch sphere that is orthogonal to the direction obtained by rotating ± ⟩ by the expected value θ o of the prior. This can be seen by noting that probe states and measurement directions can be restricted to the equatorial plane, followed by an optimization over the angle defining their relative orientation. For this combination of state and measurement, the conditional probabilities to obtain the outcomes "+" or "−" are such that p(± θ o ) = 1 2. We further compute The corresponding estimates are then easily found by inserting into Eq. (A.37), yielding Since 0 < σ 2 e −σ 2 < 1 for finite, nonzero σ, the variance decreases on average, V post < σ 2 , but it becomes apparent that the decrease in width quantified by ∆V ∶= σ 2 − V post has a maximum for σ = 2. This signifies that the MSE approach using Gaussian priors is not useful for priors of large width when considering phase estimation (see Appendix A.II.5 for a discussion of this issue). It is also interesting to note that the posterior distributions conditional on the outcomes m = ± are given by Unlike the prior, the posterior distributions illustrated in Fig. A.2 are no longer Gaussian, and they are not symmetric around their mean values θ = θ m=± .

A.III.2. Bound for Multi-Qubit Measurements
We are now interested in making statements about the optimal strategy for Bayesian estimation using a sequence of N consecutive single-qubit probes. Unfortunately, the posterior even after one measurement is no longer Gaussian (or symmetric). Therefore, determining the optimal single-qubit measurements and updating the prior becomes problematic for large numbers of measurements. This may not be an issue in an actual measurement, where each qubit gives a single outcome based on which the next measurement is chosen. However, we are interested in the variance of the posterior averaged over all possible sequential measurement outcomes, the set of which grows exponentially. Having 2 N potentially different posterior distributions makes such an approach computationally infeasible.
We shall therefore refrain from obtaining the exact expression for the optimal expected variance V post after N sequential single-qubit measurements with updated directions. Instead, we construct a bound based on the Bayesian Cramér-Rao inequality (A.49). We note that the updating procedure can be entirely thought of as part of the choice of measurement direction, while the probe state + ⟩ remains the same throughout. Further recall that the QFI entails an optimization over all possible measurements including correlated measurements that can depend on previous outcomes. A lower bound for V post in the classical case is hence obtained from the QFI for the state + ⟩ ⊗N , which we have previously de- where we have used that I p(θ) = 1 σ 2 . Any classical strategy, may it consist of parallel or sequential measurements, must give an expected variance larger than this bound. This result also extends to the (asymptotic) behaviour of the Holevo phase variance V φ of Eq. (18) since V φ reduces to the MSE as σ → 0 (see, e.g., [79, p. 7]). Consequently, the Holevo phase variance of any successful sequential measurement strategy will approach the behaviour of the MSE. The faster (in terms of the number of measurements) the strategy decreases the phase variance, the sooner one will enter a regime where the bound of Ineq. (A.64) applies. Moreover, the bound in Ineq. (A.83) is not tight and might significantly overestimate the performance of classical strategies since the optimization in the QFI also includes entangled measurements. We therefore complement this bound by an investigation into the optimal parallel strategy in Section A.III.3.

A.III.3. Optimal Parallel Strategy
Having obtained the previous lower bound for V post for the optimal classical strategy, one may wonder, how close a practical classical strategy may come to this bound. To address this question, we now consider the optimal classical, parallel strategy for Gaussian priors. That is, we compute V post in the case where N qubits are identically prepared and measured (i.e., without intermediate updates) with the optimal single-qubit strategy based on the prior information (see Section A.III.1). The probe state is hence + ⟩ ⊗N and for each qubit we perform the POVM with elementsẼ ± as in Eq. (A.77). Since the state is invariant under the exchange of qubits, it is irrelevant, which of the N qubits give results "+", and which give results "−", we note that there are only N + 1 qualitatively different measurement outcomes. We label these outcomes by m = 0, 1, . . . , N , which we take to be the number of outcomes "−". In other words, for the given state this measurement is equivalent to the POVM with element E m from Eq. (A.26). The conditional probability to obtain the outcome m, given that the parameter takes the value θ is then We then insert for p(± θ) from Eq. (A.78) and find where the quantity I k+k ′ is given by For the Gaussian prior we then need to further compute Tr(E m Γ) , Here we have a different integral, J k+k ′ , which is of the form we can easily integrate by parts and write Having thoroughly investigated the performance of classical estimation strategies in Bayesian scenarios, we will next turn to strategies involving genuine quantum features.

A.IV. Quantum Advantage in Bayesian Estimation
With respect to the local estimation scenario, Bayesian estimation is made considerably more complicated by the in principle arbitrary shape of the prior. Consequently, results on optimality are scarcely available apart from very special cases such as phase estimation for flat priors [19], for which an optimal (albeit with respect to a different cost function for the variance) pair of probe state and measurement have been determined. Here, we will discuss a slightly modified version of the scheme of Ref. [19] as an example and show that it can lead to a scaling advantage also for other choices of priors (and cost functions).
The probe state in question is a superposition of Nqubit computational basis states, where one representative n ⟩ un = 1 ⟩ ⊗n 0 ⟩ ⊗N −n is selected for each Hamming weight, i.e., from each subspace with a fixed number of qubits in the state 1 ⟩. That is, n ⟩ un is a unary encoding of the integer n. For flat priors [and using the Holevo phase variance [40] instead of the MSE of Eq. (A.40)], the optimal probe state ψ sine ⟩ is of the form where the coefficients are chosen with a sinusoidal profile (see, e.g., Ref. [19]), i.e., For the sake of illustration, we will study the performance of this particular state that we will refer to as the sine state for the MSE and Gaussian priors of finite width. Nonetheless, it is crucial to note that the optimal probe state for phase estimation with any prior (and variance) must be of the form of Eq. (A.97) for some choice of coefficients. This is due to the fact that ψ sine ⟩ already contains one representative eigenvector of U θ (and H) for each of its different eigenvalues. Adding any other components outside of the span of { n ⟩ un } n=0,...,N would hence not provide any more information about the phase θ. After the unitary dynamics U θ , the probe state is thus of the form Also note that the probe state we have chosen is not symmetric with respect to the exchange of the different qubits. However, relinquishing this symmetry requirement allows us to operate in an (N + 1)-dimensional subspace of the total Hilbert space of dimension 2 N , which will prove to be crucial for the efficient implementation of the estimation scheme in MBQC. As a measurement strategy for our example, we will consider a quantum Fourier transform (QFT) in the subspace spanned by the vectors n ⟩ un , followed by computational basis measurements. This measurement can be represented by a POVM with elements E k = e k ⟩⟨ e k for k = 0, 1, 2, . . . , N and Practically, we can ignore the POVM element E N +1 , as the corresponding outcome never occurs for the chosen probe state (in the absence of noise). With this, we are now in a position to compute V post from Eq. (A.41) where we again assume a Gaussian prior as in Eq. (A.42). We hence need to calculate To rewrite this quantity, it is useful to first determine Tr(E k ρ(θ)) where ρ(θ) = U θ ψ sine ⟩⟨ ψ sine U † θ , for which we obtain With this, we quickly find where γ k is given by The plots in Fig. A.4 indicate that for narrow priors (e.g. for σ = 0.1, . . . , 0.5) the example quantum strategy exhibits a quadratic scaling gap with respect to all classical measurements schemes, meaning that the variance in the quantum strategy decreases more strongly with N than classically possible. As discussed in Ref. [25], this is possible for all priors under certain regularity assumptions, but the explicit form of the optimal states and measurements is generally not known. Indeed, we cannot conclude that the strategy that we discuss here is optimal, but (at least) for narrow Gaussian priors (σ ≤ 0.5) we find that it directly outperforms even the (overly optimistic) bound on classical strategies from Ineq. (A.83) already for N = 6 qubits. For broader priors, we can not report a scaling advantage for this example, but this is to be expected using the MSE. However, recall that the  .83), which overestimates the best sequential classical strategy. The dashed lines correspond to the optimal classical parallel strategy from Fig. A.3. As can be seen in (a) and (b), the quantum strategy using the sine states may outperform the best classical strategy for small prior widths σ, providing a scaling advantage, i.e., 1 V post increases stronger than linearly with N . However, for larger σ it performs worse, that is, it still outperforms the optimal classical parallel strategy, but only by a constant improvement, as can be seen in (c). measurement strategy we discuss here is known to be optimal in the case of flat priors for an appropriately chosen cost function [19], and our results are hence complimentary in the sense that we provide numerical evidence for optimal scaling in a regime of narrow priors. Additional plots for direct comparison with the classical bounds can be found in Fig. A.5.

A.V. Bayesian Frequency Estimation
In this appendix we investigate on Bayesian frequency estimation, i.e., the case where the parameter to be estimated is the angular frequency, ω, rather than the phase θ, i.e., such that θ = ωt. The key difference of frequency estimation compared to phase estimation is that in the former we have the freedom to optimize over the interrogation time t. We shall do this for some of the states and measurement previously considered for phase estimation. Specifically, for the optimal classical parallel measurement strategy and for the quantum strategy using the sine states and QFT measurements from Eqs. (A.98) and (A.100), respectively.
More precisely, the dynamical evolution of each qubit is described by the unitary transformation U (ωt) = e −iωtZ 2 , and our prior information about ω is given by the normal distribution    In this appendix, we will briefly review the basic concepts of MBQC, but we direct the interested reader to more detailed reviews in Refs. [38,39]. In this computational paradigm, established in Refs. [30,80], a specific entangled state (e.g., a cluster state) is prepared in an array of qubits. Using the entanglement present in the system along with local measurements on a subset of the qubits, (arbitrary) unitary transformation may be implemented on the remaining qubits (if the cluster is large enough). Here, we will focus on MBQC based on 1D and 2D cluster states, i.e., graph states [37] based on regular, linear or rectangular lattices. Each vertex of the graph The inverse average variance of the posterior ∆ 2 V post, optimized over the interrogation time t and plotted against the qubit number N , is compared for the optimal classical parallel strategy (red, dashed) and the quantum strategy using the sine states and QFT measurements (blue, solid). For the plotted range one can clearly see that the quantum strategy provides a scaling advantage with respect to the best parallel classical measurements, that is, the solid blue curve for ∆ 2 V post increases quadratically with N , while the the dashed, red curve only increases linearly with N .
corresponds to a qubit initialized in the state + ⟩, while edges connecting the vertices indicate that controlled phase gates CZ, given by have been applied to these pairs of qubits. A simple example for a cluster state is shown in Fig. A.9.
The essence of the working principle of a measurementbased computation is captured by single-qubit gate teleportation [81]. That is, by measuring one of the qubits of an entangled pair in a suitable local basis and applying local correction operators dependent on the outcome on the other qubit, a desired quantum gate can be effectively implemented on the remaining qubit, as illustrated in Fig Up to the outcome-dependent local correction HZ s (and an irrelevant global phase) the output qubit hence carries the result of the computation, Rz(ϕ) ψ ⟩. (b) In a graphical notation (see, e.g., Ref. [38]) for the circuit in (a), measured qubits are represented by circles inscribed with the corresponding measurement angle ϕ, while output qubits are indicated by diamonds ( ). The connecting lines between qubits indicate the initial application of CZ gates, and the symbols for input qubits, which may be prepared in arbitrary states are coloured in red, whereas all other qubits are assumed to have been initialized in the state + ⟩.
qubits in a 1D cluster state, arbitrary single-qubit gates may be performed in such a way that only local corrections on the final qubit are required.
Although the measurement-based implementation of the CNOT gate [CX] in the notation of Eq. (A.110)] is not possible in a 1D cluster, it can be achieved in two dimensions [31], as is demonstrated by a simple example in Fig. A.11. Since the combination of arbitrary singlequbit gates with the CNOT gate is computationally universal, one may hence prepare an arbitrary quantum state (e.g., for performing parameter estimation) from a 2D cluster. The four-qubit circuit in (a), and the corresponding measurement pattern in (b) illustrate how the measurement of two of the qubits in a 2D cluster, followed by local Pauli corrections on the two remaining qubits dependent on the measurement outcomes si (i = 1, 2) can realize an effective CNOT gate in an MBQC architecture. The notation is as in Fig. A.10. Note that of the two input qubits marked red in (b) one is measured, but the other is also an output qubit.  A.12. Pauli-Y rotation in MBQC. (a) The circuit representation of the MBQC realization of a Pauli-Y rotation is shown. Measuring the first three qubits in bases in the x-y plane rotated with respect to the X-basis by ϕ1 = π 2 , ϕ2 = (−1) s 1 φ, and ϕ3 = (−1) s 2 +1 π 2 , respectively, and applying the local Pauli corrections X s 1 +s 3 Z s 2 H dependent on the measurement outcomes si = 0, 1 (i = 1, 2, 3) leaves the fourth qubit in the desired state. (b) Graphical representation of the circuit in (a) following the notation of Fig. A.10.

A.VI.2. Probe State Preparation in MBQC
In this last appendix, we present details on the conversion of the circuit for generating probe states (shown in Fig. 4 of the main text) to an MBQC measurement pattern. To do this, let us first see how a Y -rotation can be performed in MBQC, and consider the concatenation of three steps of single-qubit gate teleportation (see Fig. A.10) as shown in Fig. A.12. That is, we prepare a one-dimensional four-qubit cluster state, where the first qubit is initialized in an arbitrary state ψ ⟩. The first three qubits are then measured with angles ϕ 1 , ϕ 2 , and ϕ 3 , respectively, leaving the fourth qubit in the state (up to a global phase) X s1+s3 Z s2 HR z (−1) s2 ϕ 3 R x (−1) s1 ϕ 2 R z (ϕ 1 ) ψ ⟩ . (A.111) Noting that a Y -rotation about an arbitrary angle φ can be written as R y (φ) = R z (− π 2 )R x (φ)R z ( π 2 ), selecting measurement angles ϕ 1 = π 2 , ϕ 3 = (−1) s2+1 π 2 , and ϕ 2 = (−1) s1 φ in Fig. A.12 realizes R y (φ) up to appropriate local corrections on the last qubit. With this strategy, we are able to implement R y (φ 1 ). One may even commute the Hadamard correction with the Y -rotation to switch the initial state of the qubit from + ⟩ to 0 ⟩, as required in Fig. 4 of the main text. For the remaining controlled rotations, we make use of the simple identity R y (φ)Z = ZR y (−φ), which allows us to utilize the CZ-gates naturally appearing in the cluster state to perform the operation CR y (φ), as shown in the circuit in Fig. A.13 (a). The spurious application of the operator Z before the rotation can be disregarded, since all qubits in the circuit in Fig. 4 are assumed to be in the state 0 ⟩ in the beginning. This initialization step can be included as for R y (φ 1 ) before.
Since we already know from the circuit in Fig. A.12 how to implement rotations R y (φ) for arbitrary angles, all that is left to do to translate the preparation circuit in Fig. 4 to MBQC is to commute the local Xcorrections past the CZ-gate appearing on the left-hand side of Fig. A.13 (a), as shown in Fig. A.13 (b), such that all local corrections can be applied in the final step of the state preparation. We hence arrive at the MBQC measurement pattern generating the sine state ψ sine ⟩, which is shown in Fig. 5 of the main text.