Use of global interactions in efficient quantum circuit constructions

In this paper we study the ways to use a global entangling operator to efficiently implement circuitry common to a selection of important quantum algorithms. In particular, we focus on the circuits composed with global Ising entangling gates and arbitrary addressable single-qubit gates. We show that under certain circumstances the use of global operations can substantially improve the entangling gate count.


Introduction
Trapped atomic ions [1] and superconducting circuits [2] are two examples of quantum information processing (QIP) approaches that have delivered small yet already universal and fully programmable machines. In superconducting circuits qubit interactions are enabled through custom designed electronic hardware involving Josephson junctions and microwave resonators [2]. Different interactions can be controlled individually to invoke the two-qubit gates. A global coupling, however, would not necessarily be natural to such a system, due to the difficulty of placing and connecting O(n 2 ) individual resonators in the same area as n qubits. This said, it is possible to couple Josephson junction qubits to a single resonator mode, thereby enabling global interactions [3]. In trapped ion QIP, on the other hand, global interactions are more naturally realized as an extension of common two-qubit gate interactions [4,5,6,7,8]. In fact, the ability to implement arbitrary selectable two-qubit interactions generally requires a higher level of control, with individually focused external fields addressing each qubit [1]. Given the ease of implementing a global interaction over these two leading QIP approaches, we consider the use of global entangling gates, particularly applied to the trapped ions technology. We note that the results are technology-independent and therefore apply to any QIP approach, so long as proper global entangling operations are constructible.
One particular interaction available in the trapped ions approaches [1,7,8] to quantum computing is the so-called Molmer-Sorensen gate [9], also known as the XX coupling or Ising gate. To achieve computational universality, Molmer-Sorensen gate (either local addressable or global) is complemented by arbitrary singlequbit operations. These may come in different flavors, including the addressable R(θ, φ) rotations [1] of which at most two are needed to implement arbitrary single-qubit gate [10], or the addressable RZ rotation, which together with global RX and RY rotations also gives the single-qubit universality [7,8]. Depending on the specifics, the control apparatus may allow the application of an XX gate to a selectable pair of qubits [1], globally [4,5,6,7,8], or globally to a subset of qubits [11,12]. We furthermore note that the existing control apparatus described in reference [1] allows the application of the global Molmer-Sorensen (GMS) gates [11], however, to date, this approach has not been studied in detail. In each case above, XX gate comes at a higher cost (expressed in terms of the duration and/or average fidelity) compared to the single-qubit gates.
In this paper, we focus on minimizing the number of times an XX gate is called-be it addressable local or global, thereby targeting the most expensive resource in quantum computations using trapped ions QIP. Specifically, we center our efforts on finding the instances of quantum computations that admit a more efficient implementation using global entangling gates compared to what may be accomplished using local entangling gates.
Previous work demonstrated how to implement the parity function (fan-in gate in our terminology) using a constant number of two global entangling pulses [13]. We revisit the implementation of fan-in in Subsection 3.1, since it is relevant to our more advanced constructions. References [7,14] study the ways to implement quantum algorithms efficiently on a trapped ion quantum computer with the two-qubit gates enabled by the global entangling operator, concentrating on the case featuring anywhere between two to four qubits. Reference [10] focuses on quantum circuit compiling in the scenario when local addressable two-qubit gates are available. Reference [15] revisits the two-GMS gate parity measurement implementation of [13] and reduces the number of global pulses needed to just one (this construction can be inferred from Fig. 2), and shows how to measure the eigenvalue of a product of Pauli matrices using only a constant number of global entangling pulses. In contrast, here, we determine a set of important quantum circuits, focusing on the computations of arbitrary size, that can be accomplished using fewer entangling pulses in cases when global entangling control is available. The new circuits developed in our work include certain kinds of stabilizer circuits, number excitation operator, Toffoli-4 gate, Toffoli-n gate, Quantum Fourier Transformation, and Quantum Fourier Adder circuits, thereby substantively extending the set of known efficient circuitry based on the global entangling pulse. The results are directly accessible for implementation over trapped ions approaches featuring global control, and make a case for mixed local/global entangling control.
Control by local addressable operations is clearly easier to work with as far as implementing quantum computations is concerned, since most quantum algorithms are expressed in terms of local operations. Secondly, the number of arbitrarily selectable two-qubit operations, (n−1)n 2 , for an n-qubit computation (recall that the XX coupling does not distinguish between gate's control and its target), is higher than 1, being the number of individual full-size global gates. These two observations suggest that the local control is overall more nimble when it comes to implementing arbitrary quantum algorithms. However, it is not always the case that the implementations using local addressable gates are more efficient compared to those over global entangling operators. Indeed, it is known how to implement the 3-qubit Toffoli gate with only three size-3 global Molmer-Sorensen gates [7,8], whereas the best known implementation over two-qubit local addressable control requires five entangling gates [16]. Motivated by this example, we look into what other important unitary transformations may exist that benefit from the global gates.

Global MS Gate
A local MS gate (XX) [9], acting on i th and j th qubits, is defined as whereσ (i) x denotes the Pauli-x operator acting on i th qubit. In comparison, a global MS (GMS) gate for an n-qubit system is defined according to the equation GMS(χ 12 , χ 13 , . . . , χ 1 n , χ 23 , . . . , χ n−1 n ) = exp which is equivalent to the application of local XX gates to all n(n−1) 2 pairs of qubits for an n-qubit system. Since any two local XX gates always commute, the GMS gate is uniquely defined. For simplicity, we will first focus on the GMS gate where χ 12 = χ 13 = . . . = χ 1 n = χ 23 = . . . = χ n−1 n [4,5,6,8], and next consider other variants.  Figure 1: Example of the usefulness of global gates. GMS4 denotes a global MS gate defined according to (2), applied to all four qubits shown in the figure. GMS3 denotes a three-qubit global MS gate, applied to qubit numbers 1, 2, and 3. The common argument χ of the GMS gates specifies that all χ ij 's are equal to χ. The XX ij (χ) gate denotes a local XX gate, applied to qubits i and j with the angle χ, see (1).
Intuitively, the availability of the GMS gate allows for an efficient implementation of a single-qubit-to-manyqubits coupling gate. Consider, for instance, a 4-qubit system as shown in Fig. 1. Applying the GMS gate on all four qubits and then applying the GMS gate to the top three qubits with the negative sign of the rotation parameter, results in a selective set of the XX gates acting between qubit number 4 and the rest, as shown in Fig. 1 on the right. This means that, together with the ability of leaving out a qubit of choice, we need only two (global) entangling operators to perform the desired transformation. Note that because qubit number 4 participates in all three XX gates as shown in Fig. 1 on the right hand side, even with the possibility of parallel operations acting on disjoint pairs of qubits at least three time steps would be required if we restrict ourselves to the local XX couplings.
In the rest of the paper, we rely on the standard [16] single-qubit gates, including Hadamard (H in formulas and circuit diagrams), axial rotations RX, RY, and RZ (X, Y , and Z in circuit diagrams), as well as the two-qubit CNOT gate.

Efficient circuits using the GMS gate
In this section, we present a suite of quantum transformations, where GMS gates may be handily used to increase circuit efficiency. We lay out the specific implementation details by explicitly constructing corresponding quantum circuits, and compare them to those obtained using only local entangling gates to highlight the efficiency gain.

Consecutive CNOTs: Single-Control Many-Target CNOT (fan-out), and Many-Control Single-Target CNOT (fan-in)
Consider a set of CNOT gates with a shared control qubit, also known as the fan-out gate. As illustrated in Fig. 2 for the sample case of n = 4, we can use a pair of GMS gates, together with single-qubit rotations RX(θ) = e −iσxθ/2 and RY(θ) = e −iσyθ/2 , to implement the entire set of such n−1 CNOT gates. In particular, we require a total of two GMS gates, one over n qubits with uniform angles π/2 and the other over n−1 qubits with the angle −π/2, singling out the control qubit.
An n-qubit fan-in gate (a set of CNOTs sharing a target) can be implemented as a layer of n Hadamard gates, followed by the fan-out gate, followed by the second layer of n Hadamard gates. This means that an arbitrary size fan-in gate can too be implemented using a constant number of two GMS gates. We note that these implementations were known to [13,15] (fan-in was explicitly studied, and fan-out can be easily obtained from the fan-in). Observe that to measure the outcome of the parity function on the top qubit (see Fig. 2), the second GMS gate in the construction outlined in Fig. 2 needs not be applied, as it does not affect the qubit being measured.
An immediate application of this efficient implementation (n−1 local XX gates replaced by a pair of GMS gates) may be observed, for instance, in stabilizer circuit constructions. Figure 2: Four-qubit case of multiple CNOT gates sharing a single control qubit and targeting the rest of the qubits. Only two GMS gates are required to implement a total of n−1 local XX gates, corresponding to n−1 CNOT gates.  [17], used to distil the |A state. It relies on a set of 34 CNOT gates, that can be implemented using only 10 GMS gates.  Figure 12]. This circuit containing a total of 34 CNOT gates may be implemented with 5 pairs of GMS gates, which would otherwise require 34 local XX gates. Since the [ [15,1,3]] encoding circuit is used to distil the |A state [17], its efficient GMSenabled implementation may potentially be used to synthesize the logical-level T gate efficiently, constituting an important optimization for fault-tolerant quantum computing. We note, however, that GMS gates may be difficult to use fault-tolerantly [18].
GMS gates can furthermore be used to obtain an implementation of arbitrary n-qubit stabilizer unitary using at most 12n−18 entangling pulses. To establish this, consider the 9-stage layered decomposition -C-P-C-P-H-P-C-P-C-of [19]. Observe that two of the -C-stages (each corresponds to the CNOT-based circuits) are given by the upper triangular Boolean matrices. This means that each can naturally be implemented as a set of n−1 fan-out gates. Of these, the smallest fan-out is the CNOT, and thus it can be implemented using a single GMS. This means that the total number of GMS gates required to implement an upper/lower triangular linear reversible transformation is 2n−3. The other two -C-stages are arbitrary linear transformations. Using LU decomposition, each can be implemented as a circuit over 2(2n − 3) = 4n − 6 GMS gates. The total GMS count required to implement an arbitrary stabilizer unitary is thus 2(2n − 3) + 2(4n − 6) = 12n − 18.
The number of GMS gates required to implement an arbitrary stabilizer unitary, 12n−18, is significantly less than Ω( n 2 log n ) of the two-qubit CNOT gates required to accomplish the same [20]. The comparison, however, Figure 4: Number excitation operator, typically used in quantum chemistry simulations [21,22]. (a) shows conventional appearance readily available in the literature (see for instance [22] or supplementary material in [21]). (b) shows an equivalent circuit, where all constituent CNOT gates share a single target qubit. Figure 5: GMS-based implementation of the number excitation operator. An n-qubit case requires only 2 GMS gates, in addition to 2(n − 1) RY gates, 2 RX gates, and 1 RZ gate. In the specific case of n = 5 that is illustrated here, the RX gates vanish, since RX(−2π) = RX(2π) ≡ Id, where "≡" denotes equality up to a global phase.
is not fair. This is because the number of different functions computed by the CNOT gates spanning n qubits is (n − 1)n, whereas the number of the GMS gates with the fixed rotation angle of π/2 and arbitrary set of inputs is 2 n , which is greater on the order than (n − 1)n. A more fair approach is to compare the GMS count of 12n−18 to the CNOT depth of 14n−4 over Linear Nearest Neighbor (LNN) architecture [19]. This is because the number of functions computed by depth-1 CNOT circuits over LNN is given by the formula 2 n+1 +(−1) n 3 , and this number is similar to 2 n . The comparison reveals that our GMS-based construction still gives a slight advantage.

Number Excitation Operator
We next consider an instance that arises naturally in quantum chemistry simulations [21,22]. Specifically, Fig. 4(a) shows a typical 5-qubit example of the number excitation operator [21,22]. The number excitation operator relies on a V-shaped CNOT pattern, where the desired effect is obtained through applying a Z rotation to the linear combination of qubits, x 1 ⊕ x 2 ⊕ . . . ⊕ x n (illustrated for n = 5 in Fig. 4(a)). It is straightforward to show that an equivalent circuit to the one shown in Fig. 4(a) may be obtained by moving all targets of all of the CNOT gates in Fig. 4(a) to the last qubit, i.e., Fig. 4(a) is equivalent to Fig. 4(b). The transformed staircase structure of the CNOTs in Fig. 4(b) may now be efficiently implemented using GMS gates, such as shown in Fig. 5. In particular, the construction relies on 2 GMS gates, along with a set of correcting local single-qubit gates: 2(n − 1) RY gates, 2 RX gates, and 1 RZ gate. Compared to 2(n − 1) CNOT gates, and thus 2(n − 1) local XX gates, that are required for the realization of the original construction using local entangling gates, the GMS-enabled construction admits the reduction of the entangling gate count from the linear to the constant term.

Toffoli-n
We next consider multiply-controlled NOT gates, also known as the multiple-control Toffoli gates. We first focus on the 3-qubit (Toffoli-3) and the 4-qubit (Toffoli-4) cases.
The efficient use of GMS gates in the case of multiply-controlled NOT (Toffoli) has previously been shown for the Toffoli-3 [7, Figure 2] and Toffoli-4 [14, Equation (9)] gates. In particular, reference [14] presents a GMS-based circuit decomposition for a triply-controlled Z gate, equivalent to the Toffoli-4 through conjugating the target by a pair of Hadamard gates. For convenience, we showed the respective constructions in Fig. 6 (a) and (b). One may observe that in the case of the Toffoli-3 only 3 GMS gates are needed, compared to 5 local two-qubit gates [16,23] and, for the Toffoli-4, only 7 GMS gates are needed, compared to 11 local two-qubit gates [23]. We note that, unlike in [23], the 7-GMS Toffoli-4 construction of [14] furthermore does not require an ancillary qubit.
In pursuit of further gate count reduction, we consider employing ancillary qubits in our GMS-based construction of the n-qubit Toffoli gate. The employment of ancillary qubits to reduce the gate counts in constructing a Toffoli-n gate has in fact been extensively investigated in [24,25], but in the context of relying on the local entangling gates. Using ancillae turns out to be helpful in the case of quantum circuits employing the GMS gate, as well. In the following, we show a step-by-step construction of the GMS-based ancilla-aided Toffoli-4 gate (we report no improvements to the Toffoli-3 circuit).
We start with a simple observation that the Toffoli-4 gate is equivalent to the CCCZ gate up to the conjugation by the Hadamard gates, such as illustrated next, This function can thus be implemented as a CNOT and RZ(± π 8 ) circuit by applying Z rotation with the positive sign to the linear terms {a, b, c, d, a⊕b⊕c, a⊕b⊕d, a⊕c⊕d, b⊕c⊕d} and Z rotation with the negative sign to the terms {a⊕b, a⊕c, a⊕d, b⊕c, b⊕d, c⊕d, a⊕b⊕c⊕d}, with each such linear term obtainable by the CNOT gates. In the next, we will show how to induce all necessary CNOT gates to allow the application of the necessary RZ gates, using only a few GMS gates.
We first note that the linear functions with the single literate each, {a, b, c, d}, are the original qubits provided to us on the input side of the circuit. Therefore, all length-1 linear terms may be implemented by simply applying RZ( π 8 ) single-qubit rotation gates to each respective qubit. By doing so, we construct the circuit using no GMS gate and implementing the transformation |abcd → w a+b+c+d 16 |abcd . We next have to find how to apply as few as possible GMS gates in a way that enables to exercise the remaining 11 Z rotations. Figure 6: GMS-based implementation of (a) Toffoli-3 and (b) Toffoli-4 without ancillary qubits. .
To apply the Z rotation to the length-4 linear term, a⊕b⊕c⊕d, we introduce an ancillary qubit in the |0 state, copy all qubits into it using a set of four CNOTs sharing the target, and then uncompute those CNOTs. This allows to apply one new Z rotation between the two layers of the CNOT gates, and the number of the GMS gates required to implement this construction is two, see Fig. 7. The circuit constructed thus far performs the transformation |abcd → w a+b+c+d−(a⊕b⊕c⊕d) 16 |abcd . Observe that each of the two sets of the CNOT gates on the left hand side of the circuit equality in Fig. 7 requires two GMS gates to be implemented (both are fan-in gates, considered earlier), for a total of four GMS gates, two GMS5 and two GMS4. However, it turns out that the two GMS4 can be chosen with the opposite signs and they commute with all other gates we are about to introduce in the middle to cancel out. This means that only two GMS5 gates are needed in our construction.
We next need to apply the remaining 10 Z rotations to obtain the desired CCCZ gate. To do so, consider the following circuit identity: x where the left hand side, trivially, performs a phase rotation by the angle θ applied to the linear function x ⊕ y and the right hand side reports an equivalent circuit based on the XX gate, up to a global phase. We can generalize this construction to n qubits, by replacing the XX gate with the GMS on the right hand side, while conjugating by the layer of Hamadards before and after. What this accomplishes is the application of phases to EXORs of all pairs of participating variables, as described by the circuit on the left hand side. Formally, We next apply the above identity over GMS to our ongoing construction of the Toffoli-4 gate. To obtain length-3 linear functions, we may insert Hadamard-conjugated GMS5(π/8) in the middle of our current circuit (Fig.7).
The effect this has is the introduction of phase π 8 applied to all pairs of qubits participating in the construction. In the middle of the circuit the qubits we have are described by the linear functions {a, b, c, d, a⊕b⊕c⊕d)}. Thus, the set of EXOR pairs is {a⊕b, a⊕c, a⊕d, b⊕c, b⊕d, c⊕d, a⊕b⊕c, a⊕b⊕d, a⊕c⊕d, b⊕c⊕d}. This means that the overall action preformed by the circuit with 3 GMS gates can be written as |abcd → w a+b+c+d−(a⊕b⊕c⊕d) + (a⊕b)+(a⊕c)+(a⊕d)+(b⊕c)+(b⊕d)+(c⊕d)+(a⊕b⊕c)+(a⊕b⊕d)+(a⊕c⊕d)+(b⊕c⊕d) 16 |abcd .
Observe that the signs of the length-2 terms are not the ones we wanted to have. This may, however, be corrected by applying Hadamard-conjugated GMS4(−π/4) to the qubits {a, b, c, d}, resulting in the phase correction by accomplished as a 4-GMS circuit shown in Fig. 8.
Using a similar approach, we can obtain a 3-GMS circuit implementing the CCZ gate on qubits a, b, and c, as follows: It is different from those reported in [7,8].
In the remaining part of this subsection, we briefly outline an implementation of the n-qubit Toffoli gate using 3n−9 GMS gates and n−2 2 ancillae for even n, and 3n−6 GMS gates and n−1 2 ancillae for odd n, n ≥ 6. This beats 6n−12 local CNOT gates result of [25], while using a comparable number of ancillae. Our construction relies on nesting efficient 3-GMS Toffoli-4 gates (shown in Fig. 11; we describe how to get to this construction in the next section), such as illustrated in Fig. 9, to obtain larger multiple-control Toffoli gates. For odd n, one pair of 3-GMS Toffoli-3 gates needs to be used (equivalently, a set of two relative-phase Toffoli-3 gates, requiring 3 local entangling operations each [25]), explaining the difference between gate counts for odd and even n.

GMS with other parameters
So far, we focused on using GMS gates with all equal rotation angles χ ij , and an arbitrarily selectable subset of qubits those global gates apply to. Such gates may not always be possible to obtain directly in an experiment. Indeed, one possible experimental setup [7] allows for the application of global Molmer-Sorensen gates affecting all n qubits participating in the computation. As such, an (n−1)-qubit GMS gate may not be directly available on an n-qubit system. To circumvent this and enable smaller GMS gates, we propose the following. First, start Figure 9: Ancilla-aided construction of the Toffoli-6 using a set of three Toffoli-4, each of which is constructible using three GMS gates. Using the identity recursively, such as illustrated next, we arrive at the conclusion that the RZ(π) gate effects a spin echo on the identical XX gates to its left and right, provided that the qubit that the RZ(π) applies to also participates in the XX gates, and as a result cancels out the respective XX interactions. Based on this property, Fig. 10 illustrates how to obtain an (n−1)-qubit GMS gate out of two n-qubit GMS gates. The construction can be used iteratively to obtain global gates spanning arbitrarily selectable subsets of qubits, and enabling all constructions described in Section 3 in the case when only the maximal size GMS gate is available. In fact, this inspired the construction of a more efficient Toffoli-4 implementation. Specifically, Toffoli-4 gate may be obtained using only 3 maximal size GMS gates on a 5-qubit machine. This is because substituting GMS4(−π/4) ( Fig. 10) into GMS-enabled implementation of the Toffoli-4 gate (Fig. 8) results in the circuit over 5 GMS5 gates, however, GMS5(π/8) used in Fig. 8 meets the newly introduced GMS5(−π/8) and they cancel out, reducing the GMS gate count to 3. This improved construction is illustrated in Fig. 11. Our optimized 3-GMS Toffoli-4 construction relies on notably fewer entangling pulses compared to 11 two-qubit gates in [23] or 7 GMS gates in [14].
The signs of χ ij may furthermore be determined by the experiment such as the case in [1], disallowing their uniform assignment. It is, however, expected [11] that future trapped ions experiments will feature a fully controllable sign of the interaction, and this will not be an issue. Should the signs be uncontrollable, this provides an additional challenge, since constructions in Figs. 2 and 4 rely on the ability to apply GMS gates with the inverted sign of the rotation angle. In case when the signs cannot be controlled individually, the inverse GMS gate can, in fact, be induced by the single-qubit corrections applied to the GMS gates with uncontrollable parameter signs as follows. Figure 11: Optimized implementation of the CCCZ gate using three GMS gates. Right hand side shows two Hadamard gates, removing which transforms CCCZ gate into Toffoli-4.
First, start with the following identity where RX k denotes the RX gate applied to the k th qubit. Using this identity n(n−1) 2 times allows to construct the GMS † using only one GMS gate, as follows: RX i ((n−1)π) GMS(π−χ).
In other words, whenever GMS † is not directly available due to the inability to invert the sign of the interactions, i.e., χ ∈ [0, π], the GMS † may be still be constructed with the use of a single GMS gate by taking the parameter value of (π − χ) ∈ [0, π], and performing single-qubit corrections. This enables constructions from Figs. 2 and 4 in the scenario with uncontrollable signs of the individual interactions within GMS.

Quantum Fourier Arithmetic
Previously, we considered the case where all |χ ij | are constant, regardless of the choice of i and j. However, it is possible that |χ ij | drops off as a function of the distance |i − j|. This may be natural given physical interaction strengths typically scale in the distance between qubits [26].
In case when the strength of the interaction falls off exponentially fast, as χ ij ∼ 2 −|i−j| , it can be easily shown that the quantum Fourier transform (QFT) may be constructed efficiently using such global pulses. Specifically, the efficient implementation uses just 2n global pulses, as opposed to n(n−1) 2 local two-qubit gates, for a QFT of size n. This also enables an implementation of the quantum Fourier adder (QFA) [27] with only a linear number of global gates, rather than superlinear, making the Fourier-based arithmetic circuits more competitive than the Boolean counterparts [28]. Figs. 12 (a) and (b) show the QFT and QFA circuits. Fig. 12(c) illustrates how the GMS gates may be used to deliver the reduced gate count scaling in constructing the Fourier circuits.
Unfortunately, the exponential drop off in the strength of the interaction appears to be unnatural. Instead, the decrease in the strength of the interaction as a power of the distance d as d p , where p ∈ [0, 3] [11, 26] seems more realistic. This leads to the question of how well the desired exponential drop off can be approximated with such physical-level global gates. [29] provides an answer to this question. Specifically, the quality of such Fourier circuits (QFT, QFA) is well preserved even when we alter the fundamental form of the signal strength. For instance [29], when one replaces the exponential drop off, π/2 d , where d is the distance between qubits, with a power law hierarchy, such as π/d p , one may choose the power p of the drop off power law such as to obtain the best possible quality of approximation. In fact, it has been calculated numerically that the power p opt = 1.4 renders the maximum quality for the set of parameters considered in [29]. Fortunately, p = 1.4 is within the limits p ∈ [0, 3] [11,26].
Motivated by the previous discussion, we next propose an extended method of the power law approximation of the exponential drop off that is useful for quantum Fourier arithmetic circuits. Specifically, we propose using a few GMS gates to approximate a single stage of the exponentially dropping interaction strength (see Fig. 12(c)). This is in contrast to a simple replacement of the single-stage exponential drop off with a single-stage power law drop off, as was done in [29]. In particular, we numerically approximate the exponential drop off with a set of m power law drop offs, as follows, Since the circuit realization of each power law requires two GMS gates, the approximation by m power laws amounts to a cost of 2m GMS gates.
Our goal is to numerically determine a set of b i and p i such that the term (3) minimizes the approximation error, in order to best match the exact exponential drop off, as seen in the quantum Fourier arithmetic circuits. A straightforward generalization of the crude, yet efficient analytical works shown in [29] reveals that the fidelity of the QFT may be approximated by the term meaning we obtain the best fidelity by minimizing the value of the sum in the exponent in (4). Minimizing the exponent in (4) analytically is a non-trivial task, and we thus resort to a numerical approximation. In particular, we restricted the search to |b i | ≤ 0.6 and 1.5 ≤ p i ≤ 4, closely following what may be achievable in the lab. We find that for m = 2 the selection of the values b 1 = 0.4, b 2 = −0.5, p 1 = 2.5, p 2 = 3.4 results in the minimal exponent in (4), that is consistent with the peaks in fidelity observed in Fig. 9 for the sample cases of n = 10, 12, and 14 qubits. The peak fidelity F peak ≈ 1 demonstrates a high quality of the doublepower approximation of the exponential drop off, making the efficient GMS-based construction an attractive alternative in experiments.

Conclusion
In this paper, we studied the efficient use of a global entangling operator in realizing quantum circuitry of practical interest. Using various versions of the global entangling operator, we demonstrated the advantage in implementing certain kinds of stabilizer circuits, number excitation operator, Toffoli-4 gate, Toffoli-n gate, Quantum Fourier Transformation, and Quantum Fourier Adder circuits. In each of the above, our constructions outperform best known circuitry in the scenario when the control is given by the two-qubit local addressable gates. Our conclusion is as follows: we believe that the control by a global entangling gate could be a helpful complement to the control by addressable two-qubit local gates.