Polynomial T-depth Quantum Solvability of Noisy Binary Linear Problem: From Quantum-Sample Preparation to Main Computation

The noisy binary linear problem (NBLP) is known as a computationally hard problem, and therefore, it offers primitives for post-quantum cryptography. An efficient quantum NBLP algorithm that exhibits a polynomial quantum sample and time complexities has recently been proposed. However, the algorithm requires a large number of samples to be loaded in a highly entangled state and it is unclear whether such a precondition on the quantum speedup can be obtained efficiently. Here, we present a complete analysis of the quantum solvability of the NBLP by considering the entire algorithm process, namely from the preparation of the quantum sample to the main computation. By assuming that the algorithm runs on"fault-tolerant"quantum circuitry, we introduce a reasonable measure of the computational time cost. The measure is defined in terms of the overall number of T gate layers, referred to as T-depth complexity. We show that the cost of solving the NBLP can be polynomial in the problem size, at the expense of an exponentially increasing logical qubits.


Introduction
Owing to their simplicity, linear problems have been studied in various applications in science and engineering [1,2]. However, if noise is added, it becomes exponentially difficult to solve the problems. One such challenging problem, called a noisy binary linear problem (NBLP), is defined as follows: Given a set S = {(a, b a )} with sampled inputs a = a 0 a 1 · · · a n−1 ∈ {0, 1} n and outputs b a = a · s + e a (mod 2) ∈ {0, 1}, the problem is to determine the 'secret' structure of s = s 0 s 1 · · · s n−1 ∈ {0, 1} n for all samples in the presence of noise e a ∼ B(η), where B(η) is a Bernoulli distribution (specifically, e a = 0 with probability 1 2 + η and e a = 1 with probability 1 2 − η) and η ∈ (0, 1 2 ]. This problem is difficult to solve, and we have no better than sub-exponential sample/time complexities in classical computation [3]. This problem has thus served as a useful primitive in modern post-quantum cryptography [4]. Recently, Cross et al. [5] and Grilo et al. [6] have opened the possibility that quantum computation (QC) could solve a class of NBLPs by exponentially reducing the sample/time complexities. The key feature of the proposed algorithms is the use of a quantum-superposed sample, which is defined as where |(a, b a ) = |a |b a , and R ⊆ S is a set of arbitrary chosen samples, and |R| is the cardinality of R. The algorithm repeatedly loads, processes, and tests the quantum sample |ψ until the solution s is confirmed. A crucial condition for achieving a quantum speedup is that the number of samples (a, b a ) in |ψ should scale exponentially with n; in other words, |R| should be O(2 n ). A conventional approach has hence been to employ a black-box operation (as in Eq. (1)), often called oracle, for accessing the quantum sample. However, when |R| is large, such an approach is not feasible because it would be costly and difficult to prepare and use a (largely-)superposed quantum sample [7]. In the worst case, such an approach could offset the quantum speedup achieved [8]. Therefore, although the fullest use of the quantum sample to efficiently solve the NBLP is possible in QC, it is not clear whether the hardness of the NBLP can be completely overcome. Accordingly, the security level of post-quantum cryptography has not been determined so far.
In this paper, we present a complete analysis of the quantum solvability of the NBLP. In the analysis, we consider two essential and independent processes of the algorithm. One is loading the samples (∈ S) into an highly entangled state |ψ , which is denoted by P |ψ . We design an optimal circuitry of P |ψ by parallelising the layers of some expensive (i.e., T ) quantum gates in the fault-tolerant level. The other process is the application of the main algorithm kernel P A , which is an optimised set of elementary gate operations. We analyse an extendable form of P A , which can cover multiple problems, and apply the result to a binary setting. The studies on P |ψ and P A have been independently performed thus far in separate contexts. For example, a recipe of optimisation of the process, similar to P |ψ , has been studied for a fixed architecture [9], which is designed to localize the error propagation [10]. Likewise, the algorithms (i.e., P A in our case) have been analysed based on a prior assumption of the quantum-sample accessibility; hence separately without any consideration of P |ψ . However, P |ψ and P A are systematically combined to form the quantum NBLP algorithm, and they should be studied together in a single framework ‡. Thus, we analyse the number of repetitions of P |ψ + P A required to determine the solution s in consideration of the interconnection between P |ψ and P A . In the analysis, the exponential reduction in the quantum-sample complexity is derived based on a crucial condition of the solution test which has been overlooked in the previous works. This analysis allows us to account for the overall resource-consuming aspect, thereby facilitating a more comprehensive discussion on the quantum solvability of the NBLP.
The analysis is conducted in the context of the fault-tolerant QC, and we consider the Clifford + T library under the assumption that an effective quantum error-correction code is embedded. We minimise the overall number of gate layers, particularly those of T or T † gates-which is called T -depth complexity [12]. Because T and T † are much more costly to implement than any Clifford gates in a fault-tolerant manner, the T -depth has often been used as a computation time performance of a quantum algorithm [13,14,15]. In this context, we define a computation time performance, denoted by C, as follows: where S denotes the number of repetitions of P |ψ + P A for the completion of the algorithm. We analyse the (I) T -depth of P |ψ , (II), T -depth of P A , and (III) repetitions S and finally evaluated C. We note (again) that the analyses of (I), (II), and (III) are interrelated, and the quantum solvability of the NBLP cannot be described through an individual analysis of (I), (II), and (III). By managing the issues which would arise in such a comprehensive analysis (from the preparation of the quantum sample and main computation), we prove that NBLPs are polynomially solvable in the context of the T -depth complexity, at the expense of an exponentially increasing number of logical qubits.

Algorithm overview.
We briefly outline the entire procedure of the quantum NBLP algorithm.
(A.1) A state |ψ of a quantum sample is prepared in the form |ψ = 1 √ 2 q a |a |b a , where the summation a is of only the inputs in S, and q ≤ n = log 2 |S| ; q can be regarded as the factor that determines the size of a quantum sample |ψ . Here, by the term "size of a quantum sample," we mean the number of the (classical) pairs (a, b a ) to be quantum-superposed in constituting |ψ . x is the ceiling of x, i.e., the smallest number greater than or equal to x. ‡ In this context, it was recently pointed out that for the discussion of the quantum solvability of noisy linear problem, not only the sample/time complexity but the superposition size of the prepared quantum-sample should be considered together [11].
=s 0s1 · · ·s n 1 Figure 1. Schematic of the algorithm. The algorithm uses the superposed quantum sample defined in Eq. (1) and the kernel of quantum Fourier transform (QFT). In the algorithm, a candidate fractions is obtained and used to perform a majority voting test (for details, see the main text, or Refs. [5,6]).
(A.2) Given a quantum sample |ψ , we run P A . Formally, P A is given as the Bernstein-Vazirani (BV) kernel and is given by where QFT d is the d-dimensional quantum Fourier transform (QFT): where k ∈ {0, 1} n and k ∈ {0, 1}. (A.3) We measure the qubit state |k . Here, if we measure k = 0, no information on s can be retrieved from the remaining state, which is given by and the failure is returned. Otherwise (i.e., if k = 1), we obtain the remaining state By solving Eq. (6), we obtain the candidate k. Here, the true solution s can be obtained (i.e., k = s) with probability P (k = s|k = 1), and the most exact form of the probability is P (k = s|k = 1, {e a }). However, we drop the dependence on {e a } because the errors occur completely at random.
3), we determine the most frequently measured k as the true solution s, which is referred to as "majority voting." The condition of the majority voting is analysed later. Figure 1 shows a schematic of the algorithm. Additional mathematical details are provided in Appendix A.

Analysis (I): Resource counts for P |ψ .
Let us consider a scenario where the data, denoted by D γ , are addressed (or indexed) by the symbols γ. The addressing (or indexing) is arbitrary and the database (or table) of D γ are unsorted. We define the state of the entire data, say |T , as Here, we note that |T is not a superposition state, and each data |D γ is deterministic (or equivalently, classical [16]). We also emphasise that the state |T itself is not computable. Our approach for analysing P |ψ [or step (A.1)] is to adopt the following machinery: where |γ denotes the address and |null(D) is the null state in which the data brought from |T are duplicated; hereafter, R denotes the space of the address. Now, we present an outline of how the machinery in Eq. (8) can be used to prepare |ψ . First, by letting |R| = 2 q , we can express the address symbol γ as a q-tuple of a binary number: γ = γ 0 γ 1 . . . γ q−1 , where γ j ∈ {0, 1} and j = 0, 1, . . . , q − 1. Subsequently, we set |D γ=a = |a |b a for all samples in S. Such a setting is possible by matching the symbol γ is matched to the input a. Then, from the address state 1 √ 2 q γ∈R |γ , the machinery of Eq. (8) can provide the address-data entangled state as where the data |b a are taken from |T and the summation a [in Eqs. (4), (5), and (6)] can be replaced by a∈R . Lastly, we can retrieve |ψ by disregarding |T . A naive approach for implementing the process described above is to directly load the data |b a by using the Toffoli gates. However, such a data loading scheme requires an exponentially increasing T -depth with the address qubit size q, because the 2 q Toffoli gates should be sequentially applied (see Figure 2(a)). Therefore, our design strategy for acquiring an efficient machinery (Eq. (8)) is to use the unary (one-hot) address encoding [17], as depicted in Figure 2(b). The unary bases can be written as {|00 · · · 01 , |00 · · · 10 , . . . , |01 · · · 00 , |10 · · · 00 } .
The unary representation does not use all available Hilbert-space, and its advantages over the binary representation is that it simplifies the circuit structure [18]. To implement this approach, we consider two subdivided processes: 1) binary-unary (de)coupling and 2) data loading. In the subprocess 1), the unary addresses are correlated with the binary addresses. For example, for four addresses (i.e., q = 2) one can consider  8); the unary addresses are used to bring the data. Since each unary qubit is correlated with the different data (as seen in the red dashed box), the Toffoli gates can be parallelised using ancillary qubits. Thus, if the binary-unary (de)coupling is not demanding, the advantage is straightforward (see the main text).
The circuit for this example is presented in Figure 2(b). Subprocess 2) duplicates the data |b a in |T by using the unary addresses. Lastly, by decoupling the unary address qubits, we can obtain Eq. (9). The decoupling is equivalent to the subprocess 1). Note that the unary address qubits, each of which is to be correlated with another data qubit, can easily be parallelised. Parallelisation reduces the T -depth complexity of the data loading considerably (as described below). Thus, if the cost of the binary-unary coupling is low, the advantage of this approach is apparent [17].
Let us now analyse subprocesses 1) and 2). 1) Binary-unary (de)coupling.-For the analysis of subprocess 1), let us recall the circuit of the four-address example, shown in Figure 2(b). The circuit comprises Toffoli and CNOT gates. Such a circuit structure can be generalised for arbitrary q address qubits, as shown in Figure 3(a), where each green box contains the gates conditioned on the l-th binary address qubit. The gate arrangement in the boxes can be designed generally as in Figure 3(b), where 2 l − 2 of Toffoli gates are used in l-th box. It directly imposes a large T -depth. Thus, to minimise the depth of the circuit, we should compress the Toffoli gates [12,19]. For this, we design an optimised circuit (termed "four T -depth optimisation") of each green box, shown in Figure 3(c), that reduces the T -depth of the entire process to polynomial; specifically, to 4(q − 1).
2) Data loading.-In the data loading circuit, 2 q Toffoli gates should be used to duplicate the data |D γ in |T into the computable space. Thus, in the naive approach, a T -depth of O(2 n ) is required. However, since the control qubits of the Toffoli gates  are each assigned one to one unary qubit in our scheme, the Toffoli gates can be implemented in parallel. This is because of the availability of the unary address. Such implementation immediately leads to the parallelisation of the T gates. To avoid any restriction being imposed on the overall circuit optimisation by the control-qubit sharing of the Toffoli gates, we use the extra ancillary qubits (denoted by E1, E2, · · ·), as in Figure 4(a). Then, every Toffoli gates can be parallelised, and the T -depth complexity can be optimised as O(1). The detailed technique is shown in Figure 4(b).
On the basis of the above analysis, our first result can be stated as Resource Estimation (RE) 1 Resource counts for implementing P |ψ are as follows: The T -depth complexity of P |ψ , denoted by T D,P |ψ , is bounded by  log 2 |S| . The total number of logical qubits required to implement P |ψ is determined to be ω adr + ω a + ω extra + ω D = q + 2 · 2 q + 1, where ω adr = q, ω a = 2 q , ω extra = 2 q , and ω D = 1; these variables denote the number of logical qubits for the binary address, unary address, extra ancillary system and data.

Analysis (II): Resource counts for P A .
Next, we consider the resource for P A . By considering the formal definition of the BV kernel [as given in Eq. (3)], we start by investigating the T -depth of an arbitrary lqubit QFT. Usually, the quantum circuit for an l-qubit QFT can be synthesised with controlled-R k gates andĤ, whereR k denotes the single-qubit rotation and is given bŷ R k = |0 0| + e iπθ k |1 1|. Typically, an ideal QFT circuit requires l(l−1)
To realise an l-qubit AQFT circuit in a fault-tolerant manner, we can consider β = O(log l). Then, all controlled-R k gates with θ k ≤ 2 −O(log l) are discarded with an error bounded by ∆, and the controlled-R k gate counts are reduced from O(l 2 ) to O(l log l ∆ ) [21]. The remaining controlled-R k gates are decomposed into Clifford+T gates, with the decomposition involving fault-tolerance overhead. Consequently, we can obtain an l-qubit AQFT circuit in which the number of T (or T † ) gates is , which allows the T -count of O(l log 2 l). For all effective QC (specifically, for ∆ l2 −l ), we can neglect the dependence on ∆. By noting that the T -depth is upper bounded by the T -count in general, we obtain where T C,AQFT 2 l denotes the T -count of l-qubit AQFT. Note that in theory, T C,AQFT 2 l can be reduced more, namely from O(l log 2 (l)) to O(l log l), by using a semi-classical AQFT [22]. Very recently, Nam et al. proposed a fully coherent AQFT that can have a T -count of O(l log l) [23]. On the basis of the above analysis, we obtained the second result, which is as follows.
Resource Estimation (RE) 2 We can implement P A in the NBLP, with T D,P A = N/A. The number of (logical) qubits required to execute P A is only O(n).
The estimation can be validated as follows. In the the NBLP (i.e., a binary problem), P A is the (n + 1)-fold product of the Hadamard transform: P A = QFT ⊗n+1 d=2 =Ĥ ⊗n+1 . Hence, the number of logical qubits is n + 1. Although the circuit may be operated with some additional ancilla qubits, W P A scales as O(n). This implies zero T -depth complexity since controlledR k gates are not required. Hence, RE 2 holds. This result is a straightforward consequence of P A =Ĥ ⊗n+1 . However, an analysis of AQFT would be useful, particularly when the BV kernel is applied to a general problem setting, such as a noisy multinary linear problem.

Majority-voting conditions.
Before analysing (III), we derive the condition for majority voting [performed in (A.4)], which has not been considered in the previous studies despite the algorithm's performance being influenced by it. First, we calculate the probability P S = P (k = s) that k measured at (A.3) is equal to the true solution s. By substituting k = s into Eq. (6), we obtain P S = P (k = s|k = 1)P (k = 1) a ω a·(2s)+ea |s where P (k = 1) = 1 2 . Here, we apply a useful concentration bound, the so-called Chernoff-Hoeffding (CH) inequality [24]: For t O(2 q ), where U a = (−1) ea , U = 1 2 q a U a , and E(U a ) denotes the expectation of U a . If we assume that the order of q is greater than O(log 2 n), the right-hand side term in Eq. (15) is negligible, and P U − E(U a ) ≥ t = 0 for a large n. Note that we have used the following definition [D]: If a factor is as small as O(e −n ), the factor can be negligible for a large n and can be set to zero. We then obtain the following expression: Using Eqs. (14) and (16), we can obtain the lower bound of P S such that where we have used E(U a ) = 1 2 + η − 1 2 − η = 2η. We then consider the probability P F = P (k = s) that the measured k is not equal to the solution s. For convenience, we represent P (k = s) as P (k =s), wheres = s + φ. φ = φ 0 φ 1 · · · φ n−1 is an arbitrary n-tuple of binary numbers φ j ∈ {0, 1}, except for φ = 00 · · · 0. Then, from Eq. (6), P F can be calculated as Here, we recall the CH inequality in Eq. (15) and let U a = (−1) a·φ+ea and U = 1 2 q a U a . It should be noted that, in this case, E(U a ) = 0 because a · φ and a · φ + e a are either 0 or 1 with probability 1 2 . Because O(q) is greater than O(log 2 n) and e − 1 2 2 q t 2 is negligible by the definition [D], we have P U ≥ t = 0. Hence, we can write By using Eqs. (18) and (19), the upper bound for P F is obtained as follows: We can finally specify the conditions required for the majority voting to be valid: If this condition is not satisfied; the possibility of a 'false' solutions being identified in (A.4) cannot be ruled out. . Let X be the number of times that the true solution k = s is determined among the M candidates. Then, we have X = M k=1 x k because all values of x k are independent. In such a setting, we can use a statistical inequality, namely the Chernoff bound [25]: For any > 0, where µ = E(1 1 k=s ) = M P S , and 1 1 k=s is the indicator function of k = s. By letting 2e − 2 2+ ≤ δ with δ ∈ (0, 1], we can derive the following theorem: where X = X M = 1 M M k=1 x k and = P S (Here, we consider a slightly weaker bound. The tight bound is given by M ≥ 2+ 2 ln 2 δ ). This theorem implies that if we use more than M = 3 2 ln 2 δ samples, X can be estimated within the interval [P S − , P S + ] with a probability of at least 1 − δ. This is sometimes referred to as the sampling theorem. Since the Chernoff bound gives the minimal (Bayesian) error probability when discriminating between 'a priori' and 'observations', the sampling theorem translates into the following statement: Majority voting allows the identification of the true solution s with at least M = 3 2 ln 2 δ repetitions of (A.1)-(A.3), provided the following condition is satisfied: We point out that P S,inf − P F,sup is greater than 0 owing to the majority-voting condition in Eq. (21). Furthermore, by noting that S is the number of repetitions of (A.1)-(A.3), we achieve our third result, which is as follows.
Resource Estimation (RE) 3 Given the constants t, , and δ, the number of repetitions S is given by where we have assumed that S = 2M because half of the trials of (A.1)-(A.3) will return a failure with k = 0 (note that the factor 2 has no influence on the order of S). The following crucial conditions should be satisfied: where the former is acquired from the majority-voting condition in Eq. (21), and the latter is derived using = P S ≥ P S,inf and Eq. (24).
Note that P |ψ boots up only when P A runs with a single use of |ψ , and it is straightforward to determine that S corresponds to the quantum-sample complexity. Accordingly, RE 3 shows that the reduction in the complexity depends on the size of the superposition, that is, |R| = 2 q . For example, if we use the fullest (exponentialscale) superposition of the sample with |R| = |S| = 2 n (or equivalently, q = n), S becomes O( −2 |2η − t| −4 ln δ −1 ), which is consistent with the results of Ref. [5]. The opposite extreme case can also be considered, that is, using a non-superposed sample |ψ = |a |b a with |R| = 1 (or equivalently, q = 0), which still allows quantum parallelism to be processed by the BV kernel. However, in this case, P S becomes exponentially small with n [ Eq. (17)] and is therefore negligible (based on the definition [D]). Hence, a majority-voting condition cannot be established. Moreover, the order of q is at least O(log 2 n). Note that if q = O(log 2 n), the polynomial quantum-sample complexity cannot be achieved, that is, S = O(4 n−log n ).

Discussion
From the results of RE 1, 2, and 3, we can draw the following conclusion: the cost C, defined in Eq. (2), can be a polynomial of the problem size n. The first step to achieve the polynomial-scaling C is the optimisation of the machinery of P |ψ by using the unary (one-hot) sample input. Such a technique has been used to parallelise the expensive quantum gates in various contexts [18,17]. In our case, the focus is on reducing the layers of T and T † gates in the context of the fault-tolerant QC. The second key enabler for our result is the BV kernel in the main computation P A , which leads to a considerable reduction in the quantum-sample complexity. However, note that the unary qubit encoding is useful for P |ψ , while not at all for P A . Thus, we need to transform the input from unary into binary to efficiently run P A . In summary, the polynomial T -depth quantum solvability of NBLPs can successfully be addressed by allowing P |ψ and P A to use favorable encodings. Note further that such a result can be achieved when the two computational features, i.e., in P |ψ and P A , are analysed in a single framework. We believe that this approach will be a milestone towards confirming the overall quantum computational speedup from quantum-sample preparation to main computation.
Another insight owing to our comprehensive analysis of P |ψ + P A is the depthwidth tradeoff in the NBLP. It can be specified by Eqs. (12) and (25): roughly, (depth) × (width) 2 ≤ O(4 n ). For example, if q = n, the polynomial quantum-sample complexity can be obtained (as argued in Refs. [5,6,11]). However, this suggests an exponential scale for the number of logical qubits (RE 1) §. By contrast, if we attempt to reduce the number of the qubits to a polynomial in n, for example, by letting q = O(log n), an exponential reduction in the quantum-sample complexity cannot be achieved; and hence, the polynomial T -depth.
A further improvement can be achieved by developing a more efficient errorcorrecting code or a more efficient sample preparation scheme, which would reduce the level of noisy physical qubits.
Subsequently, we measured the state |k . If k = 1 is measured using the delta function ω a j (k j +s j ) , (A. 4) we can achieve the final state as the true solution: |k = |s 0 s 1 · · · s n−1 , (A.5) where ω = e i 2π d = (−1) with d = 2, and the probability amplitude 1 √ 2 is eliminated by the measurement of |k . For a simpler analysis, we assume q = n (hence, a = a∈{0,1} n ). If k = 0 is measured, we cannot retrieve any information of s; that is, the algorithm returns a failure.
In the presence of noise (NBLP).-Given the sample state, that is, |ψ = 1 √ 2 q a |a |a · s + e a (mod 2) , (A. 6) with non-zero noise η = 0, the n + 1 QFTs were applied as described above. We then attain the following output state: where we use | k |1 | 2 = 1 2 . This is equal to Eq. (12) in the main manuscript.