Proof-of-principle demonstration of compiled Shor’s algorithm using a quantum dot single-photon source

We report a proof-of-principle demonstration of Shor’s algorithm with photons generated by an on-demand semiconductor quantum dot single-photon source for the first time. A fully compiled version of Shor’s algorithm for factoring 15 has been accomplished with a significantly reduced resource requirement that employs the four-photon cluster state. Genuine multiparticle entanglement properties are confirmed to reveal the quantum character of the algorithm and circuit. The implementation realizes the Shor’s algorithm with deterministic photonic qubits, which opens new applications for cluster state beyond one-way quantum computing. © 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
Some quantum algorithms provide dramatic speedup in solving problems like factoring [1,2], which is difficult for current computers for large numbers. The security of widely used cryptography, like Rivest-Shamir-Adleman (RSA) public-key cryptosystem, relies on crucially the difficulty of factoring a large number to be product of two large prime numbers [3,4]. Remarkably, the Shor's algorithm utilizing quantum computer [1,2] provides an efficient way for factoring, thus directly threatens the RSA's security in the near future.
Demonstration of Shor's algorithm requires lots of qubits and gates that is beyond the current quantum technologies. Proof-of-principle demonstration, with some of the parameters being initially determined to reduce the resource requirement, is sufficient to characterize the core processes of Shor's algorithm [5]. This kind of demonstrations have been presented with systems ranging from liquid nuclear magnetic resonance [6], photonic qubits (qutrits) [7][8][9][10], superconducting circuits [11,12], to ion-trap [13]. Among these architectures, polarization encoded photonic qubits experience negligible decoherence and the fastest gates, are promising candidates for quantum computing [14]. All existing implementations of Shor's algorithm with photonic qubits employ photons generated from spontaneous parametric down-conversion (SPDC) sources [15]. Intrinsic noise of the SPDC, however, comes from multiphoton emission [16]. Therefore, it must be set to low efficiency for detectors to suppress unwanted multiphoton events, which, in return, pulls down the whole performance of quantum circuits. Semiconductor quantum dot (QD) single-photon sources, which, however, are able to generate photons one by one [17], fit extremely well for this task. Recent progresses have demonstrated that photons can be deterministically generated with high extraction efficiency, single-photon purity, and photon indistinguishability altogether [18,19]. By embedding a single QD into a symmetry-broken microcavity, photons being generated exhibit high degrees of polarization [20]. Here, we present a proof-of-principle demonstration of Shor's algorithm using photons from QD.

Methods and experimental implementations
In number theory, a strategy for factoring an n-bit composite number N = p × q, both p and q are odd primes with p q, is as follows: 1. Find the base b and the order r that satisfy: (a) b is co-prime to N, and 0<b<N, (b) r is a positive even integer, (c) b r ≡ 1 (mod N), and b r/2 ±1 (mod N).
Here, the remainders of modular arithmetic (https://en.wikipedia.org/wiki/modular_arithmetic) are non-negative and less than N. Two solutions of the GCD calculation are two nontrivial factors p and q, by which way a composite number can be efficiently factored.
The bottleneck of this algorithm lies in difficulty of selecting b and finding r satisfying b r ≡ 1 (mod N), or vice versa. For a classical computer, it needs at least exp [O(n 1/3 log 2/3 n)] operations to complete this task [4,21]. Fortunately, Shor's algorithm utilizing a quantum computer provides an effective way to execute it in a polynomial complexity. The quantum routine of the Shor's algorithm needs two registers of qubits [2,5]: the argument register that employs l qubits to store the argument x, and the function register that employs n = log 2 N qubits to store the modular exponential function: f (x) = b x mod N. Both x and f (x) can be represented by binary integer sets of x k and f k satisfying x = l−1 k=0 2 k x k , and f (x) = n−1 k=0 2 k f k . The physical realization of the Shor's algorithm requires three distinct steps: 1. Initialization. Applying H gates on argument register so that the state |0 ⊗l transforms to |+ ⊗l = 2 l −1 x=0 |x / √ 2 l , which is an equally weighted superposition. The number of digit for the argument register l is determined by an accuracy that we wish to estimate the order (usually l ≈ 2n) [5]. A gate is applied on the last qubit of the function register, transforms the initial state to be |00 · · · 01 .
2. Modular exponentiation. According to what Deutsch called "massive quantum parallelism" [22], one can calculate the modular exponential function f (x) with several controlled-U f gates,

Inverse quantum Fourier transform (QFT).
Owing to the fact that f (x) exhibits periodicity, an inverse QFT can be then applied on the argument register to acquire "frequency", Here, y is represented by binary fraction set of y k satisfying y = l k=1 y k /2 k . The probability amplitude reaches to peak if y ≈ j/r for any integer j. Thus the order can be extracted with high success rate.
However, even factoring the simplest number, N = 15, requires a total of 12 qubits for a proof-of-principle demonstration (n = 4, l ≈ 2n = 8). It is quite challenging for current quantum techniques to implement completely the Shor's algorithm. Fortunately, the compiling technique allows one to reduce the number of qubit resources. In N = 15 case, the base could be chosen from b = 2, 4, 7, 8, 11, 13. All elements satisfy the condition that f (4) = 1, or r = 4. Hence, only 2 qubits in the argument register are sufficient to exhibit the periodicity of f (x). To avoid possible errors, an additional qubit is further exploited for the analysis of the answers. Therefore, it requires at least 7 qubits for a proof-of-principle demonstration (n = 4, l = 3). Figure 1(a) indicates the quantum circuit applied by this level of compilation, or partial compilation. Furthermore, a full compilation could then be implemented by further reduction of qubit requirement. As it is always true that r<N, the function register can be represented with fewer (only n = log 2 r ) qubits. We define a new function: , which acts as a mapping of f (x). It turns out that F(x) maintains the periodicity of f (x), in which the inverse QFT applied on the argument register is kept invariable [5]. The inverse QFT can be implemented in a semiclassical way that performs only single-qubit operations conditioned on measurement outcomes [23]. Thus, there is no need to perform two-qubit gates to achieve it. Moreover, from Fig. 1(b), the U 2 f gates are always equivalent to identity operation. Hence, the qubit x 2 (or y 3 ) is not relevant to the rest, which the operations and measurements on that qubit can be performed independently. Therefore, this fully compiled version of Shor's algorithm for factoring N = 15 (or finding r = 4) only requires four-qubit entanglement (n = 2, l = 2). In Fig. 2(a), we illustrate this fully compiled version of quantum circuit. In Fig. 2(b), we depict the details of U F gates. For b = 4 or 11, the state after modular exponentiation, or the intermediate state, is only a two-qubit entanglement state that can be achieved with only one controlled-gate. For b = 2 or 13, there are two sets of states with two-qubit entanglement that can be achieved by performing the same operation twice  Fig. 1(a), but requires reduced number of qubits and gates. The modular exponentiation is implemented by controlled-U F gates instead, while the inverse QFT is implemented in a semiclassical way. The qubit x 2 (or y 3 , represented by colored wire) is not relevant to the others, which the operations and measurements can be performed independently. (b) Details of U F gates, which act as the quantum version of modular adders. as b = 4 or 11 case. The above two cases have already been demonstrated in previous literatures [8,9], while we will unveil here a more complicated case-b = 7 or 8. The intermediate state for this case is a genuine entanglement among all four qubits, which is of the form: The intermediate state represented by Eq. (1) is in fact equivalent to a four-qubit cluster (C 4 ) state [24], which can be achieved post-selectively with only linear optics in our photonic quantum architecture (See Appendix A). The schematic of experimental setup is sketched in Fig. 3, which consists of four distinct steps: 1. Single-photon emission. The state-of-the-art QD is embedded into a micropillar cavity [18] with a diameter of 2 µm, and put into a cryostat cooled down to 4 K. Under resonant excitation with a repetition rate of 76 MHz [25], single photons can be deterministically generated. A cross-polarization configuration, which consists of several polarization optics, is applied to extinguish unwanted laser background. The photons applied to this task have a lifetime of ∼60 ps, and counting rate of ∼6.4 MHz on the superconducting nanowire single-photon detector (SNSPD) with a detection efficiency of ∼80 %. In previous literature, single-photon purity is experimentally measured to be 0  5°, which act as gates and H gates respectively. A pair of wave plates aligned before each PBS in step (iv) enable detection along any desired basis. QD, quantum dot; PC, Pockels cell; PBS, polarizing beam-splitter; HWP, half-wave plate; QWP, quarter-wave plate; SNSPD, superconducting nanowire single-photon detector.

Results
The inverse QFT on the argument register of Eq. (1) results in a mixed state, therefore it is almost impossible to characterize the performance of the quantum circuit by estimating state fidelity. The intermediate state represented by Eq. (1) is the persistent four-qubit entanglement [28], one can thus perform measurements on that state to characterize the quantum circuit. The measurement is performed both qualitatively and quantitatively. For qualitative measurement, the four-fold correlations are performed by measuring all modes along {|H, |V } and {|D , |A } bases, where |D and |A denote diagonal (45°) and anti-diagonal (−45°) linear polarizations. Also two of four modes can be measured along {|R , |L } basis instead, where |R and |L denote right and left circular polarizations. The results are shown in Fig. 4, where peaks in each pattern fit well with theoretical predictions described as Eqs. (5), (6), and (7) in Appendix A. As for quantitative measurement, one can evaluate fidelity of the state using stabilizer correlation measurements, since the cluster state can be fully described by its stabilizers [29]. The evaluated expectation values of stabilizer correlation measurements are listed in Table 1, where σ 0 , σ 1 , σ 2 , and σ 3 correspond to Pauli matrices [21]. In our case, one can accomplish detections with only 9 measurements instead of a full tomography configuration. By averaging the expectation values of all stabilizer correlation measurements, the fidelity can be estimated to be 0.756(8), well above the classical limit of 0.5, indicating a genuine quantum computing in the modular exponentiation step.
At the final stage, one can implement inverse QFT to acquire the answer. A rotation of θ (θ = 0, π/2, π/4, . . .) along Z axis followed by a H operation with measurement along {|0 , |1 } basis is equivalent to a measurement along {(|0 ± e −iθ |1 )/ √ 2} basis [24], which has widely been used in characterization of the Greenberger-Horne-Zeilinger state [30]. To acquire the answer, one needs to analyze the measured data both qualitatively and quantitatively. Here, we analyze both l = 2 or 3 cases. Qualitatively, one can plot the probability distributions indicated in Figs. 5(a) and 5(b), for l = 2 and 3, respectively. It seems hard to distinguish any changes between two patterns, and peaks in both patterns appear at the position where y = 0/4, 1/4, 2/4, and 3/4, for which it is easy to estimate r = 4. Quantitatively, one can theoretically calculate the probability distributions from r = 1 to 4, which are plotted in Figs. 7 and 8 in Appendix B for l = 2 and 3 cases, and compare our measured data with them. One can use the square of statistical overlap (SSO) [31], which is used to quantify similarities between measured and expected probability distributions, to characterize the comparisons. The SSO, derived from statistical overlap (SO) [32], is defined as: γ = ( 7/8 y=0 √ m y e y ) 2 , where m y and e y denote measured and expected probabilities of the state |y . From the comparison results listed in Table 2, the maximums of γ = 0.999(41) and 0.996(41) for l = 2 and 3 appear at the place where r = 4. But for l = 2, a high SSO of 0.956(39) also appears at the order of r = 3, meaning that imperfections of quantum circuits may probably result in a wrong answer. Both qualitative and quantitative analyses reveal the same answer of r = 4. Therefore, 3 qubits in the argument register are needed at least to extract the correct answers with a higher success rate. After the answer has been acquired, the solutions of gcd (b r/2 ± 1, N) are two nontrivial factors of the composite number, which are calculated to be 3 and 5 for N = 15.

Discussions
We have so far presented a proof-of-principle demonstration of compiled Shor's algorithm with photons generated from QD single-photon source. A genuine four-photon entanglement has been observed during the experiment. The fidelity is limited by imperfection of single-photon source. For simplicity, we assume the final fidelity F is affected by single-qubit gate fidelity F s and two-qubit gate fidelity F d . For an m-photon entanglement, it needs at least m − 1 two-qubit gates to prepare the state. Thus, one can estimate the final fidelity via F = F m s F m−1 d . From the data of independently measured qubit in l = 3 case, the single-qubit gate fidelity, caused by single-qubit operations, can reach to near-unity (F s ≈ 0.997). Therefore, the final fidelity will mainly be limited by two-qubit gate fidelity. The noise from residual laser leakage and sometimes photons from other QDs lead to multiphoton events, which deteriorate the single-photon purity. Impure single photons, together with other effects like charge noise, spin noise, and phonon sidebands [33,34], decrease the indistinguishability. These imperfections contribute to unwanted four-fold correlation background and reduce the fidelity of the prepared state. From the fidelity of 0.756 (8), one can estimate the two-qubit gate fidelity to be 0.914 (5). Our experiment can be extended to 8 photons, where the largest order that can be found should be r = 16. Compared to the optimal SPDC sources nowadays with 12-photon entanglement [30], our QD single-photon source shows shortcomings in this aspect. However, the purity and indistinguishability of this solid-state single-photon source can be in principle both improved to near-unity [34]. Thus, the number of photons being entangled can be greatly extended.
The QD used in current experiment has a lifetime of ∼60 ps, which is much shorter than the timescale for any single-qubit or two-qubit gates of ion-trap or superconducting circuit architectures [35], meaning a higher correlation counting rate (or a shorter computation time) could be achieved by increasing the repetition rate in QD architecture. The correlation counting rate can be estimated via R = R 0 η, where R 0 and η represent repetition rate (76 MHz in current experiment) and system efficiency (including preparation, operation, and detection efficiency).
Assuming that both QD-and SPDC-based experiments experience the same repetition rate and detection efficiency. The preparation efficiency for QD-based experiment includes the efficiency at the incident ends (fiber output) η QD , which relates to incident photon brightness, and that of optical switches η PC (mainly affected by PC). And the preparation efficiency for SPDC-based experiment only includes the efficiency at the incident ends η SPDC . The operation efficiency denotes the success rate for each configuration. Therefore, the m-fold correlation counting rate for QD-and SPDC-based experiments satisfy R QD ∝ (R 0 /m)(η m QD η m−1 PC )/2 m−1 and R SPDC ∝ R 0 η m/2 SPDC /2 m/2−1 respectively. For direct comparison, we calculate the ratio between the counting rates of both sources, yielding R QD /R SPDC = (η QD η PC / 2η SPDC ) m /(mη PC ). To show the advantages of QD-based experiment, it must satisfy the condition that η QD η PC / 2η SPDC >1. Consider the counting rate of ∼6.4 MHz, detection efficiency of ∼80 %, and η PC ≈ 84%, the value of η QD η PC / 2η SPDC is approximately 0.28 compared to the optimal SPDC source [30]. Even the optimal QD single-photon source can only increase this value to ∼0.68 [36]. Note that due to the trade-off between fidelity and efficiency for SPDC source, η SPDC almost reaches to near-optimal. In contrast, high efficiency, high single-photon purity, and high indistinguishability have simultaneously been achieved on QD single-photon sources [18,19]. By embedding that QD into an asymmetric microcavity, both indistinguishability and efficiency are expected to reach near-unity [20]. The value of η QD η PC / 2η SPDC is expected to be more than 2, which makes QD single-photon sources perform a better scalability in quantum computing.
Furthermore, we have presented techniques that simplify complicated quantum operations like modular exponentiation, and adapted the easy-to-get quantum states like C 4 state to the specific quantum task. This is an illustration of dramatic simplification in quantum computing. We have also presented strategies for evaluation of the circuit and analysis of the data, which enable proper characterizations of the quantum task. Although imperfect quantum circuit, mainly caused by possibly poor entanglement fidelity, limits its scalability, it has little effects on the computation results due to the answer is acquired from the similarity between measured and expected data.

Summary
In summary, we have achieved a proof-of-principle demonstration of small-scale quantum algorithm with photons generated from deterministic single-photon source. We have presented every necessary stage of an r = 4 order finding routine with only four single photons. Our approach of compilation reduces the required qubits from 3 log 2 N to 2 log 2 r (r<N), and simplifies the gates by transforming modular multipliers to modular adders, finding a way to make complicated quantum problems feasible. Genuine persistent entanglement [28] among all photonic qubits has been maintained during the experiment, indicating quantum characters of the algorithm and the circuit. Since the answer is acquired from the maximum of a parameter that quantifies the similarity between measured and expected results, it is robust to the imperfections of the quantum circuit. Besides, our experiment opens new applications for the cluster state beyond one-way quantum computing [24]. By combining the compilation technique with qubit recycling [37], one may accomplish the task with further reduced number of qubits. To scale up for factoring larger numbers, finding larger orders, or even attaining a full-scale demonstration that requires auxiliary qubits to store, and finally erase, the intermediate results [5], challenges mainly come from the limited scalability caused by poor fidelity of multiphoton entanglement due to noise from residual laser leakage, charge and spin noise, phonon sidebands, and process of the post-selective entanglement generation.

Appendix A: C 4 state preparation and characterization
The photonic states of |0 and |1 are represented by |H and |V . For a polarizing beam-splitter, as shown in Fig. 6(a), it has two input modes of 1 and 2, and two output modes of 3 and 4. If two input photons are initialized into (|H 1 + |V 1 )(|H 2 + |V 2 )/2, the state of output photons would be (|H 4 |H 3 + |H 4 |V 4 + |V 3 |H 3 + |V 3 |V 4 )/2. Since we post-select two photons in the opposite output modes simultaneously, the state is then projected into (|H 3 |H 4 + |V 3 |V 4 )/ √ 2 with a success rate of 1/2, by which way one can prepare entangled state on-demand. The schematic for photonic C 4 state preparation is shown in Fig. 6(b). All half-wave plates in Fig. 6 Then, two polarizing beam-splitters project the whole state into: Next, three half-wave plates in x 1 , x 0 , and F 1 modes transform the system into: At last, the x 0 and F 0 modes interfere at the final polarizing beam-splitter to achieve the C 4 and measurements of all modes along {|H , |V } basis will result in four peaks. The peaks reveal only partial of possible entanglement property, additional measurements are still necessary. One can use {|D , |A } basis, which can be written as the superposition of {|H , |V } basis, to equivalently describe the C 4 state. The state |D is defined as (|H + |V )/ √ 2, while the state |A is defined as (|H − |V )/ √ 2. Then, the C 4 state can be written as: and measurements of all modes along {|D , |A } basis also result in four peaks. Next, one can also equivalently describe the C 4 state with two of four modes use {|R , |L } basis instead. Like {|D , |A } basis, {|R , |L } basis can also be represented by the superposition of {|H , |V } basis: |R = (|H + i|V )/ √ 2, and |L = (|H − i|V )/ √ 2. Therefore, the C 4 state can also be written as the followings: and measurements along these two sets of basis both result in four peaks.

Appendix B: analysis of inverse QFT
Since the order finding routine results in the periodic function F(x), the intermediate state of routine can be rewritten as: where m, j are non-negative integers satisfying rm + j<2 l . By tracing out the function register, the argument register will be projected into a mixed state. We introduce density matrix to represent the mixed state, by which the argument register can be represented as: ρ = j<r j=0 |ψ j ψ j |, where |ψ j = m |rm + j / √ 2 l . Each element of density matrix |ψ j exhibits periodicity with a period of r, in which l = log 2 r qubits are sufficient to construct it. For the current experimental Fig. 7. Expected probability distributions for the answers of the inverse QFT, with the order from 1 to 4, and 2 qubits in the argument register. Gray line in each plot represents half of the maximum (and the same as in Fig. 8), and values exceed this line are identified as "peaks", which seem to appear at y = j/r. parameter of r = 4, only 2 qubits are needed in the argument register. One can theoretically calculate expected probability distributions of the inverse QFT applied on ρ with the order from 1 to 4, which have been indicated in Fig. 7.
As seen from Fig. 7, the peaks seem to appear at y = j/r [for r = 3, three peaks are equivalent to appearing at y = 0/3 = 0.00(binary), y = 1/3 ≈ 0.01(binary), and y = 2/3 ≈ 0.11(binary)]. In the experiment, we extract the order by comparing measured data with expected ones. However, imperfections of quantum circuits may lead to a wrong answer, it is necessary to quantify the measured results. We firstly perform the cross comparisons between expected data indicated in Fig. 7. We use squared statistical overlap (SSO) [31], which is defined as: γ = ( 7/8 y=0 √ m y e y ) 2 , to quantify the comparisons. By substituting expected probabilities into both m y and e y , one can calculate SSOs for the cross comparisons. The calculated SSOs for l = 2 are listed in the left part of Table 3. Table 3. Calculated square of statistical overlap (SSO) for cross comparisons of the patterns shown in Fig. 7 for l = 2 (left part of the Table)  In Table 3, the maximum of SSO (γ = 1, as represented with bold values) appears in the position on the diagonal, meaning the extracted answer equals to the expected. However, for r = 3 and 4, a high SSO of γ = 0.966 (as represented with italic values) appears off the diagonal, which may result errors due to the imperfect quantum circuits. An additional qubit can be applied on the argument register to avoid this. The expected probability distributions for l = 3 are shown in Fig. 8, and the calculated SSOs for cross comparisons of the calculated data for l = 3 are listed in the right part of Table 3 respectively. Here, high SSOs no longer appear in the position off the diagonal of Table 3. Therefore, 3 qubits are needed at least that make the quantum circuit be more robust to noise. Fig. 8. Expected probability distributions for the answers of the inverse QFT, with the order from 1 to 4, and 3 qubits in the argument register. Peaks in these plots look sharper than those in Fig. 7, meaning an additional qubit makes the quantum circuit to be more robust to noise.

Funding
National Natural Science Foundation of China (11575174, 11674308, 11704424, 11774326, 11874346); Chinese Academy of Sciences; National Key Research and Development Program of China.