Heralded amplification of nonlocality via entanglement swapping

To observe the loophole-free violation of the Clauser–Horne–Shimony–Holt (CHSH) inequality between distant two parties, i.e. the CHSH value S > 2 , the main limitation of distance stems from the loss in the transmission channel. The entanglement swapping relay (ESR) is a simple way to amplify the signal and enables us to evade the impact of the transmission loss. Here, we experimentally test the heralded nonlocality amplifier protocol based on the ESR. We observe that the obtained probability distribution is in excellent agreement with those expected by the numerical simulation with experimental parameters which are precisely characterized in a separate measurement. Moreover, we experimentally estimate the nonlocality of the heralded state after the transmission of 10 dB loss just before final detection. The estimated CHSH value is S = 2.113 > 2 , which indicates that our final state possesses nonlocality even with transmission loss and various experimental imperfections. Our result clarifies an important benchmark of the ESR protocol, and paves the way towards the long-distance realization of the loophole-free CHSH-violation as well as device-independent quantum key distribution.


Introduction
Nonlocality is not only an interesting feature of quantum mechanics which can be tested by the celebrated Bell inequality [1,2], but also the key resource for quantum information protocols. Recently, it was pointed out that the system violating the Bell inequality in a detection loophole-free manner is directly related to the quantum information applications such as device-independent quantum key distribution(DIQKD) [3,4]. Assuming that the physical apparatuses are honest [5,6], DIQKD allows the two users, Alice and Bob, to guarantee the security without characterizing internal workings of the devices. However, its practical implementation is still challenging. One of the most formidable obstacles is closing the detection loophole [7][8][9][10][11][12], which necessitates the receiver to detect at least 2/3 of emitted photons [13]. That is, if a standard optical fiber at telecommunication wavelength with 0.2dB km −1 -loss is used as a transmission channel, the achievable distance becomes less than 10km even if photon detectors with unity detection efficiencies are employed.
Toward realization of the loophole-free violation of the Bell inequality over long distance, several protocols to circumvent the impact of transmission loss have been proposed, such as the linear-optics-based heralded photon amplifier(HPA) [14][15][16] and the heralding protocol with a nonlinear process [17]. When a single photon state(such as a part of the entangled photon pair) is sent into a lossy channel, the state turns out to be a mixture of a single-photon state and a vacuum. These protocols can increase the fraction of qubit(single photon) and suppress the vacuum fraction with a certain probability. For example, the HPA utilizes an ancillary entangled state formed by a vacuum and a single photon state to amplify the single-photon fraction of the input state. By applying them to the Bell state transmission, one can recover the lost Bell state with some success

Heralded nonlocality amplification by entanglement swapping
In this section, we review the ESR-based heralding scheme. The ESR-based heralding scheme proposed in [25] is illustrated in figure 1(a). Entangled photon pairs are prepared at Alice's side by source A. One of them is sent to Bob via a lossy optical channel with transmittance η T , which easily destroys the nonlocality of the state. Bob prepares another entangled photon pair by source B, and performs the ESR by the Bell state measurement (BSM) to recover the lost nonlocality of the shared state between Alice and Bob. Since the ESR succeeds only probabilistically, this is a probabilistic protocol and we use the state only when heralded by the successful events of the ESR.
In practice, entangled photon pair sources A and B are based on the SPDC, which generates entangled photon pairs only probabilistically, and moreover, sometimes generates multiple pairs simultaneously. The adverse influence of the probabilistic nature of SPDC sources on the heralding schemes has been addressed in some previous studies. For example, it is argued that the vacuum components of the SPDC sources remain in the heralded stater AB , and this leads to the vanishing of nonlocality [29]. However, this is not true, since after successful swapping, the heralded state mainly consists of the superposition of the following three events: (i)two photon pairs from sourceA and no photon pair from sourceB, (ii)two photon pairs from sourceB and no photon pair from sourceA, and (iii)one photon pair from each of sourcesA and B, which actually does not contain vacuum component. In [14,17], it is pointed out that, in the above heralded state, (iii) is clearly the desirable event, but the probability that the unwanted events (i) or (ii) occur is almost the same probability as (iii). Therefore, the fidelity of the heralded stater AB to the two-qubit maximally entangled states never exceeds 0.5, which was thought to be a reason that the generated state loses its nonlocality. However, as shown by Curty and Moroder [25], thisr AB still violates the CHSH inequality. That is,r AB contains some nonlocality, although it is far from ideal Bell states.
In this paper, we demonstrate these theoretical predictions by estimating the nonlocality and density matrix of the heralded state using experimental results.

Model
We first explain the procedure to generate a raw key using the ESR-based DIQKD in figure 1(a). Alice(Bob) generates entangled photon pairs at sourceA(B). Under the condition that the BSM succeeds at the ESR node, Alice and Bob perform the polarization-measurements based on the measurement settings X i ä{X 1 , X 2 } and Y j ä{Y 1 , Y 2 }, respectively. The measurement outcomes are binary, i.e. a i , b j ä{−1,+1}. By repeating the measurement, they calculate While the maximal value of | | S is upper-bounded by 2 in the framework of a local realism theory, quantum mechanics allows | | S to take the maximal value of 2 2, which is known as the Tsirelson bound [30]. When Alice and Bob perform DIQKD, Alice chooses another measurement basis X 0 , and the raw key is generated by the outcomes under the measurement setting of {X 0 , Y 1 }. The lower bound of the asymptotic key rate r is represented by [3,4] where r DW is Devetak-Winter rate [31], Q is qubit error rate which is defined by Here, h(·) is the binary entropy defined by h(x)=−xlog 2 x−(1−x)log 2 (1−x). Next, we describe the theoretical model. We modify the configuration in figure 1(a) to the one illustrated in figure 1(b). The difference is that the BSM is located not in Bob's side but in the middle of the channel and thus the channel is split into two with η TA and η TB , respectively. We do so, since the previous theoretical studies with ideal Bell state [32] and single-photon sources [29] indicate that the configuration in figure 1(b) is better. This anticipation is investigated in detail using the realistic model introduced below. Hereafter, we call the configuration in figure 1(b) the middle-heralding(MH) scheme and the other one as the side-heralding(SH) scheme.
As a realistic model with SPDC sources, we introduce a theoretical model similar to the one introduced in [27], as shown in figure 2. Each entangled photon pair source consists of a pair of two-mode squeezed vacua(TMSV). The Hamiltonian is given byˆ(ˆˆˆˆ) (  for sourceB. Here, denote the H-and V-polarization states of a single photon in mode j, respectively. At the ESR node, we perform the partial Bell-sate measurement using linear optics. We adopt the projection onto |Y ñ -, which is realized by detecting the two-fold coincidence between ( ). The successful operation of the ESR in the two-qubit system is described by

HH VV
2 . The polarizer with angle θ works as a polarization-domain beamsplitter mixing the H and V modes whose transmittance and reflectance are cos 2 θ and sin 2 θ, respectively. Under the condition of the two-fold coincidence between ( 6H ), Alice(Bob) chooses her(his) angle from θ A ={θ A0 , θ A1 , θ A2 }(θ B ={θ B1 , θ B2 }), respectively, and performs polarization measurements. We calculate the probability of all the combinations of the photon detection (click) and no-detection (no-click) events among D 1 , D 2 , D 3 and D 4 for each polarization angle, and obtain the probability distributions. For calculating S, Alice/Bob determines her/his local rule, and assign +1 or −1 for each detection event. We introduce the following simple local assignment strategy for Alice(Bob): only D 1 (D 2 ) clicks  -1 and otherwise  +1, respectively. The losses in the transmission channels are represented by η AH , η AV , η BH , and η BV .(Thus, the SH scheme can be simulated by setting η BH =η BV =1.) The local system losses including the imperfect quantum efficiencies of the detectors are modeled by inserting virtual loss materials denoted by η l for lä{1, L, 8}. We consider that all of the detectors are threshold detectors, which only distinguish between vacuum(no-click) and non-vacuum(click). the dark-count probability ν, which is a false click of the detector, is also taken into account in the model. The mode-mismatch between Alice's TMSV and Bob's TMSV is modeled by inserting virtual beamsplitter(BS) whose transmittance is T mode in each input port of the half beamsplitter(HBS) at the ESR node as shown in the inset of figure 2. In other words, two virtual BSs divide the mode of the each TMSV into two parts: the mode which interferes with probability amplitude T mode and that does not with probability amplitude -T 1 mode . The experimental value of T mode can be determined by performing the Hong-Ou-Mandel interference experiment [33][34][35].

Numerical results
The CHSH value S in equation (1) is numerically calculated by using characteristic-function approach based on the covariance matrix of the quantum state and symplectic transformations [27,36,37]. See appendix A and [36] for more details of this method. Below we show the numerical results comparing the MH and the SH scheme. When the corresponding fiber length is Lkm, we set η AH =η AV =η T and η BH =η BV =1 for the SH scheme, T for the MH scheme, respectively, where η T =10 −0.2L/10 . In figure 3(a), we show the relation between L and the CHSH value S in an ideal system where all the local detection efficiencies are unity, the mode-matching is perfect, and detectors have no dark counts(i.e. " l η l =1, T mode =1, and ν=0 ). At each point, we perform the optimization over the average photon numbers of the TMSVs, and measurement angles using a random search algorithm. We see that the degradation of S against the transmission distance is small for both of the MH and the SH schemes, since it is possible to set the optimal average photon numbers to be small(typically ∼10 −5 ) in the ideal case. This makes the detrimental contribution of the multiple pairs negligible. Interestingly, the maximal violation at 0km is S∼2.34, which is slightly better than what is achieved by using a single-mode SPDC-based entangled pair source(No ESR) [37,38]. On the other hand, the minimum detection efficiency to obtain S>2 is calculated to be 91.1%, which is larger than 66.7% needed in the case of No ESR [37,38]. These differences come from the fact that the density operator of the heralded state is far from the state directly generated by SPDC which mainly consists of vacuum state. The relation between L and the key rate K is shown in figure 3(b). Here, we define K≔P suc ×r DW in the MH and the SH scheme, and K≔r DW in No ESR, where P suc is the success probability of the linear-optical BSM. The average photon numbers, which maximize S, are no longer optimal for maximizing K, since employing the small average photon numbers results in the low P suc at the ESR node. That is, there is a trade-off between S and P suc for maximizing K. We clearly see the difference of K between the MH scheme and the SH scheme. The reason is qualitatively understood as follows. In the SH scheme, since a large loss is imposed on the TMSVs from sourceA, the average number of photons which survive at the ESR node is smaller than that in the MH scheme, which results in the lower P suc . In short fiber length regime, K in No ESR is about two orders larger than that in the MH and the SH scheme. This is because P suc is around 0.01 while r DW is similar between No ESR and the MH/SH scheme. Next, we add dark count probabilities of ν=10 −6 and ν=10 −5 , and compare S of the SH scheme and the MH scheme as shown in figure 4(a). S of the SH scheme starts to deviate from that of MH scheme for large L. The reason is also understood by the trade-off between S and P suc . When dark counts are considered, it is necessary to keep the average number of the photons that survive at ESR node sufficiently larger than the dark-count probability. Thus, in the SH scheme, the optimal average photon number of sourceA must be larger than that in the MH scheme, which however results in smaller S. The minimum detection efficiencies to obtain S>2 slightly increase. For example, at L=50 km in the MH scheme, 91.6% and 92.7% are necessary in the case of ν=10 −6 and ν=10 −5 , respectively. Finally, we compare K of the SH scheme and that of the MH scheme with considering the dark-count probabilities as shown in figure 4(b). We see a large gap between K of the MH scheme and the SH scheme. These results suggest that the MH scheme is always better than the SH scheme, which is consistent with previous studies about ESR schemes [29,32] and time-reversed version of ESR schemes [39][40][41][42].

Experimental setup
We perform the ESR-based Bell-test experiment using the setup illustrated in figure 5. The pump pulse(wavelength: 792 nm, pulse duration: 2 ps, repetition rate: 76 MHz) is obtained by a Ti:Sapphire laser. The pump pulse is split into two optical paths by a half waveplate(HWP) and a polarization beamsplitter(PBS), and fed to the two independent Sagnac-loop interferometers with group-velocity-matched periodically poled KTiOPO 4 (GVM-PPKTP) crystals [43]. The polarization of the each pump pulse is properly adjusted by a HWP and a paired quarter-waveplates(QWPs). The two-qubit components of the states generated from sources A andB form the maximally entangled states |Y ñ + 12 and |Y ñ -34 , respectively. While the photon1(4) passes through the dichroic mirror(DM) and goes to Alice's(Bob's) side, the photons 2 and 3 are led to the ESR node to perform the linear-optical BSM. The transmission losses in the optical fibers are emulated by two neutral density filters(NDs) inserted in modes 2 and 3. In each optical path, we insert an interference filter(IF) whose center wavelength and bandwidth are 1584nm and 2nm, respectively, which is used to improve the purity of the SPDC photons. The linear-optical BSM is implemented by mixing two input photons by means of a HBS followed by the polarization-dependent coincidence detection between D 5V and D 6H , which projects the photon pair in modes 2 and 3 onto the singlet state |Y ñ -23 with the success probability of 1/8. We note that if we introduce another two detectors and perform active feed forward, the maximum success probability becomes 1/2. We use superconducting single-photon detectors(SSPDs) whose quantum efficiency is around 75% each [44]. Alice and Bob set measurement angles {θ A1 , θ A2 } and {θ B1 , θ B2 }, respectively, by means of the HWPs and fiber-based PBSs(FPBSs). Finally, the photons are detected by four SSPDs: D 1 and D 3 for Alice, and D 2 and D 4 for Bob, respectively. In the experiment, the detection signal from D 5V is used as a start signal for a time-todigital converter, and the detection signals from D , D , D , D   condition that the two-fold coincidence between D 5V and D 6H occur, all the combination of click and no-click events are collected without postselection. We assign −1 to the events where ( ) D D 1 2 clicks on Alice's(Bob's) side and +1 to all the other events, and then calculate S.

Characterization of experimental setup
We measure the experimental parameters which will be used in the numerical simulation. We first characterize the HBS at the ESR node using laser light centered at 1584nm. It is found that the HBS is lossy only for the H-polarized light from mode3. This loss is modeled by decomposing the HBS into the lossy material (η AH =0.27) and the ideal HBS in the numerical simulation. We also characterize the local detection efficiencies η l for lä{1, L, 6} by using the weakly-pumped TMSVs [45]. The results are shown in table 1. Throughout the experiment, we set the widths of the detection windows to be 1ns. The dark-count rate within the detection window is measured to be ν=10 −6 .
Under the above experimental conditions, we perform the numerical optimization of the average photon numbers of the TMSVs and the measurement angles such that S is maximized. Note that, in the optimization, we assume η 1 =η 2 =η 3 =η 4 =1, since the detection efficiencies shown in table 1 are not sufficient to observe the detection-loophole-free violation of the CHSH inequality. In addition, we impose a condition that each average photon number is at least´-1.5 10 2 to finish the experiment within reasonable time. We set the average photon numbers of the TMSVs based on the numerical results. The optimal average photon numbers and the experimentally-measured ones are shown in table 2, where μ k is the average photon number of TMSVk. We see that μ 2 is larger than the others, since η AH is imposed in the transmission path of TMSV2. The optimal measurement angles are {θ A1 , θ A2 }={0,0.58} [rad] and {θ B1 , θ B2 }={1.47,2.01} [rad]. With above experimental parameters, the two-qubit subspace of the input quantum states and the indistinguishability between photon3 and the photon4 are also characterized.(See appendices C and D.)

Experimental results
We adopt the MH scheme, and perform the ESR-based Bell-test experiment. Under the condition of the successful BSM, we accumulate every detection event of the heralded state without postselection. First, we remove the ND filters, and perform the Bell-test experiment on the heralded state with the optimal measurement angles. Since the detection efficiencies of our system are not in the range of closing the detection loophole, S does not directly exceed the threshold value of S=2. In fact, when we input all the experimental parameters to the numerical simulation, the value of S is expected to be S th =1.614. Nevertheless, it is still possible to compare S th and the CHSH value obtained by the experimentS exp . From the experimentallyobtained conditional probability distributions, S exp is calculated to be S exp =1.597±0.002, which coincides with S th . We also compare the conditional detection probabilities. For example, all the conditional detection probabilities for {θ A1 , θ B1 }={0,1.47} [rad] are shown in figure 6. Since each of Alice and Bob possesses two detectors, there are 2 4 =16 possible detection events for each measurement angle. The red bars and blue bars correspond to the conditional probabilities obtained by the experiment and the numerical simulation, respectively. We clearly see an excellent agreement between the experimental results and the numerical simulations. Moreover, the L1-distance  defined by is calculated to be as small as =   0.037 0.001, where P i (q i ) is the ith experimentally(theoretically)-obtained conditional detection probability, respectively. For the other measurement angles, the L1-distances are calculated to be 14.43±0.01% 11.57±0.07% Table 2. The optimal average photon numbers of the TMSVs, and the average photon numbers estimated by the experiment.
Next, we insert the ND filters, and perform the Bell-test experiment on the heralded state while changing the transmission losses. Note that we fix the average photon numbers and measurement angles throughout the experiment. The results are shown in figure 7 as three black dots. The total transmittance of the ND filters are equivalent to (i)0km, (ii)24km and (iii)50km of the optical fibers, and the corresponding S are (i)S exp =1.597±0.002, (ii)S exp =1.579±0.002 and (iii)S exp =1.591±0.002. They agree well with the theoretical curve for the MH scheme(shown by an orange solid curve) obtained by using the experimental parameters characterized by a separate measurement in section 4.2. When the detection efficiencies are small, the difference between the MH scheme and SH scheme(shown by red diamonds) is small. The blue solid curve(the MH scheme) and green diamonds(the SH scheme) are obtained by the numerical simulation with the experimental parameters but assuming that η l =1 for lä{1, 2, 3, 4}. Since our model fits the experimental results, these curves are considered to be the nonlocality of the heralded state just before detection. Interestingly, there is a large gap between the MH scheme and the SH scheme. The estimated CHSH values(S η=1 ) are shown by the three circles in figure 7. The values are estimated to be S η=1 =2.123(0 km), 2.121(24 km) and 2.113(50 km), respectively, which indicates that the quantum state just before detection possesses potential to violate the CHSH inequality even with various experimental imperfections.

Discussion
In this section, we estimate the density matrix of the experimentally heralded state just before detection by compensating the detection inefficiency with the help of our theoretical model. As shown in figure 2, the heralded state is distributed over the four modes: is the characteristic function of | | ñá 1100 0011 . We use the characteristic function of the heralded state for 50km, and reconstruct the unnormalized partial density matrix, as shown in figure 8.(See appendix B for the detailed calculation.) In addition to the four center peaks which correspond to | | Y ñáY + + , we clearly see the contribution of the events (i) and (ii). By renormalizing this partial density matrix, the fidelity to | | Y ñáY + + is calculated to be 0.47. This indicates that the heralded state is clearly far from the two-qubit maximally entangled states, while it possesses enough nonlocality to violate the CHSH inequality. This counterintuitive result may come from the fact that the heralded state possesses significant amount of entanglement. The density matrix shown in figure 8 implies thatr H V H V herald 1 1 4 4 is very close to the pure state: for i={H, V} and j={1, 4}. Apparently, equation (6) is maximally entangled state in 4×4 dimensions whose entropy of entanglement is 2. This means that the amount of entanglement is enough to present some nonlocality.

Conclusion
In conclusion, we experimentally demonstrate the heralded nonlocality amplifier based on the ESR. In theory, we employ the method to calculate the detection probabilities using the characteristic function, and investigate the optimal parameters and configuration which maximize S. In accord with the previous studies, the MH scheme(the ESR node is placed in the middle of Alice and Bob) is much more robust against the transmission loss than the SH scheme(the ESR node is placed in Bob's side) in realistic model with SPDC sources. In experiment, we perform the ESR-based Bell-test using the optimal parameters derived by the numerical simulation. While the detection efficiencies of our system is not in the range of closing the detection loophole, the experimental results are in excellent agreement with the numerical simulation with experimental parameters which are characterized in a separate measurement. This allows us to estimate the nonlocality and the density matrix of the heralded state just before detection. It is revealed that, while the density matrix of the heralded state is far from the ideal two-qubit maximally entangled state, the state possesses nonlocality(S η=1 =2.113>2) after the transmission loss of 10dB which is equivalent to a 50-km-long optical fiber at telecommunication wavelength. To directly observe S>2 over 50km, it is found that a detection efficiency at least 97.4% is necessary with our current experimental conditions. However, the threshold detection efficiency can be improved further down to 91.6%, if the experimental imperfections other than the dark counts are reduced. In view of the recent progress of the single-photon detection highlighted by high-efficiency single-photon detectors with quantum efficiencies > 93% [46,47], it could be possible to experimentally observe the nonlocality over such a long distance. Our result thus shows an important benchmark about the ESR protocol, and represents a major building block towards the long-distance realization of the loophole-free test of the CHSH-violation as well as DIQKD.

Appendix A. Detailed calculations based on the characteristic function
In this section, we present the detailed method to compute the conditional detection probabilities using the theoretical model in figure 2. We follow the definitions introduced in [37]. We define a density operator acting on the N-dimensional Hilbert space Ä  N asr. The characteristic function ofr is defined by whereˆ( and ξ=(ξ 1 , K, ξ 2N ) are a 2N vector consisting of quadrature operators and a 2N real vector, respectively. When the characteristic function of the quantum state has a Gaussian distribution T T the quantum state is simply characterized by a 2N ×2N matrix γ (the covariance matrix) and a 2N-dimensional vector d(the displacement vector). In our theoretical model, each entangled photon pair source consists of two TMSV sources over polarization modes embedded in the Sagnac loop. The covariance matrices of the quantum state from sourceA(g H V H V The overall input quantum state is described by 4 . The photons in modes H V H , , 3 and V 3 are sent to the ESR node through the transmission losses. We describe the transformation of the linear loss with transmittance t on a single-mode Gaussian state with covariance matrix γ by where = K t I and α=(1−t)I. Then, the linear losses η AH , η AV , η BH and η BV transform the input covariance matrixg  in into  Here, for simplicity, we represent the block diagonal matrix like [ ] A A 0 0 by Å A 2 . As described in section 2, the mode matching between photon(H 2 and H 3 ) and (V 2 and V 3 ) are considered by dividing the each input light pulse into two mutually orthogonal modes as shown in figure A1(a). This is modeled by inserting virtual BSs whose transmittance are T mode before the HBS as shown in figure A1(b). The fractions with probability T mode interfere at the HBS, while the fractions with probability 1−T mode are mixed with vacua by the HBS. In the numerical simulation, we first add the eight modes(H(V ) 2a , H(V ) 3a , H(V ) 2b and H(V ) 3b ) of vacua to g  Loss as 3 . Second, we perform the symplectic transformations of the BSs as is the symplectic matrix of the BS whose transmittance is q cos 2 acting on the modes i and j. Finally, we perform the symplelctic transformation of the HBSs as is the submatrix obtained by extracting the rows and columns corresponding to modes j 1 Kj n from g  BSM . In equation (A19), we use the POVM elements of the threshold detector acting in mode j aŝ where ν is the dark-count probability. In the numerical simulation, P suc is given by 6H . In the experiment, the success probability of the BSM P suc is equal to ( ) Ç P D D Here, we define ⧹ Tr H V H V 1 1 4 4 by partial trace over all remaining modes except for H 1 , V 1 , H 4 and V 4 . The covariance matrices ofr g 1,r g 1 andr g 3 are given by the Schur complements [48] of g  BSM as  where g ¼ i j j , n 1 is the submatrix obtained by extracting the rows and columns corresponding to modes j 1 Kj n from g i final .
Here, we only consider up to single-photon state for each mode i.e.  Appendix C. Input state characterization We characterize the input quantum states by performing the two-qubit quantum state tomography [50]. Changing the measurement angles, we collect the two-fold coincidence counts between ( ) D D 1 2 and D 6H for characterizing the quantum state generated from sourceA(B), respectively. In this experiment, we insert a QWP and a HWP in mode 6, and a QWP just before a HWP in each of mode 1 and mode 4. The two-qubit quantum states generated by the sourcesA and B are reconstructed by performing the maximally likelihood estimation [51] using the probability distributions obtained by the experiment. The reconstructed two-qubit density operators generated from sourcesA(r A ) and B(r B ) are shown in figures C1(a) and (b) . Theses results indicate that highly entangled states are prepared as initial states. The error bars are obtained by assuming a Poissonian distribution for the photon counts. In order to evaluate the indistinguishability between the photons in modes 2 and 3 which interfere at the HBS, we perform the HOM experiment [33][34][35]. We detect the photons in modes 1 and 4 with V-polarization, and observe the HOM interference between the H-polarized photons in modes 2 and 3. We measure the four-fold coincidence counts among D , D , D 1 2 5H , and D 6H with changing the relative delay by means of a motion stage. The result is shown in figure D1. We clearly see the HOM dip around the zero-delay point. The visibility is calculated to be =  V 0.74 0.03 HOM . The degradation of the visibility is mainly caused by (i)The mode matchingT mode between the photons3 and 4, and (ii) multiple pair generation at the sources. To see the degree of the contribution of T mode , we perform the theoretical calculation considering the experimental imperfections. When we set T mode =1, the visibility is estimated to be = V 0.91 HOM th , which indicates that the remaining degradation is caused by the mode mismatch. = V 0.74 HOM th is obtained for T mode =0.9. We adopt this value in the numerical simulations.

Appendix E. Characterization of the heralded state
We show the two-qubit density operators of the heralded states reconstructed by the experimentally-obtained probability distributions in figure E1. (i), (ii) and (iii) correspond to the two-qubit density operators of the  heralded states when the corresponding fiber lengths are 0km, 24km and 50km, respectively. regardless of the distance. We guess the reason why F herald ex is lower than F herald th is that additional spatial mode-mismatch is caused by inserting ND filters.