Sample caching Markov chain Monte Carlo approach to boson sampling simulation

Boson sampling is a promising candidate for quantum supremacy. It requires to sample from a complicated distribution, and is trusted to be intractable on classical computers. Among the various classical sampling methods, the Markov chain Monte Carlo method is an important approach to the simulation and validation of boson sampling. This method however suffers from the severe sample loss issue caused by the autocorrelation of the sample sequence. Addressing this, we propose the sample caching Markov chain Monte Carlo method that eliminates the correlations among the samples, and prevents the sample loss at the meantime, allowing more efficient simulation of boson sampling. Moreover, our method can be used as a general sampling framework that can benefit a wide range of sampling tasks, and is particularly suitable for applications where a large number of samples are taken.


Introduction
Demonstrating quantum supremacy is a milestone in quantum computing, representing that quantum devices can outperform the fastest classical hardware on some task [1][2][3][4]. Evaluating the demonstration of quantum supremacy should base on the intense competition between the two sides: the developing quantum devices on some selective tasks, and the classical computers running the benchmark for the corresponding tasks to explore the supremacy threshold. The chosen task should be suitable for near-term implementation as well as able to be significantly accelerated by quantum computing.
Such a task is boson sampling [5]. On the one hand, its linear optical implementation merely requires identical bosons (typically photons), linear transformation, and passive detection. On the other hand, its classical simulation involves computing permanents of Gaussian complex matrices [6], which is likely to be classically intractable, even in approximation cases [7]. The classical hardness of boson sampling attracts enormous efforts to build large-scale physical devices to beat classical computers, and remarkable achievements have been made [8][9][10][11][12][13][14][15][16][17][18]. It is promising to show quantum advantage via boson sampling.
On the classical side, various sampling approaches have been raised for simulating boson sampling. The direct computation of the whole distribution of boson sampling is restricted within small scales because of the exponential growth of the state space. This stimulates a variety of methods for sampling from an exponentially large state space by calculating a few number of probabilities, such as the Clifford-Clifford (C&C) sampling method [19], the rejection sampling method [20] and the Metropolised independent sampling (MIS) method [21,22]. Among the various sampling approaches, the MIS method allows to generate an effective sample by evaluating only a constant number of probabilities. More importantly, it can serve as a general sampling framework for far-ranging applications. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
However, similar with the optical implementation of boson sampling, the sample loss is also a serious problem of classical samplers, as shown in figure 1(a). Current sampling methods generate massive candidate samples with each involving the evaluation of a probability, but only a small fraction of candidates are kept as effective samples, as shown in figure 1(b). For the MIS method, it has to discard massive candidates in order to tackle the autocorrelation issue inherited from the Markov chain Monte Carlo (MCMC) method. To this point, we define the computational hardness limit of the sampling tasks as the complexity of computing one probability. As for boson sampling, it is the complexity of computing one permanent [5], and to reach this limit, it has to eliminate the overhead of the sampling methods. In this work, we propose a method that could reach this limit.
Our method, namely sample caching Markov chain Monte Carlo (SC-MCMC), makes contributions on two sides. (1) On the classical side, the SC-MCMC sampler provides a general sampling framework which allows to generate one effective sample from averagely one candidate sample. SC-MCMC deals with the autocorrelation issue of MCMC following a similar manner to MIS, but caches the discarded candidate samples for further reuse. In this way, the SC-MCMC sampler avoids the sample loss, and allows much more efficient sampling than a MIS sampler. Particularly, the SC-MCMC method can be used for quantum-enhanced applications based on boson sampling such as simulating the molecular vibronic spectra [23], finding dense subgraphs [24] or approximate optimization [25]. Moreover, Our method can also be directly applied on other quantum supremacy candidates, such as the Gaussian boson sampling [26], the IQP circuit sampling [27,28] and random quantum circuit (RQC) sampling [29,30], especially when a lot of samples are taken for validating the device [31]. (2) On the quantum side, our results suggest that the hardness of demonstrating quantum supremacy by boson sampling is increasing with the development of classical simulator. Based on the results of the simulation, we emphasize the importance of avoiding the photon loss in the optical implementation of boson sampling, showing that an improvement of 5% for the loss rate would spare hundreds of photons.

Boson sampling and Markov chain Monte Carlo
The task of boson sampling is to sample the output distributions of an m-mode interferometer network with n identical photons as input. Because of the interferences of photons, the probability of a certain output pattern of photons is related to the permanent of the transformation matrix decided by the interferometer network. Specifically, the output patterns are post-selected within the 'collision-free' regime where photons are nobunching in each output port. The value of m is often chosen to be n 2 to meet the requirement for the hardness proof of boson sampling, leading to that the number of all the possible output patterns grows exponentially with n. In summary, the task of classically simulating boson sampling is to generate samples from the probability distribution over the n m ( ) output patterns. Instead of calculating the probabilities of the whole distribution, a feasible method is Metropolis-Hastings algorithm, a kind of MCMC method [32]. The sampling is processed by constructing a Markov chain, with the state space corresponding to the set of the n m ( ) possible output patterns of boson sampling, and the probabilities of all the patterns as the stationary probabilities. To expand the chain, a state ¢ s is chosen as the candidate state (current state is s) following a easy-to-sample symmetric proposal probability distribution, and this candidate The quantum runtime of a boson sampler is mainly determined by the repetition rate R q of the multi-photon source, the transmission probability for a single photon η (the sample transmission probability h h = q n ), and the probability for a collision-free event P CF . The classical runtime consists of the repetition rate of generating a candidate sample R c that is decided by the time for calculating a permanent, and the probability for keeping a sample η c . (b) The average acceptance probability of the mainstream general sampling methods when a large number of samples are taken. The computational hardness limit represents the computational complexity of the core problems. In boson sampling, it is the computation of a permanent, which corresponds to η c =100%. The C&C sampler has an equivalent cost of two permanents calculations for one effective sample, corresponding to an equivalent acceptance probability of 0.5. state would be accepted and added on the chain with probability where p(s) is the boson sampling probability of state s, or be rejected with the rest probability. If the candidate is rejected, the current state would duplicate on the end of the chain. Each time a node is added in the chain, the pattern corresponding to the added node is outputted as a candidate sample. In this way, it seems ideal that one may just need to evaluating one probability for one sample.
Unfortunately, the generated samples may be erroneous because the generated sample sequence suffers from severe autocorrelation, as shown in figure 2(a). The first-order autocorrelation of a sequence is often estimated using Durbin-Watson statistic [33], while in this paper, we follow a more straightforward way to estimate the autocorrelation by where x i is the value of the ith sample, x is the mean of the samples. Each x i is assigned a value as the order of patterns after been sorted. The value of r 1 should be in [−1, 1], and it reflects the autocorrelation in the sequence. The closer r 1 | |is to 1, the stronger the samples are correlated, and the sign of r 1 indicates if the samples are positively or negatively correlated.
To overcome the autocorrelation of the sample sequence, in MIS, a method called 'jump sampling' (or 'thinning procedure') is applied to obtain independent samples, as shown in figure 2(b). In jump sampling, an effective sample is kept in every K candidates with the rest -K 1 candidates discarded. The remained samples are approximately independent. The value of K is determined by checking the autocorrelation of the obtained sequence. In MIS, the value of K is 100, which is claimed to be sufficient for simulating boson sampling of more than 30 photons [21]. However, this suggests that in MIS the ratio for keeping a candidate sample is Other methods such as the delayed rejection [34][35][36] could be used to reduce the correlations among the samples at the cost of calculating more probabilities for one sample. If the autocorrelation issue could be tackled without abandoning any samples, the sampling process could be accelerated by 100 times.

The sample caching method
The main scheme of the SC-MCMC protocol is shown in figure 2(c). It mainly contains two parts: one is a MCMC sampler providing candidate samples, with which the second part combined is a procedure that we call 'sample caching'. In figure 2(d) we show the working processes of the SC-MCMC sampler, which can be divided into three phases: 1. The cache filling phase (CFP). In this phase, the SC-MCMC sampler follows the similar protocol to the MIS sampler: It outputs an effective sample in every K candidates, where K is the jump step used in the MIS sampler. The difference is that the unselected candidates are used to fill a sample cache till the cache is full. The sample cache can be completely filled after generating -+ L K L 1 ⌈ ( )⌉ candidate samples, where L is the size of the cache. In this stage it has to calculate K probabilities for one effective sample; 2. The working phase (WP), in which the cache is full. For each time a candidate sample is generated by the MCMC process, randomly output a sample from the cache first and then store this new candidate into the cache. This procedure is repeated till the MCMC process generates N candidates. Therefore in WP, the SC-MCMC approach drives efficient sample generation that an effective sample can be generated by calculating only one probability; 3. The cache cleaning phase (CCP). In this phase, the samples stored in the cache are outputted in random order without calculating any probabilities, and therefore there can be a burst of sample generation. This phase can be regarded as a compensation for the initial overhead of CFP, and result in that averagely the evaluation of every probability can support the generation of one effective sample.
Nevertheless, the first candidate sample can be directly outputted as an effective sample since there is no other sample that can be correlated with. As reflected from figure 2(d), if only a small number of samples are taken (i.e. < - , the sampling process may stop in CFP, in this case the SC-MCMC sampler is actually executing the MIS protocol. Generally, in the final CCP, cleaning the cache provides a burst of sample generation. However, this step can be dangerous when the value of N is not large enough because the burst output could result in sample correlation, and whether a sample can be kept is determined by how much it is correlated with the sample sequence. While if a large number of samples are taken, SC-MCMC can be K times as fast as MIS. It is worth noting that the generation of candidate samples discussed here is slightly different from that in [21] where the proposal probability distribution is chosen as the distribution of distinguishable particles corresponding to the boson sampling instance. This choice can reduce the correlations between samples, but introduces extra computational cost of a real permanent. Here we choose the uniform distribution as the proposal distribution so that only one permanent is required to calculate for every candidate sample, while the autocorrelation can be eliminated by the sample cache without abandoning any candidates (as we will show in the following discussion). In this way, compared with the C&C sampler which has an approximately equivalent cost of calculating two n×n permanents for one effective sample [19], the SC-MCMC sampler can modestly obtain a two-fold improvement over the C&C sampler when a large number of samples are taken.
Similar to MIS, SC-MCMC is also case independent. Besides boson sampling, SC-MCMC can be directly used as a general sampling framework for any sampling tasks as long as we can evaluate the probabilities for given events, for example the Gaussian boson sampling [26], IQP circuit sampling [27,28] and random quantum circuit sampling [29,30]. Meanwhile, SC-MCMC is particularly suitable for tasks where a large number of samples are taken.
Essentially, the correctness of SC-MCMC is guaranteed by the MCMC process. Some methods can be applied to validate the sampling results [37][38][39], but a more straightforward way is to compare the frequency graph with the probability distribution. We found an empirical result that the number of samples may has to be in the order of 100 n m · ( ) to construct a frequency graph with a similarity of 99% with the probability distribution. More importantly, the sample caching process eliminates the autocorrelations within the sample sequence without losing any candidate samples. Under the asymptotic condition where the size of the cache is large enough, the number of samples stored in the cache follows the probability of the samples, and the uniformly random choice makes it independent among the draws. Practically, the correlations among the samples can be eliminated with a sample cache in a limited size. In figure 3(a) we plot the first-order autocorrelation against the size of sample cache for different scales of boson sampling. In figure 3(b) we plot the autocorrelation of the sample sequences obtained from different approaches in simulating an instance of boson sampling with 30 photons and 900 modes, where the 'lag' indicates the orders of the autocorrelations. In the sample sequence from the original MCMC approach, we can still find remained correlation between two samples even in lag 100 owing to the improper choice of the proposal probability distribution (see supplementary material for details, available online at stacks.iop.org/NJP/22/033022/mmedia). In comparison, the SC-MCMC sampler with L=1000 can reduce the autocorrelation at every order to a quite low level, and a cache with L=4000 can approximately generate independent samples, even though the proposal distributions in the SC-MCMC and the MCMC samplers are the same.
To understand how sample cache works, we analyze the jump sampling method first. for arbitrary i and j, where p s j ( ) is the probability of state s j [22,40,41]. Thus a sample is independent from another that is infinite steps away. Actually, in a certain number of steps (e.g. K steps), the K-step transition probability p ij K ( ) approximately equals to p s j ( ) for arbitrary i and j, which means that a sample is approximately independent from the samples with a distance of more than K steps. The value of K depends on how fast the Markov chain converges [42]. Empirically, we can choose a sufficiently large value. If the samples between the K steps are discarded, the correlations among the remained samples are negligible.
In SC-MCMC, the result can be regarded as a reordered sequence from the original one. The probability that the two adjacent samples are at a distance of k (k1) steps in the original sequence is where L is the size of the sample cache. Then we can obtain the probability that two adjacent samples are still correlated (the distance between the two samples in the original sequence does not exceed K, the corresponding jumping step in jump sampling) by , and the first-order autocorrelation is thus eliminated if the sample cache is sufficiently large. The autocorrelations of higher orders are eliminated in the same way (see supplementary information).
The next question is that practically what size should the cache be? The exact size of the sample cache required varies case-by-case, however we can choose a sufficiently big cache. This answer is easy to obtain from Eq.(4) by ensuring e < P cr , then > e L K . Setting K=200 and ε=0.05, resulting in L=4000, is sufficient for the case of n=30 and m=900. It is supposed to be sufficient for boson sampling with larger scale, and the size of cache could be further enlarged if needed. Most significant of all, no sample is lost no matter what size the cache is, which suggests that our method reaches a time complexity of simulating boson sampling to O n2 n ( ).

Numerical simulation
We demonstrate our method for boson sampling simulation by implementing the parallelized Glynn's algorithm to exactly compute the permanents of arbitrary square matrices [43]. The numerical simulations were done on 64 nodes of Tianhe-2 supercomputer [44,45] or another local cluster, as shown in table 1. The sampling rate reached 1.01 Hz on Tianhe-2 nodes when simulating boson sampling with 30 photons, and the 32 cluster nodes afforded the simulation for 21 photons with a sampling rate of 18.09 Hz. The percentage of the time used for calculating permanents exceeds 99% when n is sufficiently large. Combined with the performance estimation on Tianhe-2 supercomputer [46], the average time estimated (in seconds) for obtaining a n-photon candidate sample is =´-T n n 1.9925 2 10 n 2 1 5 ( ) · if we are allowed to simulate the generation of a large number of samples. This scaling result indicates that the average time estimated for a 50-photon sample can be reduced from about 10 d to within 100 min. To compare the performance of the simulator with the physical setups, we use the function defined in [21] h = QA n t t , log , 5 where t c is the estimated classical runtime for a boson sampling instance with n photons and m=n 2 modes. h µ t R q q n 1 ( ) is the quantum runtime for the corresponding instance in which R q is the n-photon repetition rate of the photon source, and η is the transmission probability for a photon (see supplementary information). QA>0 suggests that the performance of a quantum boson sampling setup surpasses that of Tianhe-2 supercomputer.
In figure 4 we plot four lines corresponding to QA(n, η)=0 under different combinations of classical approaches and physical setups. Associated with the transmission probability realized in recent experiments (<0.4 [8-11, 13, 16, 18]), the increment of photon number required to reach a corresponding performance is vast. For example, when η=0.5 and = R 10 GHz q (probably beyond the reach of near-term experiments) the increment exceeds 300 photons, while it reduces to 30 when η is improved to 0.55. Further, for a network with given shape, there exist the minimum request for η, even when we have sufficiently many photons (see supplementary information). Our results suggest that currently, enhancing the transmission probability can greatly reduce the required photon number for quantum advantage.  , the leading parameter of proposed photon source [18], which is obtain from a 76 MHz quantum dot source), and the red lines are the results obtained from SC-MCMC. The increment of photon number is vast when η<0.5, and it will reduce by more than 300 when η is improved by only 5%.

Conclusion
In summary, we proposed a classical sampling approach which can be used to simulate boson sampling. Our method tackles the autocorrelation issue which leads to the sample loss in MIS, and can be 100 times as fast as MIS if a large number of samples are taken. For cases where only a small number of samples are required, our method can perform as well as MIS. Therefore our method enables more efficient simulation of boson sampling, especially in the cases where boson sampling is applied for quantum-enhanced problems. Above all, our method is a general sampling framework and could support customized optimization for specific tasks to gain further advanced efficiency.
The progress in classical simulators can be challenges to physical implementations, which however could provide some instructive feedback on physical experiments. The comparison between classical computing and quantum boson sampling suggests that currently, reducing photon loss may approach closer to quantum advantage, and is perhaps more important than merely enlarging the scale of boson sampling devices.