Faster classical Boson Sampling

Since its introduction Boson Sampling has been the subject of intense study in the world of quantum computing. The task is to sample independently from the set of all $n \times n$ submatrices built from possibly repeated rows of a larger $m \times n$ complex matrix according to a probability distribution related to the permanents of the submatrices. Experimental systems exploiting quantum photonic effects can in principle perform the task at great speed. In the framework of classical computing, Aaronson and Arkhipov (2011) showed that exact Boson Sampling problem cannot be solved in polynomial time unless the polynomial hierarchy collapses to the third level. Indeed for a number of years the fastest known exact classical algorithm ran in $O({m+n-1 \choose n} n 2^n )$ time per sample, emphasising the potential speed advantage of quantum computation. The advantage was reduced by Clifford and Clifford (2018) who gave a significantly faster classical solution taking $O(n 2^n + \operatorname{poly}(m,n))$ time and linear space, matching the complexity of computing the permanent of a single matrix when $m$ is polynomial in $n$. We continue by presenting an algorithm for Boson Sampling whose average-case time complexity is much faster when $m$ is proportional to $n$. In particular, when $m = n$ our algorithm runs in approximately $O(n\cdot1.69^n)$ time on average. This result further increases the problem size needed to establish quantum computational supremacy via Boson Sampling.


Introduction
The search for so-called quantum computational supremacy has garnered a great deal of interest and investment in recent years.One of the most promising candidates for this goal was introduced at STOC '11 by Aaronson and Arkhipov who described an experimental set-up in linear optics known as Boson Sampling.
Since its introduction, Boson Sampling has attracted a great deal of attention with numerous experimental efforts around the world attempting implementations for various problem sizes (see e.g.Spring et al., 2013;Bentivegna et al., 2015;Broome et al., 2013;Tillmann et al., 2013;Crespi et al., 2013;Spagnolo et al., 2014;Latmiral et al., 2016).The ultimate goal is to exhibit a physical quantum experiment of such a scale that it would be hard if not impossible to simulate the output classically and thereby to establish so-called 'quantum supremacy'.In terms of the physical Boson Sampling experiment, n corresponds to the number of photons and m the number of output modes and increasing either of these is difficult in practice.Progress has therefore been slow (see Lund et al., 2017, and the references therein) with the experimental record until recently being n = 5, m = 9 (Wang et al., 2016).However in a breakthrough result in 2019, Wang et al. demonstrated a Boson Sampling experiment with n = 20 photons and m = 60.Their Boson Sampling experimental setup has 20 input photons of which 14 are detected, following the model of Aaronson and Brod (2016).
Phrased in purely mathematical terms, the task is to generate independent random samples from a particular probability distribution on all multisets of size n with elements chosen from [m] as follows.It is convenient to represent a multiset by an array z = (z 1 , . . ., z n ) consisting of elements of the multiset in non-decreasing order.We denote the set of distinct values of z by Φ m,n , with µ(z) = m j=1 s j !where s j is the multiplicity of the value j in z.The cardinality of Φ m,n is known to be ( m+n−1 n ) -see Feller (1968), for example.Now let A [n] = (a ij ) be the complex valued m × n matrix consisting of the first n columns of a given m-dimensional Haar random unitary matrix, A. For each z, build an n × n matrix A [n] z where the k-th row of A [n] z is row z k in A [n] for k = 1, . . ., n and define a probability mass function (pmf) on Φ m,n as where Per A [n] z is the permanent of A [n] z and in the definition the summation is for all σ ∈ π[n], the set of permutations of [n].For given A, the Boson Sampling problem is to simulate random samples from the pmf q(z|A) either with a quantum photonic device or with classical computing hardware.
Aaronson and Arkhipov show that exact Boson Sampling is not efficiently solvable by a classical computer unless P #P = BPP NP and the polynomial hierarchy collapses to the third level.Although the original proof restricted the range of m and n a brute force evaluation of the probabilities of each multiset, as a preliminary to random sampling, requires the calculation of ( m+n−1 n ) permanents of n × n matrices, each one of which takes O(n2 n ) time with the fastest known algorithms.Previously we gave a faster classical sampling algorithm running in O(n2 n + poly(m, n)) time and linear space (Clifford and Clifford, 2018).This suggested that at least 50 photons would be needed in any Boson Sampling experiment to demonstrate so-called quantum computational supremacy.A more recent fine grained complexity theoretic analysis of Boson Sampling suggests that in fact 90 input photons will be needed to achieve quantum computational supremacy (Dalzell et al., 2020).
In practical terms increasing the number of input photons is not the only difficultly when performing Boson Sampling.Experiments also get increasingly difficult to perform as the number of output modes increases.Current experiments, for example, have typically set the number of modes to be a small multiple of the number of photons and it is likely this trend will continue in the near future.
The original hardness result for exact Boson Sampling of Aaronson and Arkhipov required that m ≥ 2n as a crucial step in their reduction.For approximate Boson Sampling the number of outputs has to increase at least quadratically with the number of input photons for any known hardness results to apply.On the other hand, there is no existing evidence that approximate Boson Sampling is easier than exact Boson Sampling for any set of parameters.Moreover, Grier and Schaeffer (2018) showed that computing the permanent of a unitary matrix itself is #P-hard.By applying this result to the proof technique of Aaronson and Arkhipov it follows that even with the number of output modes equal to the number of input photons, Boson Sampling cannot be performed in polynomial time classically unless P #P = BPP NP .This raises the question of whether experimentalists should now focus on increasing the number of input photons alone, keeping the number of output modes to be a small multiple of, or even equal to, the number of input modes.
We answer this question in the negative and show that if m = θn for constant θ ≥ 1 then it is indeed possible to sample from the Boson Sampling distribution significantly more quickly than was known before.We establish the expected complexity averaged over all realisations.For simplicity, we first give the result for the case m = n.The time complexity for the general case is given in Section 6.The additional space complexity on top of that needed to store the input is O(m).
Our sampling algorithm also applies directly to the closely related problem of scattershot Boson Sampling (Bentivegna et al., 2015).From a mathematical perspective this produces no additional complications beyond the specification of a different m × n matrix.Once the matrix is specified we can apply our new sampling algorithm directly.
2 Related work Terhal and DiVincenzo (2004) were the first to recognise that studying the complexity of sampling from low-depth quantum circuits could be useful for demonstrating a separation between quantum and classical computation.Since that time the goal of finding complexity separations for quantum sampling problems has received a great deal of interest.We refer the interested reader to Lund et al. (2017) and Harrow and Montanaro (2017) for surveys on the topic.
The search for faster classical algorithms for Boson Sampling has been of central interest since the problem was first explicitly formulated by Aaronson and Arkhipov.The first significant breakthrough was a Markov chain Monte Carlo (MCMC) sampling procedure developed by Neville et al. (2017).
The overall approach of MCMC is to take samples from some easy to compute proposal distribution and then to accept them depending on their individual likelihoods.Neville et al. were able to provide numerical evidence that for limited problem sizes only approximately 200 such permanent computations were needed to take one sample approximately from the Boson Sampling distribution.This MCMC approach is however necessarily approximate and does not give provable bounds on the quality of its approximation.
In a previous paper (Clifford and Clifford, 2018) we presented a faster and provably correct Boson Sampling algorithm running in O(n2 n + poly(m, n)) time per sample, costing approximately two permanent calculations of n × n matrices for all values of n.In recent papers the algorithm has been extended and adapted to examine the effect of non-uniform losses and binned input/output modes.See Moylett et al. (2019); Shchesnovich (2019); Brod and Oszmaniec (2020) and the references therein.

Computing the permanent of low rank matrices
The complexity of computing the permanent of an arbitrary k × k complex matrix was shown to be O(k 2 2 k ) (Ryser, 1963) and subsequently O(k2 k ) (Nijenhuis and Wilf, 1978;Glynn, 2010).The computation time is decreased when there are several identical rows or columns, resulting in a matrix of reduced rank (Tichy, 2011;Shchesnovich, 2013Shchesnovich, , 2019)).We will show how these ideas can be implemented in practice to produce a faster algorithm that suits our needs more closely.
Let A [k] be the first k columns of a complex m × m matrix, A. Assume the rows of A [k] are distinct and let B be a k × k matrix with repeated rows drawn from A [k] .Specifically, let z be a multiset of size k with elements from [m] and let s = (s 1 , . . ., s m ) be its associated array of multiplicities, then the ith row of B will be row z i of A [k] , for i = 1, . . ., k.
According to Ryser's formula The subset T can be written as ∪ m ν=1 T ν where T ν = {i : z i = ν}.Note there are s ν elements in {i : z i = ν}.It follows there are sν rν ways of choosing a subset T ν of a given size r ν , when r ν is between 0 and s ν .Furthermore b i,j = a ν,j whenever i ∈ T ν .
Ryser's formula can then be expressed as A straightforward implementation of the formula requires us to iterate over the tuples (r 1 , . . ., r m ).
Since each term in the summation requires O(km) operations, the run time is then O(km m ν=1 (s ν +1)) as shown by Shchesnovich (2013).Similar complexity is achieved but with an improved constant factor overhead when starting from Glynn's formula (Chin and Huh, 2018).
To make this iteration as fast as possible we would ideally like to perform the iteration in such a way that at each stage we change only one of the r i values and these are changed by ±1.This can be achieved using Guan codes (otherwise known as generalised Gray codes) (Guan, 1998), borrowing from the idea of Nijenhuis and Wilf (1978) who used a basic Gray code to speed up Ryser's algorithm.
Proof.There are m ν=1 (s ν +1) terms in the outside set of summations in (3).By using Guan's algorithm we can move through the tuples (r 1 , . . ., r m ) exhaustively, adding or subtracting 1 from a single element.This means that the set of values m ν=1 r ν a ν,j , j ∈ [k] can be updated in k operations.The product over j ∈ [k] is a further k operations.Updating the relevant binomial term is a O(1) operation.The total operation count is then O(k m ν=1 (s ν + 1)).

Laplace expansion
We make extensive use of the Laplace expansion for permanents (see Marcus and Minc, 1965, page 578), namely that for any k × k matrix B = (b i,j ), where B ⋄ k,ℓ is the submatrix with row k and column ℓ removed.Note that B ⋄ k,ℓ only depends on B ⋄ k the submatrix of B with the k-th row removed.An important consequence is that when B is modified by changing its k-th row, the modified permanent can be calculated in O(k) steps, provided the values {Per B ⋄ k,ℓ } are available.As explained in Clifford and Clifford (2018), we can take advantage of this observation to quickly compute a set of permanents of matrices each with one row differing from the other.
We now show that computation of all of the values {Per B ⋄ k,ℓ , ℓ ∈ [k]} has the same time complexity as computing Per B, the permanent of a single k × k matrix when there are repeated rows in B. We derive this result for matrices with repeated rows using a combination of Ryser's algorithm and Guan codes.Proof of Lemma.By applying Ryser's formula (3) to B ⋄ k,ℓ for a given value of ℓ we have: where s ⋄ is the multiplicity of repeated rows in B ⋄ k and w j (r) = m ν=1 r ν a ν,j .Working through values of r in Guan code order, the terms {w j (r), j ∈ [k]} can be evaluated in O(k) combined time for every new r.This is because successive r arrays differ by one in a single element.The product of the w j (r) terms can be computed in O(k) time giving O(k m ν=1 (s ⋄ ν + 1)) time to compute Per B ⋄ k,ℓ for a single value of ℓ, but of course this has to be replicated k times to cover all values of ℓ.To compute {Per B ⋄ k,ℓ , ℓ ∈ [k]} more efficiently we observe that each product j∈[k]\ℓ w j (r) can be expressed as f ℓ b ℓ where f ℓ = ℓ−1 j=1 w j (r), ℓ = 2, . . ., k and b ℓ = k j=ℓ+1 w j (r), ℓ = 1, . . ., k − 1 are forward and backward cumulative products, with We can therefore compute all of the partial products j∈[k]\ℓ w j (r) in O(k) time, giving an overall total time complexity of O(k m ν=1 (s ⋄ ν + 1)) for jointly computing {Per B ⋄ k,ℓ , ℓ ∈ [k]}, and since s ⋄ s, the time complexity is as claimed.Furthermore the computation time has constant factor overheads similar to that of computing Per B. Other than the original matrix, space used is dominated by the two arrays of cumulative products, both of length O(k).

Boson Sampling algorithm
Clifford and Clifford (2018) provide the following algorithm for Boson Sampling: Algorithm A Boson sampler: single sample z from q(z|A) Require: m and n positive integers; m-dimensional Haar random unitary matrix, A x ← SAMPLE(w) 11: r ← (r, x) Calculation of the time complexity proceeds as in Clifford and Clifford (2018), with a few modifications to take account of repeated rows in evaluating permanents of the submatrices involved.At stage k, let s (k) be the multiplicities of the values in (r 1 , . . ., r k ).The operation count in applying Lemma 1 is O(k m ν=1 (s (5) Importantly at the conclusion of stage k, the k × k matrix A [k] r 1 ,...,r k has been found.In other words at this intermediate stage a random submatrix has been drawn for the Boson Sampling problem with k photons and m output modes.Equivalently, the multiset formed by sorting (r 1 , . . ., r k ) in non-decreasing order is then a random sample from the Boson Sampling distribution on Φ m,k .
We now consider the average-case time complexity when the algorithm is applied with a random choice of Haar unitary matrix, A.

Marginal uniformity of the Boson Sampling distribution
Arkhipov and Kuperberg have shown that the marginal Boson Sampling pmf averaged over Haar random unitaries is uniform on the space of multisets.
Theorem 3.With q(z|A) as in (1) and A a random matrix drawn from the m-dimensional Haar random unitary distribution, the marginal Boson Sampling pmf Q(z) is given by The proof is immediate from quantum theoretic considerations (Arkhipov and Kuperberg, 2012).It can also be derived from properties of the Weingarten function in random matrix theory as follows.
Suppose A is an m-dimensional Haar random unitary matrix.The Weingarten function is defined to be for n m and α in π[n], where Āi,j is the complex conjugate of A i,j and π[n] is the set of permutations of [n].
We make use of the following result (Collins, 2003;Collins and Śniady, 2006).
The last reduction follows since From (Petz and Réffy, 2004), for example, with W = |A 1,1 | 2 we have , and the proof is complete.
Turning now to the Boson Sampling distribution on multisets of size n with elements in [m] where a multiset is represented by an array z = (z 1 , . . ., z n ) consisting of elements of [m] in non-decreasing order.As before µ(z) = m j=1 s j !where s j is the multiplicity of the value j in z.From the definition (1) we now have using Theorem 4. For each β the final summation in ( 7) is n!, so the expectation becomes from Corollary 1 and collecting factorial terms.The last term counts the number of ways that the elements of the array z can be permuted without changing the array, so that α∈π[n] δ α (z, z) = µ(z), and the result (6) follows.

Average-case time complexity of Boson Sampling
The average-case complexity of Algorithm A is the expected value of (5) when A is drawn from the Haar random unitary distribution.We start by considering the term m ν=1 (s ν + 1).Since the multiplicity array s (n) is an alternative representation of the multiset z, we know from Theorem 3 that s (n)  Lemma 2. Suppose that s (n) is sampled uniformly from Φ * m,n , the set of all possible multiplicity arrays then Proof.To see this, start with the stars and bars arrangement for a particular array s (n) as above.Now consider adding a new bar between each existing neighbouring pair of bars.If there are s (n) i stars between a pair of bars, the new bar can be located in 1 of s (n) i + 1 places, for example if there is one star the new bar can be before or after it.The number of arrangements of stars and bars for a given array s (n) is then m ν=1 (s The result now follows because s is the total number of arrangements of the new and old stars and bars, i.e. the number of ways of placing 2m − 1 bars among 2m + n − 1 integer locations.
Corollary 3. Let B a k × k complex matrix with repeated rows having multiplicity s (k) where s (k) is uniformly distributed on Φ * m,k then the expected running time of Per B is of order Proof.This follows directly from Lemma 2 and Theorem 2.
Proof.Note that the first term in the time complexity of Algorithm A given in ( 5) is equivalent to the total time complexity for evaluating the set of permanents of A ν + 1)) where s (n) is the array of multiplicities in r.From Lemma 2 this has a simple reduction in the average case, since s (n) is uniformly distributed on Φ * m,n by Theorem 3.
At the intermediate stage k, the matrix A [k] r 1 ,...,r k can be viewed as the final matrix in a Boson Sampling algorithm with reduced size, i.e.where k columns are taken from A rather than n.Again the averagecase complexity has a simple form from Lemma 2 so the expected value of the first term in the time complexity of Algorithm A given in (5), averaging over A, is Incorporating the second term in (5) gives the complexity as claimed.Finally, Stirling's formula gives the asymptotic form.

Lemma 1 .
Let B be a k × k complex matrix with repeated rows specified by multiplicities s and let {B ⋄ k,ℓ } be submatrices of B with row k and column ℓ removed, ℓ ∈ [k].The collection {Per B ⋄ k,ℓ , ℓ ∈ [k]} can be evaluated jointly with the same time complexity as that of Per B, with O(k) additional space.
Using the Laplace expansion for permanents, as described in Section 3.2, an array of length m is obtained by summing k terms for each r k ∈ [m].Taking a single sample from the pmf proportional to the array takes O(m) time.This gives the time complexity bound for stage k of O(k m ν=1 (s (k) ν + 1)) + O(mk) and hence a total operation count of O( n k=1 k m ν=1 (s (k) ν + 1)) + O(mn 2 ).
is uniformly distributed over the set of all multiplicity arrays, Φ * m,n .Recall that |Φ * m.n | = m+n−1 m−1 or equivalently m+n−1 n as can be shown with the usual "stars and bars" argument of Feller (1968), i.e. we place m − 1 bars at locations in [1, . . ., m + n − 1] and n stars at the remaining locations.The total number of such arrangements is m+n−1 m−1 .Adding further bars at each end, i.e. at locations 0 and m + n, the associated array s (n) is the count of stars between bars.

Theorem 5 .
The average-case time complexity of Boson Sampling is bounded above by a term of order m as m, n → ∞.
...,r k , k ∈ [n].In particular, evaluation of the permanent of the final matrix A