A Statistical Benchmark for BosonSampling

Computing the state of a quantum mechanical many-body system composed of indistinguishable particles distributed over a multitude of modes is one of the paradigmatic test cases of computational complexity theory: Beyond well-understood quantum statistical effects, the coherent superposition of many-particle amplitudes rapidly overburdens classical computing devices - essentially by creating extremely complicated interference patterns, which also challenge experimental resolution. With the advent of controlled many-particle interference experiments, optical set-ups that can efficiently probe many-boson wave functions - baptised BosonSamplers - have therefore been proposed as efficient quantum simulators which outperform any classical computing device, and thereby challenge the extended Church-Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, there remains one crucial problem: How to certify that a given experimental measurement record is an unambiguous result of sampling bosons rather than fermions or distinguishable particles, or of uncontrolled noise? In this contribution, we describe a statistical signature of many-body quantum interference, which can be used as an experimental (and classically computable) benchmark for BosonSampling.


Introduction
As the world waits for the first universal and fully operational quantum computer [1], quantum information scientists are eager to already show the power of quantum physics to perform computational tasks which are out of reach for a classical computer. Rather than designing devices that can perform a wide range of calculations, machines which are specialised in specific tasks have joined the scope [2,3]. Here, we focus on one type of such devices, the BosonSamplers, which may hold the key to falsifying the extended Church-Turing thesis [1,4,5]. This conjecture, rooted in the early days of computer science, states that any efficient calculation performed by a physical device can also be performed in polynomial time on a classical computer. It is now proposed that all that is necessary to falsify this foundational dogma of computer science is a set of m photonic input modes, which are connected to m output modes by a random photonic circuit [2]. This immediately indicates why BosonSampling attracts such attention, as these systems are experimentally in reach [6][7][8][9][10][11].
Since such a device is operating on bosons, indistinguishable particles, interesting physics arises when multiple particles are simultaneously injected into the system [12][13][14][15][16]. A schematic overview, indicating the essential ingredients of a sampling device such as considered here, is provided in figure 1. BosonSampling essentially consists in sampling an occupation vector y y y ,..., m for the output modes, given that the initial mode occupation was x x x ,..., m , where x i and y i are the number of photons found in the ith input and output mode, respectively. In the case with at most one photon per input mode, this probability is given by where U x y ,   is the n×n matrix, constructed from U, which describes how the n occupied input modes are connected to the selected output modes [13], and 'perm' denotes the permanent [17]. To evaluate the manybody output wave function, and thus the sampling statistics, we must deal with these permanents, which are 'hard' to compute, i.e.there is no algorithm that can do so in polynomial time [17,18]. A priori, there may be algorithms that sample from the probability distribution p x y    without explicit computation of the probabilities. This possibility was recently refuted [2] and it is argued that it is in general not possible 6 to efficiently simulate such a BosonSampling procedure on a classical computer. However, it turns out that single photons, linear optics and photon counters are all the required building blocks to build a quantum device that performs the task. Thus, the BosonSampler is a physical system that efficiently samples bosons according to the bosonic many-body wave function, even though this many-body state cannot be calculated by a classical computer. Its realisation would in this perspective invalidate the extended Church-Turing thesis in the sense that it is a concrete example of a quantum device that performs a task efficiently where a classical computer could not.
However, this strength of BosonSampling apparently also implies a profound weakness: since it is impossible to compute the many-body wave function, one cannot certify that a given experimental output unambiguously stems from sampling bosons-how can we be sure that an alleged device samples events from the correct probability distribution? Note that different scientific disciplines impose very different requirements on certification protocols: from the perspective of theoretical computer science, which is motivated by the scrupulous applications of cryptography, a proof for the correct functionality of a BosonSampler can only be accepted if the possibility of a fraudulent classical device that generates a false-positive result is provably ruled out. That is, generating a sample that passes the certification must be computationally much harder than the very act of certifying. Setting aside this cryptographic perspective and disregarding the possibility of the wilful use of fraudulent devices, we focus on unambiguous physical signatures of the functionality of the device. Bunching of bosons was proposed [19] as a decisive criterion, but dismissed in [20], where a mean-field ansatz incarnated by 'simulated bosons' (to be defined below) was shown to precisely reproduce coarse-grained bunching. Reference [20] furthermore offers a highly symmetric, analytically solvable test-case (recently implemented experimentally [21,22]), which however remains, by construction, outside the realm of a true BosonSampler. A reliable physical benchmark for the functionality of BosonSamplers therefore is unachieved to date, yet highly desirable if one wishes to fully exploit the computational power brought about by many-particle quantum interference. To fill this gap, we conceive a solution based on a statistical certifier that excludes simulated bosons-even for the original formulation [2] of the BosonSampling problem with random scattering matrices. Borrowing from the vast set of techniques offered by statistical mechanics, we formulate the problem in terms of a transport process in a scattering system (e.g. a photonic network), described by a random unitary matrix U. We show that, by Figure 1. Sketch of the system under consideration. From a set of input modes (here m = 12), four particles are injected (depicted in red). The particles traverse the system via the depicted channels. At the crossings, different paths are interconnected such that particles can travel in each direction, thus leading to single-particle and many-body interference. The yellow zone of the setup is described by a unitary matrix U, which connects the input modes to the output modes. The output signal is probabilistic in nature (hence different intensities of red), with its statistics governed by the many-body quantum state. The key object of this work, the C-dataset (see main text), is obtained by calculating the two-point correlations between different modes as depicted in green. 6 To be precise: it was shown [2] that the realisation of an efficient classical algorithm that simulates BosonSampling implies a collapse of the polynomial hierarchy [5] to the third order. As it is not our goal to enter the mathematical subtleties related to the theory of computational complexity, let us stress that this is simply the complexity theorist's way of saying that it is highly unlikely to be true. combining quantum optics with methods found in statistical physics and random matrix theory, it is indeed possible to identify signatures of genuine bosonic many-particle interference with manageable overhead.

Statistical signatures of many-particle interference
It must be stressed that the resulting many-body wave function of a scattering process, such as manifested in a BosonSampler, is determined by several factors. One can separate them into those which are of quantum statistical origin and those which are dynamical in nature. The former concern all effects related to the structure of state space (e.g., bunching for bosons, and the Pauli exclusion principle for fermions), whereas the latter involve all sorts of interference effects. One can divide these interferences into the well-known single-particle interference [23], encoded within the matrix U, which arises due to the wave-like nature of quantum transport [24][25][26], and the far more intricate many-body interference [15,16,27]. Below, we shall scrutinise the interference exhibited by various particle types-bosons, fermions, distinguishable particles and simulated bosons.
Distinguishable particles are the simplest of the considered species, as their signals are governed only by single-particle interference. Transport processes with bosons or fermions, in contrast, exhibit the entire range of statistical and interference effects. As indistinguishable particles, they obey quantum statistics (as dictated by the relevant algebra), which for two particles in two modes either leads to bunching (bosons) [12] or to anti-bunching (fermions) [28]. Similar effects have also been observed in larger setups, with more particles, and have hence been proposed as hallmarks for many-boson [19] or many-fermion [29] behaviour.
However, for larger setups one expects a much richer phenomenology due to many-body interference [15], resulting from the coherent superposition of distinct many-particle transmission amplitudes [16]. For BosonSampling, bunching or clouding behaviour [19] as such is therefore not a sufficient tool for certification, as it can easily be achieved by the so-called mean-field sampler [20] which was inspired by semiclassical models [30]. These devices are designed to replicate the bunching behaviour by approximating the many-body output state by a macroscopically populated single-particle state with random phases added. Averaging over the random phases successfully mimics boson-like bunching. However, all relative phases between the initially populated many-particle transmission amplitudes are scrambled by the averaging procedure, and, consequently, all many-body interference effects are deleted. This type of sampling [19] is easily simulated by Monte Carlo methods [20]-hence we here refer to such sampled particles as simulated bosons-and is unable to harness quantum granularity on the many-particle level. It is therefore conceptually decisive to set mean field sampling apart from actual BosonSampling. We now introduce a general method which is specifically conceived to detect quantum interference structures as induced by the coherent superposition of many-particle transmission amplitudes. This approach thus distinguishes simulated from true bosons, and identifies manyparticle interference features which are characteristic of both-particle type and the specific scattering dynamics.

Random matrix methods
The scattering matrix U that describes the photonic circuit in the BosonSampling setup is randomly sampled from the Haar measure [31][32][33]. Therefore it is only natural to treat this problem in a framework of statistics and random matrix theory (RMT) [31,[34][35][36][37]. Often the lack of grasp on the statistical distribution of the full manybody wave function is put forth as the core of the certification problem. Exhaustive statistical characterisation of the many-body state would require the full distribution of permanents over the set of unitary matrices. To date, only the first moment of this distribution is known [38] and it is not enough to provide certification, while sufficiently precise higher order moments are out of reach [2]. The reason is that, in terms of the quantum state, permanents depend on high-order correlation functions. In contrast, we will emphasise below that a gigantic amount of information on the many-body state is within reach in the form of distributions of low-order correlation functions.
In probability theory, the knowledge of all possible correlation functions implies full knowledge of the (joint) probability distribution itself. Therefore, correlation functions play a central role in many probabilistic theories, from RMT [31] to quantum statistical mechanics [39]. In practical applications of RMT, such as occur in quantum chaos, a study of the statistics related to two-point correlations is often sufficient to certify the RMT ensemble. Similarly, here, we do not (and, for sufficiently large systems, cannot) know all correlation functions of the output many-body quantum state. There is, however, a large set of correlation functions which are accessible (both theoretically and experimentally) [14,38,40]. Hence, the relevant question is whether this set offers a sufficient amount of information to distinguish the many-particle dynamics undergone by simulated or true bosons, fermions and distinguishable particles.

Statistical benchmarking
As the arguably simplest accessible correlation function we propose the mode correlator C n n n n 2 the bosonic number operator) [14], which quantifies how the number of outgoing photons in modes i and j are correlated (as indicated in figure 1). Scattering dynamics and particle type are encrypted in the many-body output quantum state out |j ñ, which enters C ij via the expectation value . . out out | | j j á ñ = á ñ. Closer inspection of the expression for n n i ĵá ñ in [14], for particles initially prepared in mode q q ,..., n where t 1, 0, 1 = -+ for fermions, distinguishable particles, and bosons, respectively (the more intricate result for simulated bosons can be found in appendix A). With respect to p x y    in (1), expression (3) can be interpreted as a sum over all two-particle processes which connect a pair of input particles to the two selected output modes.
gives the classical transmission probability, independent of the particle type. The second term on the right hand side of (3) provides the interference contribution for the two particles selected from the input state. In the definition (2) of the correlators we suppress the classical contribution to the transmission probability by subtracting n n i ĵá ñá ñ, and thus focus on the interference terms (see also equations (A.4) and (A.5)). Therefore, C ij probes all possible two-particle interference contributions to the population of modes i and j.
A single such correlator (typically) cannot probe in sufficient detail the structure of the full output wave function, but some characteristic feature of the interference pattern should become apparent once a sufficiently large set of correlators is considered by variation of i and j. The resulting dataset, which we from now on refer to as the C-dataset, is easily obtained in the experimental setup: One must consider sufficiently many choices for output modes, i j m , 1,..., such that i j < , and for each choice compute the correlation C ij between the number of particles sampled in the two modes. Now, for a single choice of input modes and hence a single n×m unitary matrix U sub , the submatrix of U that describes how the input modes are coupled to all possible output modes, we obtain a set of data on which we can do statistics. Given U sub , it is possible to calculate the C-dataset for each particle type numerically exactly [14] (see (3) and appendix A). Although we can explore this via histograms, moments and other statistical properties, it is far from straightforward to get an analytical grasp of its statistics. Ideally, we would like to predict the exact shape of the distribution, given the number of modes m and the number of incoming particles n, but this appears to be an unrealistic goal. Nevertheless, after longwinded RMT calculations, we obtained analytical predictions for the first three moments of the set of possible outcomes when varying U sub for fixed i and j (rather than fixing U sub and varying i and j). Although the distributions are mathematically not exactly equivalent, we find good agreement with numerics (for a more elaborate discussion, see appendix B).
For a generated C-dataset for the different particle types, considering one fixed U sub for 120 output modes and six particles, the left panel in figure 2 clearly shows a qualitative difference in the histograms for different particle types. In contrast, the right panel indicates that the histograms of the true bosons (where quantum statistical bunching and multi-particle interference do both contribute) and their simulated counterparts (which only exhibit bunching, but no multi-particle interference) bear a strong resemblance. Consequently, a quantitative understanding is essential to clearly distinguish the signature of bosonic many-particle interference from the quantum dynamics of other species. The second and third moment of the obtained correlator dataset can exactly provide us with such insight. To obtain these quantities involves averaging products of components of unitary matrices, for which straightforward (but tedious) combinatorics are used [36,37].

Particle type-specific features of interference patterns
To acquire the clearest distinction between the many-particle interference patterns generated by different particle types, we propose the normalised mean (NM)-the first moment divided by n m 2 , the coefficient of variation (CV)-the standard deviation divided by the mean-and the skewness (S) [41] of the C-dataset as benchmarks. For these quantities, we have obtained an analytical RMT prediction in terms of mode and particle number, but as these expressions are rather longwinded, we present them in the appendix. Since the dataset is generated for a single U sub , as explained before, we do expect slight deviations from these RMT results. In figure 3, we show the theoretical predictions (solid lines) for NM, CV and S, for a sampler in which six particles were injected, as a function of the number of modes. In order to quantify the deviations from the RMT prediction, we sampled, for various numbers of modes, 500 different U sub matrices, calculated the respective C-dataset and its moments, and indicated the resulting average normalised mean, coefficient of variation, and skewness by a point. The error bars indicate the standard deviation from this mean value and thus quantify the typical spread of possible outcomes. We show here that for these parameters, even NM is fit to effectively distinguish the interference of bosons, fermions and distinguishable particles, the curves for simulated and true bosons, however, collapse. CV, on the other hand, is a trustworthy quantity to distinguish truly bosonic interference from that of distinguishable particles and even of simulated bosons. The skewness S completes the certification that the observed many-particle interference pattern is actually generated by bosons. Similar plots where n is varied such that m n ( )  and m n 2 ( )  are shown in appendix C. á ñ -á ñá ñ for all possible mode combinations, for a system with six particles and 100 modes. In the left panel, the histogram for bosonic correlators is compared to data obtained with fermions or distinguishable particles inserted instead of bosons. For the right panel, the histograms for bosons is compared to the result for simulated bosons, see main text. All histograms are obtained from one single circuit, using the same input modes, thus implying the same U sub . = , we sampled 500 matrices U sub , for each of which the normalised mean, the coefficient of variation and skewness of the C-dataset were calculated. For each mode number, the average normalised mean, coefficient of variation and skewness are indicated by a dot. Additionally, the standard deviations of the obtained NM, CV and S results are shown by error bars around these dots.
We must emphasise that the curves of figure 3 represent the typical values for NM, CV and S, and that it might be possible to encounter a large deviation from such quantities. Moreover, one might dwell into a parameter regime where standard deviation bars overlap and hence it is unrealistic to certify the sampler with a single U sub measurement. Luckily, a simple change of input modes implies a change in U sub and hence it is feasible to generate several C-datasets from one circuit. Figure 4 shows the outcomes for various such U sub matrices as points, where the x-coordinate indicates the coefficient of variation and the y-coordinate shows the skewness. The colour coded sets of points for different particle types are all separated from each other, showing clearly that the associated interference structures can be distinguished. As is indicated in figure 4, upon averaging over all the points in each cloud, one finds values (indicated by the black circles) which are very well estimated by the RMT predictions (indicated by the red points), thus providing a strong quantitative tool for such certification. This quality of the certification is further enhanced in large systems by noticing that the cloud is expected to shrink with the effective number of scattering events inside the array, namely the typical number of crossings between optical paths in figure 1 (state of the art experiments have 20  ). In figure 3 we have indicated that bosonic, fermionic and distinguishable particles' interference patterns can be distinguished by studying the averages of the lowest-order statistical moments of the C-dataset. Furthermore, figure 4 shows that for a rather large number of modes and a large set of sampled U sub matrices, we can classify all types of many-particle scattering dynamics. The method presented, however, does not require such an abundance of modes and samples since we can perform additional statistical analyses on the obtained cluster of data points. We emphasise this in figure 5, where data points for only 20 samples of U sub matrices for m=20 are shown. We focus specifically on bosons (indicated by blue points), where for each sample the average is calculated (red dot) and the red ellipses indicate two and four standard errors of the sample mean. The RMT prediction for bosons (a blue filled circle), with a slight bias, falls within the four standard errors, whereas the RMT prediction for simulated bosons (purple square) is well outside this region. Thus, we can successfully differentiate bona fide bosonic many-particle interference from the quantum dynamics of simulated bosons, using the RMT-based techniques described here.

Experimental overhead to measure correlators
In a realistic setup, repeated measurements are required to evaluate the correlators which constitute the C-dataset with sufficient accuracy. A fundamental question to be answered is therefore whether our protocol can be performed efficiently: Does the number of required experimental runs scale polynomially in the number of modes, m, and particles, n? The answer is positive.
Measurement of quantum observables is probabilistic by its very nature, hence one initially sets forth a target accuracy ò to which to determine the correlator. The RMT results of the previous section can be used to determine the required ò, since it needs to be significantly smaller than the distance between the points in figure 4. Focusing now on one correlator C ij , the central limit theorem dictates that the number M of Figure 4. For six particles, in 120 modes, the coefficient of variation and skewness were calculated for C-datasets of 500 sampled U sub matrices, The points labeled 'bosons', 'distinguishable', 'fermions' and 'simulated bosons', each connect to one C-dataset of a sampled U sub , the points' position on the plot indicates the calculated coefficient of variation CV and skewness S. The black and white triangles mark the RMT predictions for these quantities. Finally, the black and white circles indicate the mean value of each cluster of all the points generated for each particle type. These mean values coincide with the RMT predictions, thus the black and white triangles are hidden underneath the black and white circles. measurements necessary to achieve this accuracy is given by M Rather than analysing the scaling behaviour thereof in full rigour, we provide a simple argument for polynomial scaling.
For increasing numbers of particles, the difficulty of BosonSampling is hidden in the increasing size of the matrix U x y ,   of which to calculate the permanent. However, when calculating correlation functions, the problem simplifies considerably. The highest order term, in creation and annihilation operators, which enters C Var ij ( )is given by a a a a i j j i 2 2 2 2 ( ) ( ) ( ) ( ) † † á ñ, and from [14] it is known that the calculation of this term involves the calculation of the permanents of 4×4 matrices, constructed with components of U. The dimensions of the required matrices are independent of the number of particles n and of the number of modes m. As shown in [14], the parameter n only governs the number of permanents of different 4×4 matrices that need to be summed to obtain the final variance. From the general expression in [14] one sees that this number of terms scales polynomially in n and is independent of the number of modes. Thus the variance and hence the final number of required measurements scale polynomially with the number of particles n. The above argument holds for determining the correlator of a single choice of output modes i and j, but to obtain the full C-dataset one considers all combinations of output modes. Even if we assume the most inefficient scenario where one measures one correlator at a time, this still only implies a low order polynomial scaling in m.

Discussion
As evident from our above discussion, our certification method can be applied in a broad variety of experimental setups, for a wide range of particle numbers n and mode numbers m. Of course, smaller m lead to smaller C-datasets, making it more difficult to achieve statistical significance. In our numerical studies, however, we successfully distinguish particle type-specific interference structures for 20 modes or fewer. A nice advantage of our method as compared to [42] is that we can certainly treat regimes where m n 5.1 < , we even can explore regimes in which m n ( )  . However, as n and m grow, the curves for true and simulated bosons in figure 3 will approach one another to a distance of the order n 1 ( )  (for an extended discussion, see appendix C). As the limit n 1 0  is the semi-classical limit, where mean-field theory is exact, this is as such not surprising. The similarity of the two curves essentially implies that the statistics of the C-dataset is still strongly affected by the quantum statistics of the particles and thus by bosonic bunching. In this sense, we see a lot of potential in Figure 5. Four scatter plots, each containing 20 randomly sampled U sub matrices for six particles in m=20 output modes, from which the coefficient of variation CV and the skewness S of the bosonic C-dataset were calculated. The average of the cloud is indicated (red dot) as are the two-and four standard error regions (small and large ellipsoid, respectively). The RMT prediction for bosons is shown (large blue dot) to be contained within the ellipses for each of the four samples. The RMT prediction for simulated bosons (purple square) falls well outside the four standard errors. elaborating the methods employed here, to further distill specific signatures of many-body interference, e.g. by filtering out the undesired bunching contributions to the statistics.
Let us emphasise yet another important advantage of our method, especially in the regime of large m and n: every experimental measurement outcome contributes to the C-dataset. Alternative methods which only seek to tell bosons apart from other particles, via book-keeping of bunching events [19], are far less favourable in this respect, since bunching events are rare events in the statistical sense. Therefore, not only does our method detect genuine many-particle inference as induced by the actual dynamics, much beyond mere quantum statistical effects such as bunching or anti-bunching-in the regime where BosonSampling can be considered really hard it is also expected to be more efficient, even when only employed to identify the particle species.
What concerns a possible confrontation of our theoretical analysis with real experiments, we have here focussed on the conceptual ideas behind statistical certification using correlation functions, thus describing an ideal scenario. For an actual experimental implementation, a wide range of imperfections kick in and eventually need to be accounted for: most prominent are losses [10], but to a lesser extent also decoherence, i.e. modemismatch leading to partial distinguishability and phase-fluctuations due to alignment instability [16,[43][44][45]. If losses occur before the particles enter the scattering system, this effect can be modelled by a statistical mixture of input states with different numbers of particles. Alternatively, particles lost inside the scattering system can be dealt with by increasing the number of output modes, while assuming that these additional modes are not observed [46]. In both cases, our RMT results have straightforward generalisations. Decoherence phenomena are more difficult to incorporate, since they require an expansion of our formal framework by an additional (environmental) degree of freedom [16]. This does fundamentally not affect the here proposed certification strategy, but imposes some non-trivial technical overhead. Both, losses and decoherence effects, ultimately require a careful technical treatment, which we set forth as an objective for future work. Likewise, generalisations of our RMT approach for the scenario of multiboson correlation sampling [47,48] appear feasible.
In summary, we demonstrated that, by measuring the mode correlators C nn n n i j i j i j ,ˆˆˆ= á ñ -á ñá ñfor all possible combinations of outgoing modes, one holds the key to certifying BosonSampling. The mean, the variance, and the skewness of such a C-dataset are sufficient to identify the multi-particle interference patterns resulting from transmission through a multimode random scatterer, as generated by bosons, fermions or distinguishable particles beyond reasonable doubt. By varying the chosen input channels, and thereby generating multiple such datasets one can efficiently distinguish true bosons from simulated bosons-which only exhibit bunching, yet no many-particle interference-through comparison of the first moments of their respective C-datasets. From a more general perspective, since our statistical quantifiers are constructed to distill bona fide many-particle quantum interference, beyond quantum statistical bunching or anti-bunching effects, these results improve our understanding of the fundamental differences between the quantum many-body dynamics of distinct particle types.
What is the consequence of our results for the extended Church-Turing thesis? Clearly, the capacity of any classical computer will be quickly exhausted when confronted with the task of evaluating a many-body wave function represented by a permanent, as soon as the number of bosonic constituents and modes is large enough. However, much as in the classical theory of gases or of chaotic (classical or quantum) systems, where, e.g., classical single particle trajectories are computationally unaccessible, too [49], we have shown that there are robust and selective statistical quantifiers which indeed can be handled, and which are accessible in state of the art experiments. In this sense, we suggest that a 'thermodynamic' or 'statistical' re-interpretation of the extended Church-Turing thesis will prevail.

Acknowledgments
MW would like to thank the German National Academic Foundation for financial support. MCT acknowledges financial support by the Danish Council for Independent Research. KM and AB acknowledge partial support through the COST Action MP1006 'Fundamental Problems in Quantum Physics', the EU Collaborative project QuProCS (Grant Agreement 641277), and by DFG. JDU and KR acknowledge partial support through DFG.

Appendix A. Correlators
Initially, let us present a short and slightly more technical introduction to the central objects that constitute the C-dataset, the two-mode correlators. Given some pure quantum state |jñ, these objects are defined as C nn n n ij i j i j |ˆˆ| |ˆ| |ˆ| j j j j j j =á ñ-á ñá ñ , and the main goal of studying these object is to gain insight in the structure of states |jñ, which results from the scattering of a many-particle Fock state in a system (one might think of a complicated network) which is described by a single-particle scattering matrix U (which we numerically generate following the algorithm described in [32]). As we initially start from a Fock state for which modes q q ,..., n 1 are populated by a single particle, we can describe the initial state in |j ñ in terms of creation operators a q † (for creation in the qth mode) that act on the vacuum state |Wñ, as a a ... .
Now, by traversing the system, the matrix U acts by connecting an input mode a q † to all possible output modes In the case of distinguishable particles, one can in principle treat the particles in an independent fashion, and thus a particle starting in input mode q will be found in output mode i with a probability p U q i i q , 2 | | =  . As the particles are distinguishable, these probabilities are not influenced by the presence of other particles, and via simple probability theory we now find that Finally, simulated bosons behave similarly to distinguishable particles, with the sole exception that the initial state is different and that (uniformly distributed) random phases are included over which one needs to average [20]. We essentially sample distinguishable particles, which are inserted in the form of a single-particle state that superposes all input modes with the same amplitude, but with random phases, implying a probability p U e n r   From expressions for the correlators of different particle types, we are able to construct the C-dataset by varying i and j (with i j < ) to obtain all different mode combinations. In order to make theoretical predictions (or at least up to very good approximation), we use RMT methods. Rather than varying i and j, these methods keep the two output modes under consideration fixed and formally average over all possible matrices U in the unitary group with the Haar measure imposed on it. One might understand this as an analogue to the Bohigas-Giannoni-Schmit conjecture [50] for unitary matrices. The averaging contains one fundamental identity for an N×N random unitary matrix U: denotes the average over the unitary group and V are class coefficients also known as Weingarten functions, which are determined recursively. The details of this method can be found in [34][35][36][37]. With this formula, we efficiently average long products of coefficients of unitary matrices to compute C U ij for each particle type. Combining these quantities we can find the coefficient of variation CV and the skewness S for each particle type. We find, with n particles in m modes, for bosons:  =  --+  --+  ---+  +  +  +  +   +  -+  +  -+  +  ---+  +  +  +  +   +  -+  +  -+  -+  +  +  +  +   +  -+  ---+  ---- Although these formulas do not appear remarkably elegant due the lack of any form of assumption on m and n (apart from m n > ), they are necessary to obtain sufficiently accurate results. Once these moments are defined, we can use them to find NM, CV and S by the following definitions: With these results, one can now calculate the expected coefficient of variation and the expected skewness for each of the samplers we described, with an arbitrary number of modes and particles. Observe that even the leading n-dependence is different when comparing particle types. This implies that, while the RMT results for bosons and simulated bosons converge to the same point for NM, CV, and S, they do so in a quantitatively distinct manner.
We now repeat the computation for a different type of scaling, where we set m n 2 l = . Again, when we evaluate the scaling behaviour as n  ¥, we find for bosons: The first notable observation is that, again, the bosons and simulated bosons converge to the same point for NM, CV, and S, but approach this limit in a different fashion. Moreover, it is an interesting observation that the leading order term obtained for S are independent of λ.
The physical interpretation of these results remains unclear to us at the present stage. One can understand the similarity between bosons and simulated bosons in the thermodynamic limit by realising that this is also the limit where mean-field theory is expected to be exact. One may even verify this through (A.4) and (A.8): the terms which distinguish the two expressions vanish in the limit n  ¥.
Finally, to visualise the behaviour of the RMT results with m n ( )  or m n 2 ( )  , we present additional plots in figures C1 and C2 , which compare to figure 3. With the arbitrary choice 3 l = , straightforward numerical evaluation rapidly becomes demanding, therefore we limit ourselves to just a few data points. Figure C2. Comparison of the theoretical RMT predictions (solid lines) for the normalised mean NM (top), the coefficient of variation CV (bottom left) and skewness S (bottom right) of the C-dataset to the numerical NM, CV and S values of sampled C-datasets, where the mode number is scaled up with the particle number as m n 3 2 = . For particle numbers n=6 and 11, we sampled 50 matrices U sub , for each of which the normalised mean, the coefficient of variation and skewness of the C-dataset were calculated. For each mode number, the average normalised mean, coefficient of variation and skewness together with the associated standard deviations are indicated by a dot and by error bars, respectively.