Probing the quantum–classical boundary with compression software

We adapt an algorithmic approach to the problem of local realism in a bipartite scenario. We assume that local outcomes are simulated by spatially separated universal Turing machines. The outcomes are calculated from inputs encoding information about a local measurement setting and a description of the bipartite system sent to both parties. In general, such a description can encode some additional information not available in quantum theory, i.e., local hidden variables. Using the Kolmogorov complexity of local outcomes we derive an inequality that must be obeyed by any local realistic theory. Since the Kolmogorov complexity is in general uncomputable, we show that this inequality can be expressed in terms of lossless compression of the data generated in such experiments and that quantum mechanics violates it. Finally, we confirm experimentally our findings using pairs of polarisation-entangled photons and readily available compression software. We argue that our approach relaxes the independent and identically distributed (i.i.d.) assumption, namely that individual bits in the outcome bit-strings do not have to be i.i.d.


INTRODUCTION
The idea that physical processes can be considered as computations done on some universal machines traces back to Turing and von Neumann [1], and the growth of the computational power allowed for further development of these concepts.This resulted in a completely new approach to science in which the complexity of observed phenomena is closely related to the complexity of computational resources needed to simulate them [2].In addition, there are physical phenomena that simply cannot be traced with analytical tools, which further motivated a computational approach to physics [3].Moreover, the idea of quantum computation [4] lead to a discovery of a few problems that seem not efficiently traceable on classical computers but efficiently on a quantum version [5,6].
Classical physics can be simulated on universal Turing machines, or other computationally equivalent models [7].On the other hand, efficient simulation of quantum systems requires a replacement of deterministic universal Turing machines with quantum computers whose states are non-classically correlated.Such machines can even simulate any local quantum system efficiently [8,9].Can we experimentally distinguish between these two descriptions of the universe using a logically self-contained computational approach?
In this paper, we show that there are processes which cannot be simulated on local classical machines at all, independently of the available classical resources.We first introduce the notion of Kolmogorov complexity, a measure of the classical complexity of a phenomena, and later apply it to derive a bound on classical descriptions [10].Next, we use the fact that Kolmogorov complexity can be approximated by compression algorithms [11].
We compress experimental data obtained from polarisation measurements on entangled photon pairs and show the violation of a classical bound.
Let's consider the description of a machine, whether classical or quantum, that outputs a string x made of 0's and 1's.In the case of a Turing machine U , we can always write a program Λ that generates x.The simplest such program is obviously 'PRINT x'.However, this is not optimal: in many cases the program can be much shorter than the string itself.This brings us to the concept of Kolmogorov complexity K(x), the minimal length l(Λ) of all programs Λ that reproduce a specific output x.If K(x) is of the order of the length of the output l(x) then our algorithmic description of x is inefficient, and x is called algorithmically random [12].In most cases K(x) cannot be computed [13].To circumvent this issue, we can estimate K(x) with some efficient lossless compression algorithm C(x) [11].
We now extend this picture by considering two Turing machines U A (Alice) and U B (Bob), which are spatially separated.If these machines cannot communicate, they generate two output strings that are independent, although the programs fed into the machines can be correlated.Moreover, the input programs are classical bit strings so the correlations between them must be classical.
We determine the complexity of the generated strings using the Normalized Information Distance (NID) [10].This distance allows for a comparison of two data sets without detailed knowledge about their origin.In practice, we evaluate an approximation to the NID, the Normalized Compression Distance (NCD) [11], using a lossless compression software, in our case the LZMA Utilities, based on the Lempel-Ziv-Markov chain algorithm [14].

SIMULATION BY DETERMINISTIC UNIVERSAL TURING MACHINES
We consider a model experiment, similar to the one used for testing the Bell inequalities [15]: a source emits pairs of photons that travel to two separate polarization analyzers M A (Alice) and M B (Bob).Each analyzer has two outputs associated with bit values 0 and 1, and can be set along directions a 0 or a 1 for M A , and b 0 or b 1 for M B .The record of the outputs from each analyzer forms a bit string (see Fig. 1).
The output x of each individual analyzer can be described as the output of a Turing machine U , fed with the settings a j or b k , and a program Λ.The program will contain the information for generating the correct output for every detection event and for every setting.
If we consider a string of finite length l(x) = N , Λ will have to describe the 4 N possible events.The length of the shortest Λ is equal to the Kolmogorov complexity of the generated string.
Next, we consider the simulation of the experiment with two local non-communicating machines U A and U B (see Fig. 2).We feed a program Λ to both of them and obtain two output strings, x and y, both of length N .In this case, the program has to describe the behavior of all 2N events for all possible settings a j and b k , hence 16 N possible events.

Normalized Information Distance
The Kolmogorov complexity of two bit strings K(x, y) is the length of the shortest program generating them simultaneously.K(x, y) can be shorter than K(x) + K(y) if x and y are correlated -the more correlated they are, the simpler it is to compute one string knowing the other.This idea was further carried out by Cilibrasi and Vi- Local classical machines simulating the generation of strings x and y by correlation measurements on an entangled state.
tanyi [11] who constructed a distance measure between x and y called Normalized Information Distance (NID), The NID obeys all required properties of a metric, in particular, the triangle inequality NID(x, y) + NID(y, z) ≥ NID(x, z) . ( The above inequality holds up to a correction of order log(l(x)), which can be neglected for sufficiently long strings [11].

Information Inequality
We consider the bit strings x aj and y b k generated by Alice and Bob with fixed setting a j and b k .Equation ( 2) then transforms into NID(x a0 , y b0 ) + NID(y b0 , y b1 ) ≥ NID(x a0 , y b1 ) . ( However, NID(y b0 , y b1 ) cannot be determined experimentally because the strings y b0 and y b1 come from measurements of incompatible observables.We therefore use the triangle inequality and combine it with inequality (3) to obtain a quadrangle inequality: NID(x a0 , y b0 ) + NID(x a1 , y b0 )+NID(x a1 , y b1 ) ≥ NID(x a0 , y b1 ).( 5) Similar to various tests of Bell inequalities, we introduce a scalar quantity S ′ that quantifies the degree of violation of Eq. ( 5): In order to experimentally test this inequality, we have to address the following problem.We can set up a source to generate entangled photon pairs in a state of our choosing, but we cannot control the nature of the measurement.For every experimental run i with the same preparation the resulting string x i,aj can be different.Consequently, the corresponding program Λ i is different for every experimental run.
It is reasonable to assume that for every two experimental runs i and i ′ the complexity of the generated strings remains the same: K(x i,aj ) = K(x i ′ ,aj ) and K(x i,aj , y i,b k ) = K(x i ′ ,aj , y i ′ ,b k ).Without these assumptions the same physical preparation of the experiment has different consequences and thus the notion of preparation loses its meaning.More generally, the predictive power of science can be expressed by saying that the same preparation results in the same complexity of observed phenomena.

ESTIMATION OF KOLMOGOROV COMPLEXITY
In general the Kolmogorov complexity cannot be evaluated, but it can be estimated.One can adapt two conceptually different approaches.

Statistical Approach
This approach takes into account the ensemble of all possible N -bit strings and asks about their average Kolmogorov complexity.It can be shown that this average equals the Shannon entropy H(X) of the ensemble [13], and thus Inequality ( 5) becomes a type of entropic Bell inequality introduced by Braunstein and Caves [16] if local entropies are maximal, i.e., H(x) = H(y) = N .They showed that for a maximally entangled polarization state of two photons, and polarizer angles obeying the constraints inequality ( 5) is violated for an appropriate range of θ.
Calculating the entropy H(x, y) using the probability distributions predicted by quantum mechanics, it is possible to obtain the expected value of S ′ as a function of θ (Fig. 4a).The maximal violation of this inequality is S ′ = 0.24, with a separation of θ = 8.6 • .

Algorithmic approach
It is possible to avoid a statistical description of our experiment following the ideas pioneered in [11].There, it was shown that the Kolmogorov complexity can be well approximated by the application of compression algorithms.This approximation introduces the new distance called Normalized Compression Distance (NCD) where C(x) is the length of the compressed string x, and C(x, y) is the length of the compressed concatenated strings x, y.Replacing NID with NCD in Eq. ( 6) leads to a new inequality: This expression can be tested experimentally because the NCD distance measure is operationally defined.

CHOICE OF COMPRESSOR
Before moving to the experiment, we need to ensure the suitability of the compression software we use to evaluate the NCD.For this, we numerically simulate the outcome of an experiment, based on a distribution of results predicted by quantum physics.Among the packages we tested, we found that the LZMA Utility [14] approaches the Shannon limit [17] most closely.
The simulation also allows us to verify the angle that maximizes the violation of Eq. ( 10) predicted from Eq. ( 8).The results of the simulation are presented in Fig. 4.More details on the generation of the simulated data and the choice of the compressor are provided in the Appendix.

EXPERIMENT
In our experiment (see Fig. 3), the output of a gratingstabilized laser diode (LD, central wavelength 405 nm) passes through a single mode optical fiber (SMF) for spatial mode filtering, and is focused to a beam waist of 80 µm into a 2 mm thick BBO crystal.In this crystal (cut for type-II phase-matching), photon pairs are generated via spontaneous parametric down-conversion (SPDC) in a slightly non-collinear configuration.A half-wave plate (λ/2) and a pair of compensation crystals (CC) take care of the temporal and transversal walk-off [18].Two spatial modes (labeled A and B) of down-converted light, defined by the SMFs for 810 nm, are matched to the pump mode to optimize the collection [19].In type-II SPDC, each down-converted pair consists of an ordinary and extraordinarily polarized photon, corresponding to horizontal (H) and vertical (V) in our setup.A pair of polarization controllers (PC) ensures that the SMFs do not affect the polarization of the collected photons.To arrive at an approximate singlet Bell state, the phase 45 • between the two decay possibilities in the polarization state is adjusted to 45 • = π by tilting the CC.
In the polarization analyzers (Fig. 3), the photons from SPDC are projected onto arbitrary linear polarization by λ/2 plates, set to half of the analyzing angles θ A(B) , and polarization beam splitter (PBS) in each analyzer.Photons are detected by avalanche photo diodes (APDs), and corresponding detection events from the same pair identified by a coincidence unit (CU) if they arrive within ≈ ±3 ns of each other.
The quality of polarization entanglement is tested by probing the polarization correlations in a basis complementary to the intrinsic HV basis of the crystal; for Bell states |ψ ± , strong polarization correlations are e.g.expected in a ± 45 • linear polarization basis.
With interference filters (IF) of 5 nm bandwidth (FWHM) centered at 810 nm, we observe a visibility V 45 = 99.9±0.1%.The visibility in the natural H/V basis of the type-II down-conversion process also reaches V HV = 99.9±0.1%.A separate test of a CHSH-type Bell inequality [20] leads to a value of S = 2.826 ± 0.0015.This indicates a relatively high quality of polarization entanglement; the uncertainties in the visibilities are obtained from propagated Poissonian counting statistics.

Measurement and Data Post-processing
We record two-fold coincidences of detection events between detectors at A and B. For each PBS, the transmitted output is associated with 0 and the reflected one with 1.The resulting binary strings x from A, and y from B are written into two individual binary files.From these, we calculate the NCD using Eq. ( 9).This procedure is repeated for each of the four settings (a 0 , b 0 ), (a 1 , b 0 ), (a 1 , b 1 ), and (a 0 , b 1 ) in order to obtain the value for S.
To remove the bias due to differences in the detection efficiency of the APDs in the experiment, we also measure for each setting the associated orthogonal ones (see Appendix for details).

RESULTS
The inequality is experimentally tested by evaluating S in Eq. ( 10) for a range of θ; the obtained values [points (c), (d) in Fig. 4] are consistently lower than the trace (a) calculated via entropy using Eq. ( 7), and than a simulation with the same compressor (b).This is because the LZMA Utility is not working exactly at the Shannon limit, and also due to imperfect state generation and detection.
As a consequence of Eq. ( 8), we expect the maximal violation for θ = 8.6 • .For this particular angle we collected results from a large number of photon pairs.Although we set out in this work to avoid a statistical argument in the interpretation of measurement results, we do resort to statistical techniques to assess the confidence in an experimental finding of a violation of inequality Eq. (10).To estimate an uncertainty of the experimentally obtained values for S, this large data set was subdivided into files with length greater than 10 5 bits.The results from all the subdivided files are then averaged to obtain the final result of S(θ = 8.6 • )= 0.0494 ± 0.0076, with the latter indicating a relatively small standard deviation over these different subsets.

CONCLUSION
There is a trend to look at physical systems and processes as programs run on a computer made of the constituents of our universe.We could show that this is not possible if one uses a computation paradigm of a local deterministic Turing machine.Although this has been already extensively researched in quantum information theory, we present a complementary algorithmic approach for an explicit, experimentally testable example.This algorithmic approach is complementary to the orthodox Bell inequality approach to quantum nonlocality [15] that is statistical in its nature.
Any process that can be simulated on a local universal Turing machine can be encoded as a program that is fed into it.For every such a program there exists its shortest description called Kolmogorov complexity, which in most of the cases can only be approximated using compression software.Moreover, such a description must obey distance properties as shown in [10,11].By testing Eq. ( 10), we showed that this is unattainable in the specific case of polarization-entangled photon pairs.Therefore, there exist physical processes that cannot be simulated on local universal Turing machines.
There are two fundamentally different notions of complexity in computer science.On one hand, computational complexity, mainly researched on in quantum information science, studies how much resources are needed to solve a computational problem.These studies focus on complexity classes such as P, NP [21], and its main concern is, given an input program, how efficiently it can be computed.On the other hand, algorithmic complexity deals with a problem of what the most efficient encoding of an input program is.This complementary problem to computation complexity has not yet received enough attention in quantum information science, and it would require a further work on quantum version of Kolmogorov complexity [22].
We would like to stress that our analysis of the experimental data is purely and consistently algorithmic.We do not resort to statistical methods that are alien to the concept of computation.If this approach can be extended to all quantum experiments, it would allow us to bypass the commonly used statistical interpretation of quantum theory.bits (1, 0) or pairs of bits (00, 01, 10, and 11) of various length with various probability distributions.We generate these strings using the MATLAB [26] function randsample() that uses the pseudo random number generator mt19937ar with a long period of 2 19937 − 1.It is based on the Mersenne Twister [27], with ziggurat [28] as the algorithm that generates the required probability distribution.The complexity of this (deterministic) source of pseudorandom numbers should be high enough to not be captured as algorithmic.
The first part of this characterization involves establishing the minimum string length required for the compression algorithms to perform consistently.We start by generating binary strings, x, with equal probability of 1's and 0's, i.e. random strings, of varying length.For each x, we evaluate the compression overhead Q as For a good compressor, we expect Q to be close to 0. From Fig. 5, it can be seen that for all the compressors, Q starts to converge after about 10 5 bits, setting the minimum string length required for the compressors to work consistently.The lzw compressor fails this test, converging to a Q of 0.37 for long string, while bzip2, gzip, and lzma give a Q below 10 −1 .
In the second part of this characterization, test the compressors with strings with a known amount of correlation.We generate a random string x of length 10 7 using the same technique already described.We then generate a second string y of equal length and with probability p of being correlated to x.For p = 0 the two strings are equal, i.e. perfectly correlated.For p = 0.5 they are uncorrelated.Compression overhead Q for the string xy as a function of the probability of pairwise correlation p between the bits of the generating strings x and y for three different compressors: bzip, gzip, and lzma.
The two strings x and y are then combined to form the string xy: to avoid artifacts due to the limited data block size of the compression algorithms, the elements of x and y are interleaved.We then compress xy and evaluate the compression overhead Q as a function of p.The results for different compressors are shown in Fig. 6.Although there are ranges of p where bzip and gzip perform better than lzma, the latter shows a more uniform performance over the entire interval of p.It is reasonable to assume that the use of lzma should reduce the possibility of artifacts in the estimation of the NCD also for the data obtained from the experiment.

FIG. 3 .
FIG.3.Schematic of the experimental set-up.Polarization correlations of entangled-photon pairs are measured by the polarization analyzers MA and MB, each consisting of a half wave plate (λ/2) followed by a polarization beam splitter (PBS).All photons are detected by Avalanche photodetectors DH and DV , and registered in a coincidence unit (CU).

FIG. 4 .
FIG.4.Plots of S versus angle of separation θ.(a) Result obtained from Eq. (7) (b) result obtained from using the LZMA compressor on a simulated data ensemble, (c) measurement of S in the experiment shown in Fig.3, and (d) longer measurement at the optimal angle θ = 8.6 • .

FIG. 5 .
FIG.5.Comparison of the compression overhead Q obtained using four different compression algorithms on pseudorandom strings of varying lengths.The expected value for an ideal compressor is 0. From this characterization we can exclude lzw as a useful compressor for our application.
FIG. 6.Compression overhead Q for the string xy as a function of the probability of pairwise correlation p between the bits of the generating strings x and y for three different compressors: bzip, gzip, and lzma.