Using complete measurement statistics for optimal device-independent randomness evaluation

The majority of recent works investigating the link between non-locality and randomness, e.g. in the context of device-independent cryptography, do so with respect to some specific Bell inequality, usually the CHSH inequality. However, the joint probabilities characterizing the measurement outcomes of a Bell test are richer than just the degree of violation of a single Bell inequality. In this work we show how to take this extra information into account in a systematic manner in order to optimally evaluate the randomness that can be certified from non-local correlations. We further show that taking into account the complete set of outcome probabilities is equivalent to optimizing over all possible Bell inequalities, thereby allowing us to determine the optimal Bell inequality for certifying the maximal amount of randomness from a given set of non-local correlations.


Introduction
In the context of any non-signaling theory, and in particular in the context of quantum theory, outcomes of measurements on separate systems leading to a Bell violation cannot be completely pre-determined, i.e. the violation of a Bell inequality guarantees the presence of genuine randomness.This link between non-locality [1] and randomness is interesting on the fundamental level [2,3], but is also the main ingredient behind device-independent randomness generation (DIRG) [4,5,7,6,8], randomness amplification [9,10], and device-independent quantum key distribution (DIQKD) [11,13,12,14,15,17,16].
At the basis of such developments lies a quantitative relation between the amount of randomness that is necessarily produced in a Bell experiment and the degree of violation of a certain Bell inequality, such as the CHSH inequality [18,5], the chained inequality [19,11,20,9], or a Mermin-type inequality [21,4,10].However, the set of data obtained in a Bell experiment is much richer than just the value of the violation of some Bell inequality.For example, in a CHSH experiment there are eight independent probabilities that determine the single number corresponding to the amount of CHSH violation.Moreover, in [3] it was shown that there exist two-input two-output Bell inequalities that can allow for the certification of more randomness than the CHSH inequality.Similar examples have been provided in [22].Such results imply that taking into account extra data beyond the value of a single Bell violation can be useful, but leave open the questions of just how useful and how to do so in a systematic manner.
These questions are especially relevant now that the detection loophole has been closed (albeit re-opening the locality loophole) with entangled photons [23,24], opening the door for high rate DIRG.Nevertheless, there is still work to be done on the theoretical level before we can realize this goal efficiently.In particular, low detection efficiencies (∼ 0.75) necessitate using states of low entanglement (for efficiencies below ≃ 0.82 the CHSH inequality cannot be violated using maximally entangled two-qubit states [25]), for which the CHSH inequality is not optimal with respect to randomness certification [3].
In this work we show how to evaluate the randomness produced in a Bell test, or, more specifically, how to obtain the device-independent guessing probability (DIGP) by systematically taking into account the complete non-local behavior, rather than just the violation of some prespecified Bell inequality.We also show that for any set of non-local correlations, there exists a Bell inequality that is optimal for certifying the maximal amount of randomness given these correlations.Regarding this, we note that while the protocols in [5,7,6,14,15,17] are general in the sense that they are not formulated with respect to some specific Bell inequality, they do not tell us the optimal Bell inequality to use given the measurement data.We then show how the optimal value of the DIGP and the associated optimal Bell inequality can be computed using the semidefinite programming (SDP) hierarchy introduced in [26].Finally, we study three numerical examples illustrating the advantage in taking into account the complete non-local behavior, as opposed to taking into account only the violation of a specific Bell inequality.
2 Background: the device-independent guessing probability We consider the following setting.Alice has access to a pair of quantum devices, or boxes, A and B, which she can prevent from communicating at will, and whose internal state may be correlated with a system in the possession of an adversary Eve (or equivalently to the environment).The joint state of the boxes and Eve's system is described by a quantum state ρ ABE ∈ H A ⊗ H B ⊗ H E .Alice introduces inputs x and y, each chosen at random from the finite set {1, . . ., n} into boxes A and B and obtains outputs a and b, respectively, each taking one of the values {1, . . ., d}.This process is described by a pair of POVMs with elements {M a|x } and {M b|y }, each acting on H A and H B , respectively.The joint probability that the outputs a and b are obtained given the inputs x and y is p AB (ab|xy) = tr ρ AB M a|x ⊗ M b|y , where ρ AB = tr E (ρ ABE ).There are a total of d 2 n 2 such joint probabilities, which we view as the components of a vector p = {p AB (ab|xy)} ∈ R d 2 n 2 .We refer to this vector as the (non-local) behavior characterizing Alice's devices.
We refer to a specific state ρ ABE and sets of measurement operators {M a|x } and {M b|y }, yielding the behavior p, as a quantum realization Q of p.We denote by Q the convex set of all behaviors p ∈ R d 2 n 2 that admit a valid quantum realization Q.In the following, it will be useful to consider measurements on unnormalized quantum states ρAB (i.e.tr (ρ AB ) ≥ 0).We denote the corresponding behaviors by p and define their norm as tr (p) = tr (ρ AB ).We denote by Q the corresponding set of unnormalized quantum behaviors, which is a convex cone.
In general, different quantum realizations Q are possible for a given behavior p.Our aim is to quantify the randomness generated by the boxes from p alone, independently of the possible underlying quantum realizations Q compatible with p.To simplify the notation, we describe in the following how to quantify the local randomness associated with box A's output a when a certain input x = x * is used.The global randomness associated with both boxes' outputs a and b for a given pair of inputs x = x * and y = y * can be treated analogously.
To begin, let us fix a specific quantum realization Q compatible with p.This quantum realization defines an initial state ρ ABE and sets of projectors {M a|x } and {M b|y } 1 .After Alice's measurement the correlations between her classical output a and the quantum information held by Eve are described by the classical-quantum state a p A (a|x * )|a a| ⊗ ρ ax * E , where ρ ax * E is the reduced state of Eve given that Alice performed measurement x * and obtained outcome a.The randomness of box A's output given this side information can be quantified by the guessing probability [27,3]: the average probability that Eve correctly guesses box A's output using an optimal strategy.Such an optimal strategy is described by a d-element POVM {M a|z } that Eve performs on her system; if she obtains the output a, which happens with probability p E (a|z, a ′ , x * , Q) = tr ρ a ′ x * E M a|z when her system is in the reduced state ρ a ′ x * E , she guesses that box A's output was a. Optimizing over all possible measurements, her average probability of guessing correctly is thus given by The above expression defines the guessing probability, which is related to the quantum minentropy Note that in the above definition we made the dependence on Q explicit to stress that we are considering a given quantum realization Q.Since our aim is to obtain a bound on the randomness of the outputs that depends only on p, but not on a specific quantum realization Q of p, we must further maximize G(A|E, x * , Q) over all Q compatible with p: This defines the DIGP, the quantity which interests us in this work.

The device-independent guessing probability as a conic linear program
We have expressed the guessing probability as an average over Eve's probabilities conditioned on box A's outputs, but we can also express it, using Bayes' rule, as an average over Alice's probabilities conditioned on Eve's outcomes: Here p E (a|z, Q) is the probability that Eve obtains the outcome a and p A (a ′ |x * , a, z, Q) is the probability that box A outputs a ′ conditioned on that event.More generally, conditioning on Eve's outcomes defines a family of behaviors p azQ for boxes A and B, or more conveniently of unnormalized behaviors pazQ = p E (a|z, Q)p azQ ∈ Q.Note that averaging over these behaviors yields back the given behavior characterizing the boxes: a pazQ = p.Every choice of Q and {M a|z } defines a family of quantum behaviors satisfying this property.Conversely, it is not difficult to see that any set of behaviors pa ∈ Q satisfying a pa = p can be interpreted as describing the conditional joint output probabilities of boxes A and B for some quantum realization Q and POVM {M a|z } performed by Eve.In terms of the unnormalized behaviors, we can write Eq. ( 3) as G(A|E, x * ) = max Q, {M a|z } a pA (a|x * , a, z, Q) and thus the DIGP associated with p is the solution to the following optimization problem The guessing probability or equivalently the min-entropy is an operational measure of randomness: if ρKE = d k=1 p(k)|k k| ⊗ ρ k E is a cq-state with guessing probability G(K|E) ≤ 2 −t , then a randomness extractor can be used to map k ∈ {1, . . ., d} to a t-bit string k ′ ∈ {1, . . ., 2 t } that is close to being uniformly random and uncorrelated to the adversary, that is ρ where the pa s are the optimization variables.This is a typical instance of a conic linear program [28], i.e. the optimization of a linear objective function ( a pa (a|x * )) subject to linear constraints ( a pa = p) and to the constraint that the optimization variables belong to a convex cone (the constraints pa ∈ Q, since Q is a closed convex cone).
The program Eq. ( 4) has a simple physical interpretation.Any feasible point corresponds to a possible quantum decomposition p = a pa of the behavior p. From the point of view of an adversary, such a decomposition can be understood as a strategy where with probability tr (p a ) the adversary guesses that box A's output was a and prepares the quantum behavior p a = pa /tr (p a ).The probability of correctly guessing box A's output in this strategy is a pa (a|x * ).The program Eq. ( 4) simply searches for the optimal quantum strategy that maximizes this expression.

Dual formulation and optimal Bell expressions
Every conic linear program admits a dual formulation (see, e.g.[28]), which in the case of Eq. ( 4) is readily seen to be In the above problem the optimization variable is the vector f ∈ R d 2 n 2 .It can be interpreted as defining a Bell expression whose expectation value is f • p = a, b, x, y f abxy p AB (ab|xy).That is, it defines a linear form in the behavior p.The constraint p ′ (a|x hold for all p ′ ∈ Q. Whenever f satisfies this constraint, the expectation value f • p provides an upper-bound on the guessing probability since = max In particular, given a fixed Bell expression, such as the CHSH expression c, one can determine coefficients α and β (effectively defining a new linear form Such bounds on the DIGP are the ones that are used in most works related to DIRG or DIQKD, see e.g.[5,7,6,8,29] and [14,15,17,16], respectively.The program Eq. ( 5) goes further since it does not assume a fixed Bell expression, but determines the linear form that yields the lowest upper-bound D(A|E, x * ) on the DIGP for a given behavior p.
The fact that the dual optimal solution D(A|E, x * ) ≥ G(A|E, x * ) yields an upper bound on the primal optimal solution is a general result that holds between any primal and dual conic linear program pairs.Provided that one of the two programs admits a strictly feasible solution, it further holds that there is no gap between the primal and dual optimal solutions, i.e.G(A|E, x * ) = D(A|E, x * ).This is the case here since the form f , defined by f abxy = 1 for all a, b, x, and y, satisfies f • p = n 2 , and consequently p(a|x * ) < Q f • p, and so represents a strictly feasible point of the dual problem.
The programs Eqs. ( 4) and ( 5) are equivalent but have different interpretations.As we have explained above, the feasible points of the primal program correspond to explicit strategies for the adversary.Any such strategy yields a lower-bound on the DIGP.The primal program Eq. ( 4) searches for the optimal strategy that maximizes the guessing probability.On the other hand, any feasible point of the dual program corresponds to a Bell expression, which certifies that a certain amount of randomness is present in the given behavior p, and yields an upperbound on the DIGP.The dual program Eq. ( 5) searches for the Bell expression which certifies the maximal amount of randomness.The duality theorem of conic linear programming tells us that the optimal solutions of both programs are identical, and thus that for every behavior p there exists a Bell expression, which certifies the full amount of randomness present in the correlations.

Semidefinite programming relaxations
The above conic linear programming formulations of the DIGP are in general difficult to implement exactly.However, they can be relaxed using the SDP method introduced in [26,30].This method introduces a hierarchy of convex sets Q1 ⊇ Q2 ⊇ . . .⊇ Q, which approximate the quantum set Q from the outside2 .The hierarchy of programs therefore provides a sequence of relaxations to Eq. ( 4), which yields upper-bounds ) on the DIGP.In this approach a behavior p belongs to Qk if and only if there exists a positive semidefinite matrix Γ k 0 satisfying a series of linear constraints of the form tr (G Γ k ) = h • p (see [30,33] for details).Since the objective function and the first set of constraints in Eq. ( 7) are also linear, the problems Eq.( 7) can be cast as SDP problems for which efficient algorithms are available.
This SDP hierarchy can also be understood from the perspective of the dual problem Eq. ( 5).To see this, we note that the constraint p ′ (a|x * ) ≤ Q f•p ′ in Eq. ( 5) is equivalent to ψ|F a |ψ ≥ 0 for all possible quantum states |ψ and all possible F a of the form F a = abxy f abxy M a|x ⊗M b|y − M a|x * ⊗ I, where {M a|x } and {M b|y } are valid sets of measurement operators.This in turn is equivalent to F a 0 for all F a = abxy f abxy M a|x ⊗ M b|y − M a|x * ⊗ I.We say that F a admits a sum of squares (SOS) decomposition of degree 2k, and write F a = SOS k if there exists a set {S i a } of polynomials of degree k in the operators {M a|x ⊗ I, I ⊗ M b|y } such that F a = i S i a † S i a holds for any sets of valid measurement operators {M a|x } and {M b|y }.If this is the case, it clearly follows that F a = i S i a † S i a 0. Therefore, the series of problems represents a sequence of relaxations of the dual problem Eq. ( 5) yielding upper-bounds It is well known that an SOS constraint of the form F a = SOS k can be represented as an SDP constraint [31] and thus that the relaxations Eq. ( 8) are SDP problems.Such SDP relaxations turn out to be nothing but the dual formulation of the SDP relaxations Eq. ( 7) [30,32] (see [33] for more details on the relation between the primal and dual of the SDP hierarchy).
Even though the primal and dual SDP relaxations Eqs. ( 7) and ( 8) are equivalent, like the original programs, they have different interpretations.Feasible points of the primal programs correspond to decompositions of p in terms of supra-quantum behaviors in Q k .They can be understood as characterizing the strategies available to an adversary which is able to prepare supra-quantum behaviors.Such strategies are not necessarily always available in a purely quantum setting and thus the associated values G k (A|E, x * ) represent upper-bounds on the DIGP.The dual programs on the other hand return explicit Bell expressions certifying that the DIGP cannot be higher than a certain value G k (A|E, x * ).Such bounds are valid -and optimal -for any strategy in Q k and thus are also valid -though not necessarily optimal -for any quantum strategy in Q.In other words, the SDP relaxations Eqs. ( 7) and ( 8) not only give a bound on the DIGP, but also return explicit Bell expressions that can be used in any analysis based on a quantitative relation between the amount of Bell violation and randomness, such as in [14,17,16,5,7,6,8,9,10].

Numerical examples
In this section we present three numerical examples demonstrating the advantage in taking into account the complete non-local behavior.In the first two examples, we consider a twoinput two-output Bell scenario.We introduce the eight parameters A x = a=±1 a p A (a|x), B y = b=±1 b p B (b|y), A x B y = a,b=±1 ab p AB (ab|xy), where x, y = 1, 2. Their knowledge is equivalent to the knowledge of the complete set of probabilities p AB (ab|xy).
CHSH correlations in the presence of white noise.We first consider the randomness that can be extracted from a mixture of maximally violating CHSH correlations plus white noise, i.e. correlations of the form v q + (1 − v)r, where q are the quantum correlations yielding the maximal CHSH violation of 2 √ 2 and r denotes completely random correlations for which p AB (ab|xy) = 1/4 for all a, b, x, and y.As a function of the "visibility" v the CHSH violation is thus given by 2 √ 2 v. Naively, one would expect that in such a simple example knowledge of the full non-local behavior is of no greater utility than knowledge of the CHSH violation alone.Surprisingly, Figure 1 shows that this is not the case, although the improvement that we get by considering the full non-local behavior is modest.We have determined numerically the corresponding optimal Bell inequalities as a function of v by solving explicitly the dual programs.We find that these inequalities all have the form where the coefficients f 11 and f 22 are given in Figure 2. The case f 11 = f 22 = 1 corresponds to the CHSH inequality and only arises in the case of perfect visibility (v = 1).This shows that in any real experiment, in which the visibility is necessarily imperfect (i.e.v < 1), the optimal Bell inequality for randomness certification is not always the CHSH inequality.
Randomness from partially entangled states.In the second example, we consider the following set of correlations where tan µ = sin 2θ.For v = 1 these correlations are obtained by measuring a partially entangled state of the form |Ψ = cos θ |00 + sin θ |11 and give rise to a maximal violation of the . A value of v < 1 corresponds to a mixture of these correlations with completely white noise in the respective fractions of v and 1 − v.
Figure 3 presents bounds on the global DIGP G(A, B|E, 2, 1) corresponding to the pair of outcomes associated with the measurements A 2 and B 1 as a function of θ for v = 0.99.We see that taking into account complete sets of correlations can provide a very significant advantage, not only as compared with taking into account only the violation of a single Bell inequality, but also violations of two independent Bell inequalities.
It is interesting to see what the optimal Bell inequalities, obtained via the dual formulation of the SDP programs, look like.The significant advantage obtained in Fig. 2 by taking into account complete data suggests that the corresponding optimal Bell inequalities would be more than mere tweaks of any of the Bell inequalities that have thus far been investigated for the purposes of DIRG (essentially the I β α inequalities of [3]).This intuition is indeed backed up by the numerics.For example, for θ = 27π/200 (G(A, B | E, 2, 1) ≃ 0.609) we obtain the Bell expression +1.36 whose local bound is 8.36.
Randomness from entangled qutrits.As the last example, we consider the two-input, three-ouput Bell-CGLMP scenario [34].Specifically, we consider correlations which violate the CGLMP inequality and which arise by performing the measurements specified in [34] on the family of states α|00 with 0 ≤ α ≤ 1/ √ 2. For α = 0 the state is a product state, for α = 1/ √ 3 it is a maximally entangled two-qutrit state, while for α = 1/ √ 2 it is a maximally entangled two-qubit state.For α ≃ 0.6169 the CGLMP inequality is maximally violated [35], while no violation is obtained for α ≤ 3/22 ≃ 0.3693 using the set of measurements considered.Figure 4 presents bounds on the randomness G(A|E, 1), which can be certified in this scenario, for 3/22 ≤ α ≤ 1/ √ 2, taking into account only the CGLMP violation or the full non-local behavior.Unsuprisingly, at the point of maximal violation of the CGLMP inequality, we can certify one trit of randomness, i.e.G(A|E, 1) = 1/3.However, taking into account the complete behavior, a large interval of values of α yields G(A|E, 1) = 1/3, including values for which the CGLMP violation is small.These results have been obtained using the second order relaxation of the SDP hierarchy.The range of values of α for which G(A|E, 1) = 1/3 may thus turn out to be larger when going to higher order SDP relaxations or using different measurements from those specified in [34].

Conclusion
We have shown how the device-independent guessing probability can be evaluated by taking into account in a systematic way the complete non-local behavior characterizing a Bell test and not only the violation of a pre-specified Bell inequality.We have also shown that for any given non-local correlations, there exists an optimal Bell inequality that can certify the maximal amount of randomness compatible with such correlations.Explicit upper-bounds on the deviceindependent guessing probability and their associated Bell inequalities can be computed by adapting the SDP hierarchy introduced in [26].Low order relaxations, as is often the case with applications of the SDP hierarchy, usually already yield the optimal value of the guessing probability.
Our approach can be straightforwardly adapted to quantify randomness in purely nonsignaling settings (i.e.without requiring the validity of quantum theory).The corresponding programs are simply the analogues of Eqs. ( 4) and (5), where the constraints pa ∈ Q and p ′ (a|x * ) ≤ Q f•p ′ are replaced by pa ∈ N S and p ′ (a|x * ) ≤ N S f•p ′ , respectively, with N S denoting the set of non-signaling behaviors.Since N S is entirely characterized by linear constraints (the no-signaling constraints [36] and the positivity of probabilities), these programs can be solved using linear programming.
We expect that the tools that we have presented will contribute to advancing our fundamental understanding of the relation between non-locality and randomness, and its cryptographic applications.In particular, the simple examples that we have studied (especially Figures 1, 2,  and 4) already yield unexpected results that motivate further investigations.Finally, it would be interesting to understand what is the optimal way to incorporate directly our method in protocols for DIRG and DIQDKD taking into account finite statistics effects.
Note added.Similar results to our own have been obtained independently and in parallel by J.D. Bancal, L. Sheridan, and V. Scarani [37].√ 2v), while the solid curve was obtained by taking into account the full non-local behavior.Both curves were obtained using the second order relaxation of the SDP hierarchy and are actually optimal up to the numerical precision of 10 −6 used (we have verified optimality by finding explicit states and measurements saturating the bounds given by the SDP programs).Except when v = 1, i.e. when there is no noise, we see that there is a small advantage in taking into account the full non-local behavior.1 and the CHSH expressions, and the solid curve by taking into account the values of all correlators in accordance with Eq. (10).These curves were obtained using the third order relaxation of the SDP hierarchy.The dashed-dotted curve is optimal up to a precision of 10 −6 .(12).The dashed curve is obtained by taking into account only the CGLMP value, and the solid one the complete behavior.Both curves were obtained using the second order relaxation of the SDP hierarchy, and the dashed one has been verified to be optimal up to a numerical precision of 10 −5 .

Figure 1 :
Figure 1: Global randomness G(A, B|E, 1, 1) as a function of the visibility v for optimally violating CHSH correlations in the presence of white noise.The dashed curve was obtained by taking into account only the CHSH value (i.e.2√ 2v), while the solid curve was obtained by taking into account the full non-local behavior.Both curves were obtained using the second order relaxation of the SDP hierarchy and are actually optimal up to the numerical precision of 10 −6 used (we have verified optimality by finding explicit states and measurements saturating the bounds given by the SDP programs).Except when v = 1, i.e. when there is no noise, we see that there is a small advantage in taking into account the full non-local behavior.

Figure 2 :
Figure 2: Coefficients of the optimal Bell inequalities Eq. (9) as a function of v.The CHSH inequality corresponds to the case f 11 = f 22 = 1 and is optimal only for perfect visibility v = 1 (and trivially v = 1/ √ 2).

Figure 3 :
Figure 3: G(A, B|E, 2, 1) as a function of θ computed by taking into account partial or complete non-local data for v = 0.99.The dashed curve was obtained by constraining only the value of the I β 1 expression, the dotted curve by constraining only the value of the CHSH expression, the dashed-dotted curve by constraining the values of both I β1 and the CHSH expressions, and the solid curve by taking into account the values of all correlators in accordance with Eq.(10).These curves were obtained using the third order relaxation of the SDP hierarchy.The dashed-dotted curve is optimal up to a precision of 10 −6 .

Figure 4 :
Figure 4: Local DIGP G(A|E, 1) as a function of the parameter α defined in Eq.(12).The dashed curve is obtained by taking into account only the CGLMP value, and the solid one the complete behavior.Both curves were obtained using the second order relaxation of the SDP hierarchy, and the dashed one has been verified to be optimal up to a numerical precision of 10 −5 .