Device-independent randomness generation from several Bell estimators

Device-independent randomness generation and quantum key distribution protocols rely on a fundamental relation between the non-locality of quantum theory and its random character. This relation is usually expressed in terms of a trade-off between the probability of guessing correctly the outcomes of measurements performed on quantum systems and the amount of violation of a given Bell inequality. However, a more accurate assessment of the randomness produced in Bell experiments can be obtained if the value of several Bell expressions is simultaneously taken into account, or if the full set of probabilities characterizing the behavior of the device is considered. We introduce protocols for device-independent randomness generation, secure against classical side information, that rely on the estimation of an arbitrary number of Bell expressions or even directly on the experimental frequencies of measurement outcomes. Asymptotically, this results in an optimal generation of randomness from experimental data (as measured by the min-entropy), without having to assume beforehand that the devices violate a specific Bell inequality.


Introduction
In recent years, researchers have uncovered a fundamental relationship between the non-locality of quantum theory and its random character. This relationship is usually formulated as follows. Consider two (or generally k) separated quantum devices accepting, respectively, classical inputs x 1 and x 2 and outputting classical outputs a 1 and a 2 . Let p = {p(a 1 a 2 |x 1 x 2 )} denote the set of joint probabilities describing how the devices respond to given inputs, from the point of view of a user who can only interact with the devices through the input-output interface, but who has no knowledge of the inner workings of the devices. Suppose that given p, the expectation value of a certain Bell expression f , such as the Clauser-Horne-Shimony-Holt (CHSH) expression [1], equals f [p]. Then, it is in principle possible to compute a lower bound on the randomness generated by the devices, as quantified by the min-entropy-the negative logarithm of the maximal probability of correctly guessing the values of future outputs. This bound on the min-entropy holds for any observer, including those having an arbitrarily precise description of the inner workings of the devices, and depends only on information derived from the resulting input-output behavior through the quantity f [p]. In principle, this bound can be computed numerically for any given Bell expression f . For certain Bell expressions, such as the CHSH expression, it can also be determined analytically.
This relation between the non-locality of quantum theory and its randomness is at the basis of various protocols for device-independent (DI) randomness generation (RNG) [2,3] and quantum key distribution (QKD) [4,5]. The theoretical analysis of such protocols presents us with an extra challenge in that the probabilistic behavior p of the devices is not known in advance and may vary from one measurement run to the next. This implies that bounds on the randomness as a function of f [p] have to be adapted to rely instead on the value of the Bell expression f estimated from experimental data. Some DIRNG and DIQKD protocols, and their security analyses, are reliant on specific Bell inequalities (usually the CHSH inequality) [6][7][8][9] or certain families of Bell inequalities [10][11][12], while others may be adapted to arbitrary Bell inequalities [3,[13][14][15][16][17]. However, to our knowledge all DIRNG and DIQKD protocols in the literature require that a single Bell inequality be chosen in advance and its experimental violation estimated (one exception is [18], where two fixed Bell expressions are used). The length and secrecy of the final key will then depend on the observed violation of the chosen inequality.
Nevertheless, it has been pointed out in [19,20] that the fundamental relation between the randomness and non-locality of quantum theory does not necessarily need to be expressed in terms of a specific Bell inequality. It is in principle possible, at least numerically, to bound the probability of guessing correctly the outputs of a pair of quantum devices directly from the knowledge of the joint input-output probabilities p. Indeed, the amount of violation f [p] of a given Bell inequality captures the non-local behavior of the devices only partially, and better bounds on the min-entropy can be obtained if all the information about the devices' behavior is taken into account.
This observation raises the following question: can one devise a device-independent RNG or QKD protocol that does not rely on the estimation of any a priori chosen Bell inequality, but which instead takes directly into account all the data generated by the devices?
There are various reasons for introducing protocols of this type. First, as already mentioned, the entire set of data generated by the devices can provide more information than the violation of a specific Bell inequality, and may therefore potentially allow for more efficient protocols. Second, the choice of a Bell inequality may have a deep influence on the amount of randomness that can be certified: as shown in [21] there are devices for which the amount of randomness, as computed from the CHSH inequality, is arbitrarily small, but is maximal if computed using another Bell inequality. Third, even if a set of quantum devices have been specifically designed to maximize the randomness according to a specific Bell inequality, the optimal extraction of randomness from noisy versions of such devices, say because of degradation of the devices with time, will typically rely on other Bell inequalities [19,20,22]. Finally, suppose that one is given a set of quantum devices without any specification of which Bell inequality they are expected to violate. Can one nevertheless directly use them in a protocol and obtain a non-zero random string or shared key, without testing their behavior beforehand ? We show here that it is indeed possible to devise DIRNG protocols which exploit more information than the estimated violation of a single Bell inequality, particularly, DIRNG protocols which exploit the full set of frequencies obtained (i.e., the entire set of estimates of the behavior p). Specifically, we introduce a DIRNG protocol whose security holds against an adversary limited to classical side information, or equivalently, with no long-term quantum memory. (Note that such a level of security may well be sufficient for all practical purposes [14,15].) Technically, our protocol is obtained by generalizing the security analysis introduced in [14,15] and combining it with the semidefinite programming techniques introduced in [19,20] for lower-bounding the randomness based on the full set of probabilities p (which cannot be directly applied to experimental data).
We start in Section 2 by briefly presenting the theoretical framework of our work, its main assumptions, and the notation used throughout the paper. In Sections 3 to 5 we present our main mathematical results. In Section 3 we present the main theorem of the paper and explain in detail how to put a DI bound on the randomness produced when measuring a Bell device n times in succession, given that we have a way to bound the single-round randomness as a function of the Bell expectation, and given that we can estimate the Bell expectation with some confidence. These two sub-procedures are respectively presented in Sections 4 and 5 for the general case of an arbitrary number of Bell expressions. Combining these two sub-procedures with the general approach of Section 3 immediately yields a DIRNG protocol, whose various steps are summarized in Section 6. In Section 7 we discuss in detail the main features of our protocol, and illustrate these with a numerical example. We end with some concluding remarks and open questions in Section 8.

Behaviors and Bell expressions
In the following we will refer to a Bell set up, that is to say, k separated "black" boxes (quantum devices whose inner workings are unknown), as a Bell device. Each box i can receive an input x i upon which it produces an output a i , with x i and a i taking values in some finite sets X i and A i , respectively, where without loss of generality we assume that the set of outputs A i does not depend on the input x i . We write x = (x 1 , . . . , x k ) and a = (a 1 , . . . , a k ) for the k-tuple of inputs and outputs, and write X = X 1 × · · · × X k and A = A 1 × · · · × A k for the set of all possible k-tuples of inputs and outputs. Note that we use a roman (upright) type for the inputs and outputs of a single box and an italic type for the joint inputs and outputs of all k boxes.
The behavior of a single-round use of this Bell device can be characterized by the |A| × |X | joint probabilities p(a | x), which we can arrange into a vector p ∈ R |A|×|X | . We denote by Q ⊂ R |A|×|X | the set of behaviors p which admit a quantum representation, i.e., the set of behaviors such that there exist a k-partite quantum state and local measurements yielding the outcomes a with probability p(a | x) when performing the measurements x. It is well-known that the set Q can be approximated from its exterior (from outside the set) by a series of semidefinite programs (SDP) using the NPA hierarchy [23].
We define a Bell expression as a vector f ∈ R |A|×|X | with components f (a, x). The Bell expression f defines a linear form on the set of behaviors p through We refer to f [p] as the expectation of f with respect to the behavior p.
We consider here a framework in which the information we have about a Bell device is not necessarily given by the full behavior p, but possibly only by the expectation of one or more Bell expressions. In the following, we thus assume that t Bell expressions f α (α = 1, . . . , t) have been selected. (The certifiable randomness will depend on this initial choice of Bell expressions; we discuss this issue later.) We denote by f = (f 1 , . . . , f t ) these t Bell expressions and by f [p] = (f 1 [p], . . . , f t [p]) their expectations with respect to the behavior p. As an example, in a bipartite scenario, we might only know the value of the CHSH expression, in which case t = 1 and there is a single f defined by f (a, x) = (−1) a1+a2+x1x2 . But the framework is also applicable when f [p] corresponds to the full set p of probabilities. One simply needs to consider |A| × |X | expressions, one for each pairing (a, x), which are defined by Of course, in a DI protocol, we are not actually given f [p]; we must instead estimate it by performing sequential measurements. We are thus led to consider a Bell device which is used n times in succession.
We write x = (x 1 , . . . , x n ) and a = (a 1 , . . . , a n ) for the corresponding sequence of inputs and outputs and x j = (x 1 , . . . , x j ) and a j = (a 1 , . . . , a j ) for the sequences of inputs and outputs up to, and including, round j.
We write P ( a | x) for the conditional probabilities of obtaining the sequence of outputs a given a certain sequence of inputs x. Note that we use an upper-case P to denote the n-round behavior of the boxes and lower-case p's for single-round behaviors. We assume that the Bell device is probed using inputs x distributed according to a probability distribution Π( x). We will consider, in particular, the case where at each round the inputs are selected according to identical and independent distributions π(x), so that Π( x) = n j=1 π(x j ) (though this condition can actually be slightly relaxed in the results that follow). The full (non-conditional) n-round probabilities are thus given by P ( a, x) = P ( a | x) Π( x). We denote by P AX and P A|X the distributions corresponding to the probabilities P ( a, x) and P ( a | x), respectively.
The only assumption we make about the Bell device is that at each round it is characterized by a joint entangled quantum state and a respective set of local measurement operators for each box. Each set of local measurement operators can depend on the past inputs and outputs of all k boxes (separated boxes can thus freely communicate between measurement rounds), but does not depend on future inputs (inputs are thus selected independently of the state of the device) or inputs of the k −1 other boxes in the same round. Mathematically, this means that we can write P ( a | x) = n j=1 P (a j | x j , a j−1 , x j−1 ), and that the (single-round) behavior at round j given the past inputs and outputs x j−1 and a j−1 , defined as p aj−1, xj−1 (a j | x j ) = P (a j | x j , a j−1 , x j−1 ), should be a valid no-signaling quantum behavior, i.e., p aj−1, xj−1 ∈ Q.
We assume that the internal behavior of the boxes may be classically correlated with a system held by an adversary. Formally, these correlations and the adversary's knowledge can be represented through the joint probabilities P ( a, x, e), where e denotes the adversary's classical side information. However, in order to keep the notation simple, we do not explicitly include e in the following. All the reasonings that follow would nevertheless hold, with only minor modifications, if the adversary's classical side information e were explicitly taken into account. This can be understood by comparing our proofs with those in [14]. Alternatively, e can be formally viewed as an initial input x 0 = e.
In the following, we sometimes adopt a terminology where the k-tuples x and a are referred to as the input and output of (a single-round use of) the Bell device (though of course each consists of the inputs and outputs, respectively, of all k boxes).

A general procedure for DIRNG against classical side-information
In this section, we show how to quantify the randomness produced by n sequential uses of the Bell device based on the Bell expressions f . We follow the approach introduced in [3,14]. This approach relies on two essential sub-procedures: a first sub-procedure to bound the randomness of single-round behaviors and a second sub-procedure to estimate a certain quantity involving the Bell expressions f . Given these two ingredients, the single-round randomness bound can, through some simple algebra, be adapted to the n-round scenario and related to the actual data obtained in the Bell experiment. We provide a macro-level description of this approach, which relies only on certain general mathematical properties that these two basic sub-procedures must satisfy, but not on any specifics as to how to implement them. We will present explicit ways to carry out these sub-procedures in the next two sections.
Intuitively, the output of the Bell device exhibits randomness for some choice of input x if there is no corresponding outcome that is certain to happen, i.e., if P ( a | x) < 1 for all a ∈ A n . Equivalently, we can express this condition by saying that the surprisals − log 2 P ( a | x) are bounded away from zero: − log 2 P ( a | x) > 0 for all a ∈ A n . Our first aim will thus be to lower-bound these surprisals without making any assumptions regarding the Bell device's behavior apart from the ones stated in Section 2. We will then see how to turn this bound into a more formal statement in terms of min-entropy.
To bound the n-round randomness, we assume the existence of a function H which bounds the singleround surprisal − log 2 p(a | x) as a function of the Bell expectations f [p]. This is our first ingredient. We actually require this function to non-trivially bound the surprisals − log 2 p(a | x) corresponding to a certain subset X r ⊆ X of all possible inputs. This is because for certain behaviors, some inputs x lead to less predictable outputs than those resulting from other inputs, and we would therefore prefer to focus on these inputs only. (As will be elaborated on later, the amount of certifiable randomness generally depends on the choice of X r .) Formally, the function H, on which our results are based, is defined as follows.

H(f [p]) is a convex function of its argument:
for any 0 ≤ q ≤ 1 and any p 1 , p 2 ∈ Q.
We will also need to compute a lower bound on H(f [p]) for all behaviors p ∈ Q such that f [p] ∈ V for some arbitrary region V ⊆ R t . We thus extend our definition of H to sets, such that where is the number of x j in x = (x 1 , . . . , x n ) which do not belong to the set X r .
Proof of Lemma 1. The proof follows essentially the same steps as the proof of Lemma 1 in [14]. The main differences are (a) that we express the bound eq. (4) as a function of t Bell expressions, instead of a single Bell expression, and (b) that the bound considers explicitly only the randomness from the inputs in X r . From our assumptions regarding the Bell device, it follows that for any ( a, x) we can write Each term in the sum such that x j ∈ X r can be bounded by where in the last line we have exploited the convexity of H.
Lemma 1 tells us how to bound the surprisals − log 2 P ( a | x) as a function of 1 n n j=1 f [p aj−1, xj−1 ], which can be understood as an n-round average Bell expectation, where the average is taken conditioned on past inputs and outputs at each preceding round. This quantity, however, is not directly observable. This leads us to introduce the following definition of a confidence region, which is the second ingredient needed in our approach. (Note that in general V explicitly depends on a and x, although notation-wise this dependence is sometimes left implicit.) In other words, for small and large n, knowing the outcomes ( a, x) of n rounds of measurement, one can determine V = V( a, x, ) and assert with high confidence that 1 n n j=1 f [p aj−1, xj−1 ] is somewhere in V, even though its exact value cannot be deduced from ( a, x) alone. The assertion is false if and only if ( a, x) / ∈ V , which occurs with a probability smaller than by definition. Combining eq. (8) with this definition immediately implies the following: Lemma 2 tells us that the surprisal associated to the event a given x is lower-bounded by a function of ( a, x), except for a subset of "bad" events {( a, x) / ∈ V }. One way to deal with these bad events is simply to pretend that the boxes are characterized by a slightly modified behaviorP that yields a new "abort" output a =⊥ when one of the bad events is obtained (while according to P , the probability of a =⊥ is zero). Effectively,P can be thought of as postprocessed version of the physical behavior P . Though this post-processed version cannot be achieved in practice by the user of the devices (since he does not know the set of bad events), it is well-defined physically (it could for instance be implemented by an adversary having a perfect knowledge of P ). The relevant point is that since the probability of these bad events is extremely low for sufficiently small , the behaviors P andP are, as shown below, close in variation distance, and analyzing the security using P instead of P thus yields the same result up to vanishing error terms. (See [14] for a more detailed discussion.) Lemma 3. There exists a behaviorP A|X such thatP AX =P A|X ×Π X and P AX = P A|X ×Π X are -close in variation distance, i.e., and such that for Proof of Lemma 3. The proof of this lemma is analogous to that of Lemma 3 in [14]. DefineP A|X as Eq. (11) follows immediately, and Lemma 2 implies eq. (12).
We can now put a bound on the randomness of the Bell device as follows. Let λ denote the event that nH(V) − ν( x)η is greater than or equal to some a priori fixed threshold H thr . Conditioned on λ occurring, we can bound the conditional min-entropy of the outputs given the inputs, , as follows (see [24] for a more detailed discussion of the concept of min-entropy and its relevance in our context): In the second line we defined Λ x as the set of a's such that the event λ occurs given x, and in the third line we used eq. (12) and the fact that nH(V) − ν( x)η ≥ H thr by the definition of λ. ComparingP (λ) to some positive directly implies the following result: Theorem 1. Let and be two positive parameters, let H thr be some threshold, and let λ be the event Then the behavior P AX is -close to a behaviorP AX such that, according toP AX , The meaning of this result is as follows. Suppose that we are able to compute a RB function according to Definition 1 and, from the results ( a, x) of n rounds of measurements, a 1− confidence region according to Definition 2. We may thus compute the value of nH(V) and check whether it is above the chosen threshold H thr , i.e., whether the event λ occurred.
The given physical device that we used to generate the results ( a, x) is characterized by an unknown behavior P . The theorem indirectly characterizes the behavior P , by showing the existence of anclose behaviorP , where can be chosen arbitrarily small. The probability difference between the two distributions is thus at most for any event, and P andP are almost indistinguishable. The theorem states that, assuming that the event λ occurs, the behaviorP is one of two possible kinds.
The first possibility if the event λ is observed is that the conditional min-entropy ofP is higher than H thr − log 2 1 . This implies thatP contains extractable randomness: one can use a randomness extractor to process the raw outputs a and obtain a final string of bits, which is close to uniformly random according toP and whose size is essentially H thr − log 2 1 (the length and randomness of the output string will also depend on a security parameter ext of the extractor itself) [25] . Since P is -close toP , it follows that the output string will also be essentially uniformly random according to the actual behavior P of the device (see Section III.D of [14] for details).
The second possibility is that the event λ occurred while being very unlikely: according toP , Pr(λ) ≤ , and thus, according to P , Pr(λ) ≤ + , where can be chosen arbitrarily small. In this case there is no guaranteed lower bound on the conditional min-entropy. We cannot, of course, avoid such a possibility. For instance, a Bell device that simply outputs predetermined bits, which have been chosen uniformly at random by an adversary, will have zero conditional min-entropy, but may still pass any statistical test we can devise with some positive probability. Nevertheless, in this case, since λ is unlikely, the impact on the security of the protocol of (mistakenly) assuming that the conditional min-entropy bound of the Theorem holds, will be negligible. We refer to Section III.D of [14] for more details on how Theorem 1 translates to a secure randomness generation protocol.
Note that more generally, one can use a sequence of thresholds H 0 < H 1 < · · · < H , rather than a single threshold. Theorem 1 then becomes a set of individual statements, regarding events This means that the protocol admits intermediate thresholds of success leading to increasingly better min-entropy bounds, rather than being a singlethreshold, all-or-nothing protocol.

Estimation
In this section we explicitly illustrate how to construct a confidence region, according to Definition 2, using a straightforward estimator for the Bell expectations f [p] and applying the Azuma-Hoeffding inequality, as proposed in [3]. Note that it is possible to use other (tighter) concentration inequalities than the Azuma-Hoeffding inequality. In particular, we do not claim our specific choice to be optimal for a finite number of rounds n.
Let ( a, x) be the output-input sequence obtained in a certain realization of the n-round protocol. We define the observed frequencies as an estimation of the average behavior of the device based on the observed data,p where #(a, x) is the number of occurrences of the output-input pair (a, x) in the n rounds. As with probabilities, we refer to the full set of observed frequencies (p(a | x)) as a vectorp. We define resulting estimators for the Bell expressions by substitutingp for p in (1): To ease the notation, in the following we sometimes writef instead of f [p]. It should be kept in mind thatp andf are random variables, being functions of the observed event ( a, x). As shown in [3], a simple application of the Azuma-Hoeffding inequality yields the following result: Lemma 4. For any α = 1, . . . , t, let ± α > 0 and let where Then and Lemma 4 simply states that with high probability the n-round average 1 , conditioned on the past, is no greater (no smaller) than the observed valuef α plus (minus) some deviation µ + α (µ − α ). This deviation tends to zero as 1/ √ n and directly depends on the quantity γ α , which represents an upper bound on the maximum possible value of |f α (a, x)/π(x) − f α [p]|, that is to say, the maximal extent to which the random variable f α (a, x)/π(x) can differ from its expectation f α [p]. In other words, γ α bounds the possible statistical fluctuations which our observations can be subject to. A specific value for γ α is given byγ The terms max a,x and min a,x are easy to calculate, while the terms max p∈Q f α [p] and min p∈Q f α [p] can be computed through SDP using a NPA relaxation [23].
We can combine the above upper and lower bounds for all α through a union bound to get the following confidence region: withf α as defined in eq. (19) and γ α as defined in eq. (21).

Let the confidence region
Then . In eq. (26) the inequalitiesf − ≤ f ≤f + -as all other vector inequalities in this paper-should be understood to hold component-wise, i.e.,f − α ≤ f α ≤f + α for all α. Note that when + α = 0 (or − α = 0), we are simply not putting any bound on 1 n n j=1 f [p aj−1, xj−1 ] from above (or below). Indeed, it is not always useful to bound a Bell expression from both directions. Consider, for instance, the CHSH expression. It is well-known that the amount of certifiable randomness increases with the absolute value of the CHSH violation, increasing from 2 (the maximal local value) to 2 √ 2 (the maximal quantum value) and from −2 (the minimal local value) to −2 √ 2 (the minimal quantum value). If we are estimating the randomness produced by our Bell device based only on the CHSH expression f chsh , and strongly expect the CHSH expectation to be in the region [2, 2 √ 2], then it is certainly desirable to lower-bound it as accurately as possible. However, we have no interest in knowing that it is smaller than some value (since the randomness which can be certified is only affected by the lower bound in the region). For a given = + chsh + − chsh , we are therefore interested in setting + chsh = 0, so that − chsh is as large as possible, and thusf − chsh is as close as possible tof chsh . However, if we have no a priori reason to expect the CHSH expression to lie in one region or the other, ± chsh = /2 is the most natural choice.

Bounding single-round randomness
In Section 3 we showed how to put a bound on the randomness produced by a Bell device which is used n times in succession, given a RB function H. We now discuss how we can explicitly compute such a function.
The function H is defined through two properties, as specified in Definition 1. The first one is the for all a ∈ A, all x ∈ X r , and all p ∈ Q. The optimal function satisfying this first condition is simply given bỹ Alternatively, we can pass the − log 2 to the left of the minimizations, which then become maximizations, and we can thus The functionsH andG defined in this way have an intuitive interpretation. For a fixed behavior p and a fixed input x,H = min a∈A (− log 2 p(a | x)) is simply the min-entropy of the distribution {p(a | x)} a∈A , whileG = 2 −H = max a∈A p(a | x) is the associated guessing probability, i.e., the optimal probability to correctly guess the output a given that we know that it is drawn from the distribution {p(a | x)} a∈A . Both these quantities represent measures of the output randomness. However, we are generally interested in bounding the output randomness not only for a single input x, but simultaneously for a subset X r of all the inputs. In addition, we assume in the DI spirit that the full behavior p of our Bell device is generally not known, and that the device is characterized only by the Bell expectations f [p]. Taking the worst case ofH andG over all inputs x ∈ X r and all quantum behaviors p compatible with the Bell expectations f [p] leads to (28) and (29).
The second requirement in Definition 1 is that H should be a convex function. This property is used in Lemma 1 to bound the randomness produced from n successive measurement rounds. However, the function defined by (28) is not necessarily convex. For fixed values of a ∈ A and x ∈ X r , let us denoteH a,x (f [p]) the function defined by the interior minimization, i.e., the minimum over . This is a convex minimization program and thus the functionsH a,x (f [p]) are all convex. However,H is obtained by taking the point-wise minimumH(f [p]) = min a,xHa,x (f [p]) of these functions, which will generally not be convex (see [20] for a specific example where this happens). Similarly, the individual functionsG a,x defined by the interior maximization in (29) are concave, butG will generally not be.
In order to obtain a convex function, we could simply define a function H * as the minimum over arbitrary convex combinations of the functionsH a,x , i.e., as the convex hull of (28): Similarly, the concave hull of (29) is Note that it is not true any more that H * = − log 2 G, but it is easy to see that Though the function H * defined through (30) is the tightest function satisfying the constraint of Definition 1, it is not easy to deal with numerically because of the presence of the logarithms in the definitions ofH a,x . We will thus instead use the lower-bound H = − log 2 G, which obviously satisfies the first condition of Definition 1 (since H * ≥ H) as well as the second one (since G is concave and nonnegative, H = − log 2 G is convex). The interest is that the optimization problem (31) is simpler to evaluate than (30). Note first that (31) can be re-expressed as follows by absorbing the weights q a,x in the unnormalized quantum behaviorsp a,x = q a,x p a,x : In the above formulation,Q denotes the set of unnormalized quantum behaviors, the conditions q a,x ≥ 0 and p a,x ∈ Q are equivalent to the single conditionp a,x ∈Q, and the condition a,x q a,x = 1 becomes is independent of the choice of x, and it is equal to 1 for normalized behaviors). Problem (32) cannot be solved in general since the setQ is hard to characterize, but it can be replaced with one of its NPA relaxations, in which case it becomes a SDP (since apart from the conditionp a,x ∈Q all constraints and the objective function are linear). This will in general only yield an upper bound on the optimal value G (and thus a lower bound on H * ), but this is entirely sufficient for our purpose.
In the case where the set X r contains a single input x, the optimization problem (32) is essentially identical to the one introduced in [19,20] and corresponds to maximizing an adversary's average guessing probability over all possible quantum strategies (the difference with [19,20] is that we characterize the devices through an arbitrary number of Bell expectations f [p], rather than a single Bell expression or the full set of probabilities p(a | x)). The general form (32), however, also applies to the case where X r contains more than one input and represents one possible way to characterize the randomness of a subset of inputs (other suggestions have been made in [20]; the main reason for the present choice is that it satisfies the mathematical properties that are needed in our n-round analysis). In the following, we refer to the function G given by (32) as the guessing probability of the behavior characterized by f [p].
To apply our n-round analysis, we actually do not need to compute the value H(f [p]) = − log 2 G(f [p]) for a fixed value f [p], but instead its worst-case bound over all quantum behaviors p ∈ Q for which f [p] ∈ V. If the confidence region V is defined as an interval [f − ,f + ], as in the preceding section, this can simply be cast as the following optimization problem:  We conclude this discussion by noting that in specific cases such as that of [3], where f is a single CHSH expression, the symmetries under relabelings of inputs and outputs imply that the formulations (28), (29), (30), and (31) are equivalent, since (28) is already convex. In such cases, our RB function is the tightest function that satisfies Definition 1, by virtue of (28) being the tightest function that satisfies condition 1 of the Definition.
In the Appendix, we provide more intuition about the above problems by considering their dual formulations. We also discuss in more detail their link with [19,20].

Summary of the protocol
In the two preceding sections, we have specified a way of bounding the randomness within a (1 − ) confidence region V = [f − ,f + ] around the observed statistic f [p]. We can thus apply Theorem 1 to bound the min-entropy of the output string obtained after n uses of the device. Processing this raw string with a suitable extractor finally leads to a uniformly random and private string. The resulting protocol is summarized in Figure 1.

Arguments:
• D(k, X , A): a Bell device as described in Section 2, consisting of k black boxes taking joint inputs in X and producing joint outputs in A.
• X r ⊆ X : a subset of all inputs (ideally this set should contain the inputs from which we expect to obtain the most randomness).
• π(x): the distribution of inputs at each round.
• n: the number of measurement rounds.
• l: the level of the NPA relaxation used to solve the randomness-bounding SDP.
• H thr : the threshold used to determine if the protocol succeeds or aborts.
• , : the two security parameters involved in Theorem 1. The parameter should itself be decomposed into the 2t parameters ± α such that = α − α + + α .

Protocol:
1. Operate the device n rounds in succession: at each round j, choose the k-tuple of inputs x j according to distribution π(x j ) and obtain the k-tuple of outputs a j . After n rounds obtain the input-output string ( x, a). (b) if yes, apply a (m, h, ext )-extractor, where m is the extractor output length, h = H thr − log 2 1 is the min-entropy bound given by Theorem 1, and ext the extractor's security parameter (see [14]). As we noted in Section 3, one can define a similar protocol based on a sequence of thresholds H 0 < H 1 < · · · < H rather than a single one, introducing intermediate levels of success in the protocol. One advantage of this is that we do not need to determine what threshold we expect the device to reach and risk failing the protocol with high probability if we overestimated H thr . See Section III.D of [14] for details.

Discussion
We have introduced a family of protocols, each characterized by a choice of t Bell expressions f α , a randomness-generating input set X r , and an input distribution π(x). This family contains as a special case the protocols introduced in [3,14,15], which correspond to the case where a single Bell expression f is used (t = 1) and where the randomness-bounding function covers all inputs (X r = X ). The main novelty introduced in the present work is that we can take into account information from more Bell expressions (t ≥ 1) and can tailor the randomness analysis to a subset of all possible inputs (X r ⊆ X ).
In order to discuss these new aspects, in the following sections we illustrate our protocol on a concrete example. The scenario in this example has two parties (k = 2), two measurement settings per party (X = {0, 1} 2 ), and two outcome possibilities per measurement (A = {0, 1} 2 ). We consider a device behavior for v = 0.99, arising from a mixture of white noise u and the extremal quantum behavior p ext that achieves maximal violation of the I β 1 tilted-CHSH inequality introduced in [21], with β = 2 cos(2θ)/ 1 + sin 2 (2θ) for θ = π/8. The tilted-CHSH expression is defined as The extremal behavior can be achieved by a pair of partially entangled qubits |φ = cos θ |00 + sin θ |11 measured with observables with tan µ = sin 2θ. (Note the difference in notation from [21]: we relabelled the inputs 1 and 2 to 0 and 1, respectively.) The resulting correlations have the property of giving more predictable outcomes for a subset of measurement inputs. For θ = π/8, the two measurement settings that give more predictable outcomes, x = (0, 0) and x = (0, 1), have a guessing probability of about 0.775 in the ideal (v = 1) case where I β 1 is maximally violated. On the other hand, the two measurement settings with less predictable outcomes, x = (1, 0) and x = (1, 1), have guessing probabilities of about 0.496.
In the analysis of randomness, we will consider two choices for the randomness generating subset X r : the full input set X r = X and the more restricted choice X r = {(1, 0)}, which is one of the two settings that give less predictable measurements in p ext .
Furthermore, we will estimate three different Bell expressions, all defined in terms of the following correlators: The weights π 1 (x 1 | x 2 ) and π 2 (x 2 | x 1 ) represent the two conditional local input distributions defined with respect to the joint input distribution π(x 1 x 2 ). 1 The expressions we will evaluate are the CHSH expression the tilted-CHSH expression I β 1 (35) and the "optimal" expressions for the chosen device behavior (34): and These last two Bell expressions are "optimal" Bell expressions in the following sense. As already observed in [19,20], the dual of problem (32) (see eq. (51) in the Appendix), when applied to a device characterized by its full behavior (i.e., f a,x [p] = p(a | x) so that f [p] = p), finds a Bell expression I p such that the amount of randomness certified from I p [p] with respect to the measurement setting x = (1, 0) is equal to the amount of randomness that can be certified from the entire table of probabilities p(a | x) (again, with respect to the measurement x = (1, 0)). Thus, to each device behavior p is associated a single Bell expression I p that is optimal for p from the point of view of randomness. 2 Likewise, I all p is defined with respect to all inputs x ∈ X rather than the subset {(1, 0)}.

Bounding randomness for all inputs with one Bell expression (X r = X , t = 1)
Before discussing the novelties introduced in this work, let us start by briefly reviewing the case t = 1 and X r = X , which corresponds to the protocols introduced in [3,14,15]. In this case, ν( x), the number of inputs not in X r , is always equal to zero, and according to Theorem 1, the min-entropy of the output string is roughly equal to nH(V). Furthermore, the confidence region V reduces to a confidence interval [f − ,f + ] around the estimated Bell violationf . Usually, the values off that we expect to obtain in the protocol will fall in a region where H(f ) is either monotonically increasing or decreasing withf , i.e., the interval is within either the upward-or downward-sloped region of the convex function H(f ). For instance, if f is the CHSH expression, we may assume that the devices have been designed so that with very high probabilityf ≥ 2. In that region, H(f ) is indeed increasing withf (i.e., the randomness increases for increasing values of the CHSH expression). Let us assume for definiteness that H is increasing (the same kind of reasoning can be done if H is decreasing). Since we are looking for the minimal value of H in the region V (see Lemma 2), it is then sufficient, as done in [3,14,15], to take a one-sided interval [f − , ∞[, and the minimal value of H in the interval will then be H(f − ). Considering again our CHSH example, we are interested in a guarantee that the CHSH value is above some threshold, which determines the randomness we can certify in the worst case, but it is useless to know that it is bounded from above (see also discussion at the end of Section 4). Taking the definition eq. (25) forf − , we thus get that the min-entropy of the output string is bounded (roughly speaking 3 , and up to the marginal correlators Ax 1 and Bx 2 when applied to the observed frequenciesp(a 1 a 2 | x 1 x 2 ). Indeed, it can be seen from the definition of a Bell estimator f [p] in eq. (19) that the marginal correlators reduce to a natural definition based on locally available data resulting only from the respective party's interaction with their part of the device, namely, , and similarly for Bx 2 . 2 More accurately, there exist infinitely many Bell expressions that are equivalent to Ip up to terms that vanish for nosignaling behaviors. In order to pick one that tolerates the small signaling fluctuations present in our behavior estimator p, we run the computation of Ip in the 8-dimensional space of correlators, rather than the overspecified 16-dimensional parametrization of quantum behaviors in terms of the probabilities p(a 1 a 2 | x 1 x 2 ). We translate this expression back to a unique standard form (1) using definition (37) for the correlators. This ensures that the solution to the dual program (51) picked by our solver among many equivalent expressions does not contain terms that blow up under small signaling fluctuations. See also [26] for a finer analysis of noise tolerance in equivalent Bell expressions.
3 Equation (41) and the similar approximate bounds that follow should be understood as informal statements giving an order of magnitude for the min-entropy lower bound. Contrary to the statement of Theorem 1, this informal bound directly involves the estimatorf , which is a random variable. As such, it might be subject to improbable but extreme fluctuations, in which case the bound does not correctly characterize the device. In comparison, the min-entropy bound of Theorem 1 is expressed in terms of a fixed threshold. Furthermore, the Theorem also accounts for the unlikely event that a device reaches this threshold only by chance.

− log 2 (1/ ) correction) as
This is precisely the result of [3,14,15], whose interpretation is quite intuitive: the min-entropy after n runs is equal to n times the min-entropy for a single run, evaluated on the observed Bell violationf offset by a statistical parameter µ = γ (2/n) ln(1/ ). This correction accounts for the fact that even if a device has been built such that it produces a target Bell violation, statistical fluctuations may push the observed violation above what is expected. This statistical correction depends on the security parameter and decreases with the number of runs n. It also depends on the prefactor γ defined in eq. (21). This prefactor depends on the choice of Bell expression f , and also importantly on the input distribution π(x).
As discussed in [3,14], the input distribution can be suitably chosen to optimize the ratio R out /R in of the randomness that is produced to the randomness that is consumed when choosing the inputs. The idea is that if at each run one selects with very high probability a given input x = x * , then the resulting distribution π(x) can be sampled from a small number of initial uniform bits R in , which should improve the ratio R out /R in . However, this will also lower R out because observations involving the other inputs x = x * will be less frequent, which will reduce the statistical accuracy. Consider for instance, as in [3,14], the case where the input x * is chosen with probability π(x * ) = 1 − κn −δ for some constants κ and δ, and the other inputs are chosen with probability π(x) = κ n −δ , where κ = κ/(|X | − 1) for normalization. Then the initial randomness R in required to choose the inputs according to this distribution will be of size O(n 1−δ ln n δ ) (i.e., roughly n times the Shannon entropy of the input distribution, see Theorem 2 in [14]). On the other hand, according to eq. (41), the output randomness will be of size Ω(n) as long as the statistical correction, of order γ/ √ n, remains bounded by a constant. Since, according to eq. (21), γ 1/(min x π(x)) we get that the statistical correction is of order γ/ √ n = O(n δ− 1 2 ) and thus that we should take δ ≤ 1 2 . We can thus hope at best a quadratic expansion wherein O(n 1 2 ln n 1 2 ) initial bits are consumed and Ω(n) are produced.
Note that the initial randomness for choosing the inputs only needs to be random with respect to the devices, but can be publicly announced to the adversary without compromising the privacy of the output string [14,27]. One can thus view the above protocols as producing private randomness from public randomness. From this perspective, the "expansion" efficiency of the protocol is less relevant since the final and initial randomness correspond to different resources that do not necessarily have to be compared on the same footing.
We generated random samples of n input-output pairs a, x from the behavior p corresponding to equation (34) with the following input distribution Note that as n grows, the input distribution becomes strongly biased to select x = (1, 0) most of the time.
We performed this sampling independently for different values of n between 100 and 3 × 10 18 . For each value of n, we repeated this sampling 300 times in order to show the variation of our result over several simulations.
The corresponding min-entropy rate bound (that is, (41) divided by n) for = 10 −6 is represented in Figure 2 as a function of the number of runs n for different Bell expressions. The curves in this plot and the ones that follow (Figures 2-5) show the values for the first simulation out of the 300, and the range of values taken over all 300 simulations is drawn as a shaded area behind each curve. In some instances, usually for high values of n, the area is invisible, which indicates a negligible variation across simulation runs. All curves are obtained by solving the program (33) in its dual form (56) (see Appendix) at level 2 of the NPA hierarchy. All optimizations were performed using the Matlab toolboxes Yalmip [28] and SeDuMi [29].
As we can see, the expression I β 1 gives the worst results. The reason for this is that the inequality is suited to the extremal behavior p ext rather than the imperfect behavior we simulated (i.e., in equation (34), the case of perfect visibility v = 1 rather than v = 0.99). On the other hand, the expression I all p is tailored to our illustrative behavior (34); it thus gives asymptotically optimal results for X r = X according to [19,20]. There is, however, no reason for it to be optimal for finite, low values of n. Indeed, we observe that the CHSH expression, while not specially suited to the behavior of our device, yields a better performance for values of n lower than 10 10 . The CHSH expression appears actually to be a good randomness certificate for all values of n, as it only performs slightly worse than I all p asymptotically.

Bounding randomness for a subset of all inputs (X r ⊆ X )
Having reviewed the case t = 1 and X r = X , we proceed to consider the modifications introduced in this work. We consider first the possibility X r ⊂ X . This means that the RB function H is only required to non-trivially bound the output probability for inputs that are in the set X r . This is an important feature because for many Bell expressions the randomness that can be certified depends on the input used. For instance, maximal violation of the tilted-CHSH inequalities may imply that the randomness is maximal for one input pair but near zero for another input pair [21]. Using a function which is simultaneously randomness-bounding for all inputs x ∈ X would then be highly sub-optimal in this case. This aspect is particularly important for photonic implementations of DI protocols: recent photonic Bell tests rely on partially entangled states [30][31][32][33], for which the optimal extraction of randomness requires the use of a specific input. According to our analysis, in the case X r ⊆ X , the bound (41) becomes where H is now a RB function for X r , which will generally yield an improvement over a RB function that is required to be valid for all of X . Our analysis, however, introduces a penalty term of the form ν( x)η, where η ≤ log 2 |A| is bounded by a constant and ν( x) is the number of inputs not in X r that have been observed. 4 To keep this penalty term as low as possible, we should choose inputs in Xr = X \ X r with a low probability. One possibility, compatible with our previous discussion about the introduction of a bias in the input distribution, is to take π(x) = κ n −δ for x ∈ Xr, in which case the expected value of ν( x) would be |Xr|κ n 1−δ . This is negligible asymptotically with respect to the main term of (43), which is Ω(n), provided that δ > 0. The input distribution (42) chosen for our numerical example satisfies this requirement. The corresponding min-entropy rate bound (that is, (43) divided by n) for = 10 −6 and η = log 2 |A| = 2 is represented in Figure 3 as a function of the number of runs n, for different Bell expressions and for two choices of X r . Figure 3 shows that in spite of the penalty term, which quickly vanishes as n grows, the bound on the entropy rate for X r = {(1, 0)} for the expressions I β 1 and I p (the analogue of I all p for this restricted X r ) is significantly better than the values obtained with X r = X . This clearly shows the value of using the restricted randomness-generating input set X r = {(1, 0)}. In particular, by combining this with the use of the optimal expression I p corresponding to behavior (34), one can asymptotically reach the theoretical value H(p) of the min-entropy (represented by the dashed line in Figure 3). Furthermore, whereas using the I β 1 expression with X r = X yields worse values than using the CHSH expression, taking X r = {(1, 0)} yields an asymptotic entropy rate for the I β 1 expression which is higher than using the CHSH expression. As mentioned at the beginning of this subsection, this is because I β 1 is not adapted to bound randomness independently of the input (i.e., X r = X ) [21].
Similarly to the difference between the curves corresponding to I all p and I β 1 for X r = X , the difference bound on the randomness independently of the choice of Xr, and in particular when Xr = X . Thus the choice of Xr will not impact the main term of (43) but only the penalty term, which will vanish if |Xr| = 0. In such situations, it is therefore preferable to take Xr = X , as in [3,14].
between the asymptotic entropy rates reached by using the expressions I p and I β 1 for X r = {(1, 0)} is caused by the imperfect visibility parameter v = 0.99 in the simulated behavior (34).
Making the right choice of Bell expression and input subset X r depends not only on the device, but also on the value of n. Indeed, while I p is an optimal expression for certifying randomness in this specific device with respect to the input subset X r = {(1, 0)}, this is only the case asymptotically. For small n, Figure 3 suggests that the CHSH expression has a better resistance to statistical fluctuations than the other expressions we considered, regardless of X r .
Note that we did not attempt to optimize the choice of input distribution and it is possible that a different choice of π(x) would lead to better bounds in Figure 3 for the two curves with X r = {(1, 0)}.

Bounding randomness from several Bell expressions (t ≥ 1)
As we have seen, the right choice of a single Bell expression in the analysis of randomness is not straightforward, except for large values of n where I p becomes optimal. In this regime, it would seem perfectly admissible to perform tests on the device before running the actual randomness generation protocol, in order to estimate p and use this information to find an "optimal" Bell expression I p as described above, which can afterwards be used in the randomness generation protocol proper. However, there are disadvantages to this method.
Firstly, to find an expression I p that performs comparably to the optimal I p for the device behavior p, we must know p to a sufficiently high accuracy. In a black box scenario where imperfections cannot be ruled out, this means that a significant number of measurements must be performed in order to evaluate the behavior to great precision. Since the Bell expression needs to be fixed in advance of the protocol, those evaluation rounds cannot be taken from measurement rounds of the protocol and must instead be thrown away. In addition, the behavior of the devices may vary in time, unlike our i.i.d. choice (34), due to drifts in the experimental set-up for example. In that case, one would need to periodically estimate p and rederive the corresponding optimal Bell expression on some subset of the measurement data that needs to be thrown away. Finding an expression I p also requires methods of inference of the behavior of the device from a finite sample: indeed, the estimated behavior (18) cannot be used directly to find a candidate Ip, asp almost always violates the no-signaling conditions. There exist different approaches to this inference (see for instance [20,[34][35][36][37]), so a nontrivial choice must be made.
Finally, even ignoring the problem of estimating the unknown behavior p, the associated data loss, or the drift of p over time, we saw in the previous section and in Figure 3 that the choice of a Bell expression is not straightforward when considering different values of n. For example, the asymptotically optimal expression I p as formulated in [19,20] is generally not the best for low values of n. There is thus no general method to guide the choice of a Bell expression for a given n.
In order to avoid the above problems associated to the use of a single Bell expression, 5 we now turn to the second element introduced in this work: the possibility to estimate the randomness from t > 1 Bell expressions, and in particular from the full set of observed frequencies of occurrencep = {p(a | x)} as defined in eq. (18).
When we have more than one Bell expression, the bound (43) generalizes to where the one-dimensional interval [f − , ∞] has simply been replaced with the multidimensional region [f − ,f + ]. As before the limits of the region depend on the security parameter , the constants γ α , and they become smaller with the number of runs n (see equations (25) and (26)). Increasing the number of Bell expressions can have both beneficial and detrimental consequences. We can reach an understanding of this by considering the optimization problem (33) that defines . This problem essentially evaluates the randomness of a certain quantum behavior p such thatf − ≤ f [p] ≤f + . Each vector component of this constraint defines two affine constraints,f − α ≤ f α [p] ≤f + α , restricting the set of values p can take in the optimization. From a geometrical point of view, for each α = 1, . . . , t, this defines two parallel hyperplanes in the space of behaviors, delimiting a region between them which we call a slab. The full constraintf − ≤ f [p] ≤f + defines a polytope in the space of behaviors which is the intersection of the t slabs.
The optimization (33) identifies the worst-case bound on randomness for quantum behaviors inside this constraint polytope. We would therefore like to restrict this region as much as possible given a value of the confidence parameter . We thus see that adding Bell expressions is generally beneficial, as it cuts the constraint polytope into a smaller volume. However, as stated in Lemma 5, the confidence parameter in the protocol is shared between all 2t parameters ± α , as t α=1 ( + α + − α ) = . A consequence of this is that the more Bell expressions we have, the smaller ± α are on average. Since smaller values of ± α give thicker slabs (see equation (25)), if we are distributing evenly across all ± α for instance, this amounts to a dilation of the constraint polytope in optimization (33). Nevertheless, since the width of the slab depends on ± α only through a factor ln(1/ ± α ), we will typically find that this negative effect is outweighed by the positives of adding more Bell expressions, and the randomness bound is globally improved.
This improvement is illustrated in Figure 4, where we reconsider the numerical example presented in Section 7.1. We now use 8 Bell expressions that are equivalent (for quantum behaviors) to the specification of the 16 probabilities p(a 1 a 2 | x 1 x 2 ) (we will explain why we use this choice of 8 Bell expressions later).
In the case X r = {(1, 0)}, we see that the randomness that we can extract with multiple Bell expressions is similar to the use of the optimal expression I p alone for about n ≥ 10 10 runs, but much better for smaller numbers of runs. In fact, the improvement is even better in practice, even in regions where the use of 8 Bell expressions gives the same rate as I p , because in plotting the rate for I p we knew the exact behavior p from which the measurement runs are sampled. In a real experiment (in particular with drifts over time), we would instead need to infer the behavior p at regular intervals from auxiliary measurements and throw away the corresponding data. In contrast, the method based on the full set of observed frequencies achieves the same or better randomness extraction without throwing away any data.
In the case X r = X , the use of 8 expressions is comparable to that of CHSH and, as with I p above, it outperforms the I all p expression (not shown in Figure 4; see Figure 2 or 3). For small values of n, the CHSH expression keeps a small advantage. This can be understood from the fact that the CHSH expression itself is part of the set of 8 expressions that we used in Figure 4. The difference between the two curves therefore results from a trade-off between a better estimation of randomnes from more expressions and the negative effect of wider margins ± α in the confidence region. In the remainder of this section, we discuss in more detail how to choose a good set of Bell expressions for the protocol. For this, let us start by considering our protocol when n → ∞. In this asymptotic limit, the interval [f − ,f + ] narrows down towards the pointf = f [p(a | x)], which is just the value of the t Bell expressions f computed on the experimentally observed frequenciesp(a | x). If the bias towards inputs in X r is appropriately chosen (as discussed previously), then the relative contribution of the penalty term vanishes as n → ∞ and the bound (44) becomes in the asymptotic limit, up to sublinear terms, Furthermore, in the case where the device behaves in an i.i.d. way according to a behavior p, then, asymptotically,f → f [p]. If one chooses enough Bell expressions as to fully characterize the behavior of the devices (for instance, by using an estimator for each probability p(a | x)),f thus becomes equivalent to the knowledge of p and the above bound converges to the maximal min-entropy bound one can obtain from p given X r , as characterized in [19,20]. In this sense, and as seen in Figure 4, our protocol is asymptotically optimal. Note that there are different sets of Bell estimators that are asymptotically equivalent to the knowledge of the full set of probabilities p(a | x). For instance in a bipartite Bell experiment with two inputs and two outputs there are 16 probabilities p(a | x) = p(a 1 a 2 | x 1 x 2 ) with a 1 , a 2 , x 1 , x 2 ∈ {0, 1} and thus 16 associated Bell expressions e 1 , . . . , e 16 defined by e α [p] = p(a 1 a 2 | x 1 x 2 ), with one value of α for each of the possible values of (a 1 , a 2 , x 1 , x 2 ). But since the probabilities p(a | x) satisfy normalization and nosignaling, they are uniquely specified by the 8 correlators of eq. (37), which constitute 8 Bell expressions g 1 , . . . , g 8 , where g 1 and g 2 are the first party's two marginal correlators A x1 , g 3 and g 4 are the second party's B x2 , and g 4 , . . . , g 8 are the four bipartite correlators A x1 B x2 .
Alternatively, the probabilities are also equivalent to the 8 expressions h 1 , . . . , h 8 with h α = g α for α = 1, . . . , 4, and h α for α = 5, . . . , 8 are four linearly independent permutations of the CHSH expression, generalizing (38): As we increase the number of rounds, all these possible choices become equivalent, since the intervals [ê − ,ê + ], [ĝ − ,ĝ + ], [ĥ − ,ĥ + ] define constraint polytopes in the space of behaviors p that asymptotically intersect the quantum set at the same unique point. However, the choice of one set of estimators over another could make a difference for finite n.
Generally speaking, when choosing which Bell expressions to use for a fixed number t, we may prefer that as many of them as possible be linearly independent. Consider t − 1 Bell expressions and their associated slabs, which define a constraint polytope. In the absence of any meaningful information concerning the behavior of the objective function of (33) within its feasible set, the choice of a t-th Bell expression should be dictated by the resulting reduction of the constraint polytope: cutting a large volume out is more likely to reduce the maximum of (33). As n grows large and the slabs grow thinner, the best way to reduce this volume is to choose a Bell expression that is linearly independent from the t − 1 previous ones, if possible. We can easily understand this in the asymptotic limit: as we mentioned above, the optimization converges to H(f ), and with enough linearly independent Bell expressions, In addition, we see that there is no need for Bell expressions that are purely signaling, i.e., that have a constant value for all no-signaling behaviors p. Indeed, since the feasible region of (33) is defined by the intersection of the slabs and the quantum set, constraints deriving from purely signaling expressions are trivial in this region, and therefore do not contribute to improve the randomness bound.
Combining these two conclusions also indicates that we should avoid Bell expressions that are only linearly dependent up to purely signaling terms. This implies for instance that the sets g 1 , . . . , g 8 or h 1 , . . . , h 8 should be preferred over e 1 , . . . , e 16 . This is indeed what we find, as illustrated in Figure 5. Note that the sets g 1 , . . . , g 8 and h 1 , . . . , h 8 only differ by a linear transformation, but the second set yields better results for the same (finite) number of rounds n. With respect to optimization (33), this means that the feasible set for h 1 , . . . , h 8 excluded the optimum obtained for g 1 , . . . , g 8 . This might be related to the fact that in this scenario of two parties with two inputs and two outputs, the four versions of the CHSH inequalities constitute the facets that separate local from nonlocal behaviors, and they might therefore serve as better measures of nonlocality and randomness than the correlators A x1 B x2 .
This phenomenon can be visualized in a simpler instance where two Bell expressions are used, namely, I 0,0 chsh and I 0,1 chsh . In Figure 6, we represent the RB function G(I 0,0 chsh [p], I 0,1 chsh [p]) with respect to a single input X r = {(0, 0)}, as defined in eq. (31). The figure shows the evolution of the randomness bound, from a trivial value of 1 for values of I 0,0 chsh [p] and I 0,1 chsh [p] compatible with a local hidden variable model, to nontrivial values up to approximately 0.32 reached at extremal points. The variation of the RB function along these two axes is not trivial, but as we can see, the gradient mostly points along the directions of the CHSH axes, with the exception of the regions where the local and quantum boundaries meet. To minimize this variation within a confidence region of fixed square shape in this plane, it is best to rotate the interval so that its sides are aligned with the gradient. Aligning the confidence region with the two CHSH axes is therefore a sensible choice in this case.
As was shown in [19,20], in an idealized setting in which the behavior of the devices is known and fixed, more randomness can in principle be certified if one takes into account the violation of several Bell inequalities, or, even better, the full set of probabilities characterizing the devices' behavior.
We have shown here that a similar reasoning applies in the context of an actual DIRNG protocol, where randomness is directly certified from experimental data. Specifically, we have combined the analysis of [19,20] with the protocol introduced in [3,14,15], which generates certified randomness against an adversary with classical side information. We have in this way obtained a family of DIRNG protocols which rely on the estimation of a choice of t ≥ 1 Bell expressions. This includes the special case where the randomness is directly certified from the knowledge of the relative frequencies of occurrence of the  (46). The set of local behaviors projected onto this plane defines a square region, represented with a dotted line. The quantum set Q projects into the circular region (I 0,0 chsh [p]) 2 + (I 0,1 chsh [p]) 2 ≤ 8 [38]. Note that while this RB function is centrally symmetric, it is not symmetric under reflection through either CHSH axis. The function was computed by solving the optimization program (31) at level 2 of the NPA hierarchy. outputs given the inputs. Asymptotically, for a given X r , this results in an optimal generation of randomness from experimental data (as measured by the min-entropy) without having to assume beforehand that the devices violate a specific Bell inequality and without the need to infer the device behavior from preliminary measurements. Furthermore, in the non-asymptotic case, the choice of an optimal Bell expression is ambiguous even if the device behavior is perfectly characterized. Our method proposes a way of bypassing this problem by directly evaluating the randomness from the observed output frequencies.
Our protocol also provides a way of treating the case where the randomness of the outcomes of the devices is much higher for some inputs than for others. This happens in particular when generating randomness from partially entangled states [21], which are used in present photonic loophole-free Bell experiments [32,33]. Our analysis essentially amounts to consider that all the randomness has been generated from the optimal set of inputs, but corrected by a penalty term that is proportional to the number of events corresponding to non-optimal inputs. By biasing the choices of inputs towards the optimal ones, one can make this penalty term negligible asymptotically. However, for small numbers of measurement runs, we have seen that this procedure may be less efficient than an analysis based on a Bell expression that treats all inputs on the same footing, like the CHSH expression. It is possible that our way of treating non-optimal inputs could be improved, leading to more efficient protocols for small numbers of measurement runs.
Our result could be generalized in several ways. First, how to prove security against quantum side information when several Bell expressions or the full set of data generated in the experiment are taken into account? This is not a priori easy to answer since the analysis of most DIRNG protocols secure against quantum side information rely on Bell expressions with a particular structure [6,8,11] or, when they allow for arbitrary Bell expressions, do not optimally take into account the observed level of violation [17]. Second, we based the statistical analysis on the Azuma-Hoeffding inequality, but alternative deviation theorems [39] could be adapted to our setting. Our attempt at improving our bounds using a tighter concentration inequality from Hoeffding [40] (called McDiarmid's inequality in [39] after [41]) produced no visible difference in the plots. Another alternative, Bentkus' inequality [39,42], involves discrete summations with around n terms, which would grow too large in our simulations to be used as-is. Finally, we note that since DIRNG is not the only task where information from several Bell estimators can be exploited, a similar approach could be developed for other DI problems, such as DI quantum key distribution.
Let us determine the dual of the optimization problem (32). From now on, we take genericallyQ to denote the cone of unnormalized quantum behaviors or any of its NPA relaxations, as the analysis is identical in both cases.
Let us first rewrite (32) in a form similar to (47). For this, define c ax ∈ R |A|×|X | as the vector with components c ax (a , x ) = δ aa δ xx . Thus c ax , p = p(a | x). Let b = (1, f 1 [p], . . . , f t [p]) ∈ R 1+t . Let u be any Bell expression such that u[p] = Tr[p] for all no-signaling behaviors p, for instance u(a, x) = δ x,x0 for some input x 0 ∈ X . Let A ax be matrices in R 1+t,|A|×|X | with components [A ax ] 1,a x = u(a , x ) The dual is then readily given as whereQ * is the dual cone ofQ, that is, the setQ * = {c ∈ R |A|×|X | : c,p ≥ 0, ∀p ∈Q}. Note that this dual cone can be identified with the set of Tsirelson inequalities for normalized behaviors, that is, the set Q * = {(d 0 , d) ∈ R 1+|A|×|X | : d, p ≤ d 0 , ∀p ∈ Q}. Indeed, note that by normalization u, p = 1 and thus an inequality d, p ≤ d 0 valid for p ∈ Q can always be rewritten in the form (d 0 u − d), p ≥ 0, hence in the form c, p ≥ 0 for some suitable c. But an inequality c, p ≥ 0 is clearly valid for Q if and only if it is valid forQ. Using the explicit form of b, A ax , and c ax and the above interpretation ofQ * , we can rewrite the dual (50) as subject to p (a | x) ≤ y 0 + t α=1 y α f α [p ] ∀a ∈ A, ∀x ∈ X r , ∀p ∈ Q.

G(f [p]) = min
(51) Let us now determine the dual of the noisy problem (33), which we can rewrite as Although (56) is the best way to express the optimization for a numerical implementation, we may further simplify its expression by noting that, for a fixed value of y 0 and {y α }, the objective function is minimized when y + α = y α if y α ≥ 0 and y − α = −y α if y α ≤ 0 (in short, y ± α = (|y α | ± y α )/2). We can then write the objective function as y 0 + α:yα≥0 and further as the following maximum: Indeed, the maximum will clearly be attained at one of the extreme points of the region [f − ,f + ], whose components are of the form f ± α . If y α ≥ 0, the maximum will be attained when f α is equal tof + α , if y α ≤ 0, when it is equal tof − α . All in all, (56) can thus be rewritten as the nested optimization min (y0,y)∈R 1+t In (56), we are thus solving a problem completely analogous to (51) except that the objective function now yields a bound on G(f ) that holds on the entire region [f − ,f + ]. Since we minimize the objective function, we are searching for the best possible such bound. Now, since the extreme points of [f − ,f + ] are not necessarily quantum (i.e., do not necessarily belong to f (Q)), one could expect this upper bound to be strictly higher than the optimal quantum bound in [f − ,f + ], contrarily to the primal formulation (33). But this is not the case, as follows from the duality property of conic programs. As we mentioned, the strong duality theorem for conic programs [46] ensures that the value of the dual (59) matches that of the primal (33), as long as the primal has a strictly feasible point. That is, a set of subnormalized probability vectors {p a,x } should exist such that all the equality constraints of the optimization are satisfied, and which lie in the interior of their respective cones, i.e., p a,x ∈ int(Q) for all a ∈ A, x ∈ X r . We argue that when the primal is not infeasible, this is almost always the case.
Consider the set f [int(Q)], that is, the set of Bell value vectors that are compatible with a nonextremal quantum behavior. If the intersection of this set with the confidence region [f − ,f + ] is non-null, we have strict feasibility of (33). Indeed, let p ∈ int(Q) be a behavior such that f [p] ∈ [f − ,f + ]. Then, any decomposition of p as a sum of points in int(Q), for instancep a,x = p/(|A| × |X r |), gives a strictly feasible point for (33).
On the other hand, if the closure f [Q] has no intersection with [f − ,f + ], then the primal (33) is infeasible and the dual (59) diverges to −∞, as a rather straightforward application of the hyperplane separation theorem shows.
The last possibility is that f [int(Q)] has a point of tangency with [f − ,f + ] or, in terms of separating hyperplanes, that there exist {y 0 , y α } such that α y α f α > y 0 for all f ∈ f [int(Q)], and α y α f α ≤ y 0 for all f ∈ [f − ,f + ]. In this case, there might only exist non-strictly feasible points for the primal, while the dual does not diverge. Without strong duality, there is no guarantee that the two resulting values are the same. However, this case is irrelevant to us, as the chances that the confidence region [f − ,f + ] around our estimated frequencies is tangent to f [int(Q)] are essentially zero.
Thus, in practice, the primal is either infeasible or it is strictly feasible. Hence, when the dual (59) converges, we can safely conclude strong duality, and the optimum of the dual is indeed not worsened by allowing the dual to select f /