Regularising data for practical randomness generation

Non-local correlations that obey the no-signalling principle contain intrinsic randomness. In particular, for a specific Bell experiment, one can derive relations between the amount of randomness produced, as quantified by the min-entropy of the output data, and its associated violation of a Bell inequality. In practice, due to finite sampling, certifying randomness requires the development of statistical tools to lower-bound the min-entropy of the data as a function of the estimated Bell violation. The quality of such bounds relies on the choice of certificate, i.e., the Bell inequality whose violation is estimated. In this work, we propose a method for choosing efficiently such a certificate. It requires sacrificing a part of the output data in order to estimate the underlying correlations. Regularising this estimate then allows one to find a Bell inequality that is well suited for certifying practical randomness from these specific correlations. We then study the effects of various parameters on the obtained min-entropy bound and explain how to tune them in a favourable way. Lastly, we carry out several numerical simulations of a Bell experiment to show the efficiency of our method: we nearly always obtain higher min-entropy rates than when we use a pre-established Bell inequality, namely the Clauser-Horne-Shimony-Holt inequality.


I. Introduction
Being able to produce bits that are impossible to predict is crucial for a number of cryptographic tasks. In order to characterise the unpredictability of the outcomes of a given experiment, one usually models an adversary who has access to some information on the devices used in the experiment. Bounds on how well the adversary can predict the output bits, conditioned on the information this adversary was given, can then be derived. If the devices in use behave classically, and if the adversary is given total information about them, no unpredictable bits can be obtained, as classical physics is deterministic. By contrast, if the devices are quantum, their outputs can be impossible to predict, even when the adversary has access to a perfect characterisation of the devices.
In practice, a perfect control of quantum devices is rarely possible. This means that, in most cases, even the users do not have access to a perfect characterisation of the devices. Fortunately, the unpredictability of a sequence of bits can be certified even when the devices producing them cannot be completely characterised, thanks to the device-independent approach to quantum information protocols [1][2][3][4][5]. In this case, the minimal requirement is two separated devices that each receives an input -measurement choice -and produces an output -measurement result -without communicating. This is usually called a Bell experiment. The key idea is as follows: if the input-outputs correlations are 'Bell non-local' [6] (hereafter abbreviated 'non-local'), the outputs cannot be deterministic, irrespective of the extent * boris.bourdoncle@icfo.eu † pslin@phys.ncku.edu.tw to which the devices can be, or have been characterised. That is to say, they contain some intrinsic randomness.
Quantifying the unpredictability of the bits obtained in a Bell experiment is not a trivial task, as it depends on a number of factors, including how powerful the adversary is assumed to be [7], how the devices are assumed to behave with time [8,9] or how the users process the accessible information [10,11].
In this work, we adopt the most common approach to estimating the unpredictability of a Bell experiment: a user enters a bit in each of two shielded devices, which in return give output bits, according to some conditional probability distribution. These bits can be used to compute the violation of a Bell inequality -a constraint necessarily satisfied by physical devices that function in a local deterministic manner. Given this violation, an eavesdropper designs an optimal strategy for guessing the output bits. Here, we restrict our attention to the case where the adversary only has access to classical side information [11][12][13] (for the case of an adversary with quantum side information, we refer the readers to [14][15][16]). That is the appropriate level of security since device-independent randomness generation involves only one user in one location. The only thing the adversary may exploit in this case is the imperfection of the device such as noise or deterioration with time. We refer the readers to [17] for a detailed explanation. We then quantify the randomness of the sequence of output bits by its min-entropy.
The upside of this approach is its simplicity, as it depends on only one parameter: the violation of a Bell inequality. However, in a real Bell experiment, this number cannot be exactly known, as the number of runs is finite. One can only compute an estimate of the average Bell violation. To overcome this obstacle, statistical tools were developed that allow one to upper-bound the predictability of the outputs with arbitrary confidence, based only on an estimate of the Bell violation, rather than its theoretical value [4,11,12,17]. See also [13] for other type of statistical tool.
Another question naturally arises in this approach: which Bell inequality should one use to obtain good bounds? A Bell inequality violation contains only partial information about the input-output correlation. Choosing the inequality poorly can result in a serious underestimation of the unpredictability of a Bell experiment, and may not even certify any unpredictability, as every non-local correlations satisfy some Bell inequalities. Yet, if the input-output distribution is known, finding the Bell inequality that certifies as much randomness as possible turns out to be a semi-definite program [10,18]. Unfortunately, as mentioned above, the input-output distribution is not accessible in practice, due to finite statistics.
We thus propose a method to circumvent this problem. It consists in using part of the input-output statistics to estimate the corresponding underlying distribution. It is however very likely that a naive estimate based on the relative frequencies will not correspond to a distribution achievable with quantum physics. Consequently, the above-mentioned semi-definite program is not directly applicable as it can only be solved for distribution that belongs to the quantum set, or to some specific relaxation of this set, defined by the Navascués-Pironio-Acín (NPA) hierarchy [19,20]. We thus employ the methods developed in [21] in order to obtain a distribution approximating the underlying distribution that lies inside one the NPA sets. This then enables us to solve the corresponding semi-definite program and hence obtain a Bell inequality specifically suited for the estimated distribution, and hence better tailored for the underlying distribution.
The rest of this article is organised as follows. In Section II, we remind some known results about how to lower bound the min-entropy associated to a practical Bell experiment. In Section III, we present our results. It consists of a method to optimise the choice of Bell inequality in order to improve the bound on the min-entropy. We then study the effects of various parameters of our method on a few behaviours picked at random in order to tune them favourably. Finally, we demonstrate the efficiency of this method by presenting our numerical results obtained by running various numerical simulations of Bell experiments. We conclude with some open questions and possible future works in Section IV.

II. Lower bound on the min-entropy
We now remind the framework commonly used to quantify the randomness generated in a theoretical Bell test, and the mathematical tools developed to lower bound the randomness generated in a real Bell experiment. Here, by a theoretical Bell test, we mean the ideal, asymptotic situation where the underlying distri-bution is attained. By contrast, in a real experiment, the data available is subjected to statistical fluctuations.

A. Preliminaries
We define a Bell test in the following way: a user has access to two devices A and B. The internal working of those devices is unknown: they are treated as black boxes. The only possible interaction with those black boxes is as follows: upon receiving an input x ∈ {0, 1} (resp. y ∈ {0, 1}), A (resp. B) produces an output a ∈ {0, 1} (resp. b ∈ {0, 1}). We associate the random variables A, B, X and Y to a, b, x and y respectively, and P AB|XY denotes the conditional probabilities of the outputs given the inputs, which we call hereafter a behaviour (the subscript indicating the random variables is sometimes omitted when they are clear from the context). We assume that this behaviour obeys quantum mechanics, i.e., P AB|XY (ab|xy) = Tr[ρ M A a|x ⊗ M B b|y ], where ρ is a quantum state and {M A a|x } a and {M B b|y } b are positiveoperator valued measures. This implies in particular that input x (resp. y) has no influence on output b (resp. a), i.e., P AB|XY is no-signalling [22,23]. We denote by Q the set of all quantum behaviours.
When a Bell test is repeated n times, we write x = (x 1 , ..., x n ) for the sequence of inputs of A. We define y, a, b, as well as their associated random variables A, B, X, Y, in the same way. We now briefly remind some of the key concepts that we will use later on.
Min-entropy -We quantify the randomness of the outputs produced in a Bell test via the min-entropy. The min-entropy of (A, B) given (X, Y) conditioned on some event λ, according to a distribution P = P ABXY , is: Essentially, the min-entropy quantifies the number of almost-uniform random bits that can be obtained from a source via a randomness extractor. The event λ is typically a function of the specific inputs that were chosen and the specific outputs that were obtained during the Bell experiment, such as a statistical estimate. For a detailed review on the relevance of this quantity, see [24].
We now introduce all the elements that allow us to lower-bound this quantity.
Bell expression -We call a real linear functional in P AB|XY a Bell expression: I(P AB|XY ) = a,b,x,y c abxy P AB|XY (ab|xy). ( For a given Bell expression, its maximal value over all local deterministic strategies, i.e., all behaviours with a (resp. b) being a deterministic function of x (resp. y), gives rise to the local bound I L . A behaviour is said to be local if it can be written as a convex mixture of deterministic strategies, and the corresponding set is denoted L. The inequality: is referred to as a Bell inequality [25]. However, in quantum physics, non-local behaviours are accessible. For a given P AB|XY , it might then happen that this local bound is violated. In this case, we call Bell violation the value that the Bell expression takes, and we denote by I + Q the maximal value allowed in quantum theory, and by I − Q , respectively, the minimal value: Observed frequencies -For a given realisation of (A, B, X, Y), we define the observed frequencies as: where N abxy (resp. N xy ) is the number of occurrences of the quadruplet (a, b, x, y) (resp. the pair (x, y)) in the n length sequence (a, b, x, y). Observed Bell violation -For simplicity, we assume that the inputs (x, y) are chosen independently and identically at each round with probability P (X i = x, Y i = y) = π xy . For a given Bell expression, as defined in equation (2), and a given realisation of (A, B, X, Y), we define the observed average Bell violation as: We point out that, even thoughP andÎ are both estimators, they do not involve the inputs in the same manner. To computeP , one counts the occurrences of both the quadruplets (a, b, x, y) and the input pairs (x, y), whereas forÎ, one only counts the quadruplets (a, b, x, y) and uses directly the input distributions π xy , instead of the frequencies of each input pair for a given realisation. Both can be computed from a realisation of Bell experiments, as π xy is chosen by the user (see details hereafter). However, we decide to compute the observed frequencies in this way to ensure thatP AB|X=x,Y =y is normalised for each (x, y), and can thus be identified as a probability distribution. On the other hand, we decide to compute the observed Bell violationÎ directly using the input distribution, as this is crucial for the derivation of Theorem 1 (see [11,17] for details). Note that, if the behaviours of the devices at each round are independent and identically distributed (i.i.d.) according to some distribution P AB|XY ,Î converges towards I(P AB|XY ) when n tends to infinity. However, we do not need to make such an assumption to define this quantity.
Distance between distributions -We say that two distributions P ABXY andP ABXY are ǫ-close if their total variation distance is upper bounded by ǫ: Randomness-bounding function -For a given Bell expression I, let I(Q) = {I(P )|P ∈ Q}. Let χ be a subset of {0, 1} 2 . We say that H χ I : I(Q) → [0, 2] is a randomness-bounding function (RB function) for χ if the two following requirements are satisfied: These requirements are needed in order to bound the min-entropy produced by a sequence of Bell tests (see [11] for a detailed explanation). χ specifies a subset of all possible inputs for which the RB function is valid. It should contain the inputs for which the associated conditional distributions are the most random, i.e., the inputs that yield the largest H χ I . For instance, if one obtains a high H χ I from one pair of input (x * , y * ), and a small H χ I for the others, one would have an interest in setting χ to (x * , y * ) only. Indeed, the space over which the minimisation is carried out gets bigger when one includes more input pairs in χ, which results in a smaller RB function, which, in turn, will give a smaller lower bound on the min-entropy. The reason for that will become clear in the next section. However, this trade-off depends on the total number of Bell tests that are used for generating randomness, as is illustrated by the numerical simulations presented hereafter.

B. The guessing probability
The main ingredient needed to lower-bound the minentropy is the RB function. We now explain how to compute it via the guessing probability problem. This general form was introduced and extensively explained in [11]. Here, we only briefly present the reasoning that leads to this formulation. For a given Bell expression I and a specific value I * of I, finding the lower bound H χ I (I * ) defined by requirements R.1 and R.2 amounts to solving a minimisation problem over all quantum behaviours P such that I(P ) = I * . However, the optimisation problem obtained in this way is not easily solvable, due to the presence of the logarithm and to the complicated nature of the quantum set Q [19,20,26].
This led the authors of [11] to consider instead the following problem. For (α, β) ∈ {0, 1} 2 and (γ, δ) ∈ χ ⊆ {0, 1} 2 , let {P αβγδ } be 4 × |χ| variables, where |χ| is the cardinality of the set χ, that represent unnormalised behaviours. The problem then reads: where Tr[P ] = abP (ab|xy) is the norm ofP (which is independent of (x, y) by no-signalling) andQ k is the set of unnormalised behaviours that belong to the k th level of the NPA hierarchy [19,20]. This problem is then a semidefinite program (SDP), and, as such, can be efficiently solved. Moreover, if we let H χ I = − log 2 G χ I , H χ I satisfies both requirements R.1 and R.2, and is thus a RB function for χ (see [11] for details). It is, however, not necessarily tight, in particular because the NPA hierarchy is merely a relaxation of Q.
In the case where χ contains only one input pair, the guessing probability problem has a simple interpretation: it is the maximal guessing probability, over all quantum strategies, of an adversary who is bound to keep the Bell violation I * unchanged. This problem was introduced in [10,18], along with another optimisation problem that we now remind. The idea is the following: if we consider the guessing probability as a theoretical measure of the randomness of a behaviour P AB|XY , constraining this behaviour to only a Bell violation, that is, constraining only a linear functional of P AB|XY to a fixed value, amounts to discarding some information about the behaviour. It might thus result in an underestimation of the intrinsic randomness contained in P AB|XY . On the contrary, the following problem takes into account the complete information about the behaviour to evaluate its randomness: As problem (9) is more constrained than problem (8), it is clear that G χ f ull (P ) ≤ G χ I (I(P )). One could compare these two problems in the following way: G χ f ull (P ) is a measure of the randomness of a behaviour P , whereas G χ I (I * ) is measure of the randomness that can be certified by a Bell expression I. Yet these two formulations are connected: the dual problem of (9) precisely returns a Bell expression I * such that G I * (I * (P )) = G f ull (P ) [10,18]. When the Bell expression is well chosen, (8) and (9) are thus equivalent.
Let us stress however that these quantities can only be considered as theoretical measures of randomness for theoretical objects such as probability distributions and Bell expressions. In order to obtain practical bounds, one has to develop statistical tools.
C. Bounding the n round min-entropy With the concepts defined above, we are now able to formulate a probabilistic statement on the min-entropy of the outputs obtained after a sequence of n Bell tests. Most of this section is a reformulation, adapted to our case, of the results first presented in [4], corrected in [12,17], and extended in [11]. Let us fix a behaviour P AB|XY , an i.i.d. input distribution π xy , and a Bell expression I. Then the formal statement reads: Let λ m be the event that the estimated Bell vi-olationÎ falls between the thresholds J m and J m+1 , and let PP (λ m ) be the probability that this event occurs according to some distributionP ABXY . Let ǫ and ǫ ′ be two positive parameters. Then the true distribution P ABXY is ǫ-close to a distributionP ABXY such that exactly one of these two statements holds: and 1 χ (x j ) is the indicator function, which returns 1 if x j ∈ χ and vanishes otherwise.
The proof can be found in [11,17]. Note that, unlike [11], we take into account only one Bell expression in the statement of the theorem. This leads to numerous simplifications in its formulation, due in particular to the monotonicity of H χ I over [I L , I + Q ]. In this sense, it is closer to the way it is stated in [17]. However, from [11], we keep a few improvements on the parameters, and the possibility to select only a subset of inputs via χ. This enables improvement on the bound in some cases where the inputs have very different output probabilities: if the RB function is significantly better for a subset of inputs χ, this formulation allows to use the RB function for χ only, and corrects the bound via the penalty term γ(x)η. In that case, we have an interest in biasing the input distribution towards χ, in order to reduce the effect of the term γ(x)η and thus produce as much randomness as possible. However, the trade-off between the quality of the RB function and the number of inputs from which randomness is generated depends on the total number of runs of a given protocol.
The bound given in the second statement of the theorem is the figure of merit that we aim at optimising in this work. Indeed, this expression depends on the choice of the Bell expression I, and we now present a systematic approach to finding a well suited I.

III. Results
We first present our new method for lower-bounding the min-entropy of the outputs of an uncharacterised Bell experiment. We then study, on a few behaviours, how the regularisation method, the size of sacrificed data, and the input distributions impact the quality of the min-entropy bound. We conclude by giving numerical results that illustrate the efficiency of our method.

A. Optimising the Bell expression via regularisation
As previously mentioned, solving the dual problem of (9) provides the Bell expression that is optimal for certifying the randomness of the given behaviour. When given an uncharacterised pair of devices, one could thus first generate some input-output data in order to estimate the corresponding underlying behaviour. This estimatê P can then be used to obtain a Bell inequality that is presumably better for witnessing the randomness generated from these devices, by computing the dual solution to the guessing probability problem. Unfortunately, as mentioned above, the guessing probability problem is only properly defined over the set of quantum behaviour Q, or one of its NPA relaxation sets Q k , or over the set of no-signalling behaviours. On the other hand, there is no guarantee that the behaviour built from the observed frequenciesP belong to any of these sets:P is on the contrary almost always signalling, even if the underlying behaviour is not, due to finite statistics. In this case, problem (9) will be infeasible.
We now introduce our method to circumvent this problem, using the tools developed in [21]. The authors provide a set of tools to regularise the estimated behaviour P to one of the NPA sets Q k . It consists in minimising a norm-based metric or a statistical distance betweenP and Q k , the desired relaxation set, and taking the unique minimiser as the regularised behaviour P reg AB|XY . In this work, we employ two methods considered therein. The first one corresponds to minimising a statistical distance, namely the conditional Kullback-Leibler (KL) divergence [27,28], and is defined in the following way: where D KL (P ||P ) = a,b,x,y N xy nP (a, b|x, y) log 2 P (a, b|x, y) P (a, b|x, y) . and where ML stands for 'maximal likelihood'. The second one corresponds to minimising the twonorm distance: where 'LS' stands for 'least-squares'. It is important to note that both these minimisations can be efficiently solved (see [21] for details), thus making this approach operationally relevant.
We can now define the following regularisation-based protocol for generating randomness from uncharacterised devices: (i) Input a number N est of (x, y) drawn from an i.i.d.
uniform distribution (they can be public) and obtain the corresponding (a, b) in order to estimate the behaviour (ii) From this set of data, construct the observed fre-quenciesP and compute P reg AB|XY , the regularisation ofP (where P reg AB|XY can be either P ML (P ) or P LS (P )) (iii) Solve the corresponding optimisation problem G χ f ull (P reg AB|XY ) for different χ and select χ accordingly (see below for further details) (iv) Extract the optimal Bell expression I from the dual (v) Input a number N raw of (x, y), drawn according to a distribution P χ XY (they can be public), obtain the corresponding (a, b), and compute the observed Bell violationÎ (vi) Apply Theorem 1 to lower-bound the min-entropy of the raw set of data (a i , b i , x i , y i ) i∈{1,Nraw} We now make a few observations on this protocol, which is summarised in Figure 1. χ is chosen at step (iii), thanks to P reg AB|XY . Indeed, P reg AB|XY reveals some information about the underlying behaviour. One might thus intuitively do the following: compute the values of G a much lower guessing probability, one would choose χ = (x * , y * ). However, if N raw is not big enough, χ = {0, 1} 2 is likely to result in a better min-entropy bound in any case, as our results show. The optimised Bell expression I obtained in step (iv) may not be unique and the different possible representations of I are only artefacts of numerical computations. However, the choice of a representative for I matters, since two physically equivalent representations can lead to different statistical estimates [29], and thus to distinct lower bounds on the min-entropy. In order to avoid such effects, we use the unique representation introduced in [29], by setting the signalling part to zero (see [29] for details).
In step (v), we assume that the specific distributions P χ XY can be generated using some freely available resource. If this is the case, one might consider that the task of randomness generation is already achievable, and we might then call our primitive 'randomness expansion', rather than 'randomness generation'. However, the input randomness can be public: it needs to be random to anyone beforehand, but it can be accessed by anyone after it is produced. Conversely, the output randomness is private: its value resides in the fact that it is only accessible to the user. We can thus refer to this process as 'private random bits generation'.
In step (vi), in order to apply Theorem 1, we need to know the quantum bounds I + Q and I − Q . We approximate these bounds with the extrema over an NPA set, so that they can be easily computed. Moreover, we only bound the min-entropy of the data generated in step (v). Indeed, it is essential that the set of data used for the estimation be different from the one for which the bound on the min-entropy is derived: the statistical analysis of the data cannot depend on the data itself. This implies that, contrarily to [11], our method requires that part of the data is used only for parameter estimation, and then thrown away.
Finally, note that even though the regularisation method described in [21] is meaningful only when the underlying distribution P AB|XY is i.i.d., the derivation of the bound on the min-entropy does not rely on this assumption. For this reason, the probabilistic statement that we obtain via our method will still be valid, even if P AB|XY is not i.i.d.. In this case, the Bell expression that we obtain might be inadequate, which might result in a trivial lower bound on the min-entropy (that equals to zero), but it will not result in an overestimation of the min-entropy of the raw data. In this sense, the optimisation method might become irrelevant, but the security analysis will not be compromised.

B. Tuning the parameters
In order to adjust the parameters of our protocol, we simulate some pairs of devices, by generating for each one a random state ρ and some random measurements The random states are picked at random in the space of two qubit pure states via their Schmidt decomposition, and the random measurements are generated via their associated projectors, picked at random on the Bloch sphere.
We then compute the associated behaviour: To ensure that the obtained behaviours are non-local, we compute their associated values I CHSH (P AB|XY ) of the Clauser-Horne-Shimony-Holt (CHSH) inequality [30]: x,y,a,b (−1) xy+a+b P AB|XY (ab|xy), (17) and discard those for which I CHSH (P AB|XY ) ≤ 2. We then construct the corresponding N tot -round behaviour using P AB|XY in an i.i.d. way, i.e., We set N tot = N est + N raw = 10 8 , in accordance with the state-of-the-art experimental demonstration of device-independent randomness generation [31]. We then conduct a detailed study of four of these random behaviours, to heuristically fix three crucial parameters of our protocol: • the regularisation method, • the number of rounds used for the estimation N est , • the inputs subset used to generate randomness χ, Based on the data we obtained, presented in Appendix A, we decided to set: • N est = 10 6 , The graphs that corroborate these decisions can be found in Appendix A. Before we give the results of several simulations that illustrate the efficiency of our protocol, note that, when one sets N tot = 10 8 , generating randomness from only one input pair (i.e., setting χ = (x * , y * )) does not usually result in higher min-entropy bounds than when one sets χ = {0, 1} 2 . The same effect can be observed in the simulations carried out by the authors in [11]. It is not surprising: in order to obtain a good min-entropy rate when certifying randomness from only one input pair, one should bias the input distribution towards that pair as much as possible. However, in order to obtain a reliable estimate of the Bell violation, one should evaluate it with many occurrences of each possible input. These two assertions are in an apparent contradiction, and they can both hold simultaneously only if N tot is high enough. It seems that, for most behaviours, N tot = 10 8 is not sufficient. We however checked that, when N tot is sufficiently big, our method provides better min-entropy bounds for χ = (x * , y * ) than for χ = {0, 1} 2 . The corresponding graph can be found in Appendix B.

C. Numerical results
Ouf figure of merit is the comparison between the minentropy bound obtained from our protocol, denoted H min in the following, and the one obtained from a direct evaluation of the CHSH inequality, H CHSH min . We generate 50 behaviours at random (in the same way as described above) and run 500 simulations for each of them. To compute the lower bound on H min , one should set n = N raw in Theorem 1, whereas for H CHSH min , n = N tot > N raw , as no estimation is required. 1 The parameters of the bound of Theorem 1 are set as follows: we fix ǫ = ǫ ′ = 10 −6 , we divide the interval [I L , I + Q ] in M + 1 = 1000 segments of the same length, and we use the level 2 of the hierarchy defined 1 It might seem necessary to also first sacrifice a part of the data to determine which among the 8 representatives of the CHSH inequality is violated. This is however unnecessary as any given behaviour can violate at most one representative of the CHSH inequality (see page 2 of the Supplementary Material to [32]), which can be determined by evaluating the min-entropy bound of all different representatives of the CHSH inequality. Red circle: ratio between the maximal achievable min-entropy and the rate obtained via the direct use of the CHSH inequality in [33] (i.e., local level 2 defined in [34]) for the regularisation and the guessing probability problems. We then compute the corresponding min-entropy rate by dividing these values by N tot in both cases. We also computed − log 2 (G χ f ull (P AB|XY )), which corresponds to the maximal achievable min-entropy rate. To show that it is worth sacrificing part of the data for estimation, we then compared these three quantities. The results are presented in Fig. 2.
In this figure, we plot the ratios between the minentropy rates for H min and H CHSH min for every simulated pairs of devices, as well as the ratios between the maximal achievable rate − log 2 (G χ f ull (P AB|XY )) and H CHSH min . For clarity, we sorted them in ascending order of the latter. We highlighted in grey the areas between the line y = 1, where the amount of randomness given by our protocol is the same as using CHSH inequality, and the curves connecting the optimal ratios. Our protocol is good whenever a point falls in this area. Indeed, it means that, despite the N est bits that were thrown away, we obtain a higher bound on the min-entropy than if we had simply used the CHSH inequality on all the bits.
We observe that our method performs well in 98% of the simulations, in the following sense: when the optimal rate is nearly achieved with the CHSH inequality (i.e., the CHSH inequality gives a bound that is above 95% of the optimal rate), so does our method; when the CHSH inequality does not achieve the optimal rate, our method performs significantly better (with rates up to 1.6 times more) in all but one case.

IV. Conclusion and future works
We presented a simple method to optimise the lower bound derived in [11] on the min-entropy produced by a sequence of Bell tests. It consists in estimating the underlying behaviour of the black boxes, via the regularisation method given in [21]. We then tuned the parameters of this protocol via a heuristic method. We concluded that, when one regularises some data for randomness generation, one should always use the maximal likelihood method (the authors observed the same effect for another figure of merit, the negativity [33], in [21]), one can sacrifice up to 1% of the data for estimation, and that, for the device-independent randomness generation experiments that can be performed at the moment (i.e., with N tot = 10 8 ), one should generally use the worst case RB function (i.e., the one that bounds the randomness for all inputs). We then carried out numerical simulations that illustrate the efficiency of this method.
We now describe two possible lines of investigation that follow from this work. The first one would be to take into account more factors in the optimisation of the lower bounds on the min-entropy. For instance, one could generate randomness from two or three subsets of inputs pairs, instead of considering only one or all of them as we did here. One could also tune P χ XY in a more precise way, as a function of the total number of rounds N tot and of the differences between the guessing probabilities for each input pair. Finally, the RB function is a key element in the derivation of the bound. We used here the one introduced in [11]. However, there are other ways to compute a function that satisfies both requirements R.1 and R.2 needed for an RB function, such as the one introduced in [10]. Being able to compute the RB function that is tight would entail an improvement on the min-entropy bound.
The second one is related to the power given to the adversary. Our results hold in a trusted provider scenario, where our protocol allows for correcting noise and deterioration in the apparatuses, and in an adversarial scenario where the adversary holds only classical-side information. Adapting it to the case of an adversary with quantum side information would provide a min-entropy bound valid in the most general scenario. This could be achieved using a recent result, the entropy accumulation theorem [14]. Based on that result, a bound was derived on the n-round smooth min-entropy against an adversary with quantum side information [16]. However, this bound is based on the CHSH inequality (or, more accurately, on the CHSH game). Deriving such a bound for other inequalities might be a hard task. We took a different approach here, that consists in optimising the amount of randomness that is generated by tailoring the Bell inequality to a specific case. This, in turn, led us to consider only classical side information. If one could adapt the results of [14,16] to any Bell inequality, one would be able to guarantee the security of our protocol in the most general scenario.
We present here the analysis that we conducted in order to tune the parameters of our protocol. We first generated four random distributions, in the same way as explained in the main text, and computed the min-entropy rates for varying N est , to see how many bits should be sacrificed for estimation. We set the parameters of the bound in the same way as presented the main text, i.e., we fix ǫ = ǫ ′ = 10 −6 , we divide the interval [I L , I + Q ] in M + 1 = 1000 segments of the same length, we use the NPA local level 2 [33] for the regularisation and the guessing probability problems, we set N tot = 10 8 , and we run 500 simulations for each point. The inputs distribution for the estimation phase is always uniform. We compute the average min-entropy rates H min /N tot as a function of log 10 N est for both regularisation methods ML and LS, and with two possible choices for χ: χ all = {0, 1} 2 and χ one = (x * , y * ), where (x * , y * ) is the most random input pair, i.e. the one that yields the highest RB function. In that case, we set the input distribution to P XY (x * , y * ) = π x * y * = 0.9 (and uniform on the other inputs). The results are presented in Figure 3.
From those graphs, we deduce that setting N est = 10 6 , i.e., 1% of the total data, is optimal. Note that, to distinguish these four distributions, we give their CHSH values I CHSH . It does not mean that the CHSH inequality is the best Bell expression for certifying randomness from these behaviours: we merely give it as a way to quantify how non-local these distributions are, because it might be interesting for the reader to see that the effects we observe seem to depend on that. For instance, generating randomness from only one input seems to give an advantage only when the CHSH value is high enough.
We then study, under the same conditions, the effect of the input in the bias distribution π x * y * , to see if one can observe an advantage when setting χ = χ one instead of χ = χ all . The results can be found in Figure 4.
We observe that for three distributions, no advantage is obtained when generating randomness from only one input pair, independently of how the input distribution is biased towards that input pair. That confirms the observation based on the first graph: setting χ = χ one can give an advantage only for the behaviour with highest CHSH value. This is not surprising when one compares these results with the examples provided in [11], where the authors also observed that generating randomness from one input pair starts giving an advantage only for high enough N tot > 10 8 . We thus decided not to use this possibility and to set χ = χ all .
We then compared the min-entropy ratios obtained from the ML and LS regularisations. In that case, there is no varying parameter, so we decided to directly run the simulations described in Section III C for both regularisations, and to compare the obtained ratios H min /H CHSH min . The results can be found in Figure 5.
The ML regularisation performs better than the LS regularisation in 98% of the cases. Moreover, while the protocol based on ML performs well for 98% of the cases, that holds for LS only in 94% of the cases. This leads us to claim that when one wants to regularise data in order to certify randomness, one should preferably minimise the KL divergence. Average min-entropy rates as a function of the size of the data that is sacrificed for estimation.

B. Generating randomness from one input pair
To ensure that our method could result in better min-entropy bounds for χ = χ one when the total number of rounds is big enough, we carried out the same simulations as the ones presented in the main text, but with N tot = 10 12 . In that case, our method allows us to identify which input pair (x * , y * ) yields the most favourable RB function, thanks to the ML regularised distribution. We then bias the input distribution towards that pair, setting π x * y * = 0.99. The results are presented in Figure 6, where we plot the ratios between the min-entropy rate obtained via our protocol and via the direct use of the CHSH inequality H CHSH min , as well as the ratios between − log 2 (G χ f ull (P AB|XY )) and H CHSH min , for χ = χ one and χ = χ all . We highlighted in grey the region between these two ratios. 98% of the simulations led to points falling in that region. In those cases, our protocol is good in two ways: not only it performs better than the direct use of CHSH, but it also achieves a higher ratio than the optimal one for all inputs. In that case, the advantage of our protocol is twofold: it allows us to identify the most favourable input pair, and then to taylor the Bell inequality to that specific input pair. Average min-entropy rates as a function of the input distribution. In most cases, both regularisation methods give the same value for χone, which is why they cannot be distinguished.  Red circle: ratio between the maximal achievable min-entropy for χone and the rate obtained via the direct use of the CHSH inequality. Red dot: ratio between the maximal achievable min-entropy for χ all and the rate obtained via the direct use of the CHSH inequality.