Device-independent lower bounds on the conditional von Neumann entropy

The rates of several device-independent (DI) protocols, including quantum key-distribution (QKD) and randomness expansion (RE), can be computed via an optimization of the conditional von Neumann entropy over a particular class of quantum states. In this work we introduce a numerical method to compute lower bounds on such rates. We derive a sequence of optimization problems that converge to the conditional von Neumann entropy of systems defined on general separable Hilbert spaces. Using the Navascu\'es-Pironio-Ac\'in hierarchy we can then relax these problems to semidefinite programs, giving a computationally tractable method to compute lower bounds on the rates of DI protocols. Applying our method to compute the rates of DI-RE and DI-QKD protocols we find substantial improvements over all previous numerical techniques, demonstrating significantly higher rates for both DI-RE and DI-QKD. In particular, for DI-QKD we show a minimal detection efficiency threshold which is within the realm of current capabilities. Moreover, we demonstrate that our method is capable of converging rapidly by recovering all known tight analytical bounds up to several decimal places. Finally, we note that our method is compatible with the entropy accumulation theorem and can thus be used to compute rates of finite round protocols and subsequently prove their security.


Introduction
Quantum cryptography enables certain cryptographic tasks to be performed securely with their security guaranteed by the physical laws of nature as opposed to assumptions of computational hardness that are used in more conventional cryptography.However, standard quantum cryptography protocols, for instance BB84 [1], still require a strong level of trust in the hardware used and its implementation.Faulty hardware or malicious attacks on the implementation can compromise the security of these protocols, rendering them useless [2].Whilst better security analyses and improved hardware checks can make it more difficult for an adversarial party to eavesdrop on the protocol, it is always possible that there are unknown side-channel attacks that remain exploitable.
Fortunately, quantum theory also offers a way to remove strong assumptions on the hardware and implementation by instead running a so-called device-independent protocol.Device-independent protocols offer the pinnacle of security guarantees.By relying on minimal assumptions, they remain secure even when the devices used within the protocol are completely untrusted.The central idea behind many device-independent protocols, including randomness expansion (RE) and quantum key distribution (QKD), is that there are certain correlations between multiple separate systems that (i) could only have been produced by entangled quantum systems and (ii) are intrinsically random.Intuitively, these nonlocal correlations then act as a certificate that guarantees the systems produced randomness [3].Furthermore, nonlocal correlations have additional applications such as certifying the dimension of a system [4], reducing communication complexity [5] and in certain cases nonlocal correlations can even certify the exact systems used (up to some unavoidable symmetries) [6].
Beginning with the works of [7,8], significant effort has been placed into developing new deviceindependent protocols and subsequently proving their security.For tasks like DI-RE and DI-QKD, security is now well understood, with tools like the entropy accumulation theorem (EAT) [9,10] and quantum probability estimation (QPE) [11] enabling relatively simple proofs of security against all-powerful quantum adversaries [12,13].In both cases a security proof can be readily established once one has developed a quantitative relationship between the nonlocal correlations observed in the protocol and the quantity of randomness generated by the devices used.In particular, one is required to understand the rate of a protocol, i.e. the number of random bits or key bits that are generated per round.The asymptotic rates of both DI-RE and DI-QKD protocols are quantified in terms of the conditional von Neumann entropy.Moreover, using the EAT, one can also bound the total randomness produced in a finite round protocol in terms of the conditional von Neumann entropy. 1 central problem in device-independent cryptography then remains of how one actually computes this conditional von Neumann entropy?More specifically, one is required to compute (or lower bound) the minimum conditional von Neumann entropy produced by the devices in a single round of the protocol conditioned on the adversary's side information.This computation should be device-independent in the sense that there are no restrictions on what systems were used except for some constraints on the expected correlations produced by the parties' devices, e.g., a Bellinequality violation.For the purpose of the introduction, we use the following informal notation inf ρ H [ρ] for this quantity of interest, where the infimum is understood to be over all states that are compatible with the observed correlations, and H refers to the conditional von Neumann entropy.We refer to Section 2.2 for a more formal definition.At first glance this optimization would appear very challenging: the optimization itself is non-convex and furthermore we place no restrictions on the dimensions of the quantum systems used.Nevertheless, in spite of these apparent difficulties, significant progress has still been made towards solving such an optimization problem.For example, when the devices in the protocol have only binary inputs and outputs, it is often possible to exploit Jordan's lemma [14] in order to restrict the optimization to qubit systems.Using this technique, the work of [15] was able to give an exact analytical solution to the optimization problem for the one-sided conditional von Neumann entropy when the devices were constrained by a violation of the CHSH inequality [16].More recently, other works have also managed to obtain analytical lower bounds on devices satisfying MABK inequality violations [17], tight bounds bounds for the Holz inequality [18] and tight bounds for the asymmetric CHSH inequality [19].These analytical results are difficult to obtain and also rely on the reduction to qubit systems, meaning that it is not possible to perform such an analysis when the inputs and outputs of the devices are not binary.As such, general numerical techniques were also developed that could handle more complicated protocols and general linear constraints on the expected correlations.
For example, in [20,21] it was shown that the analogous optimization for the min-entropy [22] can be straightforwardly lower bounded using the Navascués-Pironio-Acín (NPA) hierarchy [23,24].As the min-entropy is never larger than the von Neumann entropy the results of these computations give lower bounds on the rates.However, one major drawback of this method is that the resulting lower bounds are in general quite loose.More recently, two other proposals for general numerical approaches have been developed.In [25], the authors give an explicit method to lower bound the conditional von Neumann entropy via a non-commutative polynomial optimization problem that can then be lower bounded using the NPA hierarchy.In [26], the authors introduced new entropies that are all lower bounds on the conditional von Neumann entropy and which can, like the minentropy, be optimized using the NPA hierarchy.Both works improve upon the min-entropy method but neither has been shown to give tight bounds on the actual rate of a protocol and in general there appears to be significant room for improvement.As such, the question remains as to whether one can give a computationally tractable method to compute tight lower bounds on the rates of protocols.

Contributions of this work
In this work, we introduce a new method to solve this problem.To achieve this, we develop a family of variational expressions indexed by a positive integer m that approximate the conditional von Neumann entropy to arbitrary accuracy as m grows.For any m, the variational expression is determined by a noncommutative polynomial P m and for a state given by the density operator ρ takes the form where Z 1 , . . ., Z n are bounded operators.For any choice of m, this expression is a lower bound on the von Neumann entropy i.e., and in particular can be used to lower bound the rates of DI protocols.Such variational expressions are closely related to Kosaki's expressions for the quantum relative entropy [27,28,29].However, Kosaki's variational expression has a number of variables (n in (1)) that is infinite and thus it is not suitable for computations.The variational expressions we obtain can be seen as well-chosen finite approximations of Kosaki's formula.Note that we need to be careful in the choice of approximation as we need to obtain expressions which always give lower bounds on the conditional von Neumann entropy.
The point of obtaining expressions of the form (1) is that when further taking the infimum over all states ρ that are compatible with the observed correlations, it is possible to relax such an optimization problem to a semidefinite program (SDP) using the NPA hierarchy to yield a computationally tractable lower bound.Moreover, we show that for any fixed m, the resulting noncommutative polynomial optimization problem can be made to satisfy the property that all the variables have a fixed upper bound on their operator norm.As a consequence, we obtain that the sequence of SDPs given by the NPA hierarchy converges in the limit to inf ρ H m [ρ] where ρ is compatible with the observed statistics.We also remark that unlike previous numerical techniques that were developed, our analysis applies when the systems are also infinite dimensional.
We then apply this to compute the rates of both DI-RE and DI-QKD protocols.Compared to the other numerical techniques we show significant improvements on the calculated rates.We give improved bounds on the DI randomness certifiable with qubit systems which could be used to yield more efficient experiments for DI randomness [30,31].In addition we also give new lower bounds on the minimal detection efficiency required to perform DI-QKD with qubit systems.This gives a promising approach to conduct DI-QKD experiments with current technologies.We also find that in practice our method can converge quickly as we demonstrate that we can recover (up to several decimal places) tight analytical bounds from [15], [18] and [19].We also remark that the computations we ran in several of our examples were also vastly more efficient than the two numerical approaches of [25,26].Finally, we also explain how our technique can be used directly with the entropy accumulation theorem in order to compute non-asymptotic rates of protocols and subsequently prove their security.
The article is structured as follows.In Section 2 we begin by stating the main results relevant to device-independent cryptography and their application.Note that in this section we will restrict to a special case of our more general result that is sufficient for those interested solely in applications.
Later in Section 3 we will state the general technical result along with its proof.

Results
Before we begin let us establish some notation.Let H be a separable Hilbert space, we denote the set of bounded operators on H to itself by B(H).A state on H is a positive semidefinite, trace-class operator ρ such that Tr [ρ] = 1.We denote the set of states on H by D(H).Given two positive semidefinite operators ρ, σ on H we write ρ ≪ σ if ker σ ⊆ ker ρ where ker X := {|v⟩ ∈ H : X|v⟩ = 0}.Throughout this work R + will denote the set of nonnegative real numbers, N will denote the set of strictly positive integers, ln will denote the natural logarithm and log 2 will denote the logarithm base two.In this section, for simplicity of presentation, we assume that H is finite-dimensional.See Section 3 for the setting where H is infinite-dimensional.

Converging upper bounds on the relative entropy
We define the relative entropy between two positive semidefinite operators ρ and σ as whenever ρ ≪ σ and +∞ otherwise.Note that this is equivalent to the definition where {|ψ i ⟩, y i } i are an orthonormal basis of eigenvectors with their corresponding eigenvalues for the operator ρ and similarly {|ϕ j ⟩, x j } j for σ.For a bipartite state ρ AB on a Hilbert space H A ⊗H B we define the conditional von Neumann entropy as The main technical result of this work is the following theorem. where . Moreover, the RHS converges to D(ρ∥σ) as m → ∞.Remark 2.2.The constants t i and w i appearing in Theorem 2.1 are the nodes and weights of a Gauss-Radau quadrature rule over [0, 1] with endpoint t m = 1.Importantly they are efficient to compute and we refer the reader to Sec. 3.2 for further details as well as [32] for a more general treatment.
The above theorem provides a convergent sequence of upper bounds on the relative entropy in the form of an optimization problem.This optimization has several features that are crucial to the applications we will now see.In particular it has an objective function that is linear in the operators ρ and σ and the form of the optimization does not change with the dimension.In the following subsection we will show how to turn these upper bounds on the relative entropy into semidefinite programming lower bounds on the rates of DI protocols.

Application to device-independent cryptography
For the device-independent setup we consider two honest parties 2 , Alice and Bob, and an adversary Eve.Alice and Bob each have access to a black-box device which they can give inputs to and receive outputs from.We shall denote the inputs of Alice and Bob by X and Y respectively.Similarly we denote the outputs of Alice and Bob by A and B respectively.All inputs and outputs come from finite sets.We refer to a setup wherein Alice's device has n A inputs and m A outputs and Bob's device has n B inputs and m B outputs as a n A n B m A m B -scenario.A round consists of Alice and Bob performing the following: (1) they shield 3 their devices so that they cannot communicate with one another; (2) Alice and Bob each provide their respective device with an input, selected using some fixed probability distribution p(x, y); (3) they each receive an output from their respective device.In a device-independent protocol such as DI-QKD or DI-RE the statistics of many rounds will be analyzed in order to determine whether or not randomness is being produced from the devices or whether a secret key can be distilled.
We assume that the devices are constrained by quantum theory and that they act in the following way.Let Q A , Q B and Q E be the three separable Hilbert spaces of Alice's device, Bob's device and Eve's device respectively.At the beginning of each round a tripartite state ρ shared between the three systems.In response to an input, Alice and Bob's devices will measure some preselected POVMs {{M a|x } a } x , {{N b|y } b } y on the parts of the state that they received and return the measurement outcome.The state and measurements are unknown to Alice and Bob but may be known by Eve.Overall, the joint conditional distribution of their outputs may be described via the Born rule as Immediately after the round, the joint state between the classical information recorded by the honest parties and Eve's quantum system may be described by the cq-state where In spot-checking protocols, secret key and randomness are only extracted from rounds with a particular input [33].As such, we henceforth consider some distinguished inputs (x * , y * ) and the post-measurement state . The exact choices of (x * , y * ) will be made explicit when we compute rates for given protocols.For a spot-checking DI-RE protocol, the asymptotic rate is characterized by if the randomness is extracted from both Alice and Bob's outputs and if the randomness is only extracted from the outputs of Alice's device.For a spot-checking DI-QKD protocol with one-way error correction, the asymptotic rate is given in terms of the Devetak-Winter bound [34] To obtain the asymptotic4 rate of the device-independent protocols we must consider a minimization of the above quantities over all possible quantum systems that could have produced the statistics we observed on expectation.More formally, let A, B, X , Y denote the finite sets from which the random variables A, B, X, Y take their values.Then given a finite set C let C : ABX Y → C be some function which will act as a statistical test on ABXY .Finally, we consider a probability distribution q(c) on C. We then say a distribution p(a, b, x, y) on ABXY is compatible with the pair In other words, when we apply our statistical test C to the random variables ABXY we obtain a new random variable whose distribution is q.For example, let and let q(1) = ω.Then for p(x, y) = 1/4 the pair (C, q) imposes the constraint that the distribution p(a, b, x, y) should achieve an expected score of ω in the CHSH game.
More generally we say that a tuple is compatible with the constraints (C, q) if the probability distribution p(a, b, x, y) = p(x, y)Tr ρ(M a|x ⊗ N b|y ⊗ I) is compatible with (C, q).We refer to such a tuple as a strategy.For each strategy we can also associate a post-measurement state via (7).Then for a fixed (C, q) the device-independent rate of randomness produced by Alice's device on input X = x * , if the devices satisfy the constraints imposed by (C, q), is given by inf where the infimum is taken over all strategies compatible with (C, q) and the conditional von Neumann entropy is evaluated on the post-measurement state induced by the strategy.The rates for the randomness produced by both devices and the device-independent Devetak-Winter rate can be defined analogously by replacing the objective function with the appropriate quantity.Note that by considering appropriate dilations we can restrict the optimization to strategies wherein the measurements are projective and the state ρ The following lemma demonstrates how to use the upper bounds on the relative entropy from Theorem 2.1 in order to lower bound this infimum by a converging sequence of optimizations that can be subsequently lower bounded using the NPA hierarchy.Proof.Let ρ AQ E = a |a⟩⟨a| ⊗ ρ Q E (a, x * ) be the cq-state after Alice has performed her measurement corresponding to the input x * , i.e., ρ where wi ti ln 2 and λ is some real number such that ρ AQ E ≤ λI A ⊗ ρ Q E .Now as system A is finite dimensional, we can write the operator Z = ab |a⟩⟨b| ⊗ Z (a,b) for some operators Z (a,b) ∈ B(Q E ).Then for the first term we have where on the first line we traced out the A system, on the second line we substituted in the definition of ρ AQ E (a, x * ) and on the third line we used the identity Tr Repeating this for the second term we find where on the second line we noted that b Z * (b,a) Z (b,a) ≥ Z * (a,a) Z (a,a) .Finally we get a similar relation for the final term Inserting these three rewritings into the lower bound on H(A|X = x * , Q E ) and relabelling Z (a,a) to Z a we recover the objective function stated in the lemma.Note that the above rewritings and the fact that we are minimizing implies that we need only consider operators Z that are block diagonal in the sense that Z = a |a⟩⟨a| ⊗ Z (a,a) . As This recovers the form of c m stated in the lemma (noting that w m = 1 m 2 and t m = 1).As it is sufficient to take Z = a |a⟩⟨a| ⊗ Z a , we must have Finally the convergence statement follows immediately from the convergence proven in Theorem 3.11.

Remark 2.4 (Adapting to other entropies). The above lemma only describes a bound for H(A|X =
x * , Q E ).However the proof can be easily adapted to the case of the global entropy H(AB|X = x * , Y = y * , Q E ) or for non-fixed inputs, e.g., H(A|XQ E ).For example, the global entropy H(AB|X = x * , Y = y * , Q E ) can be lower bounded by replacing the inner summation in (15) with (20) Similarly one could also adapt the proof to bound H(A|XQ E ), allowing the entropy to averaged over the inputs X.For this one would replace the inner summation with The above lemma and remark provide a converging sequence of lower bounds on the conditional von Neumann entropy.In order to turn these into lower bounds on the rate of a device-independent protocol we must also include the optimizations of all states, measurements and Hilbert spaces, subject to any constraints on the devices' joint probability distribution that we wish to impose.Suppose for 1 ≤ j ≤ r for some r ∈ N we impose on Alice and Bob's devices a collection of constraints where c abxyj , v j ∈ R. Then using Lemma 2.3 we can compute a lower bound on H(A|X = x * , Q E ) for all possible devices that satisfy the above constraints by solving the following optimization problem: where the infimum is taken over the all collections satisfying the constraints of the problem.Note that when rearranging the objective function we have used the fact that the inner summation in (15) commutes with the infimum over the Z a operators.We can then further pull the infimum outside the outer summation by reparametrizing the variables as Z a,i for each i in the outer sum.In order to compute a lower bound on (23) we employ the NPA hierarchy to relax this problem to an SDP.To do this we first drop the tensor product structure and instead include commutation relations on the relevant variables of the problem.In doing so we end up with the following noncommutative polynomial optimization problem which gives a lower bound on (23) for all a, b, x, y, i (24) where we have recalled that it is sufficient to consider pure states.Note that in both of the above optimizations we can also include projective measurement constraints without loss of generality.Using the NPA hierarchy we can then relax this optimization to a sequence of SDPs that give us a converging sequence of lower bounds on the optimal value and in turn a lower bound on the rate of the protocol.

Remark 2.5 (Commuting operator versus tensor product strategies).
It is immediate that (24) is never larger than (23) and thus our subsequent relaxations of (24) will always give lower bounds on the rates of protocols as we defined them previously using the tensor product framework.However, due to recent work [35] it may not be the case that ( 23) and ( 24) are equal.
Let R m be the optimal value in (24) and r m,k be the optimal value of its k th level NPA relaxation.As we have explicit bounds on the operator norms of the variables we know that our NPA relaxations of (24) will converge to the optimal value of (24), i.e., R m = lim k→∞ r m,k .We believe that sup m R m will correspond to infimum of the conditional von Neumann entropy for commuting operator strategies.However proving this would require one to formally define commuting operator strategies (analogous to the tensor product strategies introduced earlier) and checking that ( 24) can be derived in a similar fashion to how (23) was derived.We leave this to future work.

Remark 2.6 (Faster lower bounds)
. There are several ways to speed up the SDP relaxations of (24).We will also note in the caption of each figure which speedups were used.
1. Often including the operator inequalities Z * a,i Z a,i ≤ α 2 i and Z a,i Z * a,i ≤ α 2 i does not improve the lower bound.Hence we always removed them when performing the computations.
2. The choice of monomial indexing set for the moment matrices can greatly affect the accuracy and speed of the computations.We found that the local level 1 set, i.e., monomials of the form ABZ where 3. It is also possible to commute the outer summation and the infimum, i.e., compute This can only decrease the lower bound on the entropy and hence it is sufficient for the purpose of lower bounding the rate of a protocol.The main advantage of doing so is that it drastically reduces the number of variables in the NPA hierarchy relaxations.Rather than running a single SDP with a Z a,i variable for each value of a and i, we can instead run m much smaller SDPs with only a Z a,i variable for each a.This significantly reduces the runtime of the SDPs and results in the runtime scaling linearly with the number of nodes in the Gauss-Radau quadrature.However we did notice in certain cases that the lower bounds computed in this manner were not converging to tight lower bounds.In such cases we did not include this speedup.
4. The moment matrix of the NPA relaxation can without loss of generality be taken to be a real symmetric matrix.

Numerical results
We will now apply our method of computing rates to several DI-RE and DI-QKD scenarios and compare our technique with known analytical results [15,18,19] and other numerical techniques [25,26,36].We will only concern ourselves with the asymptotic rates in this work.However, as noted earlier, our technique can be combined with the entropy accumulation theorem in a relatively straightforward manner, similar to [37], and thus one could also use it to compute finite round rates for protocols.We discuss this further in Appendix E. The semidefinite relaxations were generated using the python package NCPOL2SDPA [38] and the resulting SDPs were solved using MOSEK [39].As NCPOL2SDPA is no longer maintained, we used a maintained fork of the original package [40].We also provide example python scripts that implement some of the computations, these can be found at the github repository [41].

Randomness from the CHSH game
To begin we consider the simplest possible setting of bounding H(A|X = 0, Q E ) when Alice and Bob's devices are constrained to achieve some minimal score in the CHSH game.In Figure 1 we demonstrate how our bounds improve as we increase the number of nodes m in the Gauss-Radau quadrature.We compare this with a known tight analytical bound in this setting [15].In the figure we see that for m = 8 our numerical technique effectively recovers the known tight analytical bound.
As far as we are aware this is the first numerical technique to do so without resorting to algebraic simplifications afforded by Jordan's lemma.Importantly, this also demonstrates that in certain settings our technique can converge very quickly in the NPA hierarchy and in the size of the Gauss-Radau quadrature.Furthermore, as we are able to run our computations at low levels of the NPA hierarchy, the computations are also relatively fast for this setting with each SDP taking less than a second to run.We include additional plots demonstrating the convergence of our technique for  other known tight analytical bounds [18,19] in Appendix D. These include multipartite scenarios useful for bounding the rates of DI conference key agreement protocols [42].
For randomness expansion protocols it is beneficial to consider the randomness generated from both devices.Therefore, in Figure 2 we plot lower bounds on H(AB|X = 0, Y = 0, Q E ).We compare this again with the analytical bound on H(A|X = 0, Q E ) and we also compare it with the numerical technique of [26] and a known upper bound from [36].As is clear from the figure, the global entropy can be significantly larger than the local entropy leading to much better rates for randomness expansion protocols based on the CHSH game.In comparison with the numerical technique of [26], we see that our new method vastly outperforms it.In the plot we also compare with some numerical upper bounds from [36] which used Jordan's lemma to reduce the problem to qubits and then optimized over explicit qubit strategies.As we see from the plot our rate curve almost coincides with this upper bound and we expect that by increasing m further the gap would be further reduced.Note that we could also have used speedup (3) from Remark 2.6 in this case which would change the runtime from hours to minutes.However, we found that when using speedup (3) we were unable to recover the bound from [36] and so we elected to not use it.

Randomness from the full distribution
By increasing our knowledge about the conditional distribution with which Alice and Bob's devices operate we further constrain the possible devices that could produce the statistics we observe in the protocol.In turn we can hope that this leads to larger bounds on the randomness produced by the devices.At the extreme end of this, we would have knowledge of the entire distribution characterizing the devices.In the following we assume that we have access to the complete distribution and we impose these constraints in the SDP.
To make the plots more experimentally relevant we also assume the devices are affected by inefficient detection events.That is, with some probability η ∈ [0, 1] the devices will operate normally and with probability 1 − η the devices will fail and deterministically output 0.   2) from Remark 2.6.For m = 8 a single data point can take hours to run.We also compare with the iterated mean entropy H ↑ (4/3) (AB|X = 0, Y = 0, QE) from [26] and an upper bound from [36].
distribution for such devices takes the form where q(a, b|x, y) is the conditional distribution of the devices when η = 1.
In Figure 3 we compare lower bounds on the entropy H(AB|X = 0, Y = 0, Q E ) when Alice and Bob's devices have inefficient detectors.For the curve representing our technique we used a Gauss-Radau decomposition with m = 8 nodes and at each data point we selected a two-qubit distribution in order to maximize the entropy produced by the devices.We compare our technique with the iterated mean entropy H ↑ (2) (AB|X = 0, Y = 0, Q E ) from [26] and the TSGPL method from [25] both of which also constrained the devices by their full distribution.Compared with the other methods, our technique is everywhere larger and on the whole the difference is substantial.We also note that our technique is again significantly faster than the other numerical techniques presented here.

Better bounds on DI-QKD key rates
Thus far we have only concerned ourselves with bounds on local and global entropies.However, we can use these bounds to compute the asymptotic rates of some DI-QKD protocols.The asymptotic rate of a DI-QKD protocol with one-way error correction is given by the Devetak-Winter bound [34] where again we are assuming a spot checking protocol and we consider the rate as the spot checking probability tends to 0 and the number of rounds tends to infinity.The second term in the rate can be directly estimated from the statistics in the protocol and the first term can be lower bounded using our technique.for quantum devices that are constrained by a full distribution.The numerical bounds for our technique were computed using speedups (1) and (3) from Remark 2.6 at relaxation level 2 + ABZ including all monomials present in the objective function.A single SDP takes less than a minute to run at this level.We also compare with the iterated mean entropy H ↑ (2) (AB|X = 0, Y = 0, QE) from [26] and the numerical technique from [25] which we refer to as the TSGPL bound.
We consider DI-QKD protocols in the 2322-scenario and we take (x * , y * ) = (0, 2), i.e., Bob's third input acts as his key generating input.In the same setup as the previous figure we consider devices with inefficient detectors and we constrain them by their full distribution. 5However, for QKD we allow Bob to record a device failure with an additional symbol ⊥ when he receives his key generating input y * = 2.By doing this, he collects more detailed information about his device's behaviour and in turn this allows him to reduce the size of H(A|B, X = x * , Y = y * ) slightly.We refer the reader to [43] for a more detailed discussion of this post-processing of no-click events.
To further boost the rates we also include preprocessing of the raw key [44].Loosely, we allow for Alice to add additional noise to the outputs of her devices.In certain circumstances, this can increase the value of H(A|X = x * , Q E ) more than it increases the value of H(A|B, X = x * , Y = y * ) and therefore increasing the rate overall.More specifically, after Alice and Bob have collected their raw key (the outputs of their devices when (X, Y ) = (0, 2)) Alice will independently flip each of her key-bits with some fixed probability q ∈ [0, 1/2].For example, the post-measurement state for a single key-generating round is transformed after this preprocessing to Thus this preprocessing can be seen as equivalent to Alice transforming her measurement from {M 0 , M 1 } to {(1 − q)M 0 + qM 1 , (1 − q)M 1 + qM 0 } on key generating rounds.It follows that we can then model this preprocessing in our numerical computations by modifying Alice's measurement operators in the objective function of (23) appropriately.3) from Remark 2.6 at a relaxation level 2 + ABZ and a single SDP at this relaxation level takes a few seconds to run.We also compare with the iterated mean entropy H ↑ (8/7) from [26] and with the analytic bound for a CHSH based DI-QKD protocol with a preprocessing step from [44,19].
In Figure 4 we plot the key rates achievable for devices with inefficient detectors.We compute our technique using a Gauss-Radau quadrature of m = 12 nodes.We compare our technique to bounds on the rate given by the iterated mean entropy H ↑ (8/7) from [26] (we do not include preprocessing for this curve as it did not give improvements) and an analytical bound from [44,19] which is for a protocol based on the CHSH game and includes the preprocessing step.For both of the curves that incorporate the preprocessing, at each data point we optimized the probability q with which Alice performs the her bitflip in order to maximize the obtained rates.We see that for the entirety of the plot the curve computed using our technique outperforms the other rate curves.In particular this shows the advantage that one can gain by changing the knowledge collected about the devices from a Bell-inequality violation to the full distribution.In the inset plot we zoom in on the region [0.8, 0.88] × [10 −5 , 0.1].Looking at the inset plot we can inspect where the various rate curves vanish, in other words the minimal detection efficiency required to perform DI-QKD according to that curve.For the curve computed using the iterated mean entropy H ↑ (8/7) the minimal detection efficiency is around 0.845.The red curve based on the analytic bound actually vanishes around 0.826 [19].However, we see that by using our method (purple curve) we are able to substantially reduce the threshold detection efficiency down to under 0.8.This threshold is now within a regime that is experimentally achievable.This warrants a more thorough analysis, using a more realistic model and computing finite round rates, in order to ascertain the performance capabilities of a current DIQKD experiment.We leave such an analysis to future work.

Methods
The objective of this section is to derive the variational upper bounds on the relatively entropy that were used in the previous section (see Theorem 3.11 below).However, first we need to define the quantum relative entropy and more generally quasi-relative entropies in the framework of von Neumann algebras and establish some of their properties (Section 3.1).We will then also describe rational approximations of the logarithm function (Section 3.2), which are a crucial ingredient in the derivation.

Quasi-relative entropies
Quasi-relative entropies can be defined in general for two positive semidefinite linear functionals on a von Neumann algebra A. A linear functional ρ : A → C is said to be positive semidefinite if ρ(a * a) ≥ 0 for all a ∈ A. We refer to a positive semidefinite linear functional satisfying ρ(I) = 1 as a state.We will mostly focus on the setting where A = B(H) for some separable Hilbert space H and positive semidefinite linear functionals ρ defined by some trace-class operator ρ by ρ(a) = Tr [ ρa] for all a ∈ B(H).To simplify notation, we will often use the same symbol ρ for both the positive semidefinite linear functional ρ and the trace-class positive semidefinite operator ρ.

Functional calculus of quadratic forms
We review briefly the functional calculus of sesquilinear (or quadratic) forms introduced by Pusz and Woronowicz in [45] and further applied in [46].Let α and β be two positive semidefinite quadratic forms on a complex vector space U .A function α : U → R is called a quadratic form if it satisfies α(λu) = |λ| 2 α(u) for any λ ∈ C and u ∈ U as well as the parallelogram identity α(u We say that it is positive semidefinite if α(u) ≥ 0 for all u ∈ U .The theory of Pusz and Woronowicz allows one to define a new quadratic form F (α, β) for any function F : R 2 + → R that is measurable (with respect to the σ-algebra of Borel subsets of R 2 + ), positive homogeneous (i.e., F (λx, λy) = λF (x, y) for all λ ≥ 0) and locally bounded from below (i.e., bounded from below on all compact sets).For example if F (x, y) = √ xy, this theory allows us to define the geometric mean of two positive semidefinite quadratic forms.
The main idea of the theory is that one can always represent a pair of positive semidefinite quadratic forms by two positive semidefinite, commuting operators.We call a representation of (α, β) a tuple (H, A, B, h) where H is a Hilbert space, A, B are positive semidefinite and commuting bounded operators on H, and h : U → H is a linear map onto a dense subset of H such that α(u) = ⟨h(u), Ah(u)⟩ H , β(u) = ⟨h(u), Bh(u)⟩ H for all u ∈ U .As shown in [45, Theorem 1.1], such a representation always exists.Then one defines the quadratic form F (α, β) on U by where E is the joint spectral measure of the commuting operators A and B; see [47,Chapter 5] for the definition of the joint spectral measure and [46] for the justification of the existence of this integral.The main result of the theory [45, Theorem 1.2] is that the quadratic form F (α, β) defined above is independent of the choice of representation (H, A, B, h) of the pair (α, β).
Defining quasi-relative entropies using the functional calculus of quadratic forms Following the work of [46], we define the quasi-relative entropy of two positive semidefinite linear functionals ρ, σ on a von Neumann algebra A. Given ρ and σ, we define two positive semidefinite quadratic forms on A (note that A is also a complex vector space) by Then using the functional calculus of Pusz-Woronowicz, one can define F (L σ , R ρ ), as a quadratic form on A. This leads to the following definition for the F -quasi-relative entropy.
Definition 3.1 (F -quasi-relative entropy).Let ρ and σ be two positive semidefinite linear functionals on the von Neumann algebra A. Then for any measurable, positive homogeneous and locally bounded from below function F : R 2 + → R ∪ {+∞} the F -quasi-relative entropy between ρ and σ is where with (K, A, B, h) being a representation of the two positive semidefinite quadratic forms (L σ , R ρ ) and this expression is defined by the integral in (29).We denote by ν ρ,σ the positive measure , where E is the joint spectral measure of A and B. Note that ν ρ,σ only depends on ρ and σ and not on the function F .With this notation we have with We remark that for the applications we consider, it is important to allow the function F to be infinite on some points.As described in [46], the functional calculus can readily be extended to this setting.In this setup, the quadratic form F (L σ , R ρ ) can take the value +∞ on some points and thus, as expected, the corresponding F -quasi-relative entropy can be infinite.
Definition 3.2 (Quantum relative entropy, also called Umegaki divergence [48]).Let ρ and σ be two positive semidefinite linear functionals on the von Neumann algebra A. Then the relative entropy between ρ and σ, written D(ρ∥σ) is defined as the F -quasi-relative entropy with F (x, y) = y log 2 (y/x): Note that when A = B(H) and σ is the trace, we obtain the von Neumann entropy H(ρ) = −D(ρ∥σ).The conditional von Neumann entropy can also be obtained in a similar way from the quantum relative entropy, this is described in Section 2.2.

Definition 3.3 (α-quasi-relative entropy)
. Let ρ and σ be two positive semidefinite linear functionals on A and let α ∈ (0, 1) ∪ (1, ∞).Then the α-quasi-relative entropy written Q α (ρ∥σ) is defined as F -quasi-relative entropy with F (x, y) = y α x 1−α .These operators realize the quadratic forms L σ and R ρ defined in (30), in the sense that L σ (a) = ⟨a, L σ a⟩ HS and R ρ (a) = ⟨a, R ρ a⟩ HS .Moreover, it is clear from their actions that [L σ , R ρ ] = 0.In particular, our representation is (B(H), L σ , R ρ , id) where id : B(H) → B(H) is the identity map.Thus, for an F satisfying the conditions of Definition 3.1, the F -quasi-relative entropy between ρ and σ is given by This is precisely the definition used in [49].Indeed, if we introduce the relative modular operator ∆ = L σ R −1 ρ and we define f (x) := F (x, 1) then for positive homogeneous F we have F (x, y) = yf (x/y) and the above is equal to where we used the fact that R 1/2 ρ is self-adjoint with respect to the Hilbert-Schmidt inner product, and that R 1/2 ρ a = a ρ 1/2 .Note that when ρ is not invertible the operator ∆ multiplies on the right by the generalized inverse of ρ.It can be verified that in the finite-dimensional case, the quasi-relative entropy is given by (see e.g., [50,Equation 15]) where ρ = j p j |ψ j ⟩⟨ψ j | and σ = k q k |ϕ k ⟩⟨ϕ k | are spectral decompositions of the density matrices ρ and σ respectively.Finally, choosing F (x, y) = y log 2 (y/x) we recover the usual expression for the quantum relative entropy and for the α-quasi-relative entropies, F (x, y) = x 1−α y α , we find Q α (ρ∥σ) = Tr ρ α σ 1−α which is the quantity within the logarithm of the Petz-Rényi divergences [51].
In order to obtain our variational expression for the quantum relative entropy, we will use a variational representation of D Ft when F t (x, y) = y x−y t(x−y)+y for t ∈ (0, 1].The relevance of F t to the relative entropy is that if we let F (x, y) = y log 2 (y/x), then F (x, y) = − 1 ln 2 1 0 F t (x, y)dt.Proposition 3.5.Let F t (x, y) = y x−y t(x−y)+y for t ∈ (0, 1], let ρ and σ be positive semidefinite linear functionals on a von Neumann algebra A. Then Furthermore, if t < 1, and ρ and σ are two trace-class positive semidefinite operators on the separable Hilbert space H satisfying ρ ≤ λσ for some λ ∈ R + , then the infimum in ( 39) is achieved at an element with norm bounded by α := max( 3 2(1−t) , 3λ 2t ), i.e., we can write Remark 3.6.Note that a constant bound on the norm of Z in (40) can only be guaranteed when t < 1.Indeed, observe that for that t = 1 we have , and the variational expression becomes: Assuming ρ and σ are finite-dimensional operators the infimum can be shown to be attained at Z = −σ −1 ρ with an optimal value equal to Tr [ρ] − Tr ρ 2 σ −1 .One can see that the operator norm of σ −1 ρ can be made arbitrarily large even if we assume that ρ ≤ λσ (e.g., it suffices to take ρ to be a pure state).
Proof.Proceeding in a similar way as [45,Lemma], if A, B are two commuting Hermitian operators on a Hilbert space H, for any y, z with y + z = x, a simple calculation gives is a positive semidefinite operator and we may choose y so that u = 0, it follows that Now we consider the positive semidefinite quadratic forms L σ , R ρ for the positive semidefinite functionals σ and ρ as defined in (30) and we take a corresponding representation (H, A, B, h).We get Since h : A → H is onto a dense subset of H, we can replace the minimum with an infimum to y, z ∈ H such that y = h(a) and z = h(b).Then using the fact that ⟨h(a), Ah(a)⟩ = σ(aa * ) and ⟨h(a), Bh(a)⟩ = ρ(a * a) we get Note that if we have a 1 and a 2 satisfying h(a 1 ) = h(a 2 ), then σ(a 1 a * 1 ) = ⟨h(a 1 ), Ah(a 1 )⟩ = ⟨h(a 2 ), Ah(a 2 )⟩ = σ(a 2 a * 2 ) and similarly for ρ.As such we can replace the condition in the infimum by simply a + b = I, which gives where on the final line we made the substitution a → −a.This proves (39).
To prove the second part we first derive sufficient conditions for a 0 to achieve the infimum.Call ϕ(a) the expression to be infimized in (39).Note that ϕ is convex since ρ and σ are positive semidefinite (note for example that the restriction of ϕ to any line a + sb is a convex quadratic in s ∈ R).If a 0 achieves the infimum of ϕ it must be that d ds ϕ(a 0 + sb)| s=0 = 0 for any b ∈ A. This equation gives Since ϕ is convex this condition is also sufficient for optimality.Now assume that ρ and σ are trace-class positive semidefinite operators on the separable Hilbert space H so that ρ(a) = Tr [ρa] and σ(a) = Tr [σa].Equation ( 47) says that a necessary and sufficient condition for A 0 ∈ B(H) to achieve the infimum is that which is equivalent to M = 0.For convenience, we let Z = −A 0 and arrive at the following operator Sylvester equation The existence of a bounded solution to this equation is guaranteed by the following lemma.
Lemma 3.7.Let t ∈ (0, 1) and let ρ and σ be two trace-class positive semidefinite operators on the separable Hilbert space H satisfying ρ ≤ λσ for some λ ∈ R + .Then the operator equation This completes the proof of Proposition 3.5.

Rational lower bounds on the logarithm
The quantum relative entropy is defined as the F -quasi relative entropy for F (x, y) = y log 2 y − y log 2 x.In this section we define a sequence of rational upper bounds on F that are expressed as a finite sum of the functions F t for t ∈ (0, 1]. The natural logarithm function (written ln) has the following integral representation where To get a rational approximation of ln, we can discretize the integral (51) with nodes t 1 , . . ., t m ∈ [0, 1] and weights w 1 , . . ., w m > 0 to get a function For example, in [52] the authors used Gaussian quadrature to choose the weights and nodes of r(x).The resulting function agreed with the first 2m + 1 derivatives (i.e., derivatives 0, . . ., 2m) of the logarithm function.However, this approximation to the logarithm is unsuitable for the current work as it is not a global lower bound on ln. 6t turns out though that if we use Gauss-Radau quadrature then we get rational functions that are global bounds on ln.Gauss-Radau quadrature [53, p.103] is a variant of Gaussian quadrature where one of the endpoints of the integral interval is required to be a node of the quadrature formula.
We note that such nodes and weights can be expressed in terms of properties of Legendre polynomials [53, p.103] and can be numerically computed efficiently [54].If we use this quadrature formula in Theorem 3.8 for the integral representation of the logarithm (51) we get the following rational approximation of ln: The next proposition shows that the rational functions r m are lower bounds on ln(x) that monotonically converge to ln(x).
Proposition 3.9.For any positive integer m, the function r m satisfies Proof.See Appendix A.
• One can obtain quantitative bounds on the approximation ln(x) − r m (x) showing that the convergence is uniform on any compact segment [ϵ, . This can be proved using similar tools as [52, Proposition 6].Furthermore, one can also show relative approximation error bounds that hold for all x > 0, such as See [55, Proposition 2.2] and [32] for more details.
• Since ln satisfies ln(x −1 ) = − ln(x) it is natural to ask what is r m (x −1 )?Note that the rational functions where rm (x) is the rational function obtained by applying an m-point Gauss-Radau quadrature on (51) with an endpoint at 0 (instead of 1).The resulting function rm (x) is an upper bound on ln(x) and also satisfies rm (x) − ln(x) = O((x − 1) 2m ) for x → 1.Noting that f (0, x) = x − 1 we see that rm (x) is a (m, m − 1) rational function 7 , and as such it is the corresponding Padé approximant of ln at x = 1.Many properties of rm (x) are discussed in detail in [56].
• One can obtain similar rational approximations for Petz divergences.The function x → 1 − x −α is operator monotone for α ∈ (0, 1), and has the following integral representation where To obtain rational approximations for 1 − x −α , we can also apply the Gauss-Radau quadrature for the integral (58), this time with respect to the measure dµ α (t) (see e.g., [57]).This yields nodes {t α,i } and weights {w α,i } for i = 1, . . ., m, with t α,m = 1 that can be used to define the rational function In [32] it is shown that the functions r α,m (x) satisfy r α,m (x) ≤ r α,m+1 (x) ≤ 1 − x −α for all x > 0. Furthermore, if one uses the Gauss-Radau quadrature with an endpoint fixed at t = 0 (instead of t = 1), then the resulting rational functions are upper bounds on 1 − x −α .We refer to [32] for further properties about these rational functions, including bounds on the approximation error as a function of m.

Variational expressions approximating the quantum relative entropy
We now state and prove the main result of this work, a sequence of converging variational upper bounds on the relative entropy.Definition 3.3).Then for any m ∈ N and the choice of t 1 , . . .t m ∈ (0, 1] and w 1 , . . ., w m > 0 as in Theorem 3.8, we have

Theorem 3.11. Let ρ, σ be two positive semidefinite linear functionals on a von Neumann algebra
Moreover, the RHS converges to D(ρ∥σ) as m → ∞.
In the special case where A = B(H) for a separable Hilbert space H and ρ and σ are defined via trace-class operators on H (also denoted by ρ and σ) satisfying ρ ≤ λσ for some λ ∈ R + , we can give an explicit bound on the norm of the operators appearing in the optimization: where and Moreover, the RHS converges to D(ρ∥σ) as m → ∞.Remark 3.12.This result can be seen as computationally tractable approximations of the variational expression for the quantum relative entropy due to Kosaki [27], a variant of which is also present in the work of Donald [28]; we refer to the book [29] for more information on these expressions.For example in Donald [28,Section 4] it is shown that where the functions a(t), b(t) : (δ, ϵ) → A are piecewise constant.Importantly however, each one of our approximations is guaranteed to be an upper bound on the relative entropy.
Proof.The relative entropy D is defined via the quasi-relative entropy D F for the function F (x, y) = y log 2 y − y log 2 x.Using the property of ν ρ,σ (33), we have Using Proposition 3.9, we can lower bound ln(x/y) ≥ r m (x/y) (defined in (55) and nodes and weights in (3.8)) obtaining where on the second line we have defined the quasi-relative entropy D Ft i (ρ∥σ) with F ti (x, y) := y x−y ti(x−y)+y .Now applying Proposition 3.5 to each D Ft i we get This concludes the proof of the upper bounds.
We now turn to the proof of convergence.Let R m (x, y) := m i=1 w i F ti (x, y).Then we have already shown that D(ρ∥σ) ≤ i − wi ln 2 D Ft i (ρ∥σ) = 1 ln 2 D −Rm (ρ∥σ).Furthermore, we have 1 ln 2 For any fixed (x, y) ∈ R 2 + we know from Proposition 3.9 that y (ln(x/y) − r m (x/y)) converges monotonically to 0 as m → ∞.Thus, by the monotone convergence theorem we get that 1 ln 2 D −Rm (ρ∥σ) − D(ρ∥σ) → 0 as desired.To obtain the second statement with explicit bounds on the operators, we use the second part of Proposition 3.5.It suffices to use the variational expression in (40) for the terms i ∈ {1, 2, . . ., m−1} instead of the general form.We get: (67) It now only remains to find a lower bound on D F1 (ρ∥σ) using the property ρ ≤ λσ.Note that which together with the fact that w m = 1 m 2 leads to the desired bound.

Conclusion
In this work we derived a converging sequence of upper bounds on the relative entropy between two positive semidefinite linear functionals on a von Neumann algebra.We then demonstrated how to use this sequence of upper bounds to derive a sequence of lower bounds on the conditional von Neumann entropy.The resulting optimization problems could then be relaxed to a convergent hierarchy of semidefinite programs using the NPA hierarchy.Overall this gives a computationally tractable method to compute lower bounds on the rates of DI-RE and DI-QKD protocols.
We applied our method to compute lower bounds on the asymptotic rates of various deviceindependent protocols.We compared the rates derived with our technique to other numerical techniques [25,26,36] and to known tight analytical bounds [15,18,19].We found everywhere substantial improvements over the previous numerical techniques and we also demonstrated that our technique could recover all known tight analytical bounds, making it the first general numerical technique to do so and showing that the method can converge quickly.Compared with the previous numerical techniques, not only does our technique derive higher rates but it can also be much faster due to the fact that the noncommutative polynomial optimization that we derive is of low degree and can be ran without additional operator inequalities without affecting the rates in a substantial way.Furthermore, our derivation of this optimization problem was done in the general setting of bounded operators on a separable Hilbert space, allowing it to compute the rates of device-independent protocols even when the systems are infinite dimensional.This is in contrast to the previous general numerical techniques of [25,26] that assumed finite dimensional Hilbert spaces in their analyses.
Computing key rates for DI-QKD, we also found significant improvements on the minimal detection efficiency required to generate secret key using a pair of entangled qubits.In particular we found a minimal detection efficiency lower than 0.8 which is now well within the regime of current deviceindependent experiments [30,31,58,59,60].With further effort we can hope to increase the rates of these experiments further and we leave a thorough investigation of this question to future work.
We also demonstrated that min-tradeoff functions, necessary for the entropy accumulation theorem [9,10], could be derived directly from our numerical computations.Therefore our technique can also be used to compute the rates of actual finite round protocols and to subsequently prove their security.For example, this then opens the possibility of using the global randomness bounds (see Figure 2) for the CHSH game to improve the rates of the recent DI-RE experiments [30,31].
There remain a few open questions from this work.It would be interesting to investigate how we could make our computations more efficient.In particular, when exploring more complex scenarios the complexity of the SDPs can quickly grow (see the difference in runtimes between Fig. 1 and Fig. 2).To combat this one could search for other variational forms that converge faster to the von Neumann entropy.Otherwise, what are the best monomial sets to include in our relaxations or can we exploit symmetries to reduce the size of the SDPs [61]?When computing the NPA relaxations we made various simplifications to the problem (see Remark 2.6 and the captions of the various figures), in what settings can these simplifications be made without losing tightness for the rate curves?In a different direction, one could also look at applying our bounds on the relative entropy to bound other entropic quantities.
where on the second line we used the fact that the Gauss-Radau formula (54) is exact for all polynomials of degree up to 2m − 2 means that the terms k = 0, . . ., 2m − 2 in the sum are equal to 0. Thus this gives ln(x) − r m (x) = O((x − 1) 2m ).
2. For any fixed x > 0 the function t → f (t, x) is continuous on [0, 1] and so can be approximated arbitrarily closely over [0, 1] by polynomials.The claim then follows from the fact that the m-point quadrature rule is exact for all polynomials of degree 2m − 2.
3. This item can be shown using the fact mentioned in Remark 3.10 that r m (x) = −r m (x −1 ) where rm is the (m, m − 1) Padé approximant of ln at x = 1.Indeed it was shown in [56,Equation (27)] that the functions rm satisfy rm (x)− rm+1 (x) = (x−1) 2m
B Proof of Lemma 3.7 concerning the Sylvester equation (50) We use a result of [62] to show the existence of a bounded operator Z satisfying (50).Given the positive semidefinite operators ρ and σ on H, we construct the operators on the direct sum . Observe first that both X and K are positive semidefinite operators.In addition, recalling that α = max{ 3 2(1−t) , 3λ 2t }, we get We are now ready to apply [62,Theorem 3.1].Note that using the fact that K ≤ αX, the operator monotonicity of the square root function, and the inequality (a 2 + ax) 1/2 ≤ a + x 2 for a, x ≥ 0, we get for any s > 0 Thus we can apply [62, Theorem 3.1] and obtain the existence of a positive semidefinite operator T on H ⊕ H with operator norm bounded by α that satisfies: Writing T = T 11 Z Z * T 22 , the equality (73) implies and we have ∥Z∥ ≤ ∥T ∥ ≤ α.

C The NPA hierarchy
In this section we briefly detail the NPA hierarchy which allows us to relax non-commutative polynomial optimization problems to a hierarchy of semidefinite programs.For a full exposition, we refer the reader to the original work [24].
We begin by recalling the basic definitions of noncommutative polynomial optimization problems.We consider noncommutative polynomials in the variables x = (x 1 , . . ., x n ) and their adjoints x * = (x * 1 , . . ., x * n ).A word corresponds to a monomial constructed from the variables above, i.e. products of the variables.We let A k be the set of noncommutative polynomials that are linear combinations of words with |w| ≤ k.Similarly let W k be the set of words with |w| ≤ k.We are interested in a noncommutative polynomial optimization problem of the form Let Q = {q j : j = 1, . . ., m} be the set of positive polynomials in (75).The quadratic module M Q is the set of all polynomials of the form ij f for some C > 0 then we say the problem is Archimedean.Note that if each of the variables x i in the problem has an explicit bound on their operator norm, i.e. x * i x i ≤ C i and x i x * i ≤ C i then the problem becomes Archimedean if these constraints are added to the problem.A moment relaxation of level k is defined by a positive semidefinite linear functional L : A 2k → C, i.e. for f ∈ A 2k with f = w∈W 2k f w w for some f w ∈ C then L(f ) = w∈W 2k,2k f w L(w).These positive semidefinite linear functionals are in one-to-one correspondence with so-called moment matrices of level k M k which is a positive semidefinite matrix whose rows and columns are indexed by words in W k and whose (v, w) entry is given by For each q ∈ Q we also define the localizing moment matrix M q k−dq where d q = ⌈deg(q)/2⌉ as the matrix whose rows and columns are indexed by words in W k−dq and whose (u, v) entry corresponds to We may then define the k th level relaxation of (75) as If the problem is Archimedean then the authors of [24] showed that lim k→∞ c k = c opt .

D Additional plots
In this section we provide some additional plots that demonstrate the convergence of our technique.
In particular we recover instances of other known tight analytical bounds [18,19] in addition to [15] which was shown in Figure 1.
In Figure 5 we demonstrate that our technique can be used to recover the tight analytical bound [18] on inf H(A|X = 0, Q E ) for devices constrained to violate the Holz inequality [63] for three parties.
Where the Holz inequality is formulated as follows: let Alice, Bob and Charlie each have binary input devices that give outputs in {1, −1}.Given a projective measurement {A 1|x , A −1,x } for Alice on input x let A x = A 1|x − A −1|x be the corresponding observable.Define observables B y and C z for Bob and Charlie in the same way.Finally, given a pair of observables {X 0 , X 1 } define X ± := 1 2 (X 0 ± X 1 ).Then the Holz Bell-expression for three parties is given as   [18].Comparison of lower bounds on H(A|X = 0, QE) for quantum devices that constrained to achieve some minimal violation of the Holz inequality for three parties [63].Numerical bounds were computed at a relaxation level 2 + ABC.A single SDP takes a few seconds to run at this level.
it has a classical bound of 1 and a quantum mechanical bound of 3/2.In the figure we see that, in a similar fashion to Figure 1, we recover the tight analytical bound at a low NPA relaxation level and rapidly in the size of Gauss-Radau quadrature.
In Figure 6 we look at recovering instances of the family of tight analytical bounds on the asymmetric CHSH inequality [19].The asymmetric CHSH inequality generalizes the standard CHSH inequality in the following way.Let the observables for Alice and Bob be defined in the same way as above, then the asymmetric CHSH expression with weight α ∈ R is given as We see that for α = 1 we recover the standard CHSH expression.The maximum classical value is given by max{2, 2|α|} and the maximum quantum value is given by 2 √ 1 + α 2 .In the figure we plot two instances of this family, α = 1.1 and α = 0.9.Similar to the Holz inequality we find that for both values of α we chose, we recover the analytical bound.However, note that in the case where α = 0.9 it was slightly more challenging.If we use speedup (3) from Remark 2.6 then for the chosen relaxation level and Gauss-Radau quadratures, we do not recover part of the analytical bound.In particular, with the splitting of the objective function it appears to be unable to recover the tight rate curve.In particular, the construction of the analytical rate curve for |α| < 1 in [19] is split into two parts, a convex function g(s) is first defined which gives the rate curve for the larger violations s.Then the actual rate curve r(s) is defined as where is the unique point such that the tangent of g gives a value of 0 at s = 2. Basically, the construction is to take the function g to be the rate curve until the tangent of g intersects the point (2, 0) (moving from high scores to low scores) then the remaining rate curve is given by this tangent function.Numerical testing indicates that it is the flat part of the curve that the split objective (speedup (3) in Remark 2.6) cannot capture.Nevertheless, at the cost of more computational time, we can still run the numerics without speedup (3) and we're able to recover all of the rate curve in this way.[19].Comparison of lower bounds on H(A|X = 0, QE) for quantum devices that constrained to achieve some minimal violation of the asymmetric CHSH inequality for different values of the weighting parameter α.For subfigure (a) the numerical bounds were computed at a relaxation level 2 + ABZ including all monomials present in the objective function and implemented using speedups (1) and (3) from Remark 2.6.For subfigure (b) we used speedups (1) and (2) from Remark 2.6.

E Entropy accumulation and min-tradeoff functions
In order to compute the rates of finite round DI-RE or DI-QKD protocols, one can bound the total conditional smooth min-entropy H ϵ min (AB|XYE) accumulated by the devices during the protocol [64].There are two tools, the entropy accumulation theorem [9,10] and quantum probability estimation [11], that allow one to break down this total smooth min-entropy into smaller, easier to compute quantities.For instance, the entropy accumulation theorem roughly states that in an n-round protocol H ϵ min (AB|XYE) ≥ nf (q) − O( where f (q) is a so-called min-tradeoff function defined as any function f that maps probability distributions over a finite set C (see the discussion around (14)) to R such that where the infimum is a device-independent optimization over all strategies compatible with the average statistics (C, q) observed in the protocol.
For a fixed distribution q the rate calculations as performed in the main text will give a valid value for a potential min-tradeoff function.However, as we will now show, a single rate computation for some distribution q ′ can also be used to construct a min-tradeoff function f q ′ that can then be evaluated for any distribution q.This construction is a consequence of weak duality for semidefinite programming problems [65].
Consider the following primal and dual pair of semidefinite programs where b ∈ R m is some constraint vector and C, F i are real symmetric matrices.For the purposes of this exposition the primal problem should be thought of as the NPA moment matrix relaxation, where X is the moment matrix. 8The constraints Tr [F i X] = b i can then be any general NPA constraints, as well as any statistical constraints that may have been imposed on the devices, e.g., for some statistical test (C, q) we would impose constraints of the form (a,b,x,y):C(a,b,x,y)=c p(a, b, x, y) = q(c).We want to show that for any fixed b, a feasible point ( λ, Ŷ ) to the dual problem parameterized by b defines a function g(b) = i λi b i that is everywhere a lower bound on the optimal primal value p(b).To see this note that for any feasible point X of the primal problem parameterized by b we have where for both inequalities we used the fact that Tr [AB] ≥ 0 when A and B are both positive semidefinite.Rearranging and taking the infimum over all feasible X we find that g(b) ≤ p(b).Thus, from a single solution to the dual problem we can derive a function g which is everywhere a lower bound on the optimal primal value.Moreover, by modifying the choice of constraint vector b with respect to which we solve the dual we can derive different lower bounding functions.
As mentioned above in the problems we consider the constraint vector will in general consist of constraints that are fixed and constraints that correspond to the statistical test (C, q) which we vary as q varies.As such we can split our constraint vector b into (b fixed , b vary ) where b vary is precisely the distribution q viewed as a vector.Then by defining α = λ fixed • b fixed where λ fixed is the part of the dual solution λ corresponding to the b fixed part of b, we can derive a function g(q) = α + λ vary • q where we have written q in place of b vary .Thus, for any primal problem corresponding to a lower bound on rate of a protocol satisfying some statistical test (C, q) , e.g., inf H(A|X = 0, E), we immediately get an affine function g(q) that satisfies g(q) ≤ inf H(A|X = 0, E) (87) which is in essence a min-tradeoff function.One could then use g(q) together with the entropy accumulation theorem to get bounds on the rates of finite round protocols and prove security in exactly the same manner as in [37].

F Improved bounds for noisy-preprocessing
For this section we assume that all Hilbert spaces are finite dimensional.When deriving the statement (5) we removed the m th term in the summation D F1 by lower bounding it by a trivial value Tr [ρ] − λTr [ρ].In the application to computing device-independent rates λ = 1 and the lower bound is 0. This bounding was done in order to guarantee that all the operators in the subsequent NPA hierarchy relaxations were explicitly bounded which can help improve numerical stability.However, in the application to DI-QKD where we consider noisy-preprocessing we can derive a tighter bound that depends on the bitflip probability q ∈ [0, 1/2].
(92) Applying this to the above expression we find where we used the fact that Tr ρ 2 AE (I ⊗ ρ −1 E ) ≤ 1.We can see that this bound is tight at q = 1/2 as this implies the Petz entropy of order 2, H 2 (A|E) = − log 2 Tr ρ 2 AE ρ −1 E , must be at least 1.This is exactly what we expect as Alice's system at q = 1/2 is just a uniformly bit.Similarly if q = 0 we recover our original bound without preprocessing.
Putting everything together, we have this implies that the constant term c m in Lemma 2.3 can be replaced by when computing key-rates for DI-QKD protocols that include the noisy-preprocessing step.

ln 2 .
Moreover these lower bounds converge to H(A|X= x * , Q E ) as m → ∞.

Figure 1 :
Figure 1: Recovering the local CHSH bound.Comparison of lower bounds on H(A|X = 0, QE) for quantum devices that constrained to achieve some minimal CHSH score.Numerical bounds were computed using speedups (1), (3) and (4) from Remark 2.6 at a relaxation level 2 + ABZ + AZZ.A single SDP took less than one second to run.

Figure 2 :
Figure2: Global randomness from the CHSH game.Comparison of lower bounds on H(AB|X = 0, Y = 0, QE) for quantum devices that constrained to achieve some minimal CHSH score.Our technique was computed using speedups (1) and (2) from Remark 2.6.For m = 8 a single data point can take hours to run.We also compare with the iterated mean entropy H ↑ (4/3) (AB|X = 0, Y = 0, QE) from[26] and an upper bound from[36].

( 2 )Figure 3 :
Figure 3: Global randomness in the 2222-scenario.Comparison of lower bounds on H(AB|X = 0, Y = 0, QE)for quantum devices that are constrained by a full distribution.The numerical bounds for our technique were computed using speedups (1) and (3) from Remark 2.6 at relaxation level 2 + ABZ including all monomials present in the objective function.A single SDP takes less than a minute to run at this level.We also compare with the iterated mean entropy H ↑(2) (AB|X = 0, Y = 0, QE) from[26] and the numerical technique from[25] which we refer to as the TSGPL bound.

Figure 4 :
Figure 4: QKD 2322 rates.Comparison of lower bounds on the key rate H(A|X = 0, QE) − H(A|B, X = 0, Y = 2).The curves related to our technique were computed using speedups (1) and (3) from Remark 2.6 at a relaxation level 2 + ABZ and a single SDP at this relaxation level takes a few seconds to run.We also compare with the iterated mean entropy H ↑ (8/7) from[26] and with the analytic bound for a CHSH based DI-QKD protocol with a preprocessing step from[44,19].

Remark 3 . 4 (
Finite-dimensional case and the relative modular operator).It is instructive to consider the setting where H is a d-dimension Hilbert space.In this case B(H) is the algebra of d×d matrices with elements in C. Then the two positive semidefinite linear functionals ρ and σ can be represented by positive semidefinite operators ρ and σ, i.e., ρ(a) = Tr [ ρa] and σ(a) = Tr [ σa] for all a ∈ B(H).Let L σ and R ρ be left and right-multiplication operators by σ and ρ, that is L σ , R ρ : B(H) → B(H) where L σ (a) = σa and R ρ (a) = a ρ.Note that L σ and R ρ are two commuting operators on B(H) that are self-adjoint with respect to the Hilbert-Schmidt inner product on B(H), ⟨a, b⟩ HS = Tr [a * b].

Figure 5 :
Figure5: Recovering the tight analytical bound of[18].Comparison of lower bounds on H(A|X = 0, QE) for quantum devices that constrained to achieve some minimal violation of the Holz inequality for three parties[63].Numerical bounds were computed at a relaxation level 2 + ABC.A single SDP takes a few seconds to run at this level.

Figure 6 :
Figure6: Recovery of the tight analytical bounds from[19].Comparison of lower bounds on H(A|X = 0, QE) for quantum devices that constrained to achieve some minimal violation of the asymmetric CHSH inequality for different values of the weighting parameter α.For subfigure (a) the numerical bounds were computed at a relaxation level 2 + ABZ including all monomials present in the objective function and implemented using speedups (1) and (3) from Remark 2.6.For subfigure (b) we used speedups (1) and (2) from Remark 2.6.

Lemma 2.3. Let
m ∈ N and let t 1 , . . ., t m and w 1 , . . ., w m be the nodes and weights of an mpoint Gauss-Radau quadrature on [0, 1] with an endpoint t m = 1, as specified in Theorem 3.8.Let ρ Q A Q B Q E be the initial quantum state shared between the devices of Alice, Bob and Eve and let {M a|x * } denote the measurements operators performed by Alice's device in response to the input X = x *. Furthermore for i = 1, . . ., m − 1 let