Device-independent certification of one-shot distillable entanglement

Entanglement sources that produce many entangled states act as a main component in applications exploiting quantum physics such as quantum communication and cryptography. Realistic sources are inherently noisy, cannot run for an infinitely long time, and do not necessarily behave in an independent and identically distributed manner. An important question then arises—how can one test, or certify, that a realistic source produces high amounts of entanglement? Crucially, a meaningful and operational solution should allow us to certify the entanglement which is available for further applications after performing the test itself (in contrast to assuming the availability of an additional source which can produce more entangled states, identical to those which were tested). To answer the above question and lower bound the amount of entanglement produced by an uncharacterised source, we present a protocol that can be run by interacting classically with uncharacterised (but not entangled to one another) measurement devices used to measure the states produced by the source. A successful run of the protocol implies that the remaining quantum state has high amounts of one-shot distillable entanglement. That is, one can distill many maximally entangled states out of the single remaining state. Importantly, our protocol can tolerate noise and, thus, certify entanglement produced by realistic sources. With the above properties, the protocol acts as the first ‘operational device-independent entanglement certification protocol’ and allows one to test and benchmark uncharacterised entanglement sources which may be otherwise incomparable.


Introduction
Entanglement is one of the most fundamental concepts of quantum physics, distinguishing it from classical physics [HHHH09]. Furthermore, it plays a crucial role in the advantages gained by considering applications of quantum physics such as quantum computation [Wil10], communication [DW02], and cryptography [BS16].
For most applications utilising entanglement, a single entangled pair of particles, e.g. a maximally entangled state F ñ = ñ + ñ + | (| | ) 00 11 1 2 , is not sufficient. Instead, one must use many copies of entangled states or, to put differently, a highly entangled state, such as the maximally entangled state F ñ = å ñ ñ of rankL. Sources that produce high amounts of entanglement are, thus, a prerequisite for gaining from the computational and cryptographic advantages that quantum physics and quantum information unveil.
Three interesting questions then arise. Firstly, how should one quantify the amount of entanglement produced by a source? Secondly, how can one compare, or benchmark, different types of entanglement sources? Thirdly, how can one test, or certify, that high amounts of entanglement are indeed being produced by the source? Importantly, we would like to answer these questions in an operational way. That is: (a)the suggested answer should be relevant for realistic sources and experimental settings. (b)When considering possible certification procedures, the statement should apply to the entanglement which is still present after the test rather than the entanglement which was already used and, hence, destroyed by the performed certification procedure itself.
The current work is concerned with certifying entanglement in such an operational way. Realistic sources are inherently noisy: they produce at best noisy entangled states. Moreover, a source creating many entangled pairs might produce systems that are correlated with one another due to, e.g. drifting of the source with time or the presence of a memory inside the source. In other words, the overall state produced by the source (after using it many times) is not an independent and identically distributed (IID) state. We call such sources noisy non-IID sources.
Clearly, in order to make any estimation of the quality of the source one must collect some data regarding the produced entanglement by performing certain measurements on the quantum states produced by the source. Similarly to the considerations regarding the source, the measurement devices might also behave in a noisy and non-IID manner.
1.1. Device-independent entanglement certification (DIEC) We fix as a target to answer our questions while accommodating the noisy non-IID nature of the problem in the so called device-independent (DI) framework [Sca12]. In the DI approach one treats the quantum apparatuses as black boxes with which we can only interact classically. That is, we assume no prior knowledge regarding the internal behaviour of the source and the measurement devices. Assumptions are made, however, on the relation that certain devices can have with each other, and on the communication allowed between them. Concretely, we only interact with those devices by 'pushing buttons' and collecting the classical data output by them. For example, we may push a button on the source apparatus to produce a state and then another button on the measurement device to ask it to perform a measurement with 'input' 0 or 1. The state produced by the source is uncharacterised and we do not know which measurements are actually being performed when we use the inputs 0 and 1.
As widely known, the only way to demonstrate that the actions of physical devices cannot be explained by classical physics in a DI manner is to perform some Bell tests using the devices and observe a violation of the considered Bell inequality [Bel64,BCP+14]. A Bell violation acts as a 'witness' attesting to the quantum (in fact, non-local) nature of the devices used to violate the inequality. This can then be used to derive conclusions regarding, e.g. the structure of the underlying quantum states [SW87,PR92,CGS16] or the randomness of the measurements' outcomes [Col09,VV12].
We remark that by using the DI method we do not only treat imperfections in the quantum apparatuses which are known or can be characterised in advance. The DI approach allows one to derive conclusions without making assumptions regarding the types of imperfections. Even more drastically, the devices can be assumed to be 'malicious' 4 ; as long as a violation of a Bell inequality is observed, the observer, or verifier, can be sure of the quantum nature of the systems without placing significant trust in the manufacture of the devices.
Most previous works, both theoretical and experimental, that can be seen as DIEC procedures work only under the IID assumption and thus fail to be operational in the sense defined above 5 . This includes tests concerned with the demonstration of entanglement via an entanglement witness (i.e. answering a yes-no question) [GT09,BBS+13,Ban14], as well as more quantitative analyses of entanglement measures such as the negativity and dimension witness [MBL+13]. In all of these works, the focus is on relating properties of a single multipartite state to some asymptotic statistics. An application of these works in an experimental setting is then most straightforwardly obtained under the assumption that the experiment consists in a repetition of identical rounds in which the same state is produced independently each time, i.e. under the IID assumption. The same assumption is inherent to claims regarding the amount of remaining entanglement which was not consumed by the estimation procedure: such claims rely on the assumption that more states, identical to those which were used for the testing phase, can be created by the source.
Another line of recent works deals with self-testing of high dimensional entangled states [McK16,CRSV16,Col17,CN16,NV17,CS17]. The goal of such works is more ambitious than entanglement certification; they aim to quantify the distance of the state used to perform the relevant tests from some specific target state, e.g. F ñ | L (up to local isometries, which cannot be excluded in the DI setting). Of course, once such a bound is derived it also implies lower bounds on various (continuous) entanglement measures.
A crucial disadvantage of these works is that they are not noise-tolerant in any realistic sense and therefore are not adequate when dealing with noisy sources of entanglement. For example, IID noisy sources, which produce many IID copies of a noisy entangled state, e.g. the two-qubit Werner state  s x f f x = -ñá + + + ( )| | 1 4 for some constant (i.e. independent of the number of copies being created) noise value x Î [ ] 0, 1 , pass the considered self-tests only with negligible probability. Thus, no conclusion regarding the amount of entanglement produced by such IID noisy Definition 1 (Asymptotic IID distillable entanglement). The asymptotic IID distillable entanglement of a bipartite state   s Î Ä A B is given by where Γ is an LOCC map (with respect to the bipartition of σ) and F is the fidelity.

Equation
(1) describes a scenario in which one starts with n independent copies of σ and requires that, as n goes to infinity, the error of the distillation protocol goes to zero. This explains why we call it here the asymptotic IID distillable entanglement and not simply the distillable entanglement as it is usually called in the literature. However, as we claimed above, the sources that we consider do not necessarily produce IID states s Än (and they, definitely, do not emit  ¥ n entangled states). Hence, does not truly quantify the entanglement produced in our scenario.
For our purpose, a more suitable entanglement measure is the one-shot distillable entanglement [BD10]. In the one-shot scenario Alice and Bob share a single copy of a bipartite state ρ and their goal is to convert it to the maximally entangled state F ñ | L , for the maximal possible value L, using only LOCC. We say that the distillation protocol is successful when the resulting state is ε-close to F ñ | L for some fixedε. To state the definition of the one-shot distillable entanglement in a way comparable to its asymptotic IID counterpart ¥ E D , we consider n for a fixed n and some Hilbert spaces  A and  B (while ρ itself does not necessarily have the IID form s Än ). We then identify the one-shot distillation rate as = ( ) r L n log 7 . We can now use the following definition: n is given by where Γ is an LOCC map (with respect to the bipartition of ρ) and F is the fidelity.
describes the number of maximally entangled states which can be extracted, using LOCC, from a single copy of an arbitrary bipartite state ρ while allowing for some error ε. Hence, it captures the amount of entanglement available in ρ when using it as a resource in quantum information processing tasks.
1.2.1. Structure of the paper The following sections are arranged as follows. In section 2 we present our contribution: our definition of an operational DIEC, the considered setting, and our results (the protocol and achieved rates). One can find all the necessary preliminary information and notation in section 3. Section 4 is devoted to presenting the main steps of 6 Note that this issue is inherent to the distance measures used in all self-testing works (i.e. their objective) rather than some non-optimal properties of the specific tests considered in the mentioned works. 7 For a given ρ there are different ways of choosing n,  A and  B such that is independent of these choices). Later on there will be no ambiguity regarding the value of n and hence the used definition will be the one most relevant for us. the proof and section 5 includes several remaining open questions. All the technical details of the proofs can be found in the appendix.

Our contribution: operational DIEC
After setting the stage in the previous section, we are now ready to state the objective of the current work-DI certification of one-shot distillable entanglement-and our results. We start in section 2.1 by introducing and motivating our definition of an operational DIEC protocol. We then explain in section 2.2 the exact setting of the source and measurement devices considered in our work. In section 2.3 we present our results.
2.1. The goal Let us start by defining explicitly what we mean by a DIEC protocol and, by this, set the goal of this work. Given an uncharacterised source of entanglement producing n bipartite systems globally described by the state n for some (unknown) Hilbert spaces  A and  B and (at least) two measurement devices not entangled to one another but otherwise uncharacterised, our goal is to find a DIEC protocol, employing only LOCC, that certifies that f is highly entangled in a meaningful operational way. The certification protocol is going to act on f, using the measurement devices, and we would like to claim (roughly) that, if the protocol does not abort, then the final state has high amount of one-shot distillable entanglement.
More precisely, we define a DIEC as follows. There are several important remarks to make regarding the above definition. 1. One possible example for an honest source is a source producing n independent copies of the Werner state 4 for some maximal value of x > 0. We can then define n honest  honest can be defined to include, e.g. the measurement devices that apply the optimal measurements performed in the CHSH game. The noise-tolerance property, also termed completeness, states that the protocol should not abort, with high probability, for any  f Î honest honest even though some noise x > 0 is present. This implies (in combination with the soundness property) that P is able to certify the entanglement produced by the honest source. Of course, the honest sets  honest and  honest can be chosen in any way one wishes and depending on the experimental setting one has in mind. For instance, it could include states with a different noise model than the Werner state. In most cases the manufacture of the entanglement source (the experimentalist) has some 'guess' for a realistic description of the source and measurement devices. In most applications, these define the sets that should be chosen as the honest sets.
3. To be able to quantify the entanglement produced by the source itself, rather than the entanglement that could be present inside the measurement devices, we need to assume that the measurement devices were not entangled before the start of the protocol. This situation is somehow similar to the requirement for measurement independence in a Bell test: if the measurement settings used in a Bell test are chosen by a device correlated with the source, then quantum statistics can be reproduced by local models [Hal11,BG11]. In practice, measurement independence is guaranteed by assuming that the devices used to produce the measurement settings behave independently from the rest of the setup. Similarly, we could assume that the measurement device of each party is independent from all other devices involved in the protocol. However, this hypothesis is stronger than necessary. We thus make the lighter assumption that the measurement devices share no entanglement with each other at the beginning of the protocol.
4. Only LOCC protocols should be considered as DIEC protocols since we would like to certify the distillable entanglement produced by the source and not the entanglement that may be produced by a protocol employing operations that cannot be explained via LOCC 10 .
5. In a way, the definition of a DIEC can be seen as an extension of the so called SWAP technique [BNS+15], used in self-testing works, to the non-IID setting. Roughly speaking, by constructing the SWAP operators one can claim that a state close to, e.g. the maximally entangled state, up to local isometries can be 'extracted' out of the uncharacterised devices. The construction of a DIEC protocol implies that one can extract a state close to many maximally entangled states using LOCC.
6. The above definition allows us to use DIEC protocols as a way to compare different entanglement sources which are otherwise incomparable. For example, one can consider two sources developed by different experimental groups. One source produces, say, many identical copies of the Werner state while the other creates many identical copies of perfect partially entangled states. Each group is free to choose a DIEC protocol, as in definition 3, that will result in the highest lower bound on the distillable entanglement produced by its source. If one group wishes to claim that its source is 'better', then the certified distillation rate of their source (which can be verified by any user) must be higher than that of the competing source.
The protocols allow us to benchmark one source against the other in a meaningful way by allowing any user to verify a claimed lower bound on the produced entanglement available for further applications.

The setting: source and measurement devices
Here we describe the theoretical setting considered in our work to which our results apply, i.e. under which we will prove the soundness of our DIEC protocol. Different variants of this setting can be chosen depending on one's interest; we choose the presented one as we believe it is both realistic and simple to discuss. This setting is compatible with the standard assumptions used in the DI setting (e.g. when testing the CHSH Bell inequality in DI quantum key distribution or randomness generation protocols). In particular, we employ the following standard assumptions: the measurement devices are separated in space and are the verifier can restrict their communication in different stages of the protocol and the verifier holds a trusted random number generator and is able to make basic classical calculations required to run the protocol. In addition, we assume that quantum physics is correct.
2.2.1. The theoretical setting: source, measurement devices, and quantum registers In order to be able to talk about the entanglement available after running the DIEC protocol we must be able to have a well defined state at hand, whose entanglement we are quantifying. For this, we consider a theoretical setting which fulfils the following two conditions. First, the production of any entangled states (if such are being created) can be attributed only to the source. In other words, we assume that the measurement devices neither produce nor hold additional entanglement. This implies a distinction between the source and the measurement devices.
Second, the state produced by the source, or the 'post-protocol state', can be kept in some registers, i.e. quantum memory. This is necessary at the theoretical level since we wish to discuss the remaining entanglement in a meaningful way. The registers are trusted, in the sense that they cannot be manipulated by the devices after they are accessed during the protocol. Indeed, if the devices are allowed to, e.g. measure the registers in which the final state is being kept then, clearly, one cannot say anything regarding the entanglement left in the system 11 . 10 Another option may be to consider protocols that employ only separability preserving operations; see [Rai97,CDKL01,BD11a]. 11 This also fits the distinction we made above between the source and the measurement devices. If the entanglement is produced by the measurement devices and is kept 'inside of them' then they can also destroy whatever is left in the end of the protocol. While the results are closely related, Lemma 5 does not follow from [PAB+09].
Note that by considering the one-shot distillable entanglement we are already hinting that one should be able to apply an entanglement distillation protocol on the certified state (at least in theory). To apply such a protocol, the state must be available somewhere so it can be manipulated. Defining some quantum registers in which the state is being kept is therefore necessary in our context. The quantum registers are merely a theoretical tool, they are not needed in an experimental implementation of our DIEC protocol.
The reminder of the current section is devoted to explaining precisely the setting, fulfilling the above conditions that we consider in the current work. The setting is illustrated in figure 1.
The entity wanting to certify the entanglement produced by the source is called the verifier. To run the DIEC protocol the verifier holds two separated (space-like or otherwise shielded) measurement devices. As we consider two measurement devices we can also treat the verifier as two parties, Alice and Bob (both the 'verifier' and 'Alice and Bob' are used below, depending on the context).
We consider an uncharacterised source emitting a sequence of entangled quantum systems. (For example, one can imagine a source that emits pairs of entangled photons one after the other.) We denote by  Î + n the number of systems produced by the source, i.e. the number of times the source is being used during the DIEC protocol. For every Î [ ] i n , the source produces some unknown statef i that is being kept in the memory. The verifier then chooses whether to measure the state using his measurement devices or not. If he measures the state then the classical inputs given to the devices and the outputs produced by them are kept in a classical memory. If f i is not measured then it is kept as is in a quantum memory. The next state produced by the source, f + i 1 , can depend on all of the previous classical information, i.e. which of the previous states were measured and what were the inputs and outputs in those rounds.
Note that while f + i 1 can depend on whether a test was made in a previous round < + j i 1, the source and the measurement devices have no further access to the quantum states f j which were not measured. That is, we assume that each f j which was not measured is not affected by the devices after the jth round. Again, this is necessary as otherwise the remaining entanglement can be destroyed by the devices. We emphasise that this does not imply that the systems of the different rounds cannot be entangled with one another.
We use the above model in our protocol and its analysis. However, we remark that the protocol and analysis can be adapted to capture other models with a sequential structure.

Our DIEC protocol and the achieved rates
Our DIEC protocol is presented as protocol 1. It is based on the CHSH inequality (see section 3.2 for the necessary basic information). Similar protocols can also be considered for other Bell inequalities. Protocol 1. DIEC protocol (based on the CHSH inequality) Arguments: D-untrusted measurement device of two components with inputs and outputs set { } 0, 1 .   Î + n -number of rounds. γ-the probability of a test.  w exp -expected winning probability in the CHSH game. d Î ( ) 0, 1 est -width of the statistical confidence interval for the estimation test.
1:For every round Î [ ] i n do Steps 2-10: 2:Let f i denote the bipartite state produced by the source in this round.
As mentioned in the previous section, we consider a source which produces a sequence of bipartite states, denoted by f i for Î [ ] i n . The verifier chooses whether to measure f i or keep it as is in the memory. The register T i describes whether a test was performed or not. If a test is performed, the registers X Y A B i i i i hold the classical inputs and outputs. The register W i is set to 1 when the CHSH game is won in the ith round and 0 otherwise. When a test is not being performed, the state f i is kept in the quantum registersˆÂ B i i . We allow the source to 'know' all the classical information of the previous rounds, i.e.

(
) TABXYW i 1, , 1 and hence f i can also depend on this information.
We denote the state in the end of protocol 1, before Step 11, by i.e. the event of not aborting the protocol. We denote the state after the end of the protocol, conditioned on not aborting, by

ABABXYTW
We define the bipartition of ρ and r W | as It is easy to see that for this bipartition protocol 1 employs only LOCC for any measurement devices in our setting (i.e. initially nonentangled devices that do not communicate during the time of the measurement).
Our main result states that protocol 1 is indeed an operational DIEC protocol. That is, it fulfils the requirements of definition 3.
As the honest source and devices we choose to consider a source that produces identical and independent copies of a state f s = i and measurement devices that apply the same measurements in each round when they are used. The state σ and the measurements are such that the winning probability achieved in the CHSH game is at least w exp . For example, one can choose  honest to include the measurement devices that apply the optimal measurements performed in the CHSH game and the set  honest to include all states f s = Än honest forσ any noisy maximally entangled state that will result in winning probability w exp .
The following theorem states our main result. 2. Entanglement certification (soundness): For any source and measurement devices in the considered setting, either protocol 1 aborts with probability greater than e -1 snd when applied on f or where h opt is defined in equation (14).
The exact bound on the certified distillation rate r is not very informative, so we postpone discussing its explicit form to later. Instead, we plot the distillation rate in figure 2 for some choices of parameters. Clearly, the optimal distillation rate is upper bounded by 1 and hence Î W( ) L n log , as achieved in our work, is optimal in terms of the asymptotic dependency on n.
To discuss the tightness of the derived rates more concretely we need to first say few words about our proof technique. To prove theorem 4 we use two independent results. The first is a lower bound on the one-shot distillable entanglement in terms of the negative of the conditional smooth max-entropy e This allows us to reduce the task of proving theorem 4 to that of (upper) bounding e The main tool used to derive a bound on the smooth max-entropy is the entropy accumulation theorem (EAT) [DFR16].
Our bound on e is tight to first order in n and hence cannot be significantly improved. In particular, this means that the regime of Bell violations w ⪅ 0.775 exp in which our distillation rate is zero, although the verifier observes a violation, cannot be improved by better bounding e Qualitatively, it is not surprising that such a regime exists. Indeed, it was already shown that, in general, Bell non-locality is fundamentally different from distillable entanglement (this stands in contrast to the so called Peres conjecture). That is, there are bound entangled states (i.e. entangled state which cannot be distilled) that can be used to violate some Bell inequalities [VB14]. For the CHSH inequality, however, this is not the case [Mas06]: bound entangled states cannot be used to violate the CHSH inequality. Hence, asymptotically, one should be able to certify distillable entanglement for any violation w > 0.75 exp . This implies that, although our bound on e is tight, the relation between the smooth maxentropy and the one-shot distillable entanglement is not; it only gives a lower bound on the amount of distillable entanglement but it should be possible to distill more (see section 5 for more details). . The conditional von Neumann entropy is negative only when evaluated over entangled states. Thus, an upper bound on it can be seen, by itself, as a quantitive certificate of entanglement (though not in our operational sense). Such a bound might be of independent interest in other contexts.

von Neumann entropy as a function of the CHSH violation
Inspired by [PAB+09], in which a 'dual quantity' 12 was bounded, we derive a bound for Bell diagonal states. We prove the following: All other values were chosen so that the distillable entanglement rate is maximised. The dashed curve describes the distillable entanglement rate which can be certified in the IID asymptotic case. 12 In where A is the measurement outcome on Alice's side (when measuringsÂ i in our notation) and E describes a system used to purify sˆÂ B i i .
Lemma 5. For any Bell diagonal state sˆÂ B i i that can be used to violate the CHSH inequality with violation w Î where h is the binary entropy function.
The bound given in equation (4) is plotted in figure 3. The bound is tight, i.e. there exist states that saturate it. As mentioned above, the 'interesting' regime of the bound is that in which the conditional entropy is negative, i.e. w ⪆ 0.775 (b ⪆ 2.2). A similar bound, but for the conditional max-entropy , their bound can also be used to upper bound the von Neumann entropy. However, using our bound directly leads to better quantitive results. In particular, the bound derived in [AFRV98] only leads to a negative upper bound for b ⪆ 2.5.
The observation that there exists a regime in which the conditional entropy is positive although the CHSH inequality is violated is not new. Indeed, it was already known that that some states, e.g. the Werner state, can be used to violate the CHSH inequality while presenting positive conditional entropy [FBB17].
The fact that the bound(4) on the conditional entropy is negative as soon as the winning probability is larger than w ⪆ 0.775 allows our scheme to certify entanglement for honest implementations based only on their CHSH winning probability. In particular, the honest implementation might be different than the Werner state we considered above as an example. At the same time, since the minimum winning probability required to obtain a useful bound on the conditional entropy is larger than 0.75, a significant Bell violation is required. This has implications on the critical detection efficiency of the scheme: whereas a violation of the CHSH Bell inequality is possible as soon as the detection efficiency is larger than 66.7% [Ebe93], the minimal detection efficiency required to reach a CHSH value of 2.2 is85.3%.

Preliminaries
3.1. General notation The set n . All logarithms are in base 2. Random variables (RV) are denoted by capital letters while specific values are denoted by small letters. Sets are denoted with calligraphic fonts. For example, we use X i to denote a RV over  and x i to denote a certain value  Î x i . Most RV will refer to a specific Î [ ] i n denoted in their subscript, as in X i above. RV describing a range between . When no subscript appears then the range is from 1 to n, i.e. =  X X X , , n 1 .
The fidelity of two quantum states is given by r s

The CHSH inequality and game
In a bipartite scenario where each party, Alice and Bob, can perform one of two possible measurements, indexed by Bell inequality is a linear constraint on the conditional probability distributions ( i which is satisfied by all local hidden variable models [BCP+14]. One Bell inequality of special interest is the so-called Clauser-Horne-Shimony-Holt (CHSH) inequality [CHSH69], which takes the form This inequality admits a maximum quantum violation of b = 2 2. To achieve this maximal violation Alice and Bob can share the maximally entangled state F ñ = ñ + ñ + | (| | ) 00 11 2 and measure it with the following measurements: Alice's measurements X i =0 and X i =1 correspond to the Pauli operatorss z ands x respectively and Bob's measurements respectively. We can therefore restrict our attention to b Î [ ] 2, 2 2 . The CHSH inequality can be equivalently expressed in terms of a game. The game is defined via the winning , while the optimal classical strategy achieves a winning probability of 0.75. The relation between the winning probability in the game and the violation of CHSH inequality is given by

Max-entropy
The explicit definition of the quantum max-entropy will not be of use in the current work. Yet, we state it here for completeness. The conditional smooth max-entropy of a bipartite quantum state r AB is given by is the conditional mutual information. Note that = (

| ) I A C B
: 0if and only if given B, A and C are independent.

The EAT
The EAT ( [DFR16], theorem 4.4) gives us a way of bounding the amount of smooth min-or max-entropy accumulated during a sequential process fulfilling certain conditions. In contrast to previous works where a bound on the smooth min-entropy was derived using the EAT, the current work uses the EAT to bound the smooth max-entropy. We state here the necessary details in the context of our work.
To apply the EAT one needs to define 'EAT channels' which describe the sequential process under consideration (in our case, for example, the channels are defined via the actions of our DIEC protocol). EAT channels are defined as follows.
Definition 6 (EAT channels). EAT channels   -R ROSW : We will use below the following notation. Given a value , where  is a finite alphabet, we denote by freq w the probability distribution over  defined by = = (˜) to denote the probability that τ assigns to w.
Definition 7 (Max-tradeoff functions). Let   ¼ , , n 1 be a family of EAT channels. Let  denote the common alphabet of ¼ W W , , n 1 . A concave 13 function f max from the set of probability distributionsp over  to the real numbers is called a max-tradeoff function for  where the supremum is taken over all input states of  i for which the marginal on W i of the output state is the probability distribution p.
The statement of the EAT, relevant for the smooth max-entropy, is given below. ...
We used above a slightly and trivially modified statement of the EAT, compared to that of [DFR16]. The definition of the max-tradeoff function used in [DFR16] consideres R is isomorphic to to -R i 1 . For the calculation of the supremum one can always assume that the system on ¢ R is in product with the rest of the system and hence drop it here (see Remark 4.2 in [DFR16] as in theorem 8, for which the conditional smooth max-entropy is maximal. In our context, an alternative way of thinking about this is to note that our goal is to bound e t W ( | ) | H OS max smo . As it clearly depends only on the registersO andS, E does not take part in the proof and we can, w.l.o.g., consider an initial state of the form t t Ä The final state then also has a tensor product form.

Main parts of the proof
In this section we present the main steps and ideas used in the proof of theorem 4. The full details are given in the appendix.

Modified protocol
As explained above, our goal is to show that there exists an entanglement distillation protocol that can distill the entanglement present in r W | . Instead of considering an entanglement distillation protocol that acts on the state r W | directly, we consider a slightly modified scenario. The modified scenario will result in a state t W | , closely related to r W | , from which at least the same amount of entanglement can be distilled. Concretely, we modify the real DIEC, protocol 1, to define the modified protocol stated as protocol 2. The only difference between this protocol and protocol 1 is in Steps 11 and 12, which we explain below. 13 Let Ŵ be a set of frequencies defined via Î W (˜)ŵ freq w if and only if Î W w . We can consider concave functions, in contrast to affine ones [DFR16], since the event Ω defined in the current work results in a convex set Ŵ .
We remark that protocol 2 is being used only as a step in the proof of our theorem. We do not claim at any point that this modified protocol can be implemented by the verifier given the uncharacterised devices (in fact, it cannot). It will become clear later on why the modified protocol is, nevertheless, needed in our proof.

Protocol 2. Modified DIEC (based on the CHSH inequality)
Arguments: D-untrusted measurement device of two components with inputs and outputs set { } 0, 1 .   Î + n -number of rounds.  w exp -expected winning probability in the CHSH game.  (7).

First modification: reduction to qubits
When a test is not being performed by the verifier the state produced by the source is being kept in the memory.
In protocol 1 the state is being kept as is. In contrast, in protocol 2 we first project the state such that the resulting state is a two qubit state. The projection, described by two local projections, can obviously only decrease the amount of entanglement of the kept state. As one can guess, we define the specific projection via the measurements performed by the measurement devices 14 . Formally, due to Jordan's lemma (see, e.g.
where we added a classical register for the measurement outcomes. At this stage, the parties exchange their indices c i and d i with each other. It is easy to see that the considered projection step can be described via LOCC. Note the following.
14 These are unknown to the verifier but, as mentioned above, protocol 2 does not need to be implemented in practice by the verifier. The only thing that matters is that such projectors exist.
1. Due to the projection, the registersˆÂ B i i are now qubit registers.
2. Since the projectors given in equation (5) are defined via the measurements used by the device when T i =1, they could also be applied when T i =1 without changing the resulting state. This is made formal in lemma 9 below (for the proof see appendix A).
Lemma 9. Consider a scenario in which the projection to the two qubit space is applied on the state f i directly after it is produced by the source in the ith round (i.e. before choosing the value of T i ). Denote the resulting state in the end of the ith round in such a case by r i . Then r r =¯ˆˆ, where r i is as defined in equation (6).
It is clear that for the state r i the measurements in the case of T i =1 act on the subspace of the same Hilbert space which is otherwise kept inˆÂ B i i when T i =0, since the projection is made before choosing the value of T i . Lemma 9 implies that this is also the case when the projection is applied only when T i =0 as in our modification. Without this property one could not argue that the test rounds, i.e. those for which T i =1, represent also the rounds with T i =0.

Second modification: reduction to Bell diagonal states
After the projection, we apply another step that can be seen as a symmetrisation step, also called twirling in the literature. In each round for which T i =0, a one-qubit unitary U is chosen uniformly within the set  s s s { } , , , x y z and applied on both systemsÂ i andB i . That is, , , . In our notation, we get the following corollary: , , for some eigenvalues.
As in the projection step, the effect of applying a random unitary is restricted to the case T i =0 and can be done with only LOCC. The importance of this step lies in the following.
1. After the twirling step, the resulting two qubit state is a convex combination of Bell diagonal states (Corollary 10). This simplifies the analysis in the rest of the proof.

Properties of the modified protocol
Protocol 2 can be described mathematically by an application of a sequence of maps one after the other. The maps describe both the actions of the verifier defined by the protocol as well as the actions of the measurement devices. We denote these maps by The register -R i 1 holds here the state of the uncharacterised devices in the beginning of the ith round. That is, -R i 1 includes the state f i produced by the source as well as any other information kept by the measurement devices. R i describes a register which can be used as internal memory for the devices.
The final state in the end of protocol 2 is denoted by t t =ˆÂ BABCDXYTW and the state conditioned on not aborting by t W | . We now present 3 statements regarding the modified protocol. All the proofs are rather simple and are given in appendix A.
For start, it follows from the definition of protocol 2 that the observed statistic of ρ (the state in the end of protocol 1) and τ (the state in the end of protocol 2) are the same. This, in particular, implies that the probabilities of aborting protocols 1 and 2 are identical.
Lemma 11. The observed statistics and, hence, the probabilities of aborting protocols 1 and 2 are the same. That is, Second, as mentioned in the previous section, the final state of protocol 2, τ, has the property that for any Î [ ] i n the registersˆÂ B i i are decoupled from the other registers. This can be shown using the structure of the state given in equation (8). The formal statement needed in the next sections is given in the next lemma.
Lemma 12. Let τ denote the state after all rounds of protocol 2 (before conditioning on Ω).
The last thing which will be of use later is that r W | is at least as entangled as t W | . That is, if one can distill F ñ | L from t W | then one can also distill F ñ | L , for the same L, from r W | . This follows from the fact that both modifications described above can be implemented using LOCC, and hence they cannot increase the entanglement. The formal statement is given in the last lemma of this section.
from t W | with error probability ε. Then, there exists another LOCC protocol Δ which can be used to distill F ñ | L from r W | with the same error probability.
Lemma 13 implies that instead of proving that there exists an entanglement distillation protocol starting from r W | one can prove that there exists an entanglement distillation protocol starting fromt W | . The advantage of doing this is that t W | has some nice properties as was shown above. These will be of use in our proof given in the next sections. We emphasise again that protocol 2 acts just as a step in the proof. This is not a real protocol which we expect the verifier to implement in order to verify that the state produced by the source is highly entangled.

Single-round bound on the von Neumann entropy
The goal of the current section is to upper bound the conditional von Neumann entropy (ˆ|ˆ) H A B i i of the states produced by the source and kept in the memory in protocol 2. As explained in section 2.3.2, a negative value of the conditional von Neumann entropy can be used to quantify the amount of entanglement present in the considered state. While there are many ways to quantify entanglement, the conditional von Neumann entropy is the one relevant for our proof technique (as will become clear below).
The first lemma of this section was already presented as lemma 5 in section 2.3.2. To simplify the equations in the lemmas below we use the notation of a Bell and h is the binary entropy function.
The function H b -( ) 1 is plotted in figure 3 as a function of the CHSH violation β and the winning probability ω.
The proof is inspired by the work of [PAB+09], though we derive a bound on a different quantity. Neither result follows from the other. We give the proof sketch here; the full details are given in appendix B.   The constraints of the optimisation problem imply that we can write l  ( ) H as a function of only two variables, l Y -and l F -, for any value ofβ. As an example, l  ( ) H is presented in figure 4 for b = 2.5. One can solve this optimisation problem numerically; the solution is given by

Proof sketch. First note that
.  15 This observation was already made in [BBPS96]. In the context of [BBPS96], the eigenvalues l  are known. What we do here can be seen as an extension to the case where only the Bell violation is known and not the eigenvalues. Now that we have a solution in hand, we can verify that it is indeed the correct local optimal solution. This can be done by taking the derivatives in the relevant directions and verifying that the point given in equation (13) is indeed a maxima. , Next, we extend the claim of lemma 14 also to convex combinations of Bell diagonal states. This can be done easily using the definition of the conditional entropy. The proof is given in appendix B.

Lemma 15. Let H b
( ) be as in equation (11). Then, for any state sˆÂ B C D i i i i as in equation (8) Two remarks are in order.
1. The derived upper bound on (ˆ|ˆ) H A B C D i i i i is tight; it is saturated by the Bell diagonal state defined via the eigenvalues given in equation (13).
2. As can be seen from figure 3, there is a regime of parameters (b ⪅ 2.2) in which the conditional entropy is positive even though the state violates the CHSH inequality and, hence, is entangled. Indeed, it is known that some states, e.g. the Werner state, can be used to violate the CHSH inequality while presenting positive conditional entropy [FBB17]. This implies that the conditional entropy is not the optimal quantity to use when certifying entanglement in a DI manner. Yet, it is the relevant quantity when bounding the operationally distillable entanglement (using the known techniques) as we do below.
For the coming steps of our proof, we need an upper bound on where  i describes the maps defining the rounds of protocol 2, as in equation (10), and Σ describes the set of possible states for which protocol 2 does not abort. On the conceptual level, the protocol does not abort when the observed frequency defined by the registers W imply that the average Bell violation is sufficiently high 16 . Thus, we shall now consider the set Σ defined as  s s w is the winning probability in the CHSH game of the state σ and w th is some threshold winning probability.
The following lemma can be proven using the above lemmas together with the definition of the conditional entropy and the transformation between β and ω; see appendix B for the proof. , let  s s w We do this using the EAT [DFR16]. The EAT gives a way of bounding conditional smooth min-and maxentropies in sequential processes in which certain systems of interest are being produced one after the other in overall n steps (not necessarily in an IID way). It, roughly, states that the total amount of smooth entropy accumulated during the entire process is n times the von Neumann entropy produced in a single step of the process. In our context, this translates to saying that the total amount of smooth max-entropy can be related to the von Neumann entropy considered in section 4.2. All the definitions and statements of the EAT which are necessary for our work are presented in section 3.4.

Prerequisites of the EAT
Before applying the EAT, we show that the prerequisites of the EAT hold.
Specifically,   -ˆR are the channels defined by protocol 2 and will act as our 'EAT channels'. We need to show that these are indeed EAT channels, i.e. that they fulfil definition 6 (see section 3.4). To verify that this is the case note the following.
(due to the projection step of protocol 2).
2. The classical value W i is a function of A B X Y i i i i . Hence, it can be measured from the output of the EAT channels (for any input state) without modifying the state.
3. The necessary Markov-chain conditions hold, as stated in the next lemma.
Lemma 18. For all Î [ ] i n and any initial state, The crucial ingredient in the proof is lemma 12, which asserts that the twirling step (Step 12 of protocol 2) decouples the states kept in each round from the other registers; see appendix C for the details. It now becomes clear why the twirling step is necessary-without it the required Markov-chain conditions do not hold. For example, one can imagine a source that creates two bipartite statesf 1 andf 2 entangled with one another. In such a case the above Markov-chain conditions do not hold.
Step 12 therefore enforces the necessary conditions while not destroying the entanglement betweenÂ i andB i .

Max-tradeoff function
To apply the EAT we need to define a concave max-tradeoff function, as defined in definition 7. We construct one by following similar steps used to define min-tradeoff functions in [AFRV16].
Let p be a probability distribution over{ } 0, 1, resulting from the observed data w, i.e. = = (˜) |{ |˜}| p w w w n i i for Î{ } w 0, 1, , and define Below we focus on probability distributions for which g where g is as in lemma 16: for h the binary entropy function.
. Furthermore, as f is a concave function, we also have that is indeed a max-tradeoff function.

Applying the EAT
We are finally ready to apply the EAT, stated as theorem 8, to derive an upper-bound on the smooth maxentropy. The smooth max-entropy rate is governed by the following functions: Lemma 19. For any source and measurement device in the considered setting, let τ the state generated using protocol 2, Ω the event that protocol 2 does not abort, and t W | the state conditioned on Ω. where h opt is defined in equation (14).
The theorem follows from the combination of lemmas 20 and 21 given below.

Noise-tolerance (completeness)
As the honest source and devices we choose to consider a source that produces identical and independent copies of a state f s = i and measurement devices that apply the same measurements in each round when they are used. The state σ and the measurements are such that the winning probability achieved in the CHSH game is at least w exp . For example, one can choose  honest to include the measurement devices that apply the optimal measurements performed in the CHSH game and the set  honest to include all states f s = Än honest forσ any noisy maximally entangled state that will result in winning probability w exp . The following lemma bounds the probability of protocol 1 aborting when using an honest device as above.
Lemma 20. The probability that protocol 1 aborts for an honest implementation discussed above is at most Proof. The protocol aborts in Step 11 when the sum of the W i obtained during the test rounds is not sufficiently high (this happens when the estimated Bell violation is too low or when not enough test rounds were chosen). In the honest implementation the products c = Therefore, we can use Hoeffding's inequality: where h opt is defined in equation (14).
Proof. Given a source and measurement devices in the considered setting, we first consider the hypothetical scenario in which protocol 2 is being ran using the given devices. Putting lemmas 17 and 19 together we learn that either protocol 2 aborts with probability greater than e -1 snd or there exists an entanglement distillation protocol Γ such that . Finally, lemma 11 tells us that if protocol 2 aborts with probability greater than e -1 snd then protocol 1 aborts with probability greater than e -1 snd as well. Combining the above observations, the lemma follows. , The resulting distillable entanglement rates, ( ) L n log , are plotted in figure 2 as a function of the expected winning probability in the CHSH game w exp for different values of n. As seen from the figure, as the number of rounds of the protocol n increases our rate approach the optimal rate, for our proof technique, given by the IID asymptotic rate; see section 5 for further details.
We remark that one can also derive improved rates for finite number of rounds n by considering a slightly modified version of our DIEC protocol, similarly to what was done in [AFRV16], appendix B. As the modification of the protocol, the analysis, and the resulting rates follow directly by combining the analysis done here and that of [AFRV16], appendix B we do not present the details here. 5. Open questions 5.0.3. Tightness of our result As mention in the previous sections, if one chooses to take the path of bounding the one-shot distillable entanglement using the smooth max-entropy, as done in the current work, then our quantitive results are tight to first order of n. However, this may not be the only way to go. Considering other proof techniques is crucial in order to achieve a result in which there is positive certified distillable entanglement regime whenever a violation of the CHSH inequality is being detected.
In general, it is known that there are Bell inequalities which can be violated by bound entangled states, i.e. entangled state which cannot be distilled [VB14] and, thus, for some Bell inequalities a zero rate regime, similar to the one observed here, is of a fundamental nature. However, for the CHSH inequality this is not the case [Mas06]: bound entangled states cannot violate the CHSH inequality. Hence, asymptotically, one should be able to certify distillable entanglement for any violation.
One way of assessing how far our results are from the optimal results, achievable using any proof technique, is to find upper bounds on the asymptotic distillable entanglement of the states achieving the highest conditional entropy given their Bell violation (see section 4.2). One possible starting point is to consider the sets of states described in [LDS17].

Possible extensions of our result
There are many possible ways of extending our work.
1. Our protocol and proof technique can also be modified to work with other Bell inequalities instead of the CHSH. This can potentially increase the rates when considering different types of honest sources of entanglement. For example, if one is interested in a source that emits partially entangled states then it probably makes more sense to consider the tilted CHSH inequalities [AMP12] rather than the CHSH. The only part of the proof which requiers a modification is the upper bound on the von Neumann entropy for a single round given in section 4.2. We remark that one can achieve such a bound for any Bell inequality for which a robust self-testing result is known, e.g. [BP15], combined with the continuity of the von Neumann entropy [Win16]. However, it is likely that such an approach will lead to relatively weak quantitive results. Thus, considering the von Neumann entropy directly for other Bell inequalities is a more promising direction.
2. An important direction to consider is the extension of our work to entanglement shared between more than two parties. This can then be used, for example, to consider scenarios and results as those derived in [Ban14,MPB+16] and extend them beyond the IID setting.
3. Another possible extension of the analysis done here is to consider DIEC protocols that employ the more general (but less fundamental) separability preserving operations [Rai97, CDKL01] rather than LOCC. To do so one should first consider one-shot distillation protocols which use separability preserving operations [BD11a].
5. In a different direction, it can also be of interest to consider other settings than the one considered in the current work (as described in section 2.2). As different experiments may require different sets of assumptions, formulating other interesting scenarios and modifying the proof accordingly can be relevant.
6. Similarly, one may consider device-dependent and semi-DI versions of our work. For example, it is possible to study a one-sided DI scenario in which one of the measurement devices is completely characterised. The only part of our proof that needs be to modified in such a case is that given in section 4.2 while replacing Bell inequalities with Steering inequalities [CS16]. The rest of the proof will follow as is. The additional assumptions can potentially result in certification rates higher than the ones presented in the current work.
Lemma 9. Consider a scenario in which the projection to the two qubit space is applied on the statef i directly after it is produced by the source in the ith round (i.e. before choosing the value of T i ). Denote the resulting state in the end of the ith round in such a case by r i . Then r r =¯ˆˆ, where r i is as defined in equation (6).
Proof. Firstly, for the rounds in which T i =0 there is clearly no difference between r =|T i 0 i and r =|T i 0 i . We show that the same holds also when T i =1. To this end we need to prove that To see that this is indeed the case note that the successive application of the projections given in equation (5) and of the measurement as applied to create r i has the exact same effect as applying the measurement alone. Indeed, and similarly for Bob. Therefore, after tracing out the block registers C i and D i , both final states are identical. As the probability for choosing T i =0 is, obviously, independent of when the projection is made the combination of the above statements implies the lemma. , Lemma 11. The observed statistics and, hence, the probabilities of aborting protocols 1 and 2 are the same. That is, Proof. The only difference between protocols 1 and 2 is in the rounds in which T i =0. The observed statistics over ABXYT and, hence, also W, depend however only on the rounds in which T i = 1. Thus, r t = ABXYTW ABXYTW . The event Ω is defined according to the registers W and therefore the lemma follows. Proof. We prove below that for all c i and d i we have Note that K does not include the information encoded in F i by the definition of K. Thus, including the register F i we must have Moreover, we can freely trace F i out while preserving the tensor product structure from t W | with error probability ε. Then, there exists another LOCC protocol Δ which can be used to distill F ñ | L from r W | with the same error probability.
Proof. The only difference between protocols 1 and 2 is the addition of Steps 11 and 12.
These steps are such that their effect is restricted to the round in which they are applied. That is, whether we apply them or not in round i does not effect all other rounds ¹ j i. In other words, they commute with the rest of the operations made in the protocol. We can therefore postpone them (for all steps i with T i =0) to the end of the protocol.
Furthermore, according to lemma 11 the additional steps do not change the observed statistics and the probability of the event Ω. Thus, we can also postpone them for after making the projection on Ω.
Denoting the combination of all the projections and rotations for all relevant rounds by the mapΛ, the above means that the relation  (20) is always true). Therefore, Δ is an LOCC protocol which can be used to distill F ñ | L from r W | with the same error probability asΓ. , To simplify the constraint we observe that both the objective function l  ( ) H and the above constraint are invariant under the exchange of the eigenvalues with one another. Thus, we can assume without loss of generality that the optimal solution is restricted by, say, the first term in equation (24). That is, the eigenvalues of the state that maximise the entropy are such that To see that this is indeed the case one can assume by contradiction that the second term in equation (24), and not the first one, is the one restricting the optimal solution. Then, by exchanging the values of l Y + with l Y -we get a different state, which is restricted by the first term instead of the second one, but attains the same value l  ( ) H . Hence, the solution defined by this exchange must be an optimal solution as well and we can work with it instead.
We can therefore restrict our attention to the optimisation problem  The constraints stated in the optimisation problem given as equation (25) imply the constraint: can be written as a function of only two variables, l F -and l Y -, for any value ofβ. Let us identify the region of interest in this plane. First, we have the linear conditions  l l , we can restrict our attention to solutions for which The white point in figure 4 denotes this solution. Now that we have a solution in hand, we can verify that it is indeed a correct solution (i.e. there are no numerical errors) which is locally optimal. To do so we need to check that the local gradient vanishes at our suggested solution in order to verify that it is indeed the maxima.