The Binary-Outcome Detection Loophole

The detection loophole problem arises when quantum devices fail to provide an output for some of the experimental runs. These failures allow for the possibility of a local hidden-variable description of the resulting statistics; even if the correlations obtained from successful runs appear non-local. This is of particular significance in device-independent quantum cryptography, where verifiable non-locality is a necessary requirement for the security proof. For every scenario characterised by the amount of devices, along with the number of inputs and outputs available to them, there is a detection threshold - if the efficiency of the device falls below this no verification is possible. In this work I present an intuitive local hidden-variable construction for all no-signalling distributions with two parties and binary outcomes. This provides a lower bound on the detection threshold for quantum measurements in the same scenario tighter than known previously. When both parties have the same number of inputs into their device, this construction is shown optimal for small input numbers. I finish with some interesting conjectures.


INTRODUCTION
Due to the scales on which it operates, quantum technology faces the challenge of single photons or electrons being lost to the environment. This can result in devices failing to give any output. Ignoring these failures leads to the 'detection loophole' [1][2][3] security flaw. This is where a preprogrammed 'hidden-variable' device can falsely appear to exhibit non-local behaviour. Non-locality is necessary for the security proofs of device-independent quantum cryptography [4][5][6][7][8][9][10][11], therefore understanding and preventing the detection loophole is an extremely relevant problem. One important question to consider is how low the rate of successful detection events (the efficiency) can be before all observed correlations are describable by a local realistic model. Knowing this threshold allows one to set minimum requirements for commercial devices and benchmark current technology. However, obtaining this bound for quantum states is generally difficult due to the infinite set of extremal quantum correlations, and only a few optimal constructions are known [12,13]. In this article we present an intuitive local hidden-variable (LHV) construction for two parties, arbitrary inputs, and binary outputs, which will be able to reproduce any no-signalling distribution obtained by the successful runs, up to a detection efficiency dependent on the number of inputs. This provides a lower bound on the threshold for quantum measurements in the same scenario. When both parties have the same number of inputs into their device, this construction achieves numerically known thresholds (for general no-signalling distributions) leading us to conjecture it is optimal for this symmetric case. We furthermore show that in cases with an asymmetric number of measurements, increasing the number of Bob's measurements m B above 2 ⌈log 2 mA⌉ provides no additional power in verifying non-local correlations.
Bell's seminal theorem [14] and its subsequent generalisations [15][16][17] give fundamental constraints on the correlations exhibited by any local realistic model; constraints that quantum theory can violate. These violations have been confirmed experimentally [18,19]. Due to limitations on technology however, to show Bell violations they relied on a 'fair-sampling' assumption; that the device failures were non-malicious and the successful detections were representative of the underlying system. In cryptographic protocols however, we cannot make that assumption, allowing an adversary (Eve) to pre-program the device to fail. It wasn't until much later that loophole-free violations, with no fair-sampling assumptions, were experimentally demonstrated [20][21][22]. The difficulty involved in closing this loophole highlights the importance of obtaining the best theoretical thresholds possible, so that minimal technological developments are required to perform secure protocols.

PRELIMINARIES
In this paper, we are working in the device-independence framework. We assume that two parties (named Alice and Bob) have been distributed a joint system, on which they can make measurement choices, also referred to as inputs (labelled by x for Alice, and y for Bob) and receive outcomes (labelled a for Alice and b for Bob). We characterise the joint system only by the conditional probability distribution p(ab|xy), making no assumptions about the underlying state or measurements made. This is known as a black box description. However, we do assume that Alice and Bob can isolate their systems, also referred to here as devices, from communicating with each other. This imposes the no-signalling conditions a p(ab|xy) = a p(ab|xy ′ ) ∀b, x, y, y ′ , When the number of inputs and outputs are finite, so that . . n B − 1}, then we may express any no-signalling probability distribution via the vector p : The set of such vectors forms a convex set with finitely many extremal points, known as the no-signalling polytope, N S. This restriction is known as the (m A , m B , n A , n B )-scenario.
Within this set is a strict subset [23] of quantumly realisable distributions, Q. Unlike the full no-signalling space, Q has an infinite number of extremal points, making it more difficult to deal with computationally. Strictly contained within Q is the set of local distributions, L. Any distribution p(ab|xy) within L has a local hidden variable model of the form p(ab|xy) = Λ dλρ(λ)p(a|x, λ)p(b|y, λ). These distributions may always be expressed as convex combinations of deterministic distributions p(ab|xy) = δ a,ax δ b,by , which are finite in number. Geometrically, this means the structure of L is also a polytope.
L may be equivalently described by a set of Bell inequalities, linear inequalities of the form a,b,x,y s xy ab p(ab|xy) ≤ k, where p(ab|xy) is our input-conditional joint distribution [24]. There is a finite set of facet Bell inequalities; if all facets are satisfied by p(ab|xy) it must have a local hidden-variable model i.e. it belongs to L. Thus violation of a Bell inequality is used to prove the impossibility of a local hidden-variable model. We will also often denote a Bell inequality by a vector s = (s 00 00 . . . s mA−1,mB −1 nA−1,nB −1 ), though one must also state the sign and magnitude of the inequality.
The typical detection loophole model; and the one considered in this article, is one in which the devices fail to detect with equal probability independently of each other [25]. Whilst not completely general, it is how we would expect the device to behave if the failures were 'honest'; if we see autocorrelations, or correlations between the joint failures; this is a clear signal of adversarial manipulation. The model considered here adds an extra output to both parties to alter the original distribution p(ab|xy) in the following way: p η (ab|xy) = η 2 p(ab|xy), p η (F b|xy) = η(1 − η)p(b|y), One can see this as a linear map D η : p → p η , from the set of no-signalling distributions in the (m A , m B , n A , n B )scenario to those in the (m A , m B , n A + 1, n B + 1)-scenario. The quantity we are interested in is the (quantum) critical detection efficiency, η c := inf{η | ∃p ∈ Q, p η ∈ L}, where Q, L are considered in the (m A , m B , n A , n B )-scenario and (m A , m B , n A + 1, n B + 1)-scenario respectively.
To check the membership criterion p η ∈ L, we can calculate the local weight. This is defined for an arbitrary distribution q as: where q L is a local distribution and q ′ is a general no-signalling distribution. This linear program (see the appendix for details) gives w = 1 iff q is local.
For a given (m A , m B , n A , n B )-scenario, we can use the linear weight to lower bound the critical detection threshold η c in the following way. For every extremal no-signalling distribution p NS j , we can calculate the local weight of successive distributions p NS j,η -allowing us (e.g. by the binary chop algorithm) to determine the detection threshold mA mB 2 3 4 5 6 2 2/3 2/3 2/3 2/3 2/3 3 4/7 5/9 5/9 5/9 4 1/2 1/2 1/2 5 4/9 * TABLE I. Cases for which the no-signalling threshold has been numerically calculated; these provide a lower bound on the corresponding critical detection efficiency for the quantum set. The * indicates numerical evaluation was not attained [26].
of that particular distribution, η j . By doing this for all extremal points, we find that at η * = min j η j , the entire (m A , m B , n A , n B ) no signalling space is mapped into the (m A , m B , n A + 1, n B + 1) local polytope. Thus, η * is necessarily a lower bound of η c . We will refer to η * as the no-signalling threshold. This bounding technique was performed in [26] on m A , m B ≤ 6 and n A = n B = 2 for both parties, until the exponential growth in the number of extremal N S points became too large for numerical calculations.
Reproducing the table of thresholds from [26] in table I, there are two patterns one observes immediately; that for m A = m B = m the bound appears to match 4/(m + 4), and that, if one fixes m A , the bound for m B decreases with each additional output until m B = 2 ⌈log mA⌉ . In this article we prove that indeed the threshold for all m A = m B = m is bounded below by 4/(m + 4), and that it remains constant for all m B ≥ 2 ⌈log mA⌉ . Instead of doing this via numerical results, we construct an explicit local hidden variable model for all p η up to this threshold value.

PRE-EXISTING LOCAL HIDDEN VARIABLE CONSTRUCTIONS
In order to understand our explicit construction, it is first useful to compare it to a local hidden-variable construction for the detection loophole introduced in [25]. Valid for any number of outputs, the construction is simple yet elegant. To emphasise the idea that Alice and Bob's devices are working against them, we introduce Alexa and Boris as the names of their devices, whose goal is to falsify an arbitrary non-local distribution. Beforehand they may agree a strategy (using the local hidden variable λ) but cannot communicate once they have received their input choices. Between themselves, Alexa and Boris first randomly choose a leader, with bias towards Alexa α ∈ [0, 1]; let us suppose for this run Alexa is chosen. They then generate uniformly a prediction for Alexa's input; say k ∈ {0, . . . m A − 1}. Finally they agree on an output a ∈ {0, . . . n A − 1} for Alexa according to her desired marginal probability p(a|k). When separated, once Alexa receives her input, if they have guessed correctly she will return outcome a. If the input received from Alice does not match their prediction, then Alexa outputs a failed detection F . It is clear this occurs with probability (m A − 1)/m A . Meanwhile, Boris receives his input and returns b ∈ {0, . . . n B − 1} according to p(ab|ky)/p(a|k) regardless. Notice that they never jointly output a failure, so in order to fully reproduce inefficient statistics they must with some probability β agree to both output F , regardless of input. This strategy gives rise to the statistics: One can equate equations (3) and (5) to find this local hidden variable (LHV) strategy can reproduce statistics up to η ≤ mA+mB−2 mAmB−1 . By comparison to results in table I, one can easily check for e.g. m A = m B = 3 this is not optimal.
If g or h is non-zero, then the distribution is a lower input-number extremal point with local deterministic inputs appended to it. As such, its detection threshold cannot be lower than that of the mA − h, mB − g case; and is not generally optimal for mA, mB inputs.

A NEW LOCAL HIDDEN VARIABLE CONSTRUCTION
The Model We will look to improve this strategy on extremal binary-output N S points, thereby bounding the threshold for the entire space. To do this, we need to understand better the extremal points themselves. Fortunately, for binary outputs a complete characterisation has been provided in [27]. One can see their general form in figure 1. They may also be expressed in the simple form where Q i (x) are polynomials in the binary digits 1 of x, which we label x 2 , and R i (y) are monomials in the binary digits of y (labelled y 2 ). Similarly, S j (y) are polynomials of y 2 and T j (x) monomials. n x = ⌈log 2 m A ⌉ is the length of x 2 and similarly for n y . The most famous example of this is the (generalised) PR box [23], which has the form p(ab|xy) = 1/2 if a ⊕ b = x 2 · y 2 mod 2, 0 otherwise.
For all the numerically evaluated cases presented in table I, the generalised PR box achieves the no-signalling threshold η * . In particular, given any extremal N S point, the conditional output distribution for two input pairs either match exactly or are exactly anti-matching. This allows the following strategy: Alexa and Boris with probability α randomly choose a leader; suppose it is Alexa. They generate uniformly a prediction for Alexa's input; say k 1 ∈ {0 . . . m A − 1}; then another from the remaining m A − 1 choices a second prediction, k 2 ∈ {0 . . . m A − 1}\{k 1 }. They also with probability 1/2 decide whether they will use a matching or unmatching strategy. Finally, they decide uniformly on a value for a, a L ∈ {0, 1}. Once Alexa receives her input, if it matches k 1 she returns outcome a L . If she receives k 2 , for the matching strategy she returns a L , and if they are following the unmatching strategy a L ⊕ 1. If her input does not match k 1 or k 2 , then she outputs a failed detection The matrix highlights for which input pairs given to Alexa and Boris they can reproduce the output correlations successfully, and when they must abort, for an example λ. The detection threshold for which one can reproduce all correlations is found by averaging over all possible λ outcomes. and they chose the unmatching strategy he outputs a L ⊕ G(k 1 , z), and F otherwise. They still with some probability β agree to both output F , regardless of input. This gives statistics: The advantage of such a strategy becomes apparent in the final term; to achieve the joint failure rate (1 − η) 2 , they can devote fewer runs to deterministically outputting F F , since their guessing strategy will also output a joint failure some of the time; unlike the single input guessing strategy. Equating equations (3) and (8)

Asymptotic Power of the Model
We now prove that the no-signalling detection threshold cannot be improved by increasing asymmetrically one party's possible measurements beyond the limit m B = 2 ⌈log 2 mA⌉ . One may express any extremal point as having p(ab|xy) = 1/2 when a ⊕ b = G(x, y) =

Comparison to Numerically Known No-Signalling Thresholds
Although the bound derived in the previous section holds for all pairs (m A , m B ), we see from the numerical evidence in table I it is not generally tight. In the case where m A = 3, m B = 4, we know the no-signalling detection threshold to be η * = 5/9; however, the hidden variable strategy we have proposed only simulates arbitrary distributions up to η = 1/2. To reproduce correlations up to η * , one can mix our strategy with the pre-existing one [25] presented earlier in this paper. By choosing the pre-existing strategy, which guesses a single input, 20% of the time and our strategy, predicting two inputs, 80% of the time, and by choosing Alexa solely as the leader for both strategies one can achieve η ≤ 5/9. This mixing of strategies does not extend to higher dimensional asymmetric scenarios though; for m A = 5, m B = 6 no combination of the two strategies beats the bound given by equation (8).
As the number of input choices increases, one could propose a more general variation; in which the leader (say Alexa) chooses many input predictions k 1 . . . k n ∈ {0 . . . m A − 1}, n ≤ m A . With this strategy, they must beforehand predict whether G(z i , k) will coincide with G(z 1 , k), for each i = 2 . . . n. This is analogous to the 'matching/unmatching' choice seen earlier. The probability of guessing this correctly scales as 2 n−1 . However, the benefit of predicting additional inputs only scales as n/m A . This implies the probability of a correct output will scale as n mA 1 2 n−1 , which takes its maximal value at n = 1, 2 only. Trying to incorporate this strategy to simulate m A = 5, m B = 6 distributions, our optimisation never chose strategies with n > 2. This suggests for the asymmetric case a more nuanced joint strategy is required. However, we stress that when m A = m B , the bound predicted by this model matches all numerically obtained bounds.
In order to prove that our conjecture of η * = 4/(m + 4) for m A = m B = m is correct, one would need to provide a extremal N S distribution p NS , and corresponding Bell inequality s xy Here we have used a ′ , b ′ to explicitly remind the reader that a ′ ranges both in the original values of a and F ; that is, it is a 3-outcome inequality. From numerical results, the generalised PR box is the best candidate for the extremal point, but we found no obvious generalisation of the witnessing Bell inequalities, which are provided for evaluated cases in the appendix.

Comparison to Quantumly Realisable Thresholds
As stated above, to prove the no-signalling threshold η * for a given scenario requires a Bell inequality violation a ′ ,b ′ ,x,y s xy a ′ b ′ p NS η (a ′ b ′ |xy) ≤ k, ∀η > η * . Therefore s xy a ′ b ′ is the 'optimal' Bell inequality, in that it detects non-locality for all efficiencies above the no-signalling threshold. A natural question is whether the same Bell inequality is optimal with respect to quantum correlations; i.e. a ′ ,b ′ ,x,y s xy a ′ b ′ p Q η (a ′ b ′ |xy) ≤ k, ∀η > η c . For quantum correlations, a critical efficiency of η c = 2/3 is achievable via qubits using the m A = m B = 2 CHSH inequality [12], whilst testing ququarts with a m A = m B = 4 inequality allows a critical efficiency of ( √ 5−1)/2 ≈ 0.618. The respective Bell inequalities verifying non-locality for efficiencies higher than the critical efficiency, when applied to the generalised PR box achieve the no-signalling detection threshold, η * , for their respective scenarios. These inequalities are somewhat special in that they are 'lifted inequalities'; they are of the form s Fb|xy = s axb|xy , s aF|xy = s aby|xy , a x , b y ∈ {0, 1} ∀x, y i.e. facet 2-outcome inequalities where F is treated identically to one of the valid outputs. In contrast, the optimal Bell inequality for m A = m B = 3 requires a truly new 3-output inequality; something noted in [28]. In order to test whether our optimal Bell inequalities could lead to new quantum constructions, we employed the NPA hierarchy of correlations [29]. These allow one to define successively tighter outer approximations to Q, which we label Q 1 ⊃ Q 2 . . . ⊃ Q. For a fixed η, we can then employ semidefinite programming to look for a set of correlations such that p ∈ Q i , s · p η ≤ k,, which implies p η ∈ L. It is then clear that, if for a given i,η no such p is found, then {p ∈ Q, s · pη ≤ k} must also be empty.
For the scenarios m A = m B = 3 and m A = 3, m B = 4, we know the quantum critical efficiency is not higher than 2/3; since we may always embed the CHSH /qubit construction into these scenarios. Therefore, an improvement in the quantum critical efficiency would require that {p ∈ Q, s · p 2/3 ≤ k} is non-empty. However, in both scenarios, choosing s as the optimal Bell inequality for non-locality, we find that this set is empty at level Q 2 of the hierarchy; thus these inequalities do not help us to improve the quantum critical efficiency, η c .

CONCLUSIONS AND DISCUSSION
In this paper, we have exploited the structure of the bipartite binary-output no-signalling polytope in order to provide a lower bound on the detection loophole critical efficiency for an arbitrary number of inputs. We have done this by constructing an explicit local hidden-variable model valid for all extremal points. Numerical evidence suggests that when Alice and Bob share an equal number of inputs, this construction is optimal. An open question is whether one can find a family of Bell inequalities verifying this.
One possible extension to this work would be improve the strategy for asymmetric measurement capabilities; since we know our model does not provide a tight bound for m A = 5, m B = 6. A further generalisation would be to test if this approach generalises to a larger number of outputs. Unfortunately, the vertices of higher output no-signalling polytopes are not generally known, so we cannot say much about their structure. Considering the results here, one would expect the successful simulation efficiency of a construction which predicts n inputs in a k-output scenario to scale as n mA 1 k n−1 , which for k > 2 achieves optimal integer value only at n = 1. This suggests for higher output-number scenarios the construction of [25], defining equation (5), may be optimal.

Local Weight Linear Program
In order to calculate the linear weight of an arbitrary distribution q, we solve the following problem: where q L i are the extremal points of the polytope L. By rearranging the inequality, we see that the leftover distribution q ′ := q − i αq L i has all positive entries, and satisfies the no-signalling constraints since so too do q, q L i . Therefore it is a valid (sub-normalised) distribution. This linear program therefore looks to optimise the total weight of the local extremal points over all decompositions of q.
It is also worth mentioning that every linear program has a dual with the same optimal value [30]. The dual of the above function gives us a vector b such that: we see immediately that if i α i < 1, this gives us a Bell inequality violated by q.

Bell Inequalities which verify the Threshold
In this supplemental file, the optimal Bell inequalities are provided to achieve the detection loophole threshold for the generalised PR box. They are presented in matrix format: where the solid lines delineate different inputs. All presented inequalities have local bound ≥ 1. Note that there are n A + 1 (n B + 1) outputs to account for the additional output F .

Optimal Inequality for 2-Inputs
As mentioned in the main body of the paper, this inequality is a 'lifting' of the CHSH inequality. For all measurements failure to output is treated identically to 0. Since other liftings of the same CHSH inequality achieve the optimal value, we can see generally there is not a single unique inequality that witnesses the threshold.
Optimal Inequality for 5-Inputs The previous inequalities provided were all calculated using exact arithmetic. Unfortunately this takes much longer than floating point methods, particularly as the dimension increases. Therefore, we are only able to provide a Bell inequality here which is accurate up to 6 s.f. and moreover, not a facet inequality. However, it still verifies the detection loophole threshold, and is included for completeness.