A time-dependent Tsirelson's bound from limits on the rate of information gain in quantum systems

We consider the problem of distinguishing between a set of arbitrary quantum states in a setting in which the time available to perform the measurement is limited. We provide simple upper bounds on how well we can perform state discrimination in a given time as a function of either the average energy or the range of energies available during the measurement. We exhibit a specific strategy that nearly attains this bound. Finally, we consider several applications of our result. First, we obtain a time-dependent Tsirelson's bound that limits the extent of the Bell inequality violation that can be in principle be demonstrated in a given time t. Second, we obtain a Margolus-Levitin type bound when considering the special case of distinguishing orthogonal pure states.


I. INTRODUCTION
Entropic measures tell us how much information a quantum register E contains about some classical register X in principle. But just how quickly does this information become available to us? In this little note, we derive bounds on the amount of information available after a given time t. As expected, our bounds depend on the resources we have available in the form of the available energy.
Throughout this paper, we will choose to measure information in terms of the min-entropy, which is the relevant quantity when we consider single-shot experiments and quantum cryptography. As we will explain in detail below, this measure is directly related [1] to the probability of success in state discrimination [2][3][4][5][6]. As a result, we focus on bounding the probability of success in distinguishing states {ρ x } x∈X where we are given ρ x with probability p x . Let P guess (X|E) H,t denote this success probability after time t when using a particular Hamiltonian H in the measurement process. After providing a more careful discussion of the measurement process, we show the following results.

A. Results
A bound for two states: We first consider the case of only two input states ρ 0 , ρ 1 , for which it is easy to compute the optimal success probability if we have unlimited time (or resources) available [2]. We first provide a general bound in terms of the spectrum of the Hamiltonian (Corollary B.2). For the special case of of two equiprobable states (p 0 = p 1 = 1/2), this bound simply reads where D(ρ 0 , ρ 1 ) is the trace distance between the two states, and γ is a small constant. This bound is directly related to our ability to distinguish two inputs states given an unlimited amount of time, where the best measurements gives us [2] P guess (X|E) = 1 2 + D(ρ 0 , ρ 1 ) 2 . (2) We proceed to show that our bound is nearly tight up to a constant factor (Theorem D.1) by providing an explicit measurement strategy. Finally, we prove a bound in terms of the average energies of the input states (Theorem B.3). However, this bound does not compare as easily to the case of unlimited time.
A bound for many input states: When considering the case of an arbtirary number of input states ρ 0 , . . . , ρ N −1 it is difficult to compute the maximum success probability even in the case of unlimited time. In particular, no general closed form expression is known -only for the case of single qubit encodings does there exist a way to construct the optimal measurements geometrically [7]. In general, we can only approximate the optimal measurements numerically [5,[8][9][10][11][12][13][14], or resort to bounds on the success probability [4,[15][16][17][18][19][20]. As such, it becomes harder to relate P guess (X|E) H,t to case of unlimited time. We hence provide a general bound in terms of the average energies alone. In particular, we show (Theorem C.1) that where x max is the smallest x ∈ {0, . . . , N − 1} such that p xmax ≥ p x for all x.
Applications: Finally, we discuss two applications of our bound. The first is to the study of Bell inequalities [21]. Typically, we care about determining the maximum quantum violation of such inequalities. In contrast, we ask what is the maximum violation that can be achieved in a fixed amount of time. When considering such inequalities as games between two players Alice and Bob (see Section IV B 1), the "amount" of quantum violation is determined by the probability p win that the arXiv:1105.2268v1 [quant-ph] 11 May 2011 players win the game maximized over all states and measurements. For the CHSH inequality [22], we have that classically p win ≤ 3 4 (4) for any strategy of Alice and Bob. However in quantum mechanics there exists a strategy that achieves which is optimal [23]. Here, we show (Corollary F.2) that if we demand answers from Alice and Bob after time t where H is Bob's Hamiltonian involved in the measurement process, and γ is a small constant. We will also see that to achieve Tsirelson's bound, Alice and Bob need time at least Our bounds tell us that there does indeed a fundamental time that is needed to establish non-local correlations of a certain strength. We will discuss these bounds in detail in Section IV B. As a second application, we use our bound to obtain a form of the Margolus-Levitin theorem [24] which provides us with a lower bound on how much time it takes to transform a pure state into an orthogonal state. Since the Margolus-Levitin theorem provides a bound on the speed of evolution, it clearly provides a bound on the minimum amount of time that is required to obtain the optimal (time unlimited) success probability for state discrimination. Yet, note that we are interested in bounding P guess (X|E) H,t even for shorter periods of time. We will discuss the relation of our work and the Margolus-Levitin theorem in detail in Section IV A.

B. Related work
Next to the Margolus-Levitin theorem [24], our work is related to several bounds [25,26] on how fast information can be transmitted in principle given energy constraints (see [27] for a survey of results). These bounds generally consider the von Neumann entropy as a measure of information and are concerned with determining the capacity for sending information as a function of energy. That is, they consider how fast we could convey information in the best possible way. In contrast, we consider the case of arbitrary encodings ρ x , which may not be optimal to transmit classical information. In fact, even in the case of unlimited time the probability that we can reconstruct x from ρ x could be very small. Our setting also differs in the sense that we focus solely on extracting classical information into a classical register in a sense that we will make precise below.
Our work is also related to several previous papers [28][29][30] that study the rate of change in entropies of a system that is in contact with an environment. Again, our work is a somewhat different flavor since we are interested in extracting classical information, and our bounds furthermore involve average energies, rather than the largest energy H ∞ of the (interaction) Hamiltonian H alone.

A. Quantifying information
Let us now consider more formally what we mean by gaining classical information encoded in a quantum system. Imagine that there is some finite set X of possible classical symbols to be encoded. For any symbol x ∈ X , we thereby use ρ x ∈ B(H enc ) to denote its encoding into a quantum state on the system H enc . We also refer to H enc as the encoding space. Our a priori ignorance about the classical information x is captured by the probability distribution p x according to which the encoding space is prepared in the state ρ x .
Throughout, we quantify how much information we have about x given access to the encoding space H enc in terms of the min-entropy [1] H ∞ (X|E) := − log P guess (X|E) , where is the probability that we guess x, maximized over all possible measurements on the encoding space. Finding the optimal measurement is known as state discrimination and can be done using semidefinite programming [5,8].
The min-entropy accurately measures information in a cryptographic setting [31], and for single shot experiments. This is in contrast to the von Neumann entropy which is concerned with the asymptotic case of a large number of identical experiments. The min-entropy and the von Neumann entropy can be arbitrarily different, as is easily seen by considering the example where the encoding is trivial, that is, ρ x = ρ x for all x and x . The strategy that maximizes the guessing probability P guess (X|E) is then simply given by outputting the most likely symbol, i.e., H ∞ (X|E) = − log max x p x [38], and the conditional von Neumann entropy obeys H(X|E) = H(X) = − x p x log p x . Consider now Σ = {0, 1} n to be the set of bitstrings of length n and suppose the all '0' string occurs with probability p 0 n = 1/2, and with probability 1/2 any of the remaining strings occurs with equal probability. Clearly, we have H ∞ (X|E) = 1, whereas H(X) ≈ n/2. That is, the von Neumann entropy can be very large, even if there is one symbol that occurs with extremely high probability. We will remark on the rate of information extraction from a quantum system in terms of the von Neumann entropy later on, but focus on the single shot case given by the min-entropy, or equivalently the probability of error in state discrimination.

B. Producing a classical output
To determine how quickly we can acquire classical information, we first need to specify what it means to output classical information from a measurement. Here, we model this process with the help of an additional 'classical' ancilla system H anc that contains the output. A classical system is associated with a fixed basis, which without loss of generality we take to be the computational basis. Preparation and measurement of a classical system can only be done in this basis, which intuitively corresponds to the idea of storing classical information: The ancilla can be prepared in any state of the fixed basis, and is subsequently measured in this basis after time t. The information contained in this register captures the notion of a classical probability distribution over the basis elements.
We model the process of state discrimination as follows. The problem is to discriminate between N states ρ x on the encoding space H enc , where N is the number of possible classical symbols. At the beginning of the experiment the ancilla system is initialized to the symbol occurring with the largest probabily |x max where This initial condition captures the distinguisher's apriori knowledge: recall without access to the quantum register H ∞ (X|E) = − log p xmax . If the there are multiple classical symbols with the same value p xmax , we take the smallest one in lexicographic order. We will discuss the choice of initial state in detail below. The ancilla system has total dimension d Hanc = N and the other directions correspond to the classical symbols x. The experimenter implements a unitary U on H enc ⊗ H anc during a specified time t. At this point the ancilla system is passed to a referee who will decide whether information has been gathered successfully by measuring H anc in the computational basis, using measurement operators where the subscript x denotes the corresponding classical output. Hence the success probability of correctly identifying the state ρ x using this procedure when the ancilla was initially in the state |x max is given by See Figure 1 for a schematic depiction of this process. Note that the ancilla is measured by the referee at no time cost. This is a natural assumption in our setting where we imagine that the final information is extracted by a referee who is not limited by any energy constraints. Such a referee naturally arises in, for example, the setting of Bell inequalities which we consider later. We will from now on assume that measurements producing classical outcomes are always performed this way.
FIG. 1: Our protocol for distinguishing quantum states in finite time. First, the encoding register is placed into an encoding ρx of the classical symbol x chosen with probability px. The ancilla is intialized in the state |xmax . Second, we can perform a unitary interaction U = exp(−iHt/ ) for time t between the encoding and the ancilla register. Finally, the ancilla register is measured by the referee in the computational basis to determine a guess x for x. If x = x, then we successfully recovered the classical information. In the setting of Bell inequalities considered later on, the ancilla register is simply the message returned to the referee.
To bound how much min-entropy we have after time t, our goal is to place bounds on the success probability in terms of the unitary that is, in terms of the interaction Hamiltonian and the time t. Throughout, we will assume that H ≥ 0 and that the lowest energy level is in fact E 0 = 0. Any other Hamiltonian differs from such an H by a term proportional to the identity, which does not contribute to the speed of information gain. We explictly chose not to use the common convention = 1 to make it easier to draw comparisons to the Margolus-Levitin theorem [24] later on. Before turning to our actual bounds, let us first introduce some additional notation which we will refer to throughout the paper. We will usẽ to denote the combined state consisting of the input state ρ x on the encoding space, and the initial state of the ancilla |x max x max |. We also write Furthermore, it will be convenient to rewrite the success probability (12) in terms of measurement operators as tr(M xρx ), where and The average success probability for a particular Hamiltonian H and time t can now be written as

III. TIME VS. INFORMATION GAIN
We are now ready to derive our bounds. For simplicity, we will outline how this can be done for the case of two equiprobable states, and merely state our general result. Precise statements as well as a detailed derivation can be found in the appendix.
A. An upper bound to Pguess(X|E) We now first derive an upper bound to the guessing probability. For the case of two equiprobable states (i.e., N = 2 and p x = 1/2 for all x ∈ X , such bounds are easy to obtain when we allow unlimited time (or energy). In particular, it is well known that in this case the success probability is given by [2] P guess (X|E) : where D(ρ 0 , ρ 1 ) = 1 2 ρ 0 −ρ 1 1 is the trace distance of the two states. Let us now consider what happens in our time limited scenario for a particular interaction Hamiltonian H. First of all, recall that for two equiprobable states, the ancilla is initialized to the smallest value For two states, the success probability P succ averaged over the choice of input state, using the measurement given by operators M 1 and M 0 = I − M 1 from (17), can now be expressed as where the fourth equality follows immediately from the fact that P 1 |x max x max | = |x max x max |P 1 = 0. Let us now upper bound the term involving W 1 . Again using that DefineÃ :=ρ 1 −ρ 0 , and consider its diagonalizatioñ Using the fact that R · R † is a positive map [32] and 0 ≤ I ⊗ P x ≤ I, we can now bound the term Substituting back into our original bound (27) gives us (31) This is the basic inequality that we can use, along with some restriction on the allowed energies E n , to bound the success probability for state discrimination in time t. In the rest of the paper we will apply this in two main settings, bounded maximum energy, and bounded average energy.

A bound in terms of the maximum energy
From (31), we can immediately obtain a bound on the success probability for state discrimination in terms of the maximum energy H ∞ of the coupling Hamiltonian H. ( H ∞ is just the largest eigenvalue of H.) This bound is attractive since it is simple to derive and has the appealing feature that it involves the trace distance between the two states, and is thus directly related to the probability that we distinguish the two states given an unlimited amount of time. However, there are many systems of physical interest where the maximum energy of system states is effectively unbounded. Even though we may without loss of generality assume that the spectrum is bounded for a particular set of input states (see appendix), this bound is nevertheless quite unsatisfying in these situations since it can be very weak. For this reason, we use the fundamental inequality (31) in the next section to derive a bound on the success probability that depends only on the average energy.
Note that since tr(Ã + ) = D(ρ 0 ,ρ 1 ) = D(ρ 0 , ρ 1 ) we immediately obtain that the success probability obeys where C max = argmax En (1 − cos(tE n / )). If tE n / ≤ 1 for all n, then this upper bound simply reads which will be useful for comparison below. For larger values of tE n / it is easy to see that where

A bound in terms of the average energy
A sometimes more satisfying bound can be obtained in terms of the average energy. Note that we can upper bound (31) as and hence we may use the fact that to obtain Now, the asymmetry between the labels 0 and 1 is inessential. The bound is true if we swap the two state labels, as may be seen by repeating the above derivation swapping the role of the two state labels. Averaging these two bounds we find the following symmetric bound This bound should be compared with the bound (33) in terms of the maximum energy in which the trace distance appears. The quantity on the right hand side of (40) is loosely an energy-weighted trace distance. Whereas this bound is certainly stronger for a particular choice of H, it does not any longer bear an obvious quantitative relation to the Helstrom bound in terms of the trace distance. In deriving (40) we have made use of the knowledge of the optimal measurements for distinguishing a pair of states. This is no longer possible in more complicated cases, even where unlimited time is allowed [6]. We can weaken the bound somewhat, using the fact that ρ, H ≥ 0 to obtain a bound explicitly in terms of the average energy as follows So we see that the average energy of the joint system and ancilla place a bound on the success probability of state discrimination, as claimed. This bound may be generalized easily to the case of more than two classical symbols and an aribtrary distribution {p x } x . We show in the appendix that Then the probability of distinguishing ρ 0 , . . . , ρ N −1 given with probabilities p 0 , . . . , p N −1 using the Hamiltonian H obeys Note that the term x p x tr (Hρ x ) is the energy of the encoding and ancilla register averaged over the choice of input symbols.

B. A lower bound on Pguess(X|E)
We now exhibit a specific measurement strategy for two equiprobable states, which attains our upper bound up to a constant factor. We again focus on the case of two possible input states, as for the general setting there is no analytic procedure of obtaining the optimal measurements even in the setting of unlimited time. Our construction for two states will make explicit use of this optimal measurement.
Let A = ρ 1 − ρ 0 . It is well known [2] that the optimal distinguishing measurement in the time unlimited case without the use of an ancilla is given by where Π A + and Π A − are projectors on the positive and negative eigenspace of A respectively. To construct our Hamiltonian H, let us diagonalize A = j λ j |u j u j |, and define A + := j,λj ≥0 λ j |u j u j | and A − := A + − A.
Consider the operator Clearly,Ĥ is Hermitian and unitary, and hence has eigenvalues ±1. In fact,Ĥ is the unitary we would use to achieve the optimum distinguishing probability if we were unconcerned with time. We now define a Hamiltonian H For comparison with our upper bound of (33) H obeys the condition H ≥ 0 and has largest eigenvalue equal to E max = H ∞ . A simple calculation provided in the appendix shows that for our choice of H we have which gives a lower bound to P succ (X|E) maximized over all possible H in time t. This bound matches the upper bound of (33) up to a factor of 1/4. Note thatĤ effectively implements a variant of the controlled-NOT (c-NOT) operation on the encoding space and the ancilla. For more than two inputs states, one could construct a similarĤ implementing a controlled addition mod N on the ancilla, as long as the optimum distinguishing measurement in the case of unlimited time is a projective measurement on the encoding space. This would give a similar relation between time and the original probability of distinguishing the given states. However, it is known that there do exist choices of encodings ρ x such that the optimum measurement is not projective, and hence we omit this restricted form of generalization.

IV. APPLICATIONS
Let us now consider several applications of our simple bound.

A. Minimum distinguishing time and the
Margolus-Levitin theorem The first application we are interested in, is a return to our initial question: Just how quickly can we acquire information? That is, what is the minimum time needed to extract classical information encoded in a quantum system? Note that with the Hamiltonian H in the lower bound for two equiprobale states, there does indeed exist a way to optimally distinguish the two states in time t = π/ H ∞ . However, since there is a small gap to our upper bound it would be an open question, whether it is possible to achieve the same in an even shorter amount of time.

Minimum time
Yet, note that our upper bounds on P succ (X|E) H,t can also be understood as lower bounds on the time required to optimally distinguish the given states, retrieving the maximum amount of information from the encoding. Let us first consider our most general bound for large X . We have that if we can distinguish optimally in time t distinguish our upper bound must be at least as large as the optimum P guess (X|E). That is, and hence . (48)

Margolus-Levitin theorem
Let us now consider the special case where two equiprobable input encodings are perfectly distinguishable. That is, ρ 0 = |0 0| and ρ 1 = |1 1|. Our task is now quite simple: We merely wish to turn the state |1 |0 of the encoding and ancilla system to the state |1 |1 , that is, we wish to transform one vector into its orthogonal. Note that given unlimited time (or energy) we can succeed perfectly at this task and hence P guess (X|E) = 1. From (48) we thus have Our bound can hence also be understood as putting a limit on the time that it takes to turn a state vector to its orthogonal (on the ancilla), given some additional resource (the encoding register). A bound on the minimum time that it takes to turn a vector into its orthogonal is indeed known as the Margolus-Levitin theorem [24]. In particular, their bound applied to our situation involving both the encoding and the ancilla register gives t ML ≥ π 2 tr (Hρ 1 ) .
Such a bound had previously only been derived from the time-energy uncertainty principle where instead of the average energy, we have the energy spread, i.e, the difference in the largest and smallest eigenvalue of the Hamiltonian (see [33] for a review of history). The Margolus-Levitin theorem has been used to place bounds on the fundamental speed of computation [33], and was even slightly improved for some special cases [34]. Note however that for the Hamiltonian constructed in (45) we have tr (Hρ 1 ) = E max /2 and hence the bound provided by Margolus-Levitin is in fact tight as we know that (45) lets us achieve the optimum success probability in time t = π/E max . This shows that it is our upper, rather than our lower bound that can be improved.
Since we have γ = 3/π or γ = 5/π depending on the parameters, our bound is slightly worse than the Margolus-Levitin bound which stems from our somewhat crude bound on (1 − cos(tE n / )). Note, however, that our bound considers a more specialized situation, namely turning the ancilla to its orthogonal given the encoding, but in turn applies to any kind of input states.
That we obtain a Margolus-Levitin type theorem as a side effect of our analysis is not very surprising: Clearly, the speed of dynamical evolution places a bound on how quickly we can transfer information from one system into the other. In turn however, note that a bound on how quickly transformation can be transferred does translate into bounds on the speed of evolution as well and one can think of the speed of dynamical evolution when applied to a computation [33] as being limited by how quickly one can transfer the necessary information required for the subsequent stage of computation.

B. Time-dependent Tsirelson-bound
As another example on how our bound can be used we will derive a time-dependent Tsirelson's bound [23] for the Bell inequality [21] known as the CHSH inequality [22].

CHSH as a game
We briefly describe the CHSH inequality in its more modern form as a game involving two distant players, Alice and Bob. A detailed account of this formulation and how it allows us to recover the original form of the CHSH inequality can for example be found in [35]. In the CHSH game, we imagine that we pose a question y ∈ {0, 1} to Alice and a question z ∈ {0, 1} to Bob, chosen uniformly at random, i.e., p(y) = p(z) = 1/2. These questions can be identified with the choice of measurement setting in the usual formulation. Alice and Bob now return answers a ∈ {0, 1} and b ∈ {0, 1} respectively, where we say that Alice and Bob win the game if and only if Alice and Bob may thereby agree on any strategy beforehand, but they can no longer communicate once the game starts. In the quantum setting, this strategy corresponds to a choice of shared state and measurements, and in an experiment the no-signaling assumption is employed to enforce their inability to communicate. Clearly, one may write the probability that Alice and Bob win for a particular strategy as where Pr[a, b|y, z] denotes the probability that Alice and Bob return answers a and b given questions y and z. For any classical strategy, p win ≤ 3/4 but quantumly there exist a strategy that achieves p win = 1/2 + 1/(2 √ 2) ≈ 0.853. This is in fact optimal, since Tsirelson has shown [23,36] that for any quantum strategy

Strategies and state discrimination
For our purposes, it will be convenient to employ a simple observation about what Bob has to do in order to produce the right answer in the game, which was described in more detail in [35]. Let ρ y,a denote the state of Bob's system conditioned on the fact that Alice received question y and has given answer a. Note that Bob's system will be placed in this state with probability p(y, a) = p(a|y)/2. For z = 0, (51) the rules of the game state that Alice and Bob win if and only if Bob returns the same answer as Alice, that is, b = a. In other words, Bob would like to determine, which of the following two states he is given where q z=0,0 y = p(0|y)/(p(0|0) + p(0|1)) , and the probability of σ z=0 x is given by p z=0 x = (p(x|0) + p(x|1))/2. That is, Bob would simply try to extract classical information stored in quantum states, which is exactly the setting that our bound applies to. Producing a classical outcome on the ancilla system is very natural in this setting as we can imagine that when giving his answer Bob simply returns his ancilla to a referee who decides whether Alice and Bob win [39]. Similarly, if z = 1 Bob would like to determine which of the following two states he is given where q z=1,0 y=0 = p(0|0)/(p(0|0) + p(1|1)) , q z=1,0 y=1 = p(1|1)/(p(0|0) + p(1|1)) , q z=1,1 y=0 = p(1|0)/(p(1|0) + p(0|1)) , q z=1,1 y=1 = p(0|1)/(p(1|0) + p(0|1)) , the probability of σ z=1 0 is p z=1 0 = (p(0|0) + p(1|1))/2, and the probability of σ z=1 1 is p z=1 1 = (p(1|0) + p(0|1))/2. The probability that Alice and Bob win the game for a particular strategy can now be expressed as where we write P guess (X z |E z ) for Bob's success probability in solving the state discrimination problems described above for z ∈ {0, 1}. From this perspective, Tsirelson's bound provides us with an upper bound on how well we can solve these two problems on average.

A time limited game
In the usual setting of this game, Alice and Bob are essentially given an unlimited amount of time and energy to produce their answers. But how well can they do given only a limited amount of energy and time? Here, we consider a time-limited version of the CHSH game, in which Alice and Bob are given a fixed time t to produce their answers. If no answers are given at time t, we automatically rule that Alice and Bob loose. Our goal will be to derive a time-dependent version of (53). For simplicity, we will thereby assume that Alice has an essentially unlimited amount of energy at her disposal and only Bob will be restricted in some fashion. Given the perspective that Bob has to solve a state discrimination problem to produce the right answer as explained above, it is clear that we can use our general bound to address this setting. The use of an ancilla register is very natural, as we can view it as the message system holding Bob's answer that is returned to the referee.
In the usual scenario, Alice and Bob can choose which state to share at the start of the game as part of their strategy. Note, however, that we cannot allow arbitrary starting states to begin with, as we want to put a limit on the energy that Bob has at his disposal. For simplicity, however, we will make the sole assumption that Bob's Hamiltonian is bounded as H ∞ . In the appendix, we will derive a general time dependent Tsirelson bound from this assumption where we will need our generalization of the time bound for two input states to the case of non-uniform input distributions.
Here, we will focus on the essential idea that underlies this bound which already becomes apparent if we consider a slightly simpler scenario in which Alice's marginal distributions are uniform (p(a|y) = 1/2 for all y). This scenario is well motivated if we imagine that there is a source supplying Alice and Bob with the maximally entangled state which lies outside of their control, and their strategy is restricted to their choice of two-outcome observables. In this case, Alice's outcome distribution will either be deterministic or uniform. In the deterministic case, Alice essentially plays a classical strategy. To obtain a quantum advantage in the case of unlimited time, Alice's outcome distributions will be uniform, and we will hence focus on this case.
To obtain a time-dependent Tsirelson bound, we now employ our simple bound involving the original trace distance of the two states that we wish to discriminate (34). We have by Tsirelson's bound that and hence by (22) and the fact that p(a|y) = 1/2 otherwise there would exist a better strategy for Alice and Bob at long times. So we have from (34) that In particular, this means that if we allow only a limited amount of energy by Bob (e.g., by demanding that H ∞ = 1), then Bob needs time at least to achieve the optimum quantum violation of CHSH. Note that to achieve the optimum quantum violation, Alice's marginals will in fact be uniform, and hence this is indeed the minimum time required.
Clearly, for small time frames, it would be better for Alice and Bob to play a classical strategy in which Bob can just return the ancilla |0 "as is" to the referee. The tradeoff betweeen the classical and quantum strategies in our setting can be captured when considering arbitrary distributions, which we will address in the appendix. In particular, we will show that Corollary IV.1. Let Bob's Hamiltonian be scaled such that H ≥ 0. Then the maximum success probability of winning the CHSH game for Alice and Bob in time t obeys where γ := 5/π if 1 < tE n / < 4 , 3/π otherwise .
We could also derive a more general bound in terms of Bob's average energy using (40). However, such a bound does not compare easily to the original Tsirelon's bound.
Of course, the minimum time (68) is extremely small, and irrelevant for any practical tests of CHSH. Indeed, it is not our intention to question the validity of present CHSH experiments or suggest any loopholes caused by an insufficient distance for Alice and Bob compared to the time it takes them to achieve Tsirelson's bound. Instead, we provided the present analysis as an illustrative example of how our bound applies.
We would like to point out that (69) tells us that the strength of non-local correlation is indeed a function of time. Furthermore, (68) tells us that there exists a fundamental time required to establish maximally strong quantum correlations. Finally, we note that one can also interpret (69) in another way: Let's suppose that we were to fix a time t and observe that Alice and Bob tend to win the game with probability at least q. We can now rewrite (69) to obtain a lower bound on H ∞ . That is we can conclude that Bob had a certain energy at his disposal, and the strength of non-local correlations in this setting provides us with a form of "energy witness" for Bob. This also holds for the most general case discussed in the appendix.

A. Choice of initial state
We obtained a series of simple bounds on how well we can recover classical information stored in a quantum system within a certain timeframe. Let us now first consider what role the choice of initial state of the ancilla played in our bounds. During our discussions we assumed that the ancilla started out in the classical state corresponding to the most likely symbol x max . This reflects the fact that the distinguisher does have full knowledge not only about the states ρ x themselves, but also about the distribution p x . In particular, this means that without touching the quantum register, he can always achieve a success probability of p xmax by outputting x max . Clearly, we could have chosen any other classical symbol as our starting point, and our bounds can easily be adapted accordingly. This holds even for an arbitrary pure state of the ancilla. Yet, such a choice does not reflect the distinguisher's apriori knowledge.
Another option would be to let the ancilla start out in a special blank state, which intuitively corresponds to an outcome of "don't know". and is orthogonal to any other outputs. It is straightforward to apply our methods to obtain a similar bound for this case. Yet, note that using a blank ancilla state is conceptually rather different since it means that we essentially neglect the apriori knowledge that a distinguisher has available.

B. Input size
Our bound is especially useful, if we are merely concerned with the probability of success that can be achieved withing a certain time t in principle, using any physically allowed operation H. This is indeed interesting when we consider the problem of Bell inequalities where we wanted to obtain a bound on how well Alice and Bob can violated CHSH within a given time frame, when they can choose any Hamiltonian they like subject to energy constraints alone. In particular, we would like to emphasize that the time required to acquire classical information in our setting is not limited by the size of the alphabet X , but merely by the choice of encodings. In practise, however, there are much more stringent constraints on how quickly information can be transferred that depend on the geometry of the ancilla, leading to additional constraints on the interaction Hamiltonian H. For example, it could be that H can consist only of two qubit interactions, and interactions between the encoding system and the ancilla are limited to their boundary. In this case, the size of the alphabet X clearly does matter, and stronger bounds therefore should depend strongly on the exact form of H. We note that some bounds on time scales for particular Hamiltonians H do follow from the decoherence and thermodynamics literature [28,29] for pure state encodings, yet since such bounds typically involve H int ∞ , where H int is the interacting part of H they offer little advantage in our setting. To see how such bounds are related to ours is most easily seen when considering the conditional von Neumann entropy H(X|E). Note that if all ρ x are pure the overall cqq-state ρ XEA [40] is pure as well. Hence, H(X|E) = H(XE)−H(E) = H(A)−H(E). To determine how H(X|E) can change with time we would thus like to determine how the entropy of the reduced systems A and E evolves with time which has been studied for the von Neumann entropy in the decoherence literature where an upper bound for the rate of change in entropy was obtained in terms of H int ∞ [28]. Similar considerations can be made for other entropies [37]. It is an interesting open question to obtain good bounds on such quantities for arbitrary H that take more of their structure into account.

C. Open questions
Clearly, this is not the only interesting open question. Closely related is the question of how much time is required to demonstrate non-local correlations if Alice and Bob are yet more restricted. Again, this could take the form of physical constraints on the ancilla, or be considered in the framework of circuit complexity where one cares about the number of two qubit interactions, i.e., gates, that they have to apply. The example of CHSH is too small for such constraints to make a difference, but do play an important role when considering more com-plicated inequalities.
Furthermore, it would be nice to see if the slight gap between our bound and the Margolus-Levitin theorem can be closed completely using a more stringent analysis for the case of orthogonal encodings ρ x . In particular, this means that one would rederive the exact form of the Margolus-Levitin theorem from the rate of information transfer alone.
N states ρ 0 , . . . , ρ N −1 . The first lemma will be used to bound the success probabilies using measurement operators M x = I⊗P x +W x where the label x ∈ {0, . . . , N −1} corresponds to one of the N states we wish to idenitfy.
Lemma A.1. For any Hermitian operator A ∈ B(H in ) with diagonalization A = j λ j |u j u j |, and any x ∈ {0, . . . , N −1} the operatorÃ := A⊗|x max x max | satisfies Proof. Using the definition of W x from (18) we evaluate the terms involving W 1 x and W 2 x separately. Let us now first bound the term involving W 1 x . For x = x max we have that where we used the linearity and cyclicity of the trace, as well as the fact that P x |x max x max | = |x max x max |P x = 0 for all x = x max . Let A − := j,λj <0 |u j u j |, and defineÃ + := A + ⊗ |x max x max | andÃ − := A − ⊗ |x max x max |. Note thatÃ =Ã + −Ã − . For x = x max we can now use the fact that where the fourth equality follows from Euler's formula, and the first inequality from the fact that cos(tE n / ) − 1 ≤ 0 andÃ + ,Ã − ≥ 0. It remains to bound the term involving W 2 x . First of all, since Λ(X) = RXR † is a positive map [32], and A + ,Ã − ≥ 0, we have that Note that for any X, Z ≥ 0, we have tr(XZ) ≥ 0, and hence tr (I ⊕ P x )RÃ − R † ≥ 0. Second, note that RR † = R † R and we have where the second equality follows by applying Euler's formula. We thus have where the first inequality follows from (A13), the fact that 0 ≤ I ⊗ P x ≤ I and the cyclicity of the trace, and the last equality from (A15). Putting everything together, tr(W xÃ ) = tr(W 1 xÃ ) + tr(W 2 xÃ ), we obtain the claimed result.
We will also make repeated use of the following bound. Note that whereas the bound applies to a very large range of values E n ≥ 0, we will later be particularly interested in the case of tE n / < 1. Indeed the bound below is a great overestimate if tE n / > 2π, as 2(1 − cos(k)) ≤ γ(k − 2π k/(2π) ).

A bound in terms of the trace distance
First of all, note that even for a general distribution {p x } x the problem of distinguishing two states is easy to analyze [2]. In particular, we have that in the timeunlimited case for measurement operators acting directly on the encoding space where ∆(p 1 ρ 1 , p 0 ρ 0 ) is given by where A := p 1 ρ 1 − p 0 ρ 0 with diagonalization A = j λ j |u j u j | and A + = j,λj ≥0 |u j u j | (Note that ∆ is not symmetric here and hence formally does not form a distance measure.) Similarly, we have ∆(p 0 ρ 0 , p 1 ρ 1 ) = tr(A−) . (B6) Note that for the time unlimited case, we could have equivalently expressed the success probability as It will also be useful to note that forρ x = ρ x ⊗ |x max x max |, Before stating our bound, let us introduce some additional notation. For two states, define We now first relate the problem of discriminating the two states in time t to the original success probability.
Lemma B.1. The probability of distinguishing ρ 0 and ρ 1 given with probabilities p 0 and p 1 using the Hamiltonian H = n E n |E n E n | ≥ 0 is bounded by where C max = argmax En (1 − cos(tE n / )).
Our claim now follows by plugging this bound into (B11).
With the help of Lemma A.2 one may now also use the fact that ∀E n , E n ≤ H ∞ to obtain a very simple bound in terms of the spectrum of the Hamiltonian.
Corollary B.2. The probability of distinguishing ρ 0 and ρ 1 given with probabilities p 0 and p 1 using the Hamiltonian H = n E n |E n E n | ≥ 0 is bounded by

A bound in terms of the average energy
Inspecting the proof above with Lemma A.2 in mind, it is indeed easy to see that we can also obtain a bound in terms of average energies. We first derive a somewhat stronger bound for two equiprobable states that actually depends on the "average energy" of a function of both states.
Theorem B.3. The probability of distinguishing ρ 0 and ρ 1 given with probabilities p 0 and p 1 using the Hamiltonian H = n E n |E n E n | ≥ 0 in time t is bounded by Our claim now follows by noting that

Appendix C: A bound for many input states
Finally, we derive a bound for the most general case of distinguishing states ρ 0 , . . . , ρ N −1 where we are given ρ x with probability p x . Proof. Note that the success probability for a particular interaction H is now given by where tr (M xρx ) = tr ((I ⊗ P x )ρ x ) + tr (W xρx ) .
Let us now first consider the case of x = x max . We have that tr ((I ⊗ P x )ρ x ) = tr(ρ x ) = 1 .
We now turn to the case of x = x max . Since P x |x max x max | = |x max x max |P x = 0 andρ x = ρ x ⊗ |x max x max | we have (I ⊗ P x )ρ x = 0 for all x = x max . Again by applying A.1 with A = ρ x we obtain from ρ x ≥ 0 that tr (W xρx ) ≤ 2 n (1 − cos(tE n / )) E n |ρ x |E n . (C7) Our claim now follows by using Lemma A.2 to obtain 2(1 − cos(tE n / )) ≤ γtE n / for E n ≥ 0.

Appendix D: Attaining the bound
We now exhibit a Hamiltonian that achieves our upper bound for the success probability of distinguishing two states given with apriori equal probability.

(D1)
In particular, we can distinguish the two states perfectly in time t = π/E max .
Proof. Let A = ρ 1 − ρ 0 . We can diagonalize A = j λ j |u j u j |, and define A + := j,λj ≥0 λ j |u j u j | and A − := A − A + . Consider the operator where Π A + and Π A − are projectors on the support of A + and A − respectively. Clearly,Ĥ is Hermitian and unitary, and hence has eigenvalues ±1. We now define the Hamiltonian H H := E max (Ĥ + I)/2 .
The claim follows by an application of the double angle formula.
Let us now consider a more general version of our timedependent Tsirelson's bound in which we drop the assumption that the source emits a particular state, and that Alice makes a two-outcome projective measurement. The only assumption we will make now is that Bob's Hamiltonian is bounded H ∞ = E max .
For our proof, we will need the more general version of the two state discrimination problem in which the two states are not necessarily given with equal probabilities (see Corollary B.2). Again, let us first briefly consider the time unlimited case, where M 0 and M 1 are just measurements on a single system. Recall that we could express the success probability of distinguishing ρ 0 and ρ 1 given with probabilities p 0 and p 1 respectively as P guess (X|E) = p 0 + ∆(p 1 ρ 1 , p 0 ρ 0 ) .
At first glance, this expression appears a bit assymetric -after all, what should be so special about p 0 ? Note, however, that by replacing M 1 = I − M 0 in (B1) we could also have expressed the success probability as P guess (X|E) = p 1 + ∆(p 0 ρ 0 , p 1 ρ 1 ) .
In particular, it will be convenient to note that we could have also written the success probability as the average of these two terms P guess (X|E) = 1 2 Let us now return to the time limited case, involving an interaction of the encoding and ancilla system, followed by a measurement on the ancilla. Recall that we have from Corollary B.2 that P guess (X|E) H,t (F3) ≤ p xmax + tγ H ∞ ∆(p xmin ρ xmin , p xmax ρ xmax ) .
Note that in the time limited case we cannot simply average -the proof of Corollary B.2 yields a different bound had we placed p xmin in front (a small calculation shows that it will again single out p xmax ). We are now ready to show our general bound, where we will use the notation developed in Section IV B 2.