Simulating extremal temporal correlations

The correlations arising from sequential measurements on a single quantum system form a polytope. This is defined by the arrow-of-time (AoT) constraints, meaning that future choices of measurement settings cannot influence past outcomes. We discuss the resources needed to simulate the extreme points of the AoT polytope, where resources are quantified in terms of the minimal dimension, or"internal memory"of the physical system. First, we analyze the equivalence classes of the extreme points under symmetries. Second, we characterize the minimal dimension necessary to obtain a given extreme point of the AoT polytope, including a lower scaling bound in the asymptotic limit of long sequences. Finally, we present a general method to derive dimension-sensitive temporal inequalities for longer sequences, based on inequalities for shorter ones, and investigate their robustness to imperfections.


I. INTRODUCTION
The study of spatial correlations, from Bell nonlocality [1,2] to entanglement theory [3,4], has had, on the one hand, a profound impact on the foundations of quantum mechanics. On the other hand, it stimulated plenty of applications in quantum information processing, such as quantum key distribution [5], randomness certification [6] and expansion [7], to mention a few. Moreover, correlations stronger than quantum ones, but still obeying the nosignaling constraints [8], have been extensively investigated both from a fundamental perspective and in relation with applications to quantum information processing.
Similarly, temporal correlations have been studied from the perspective of the difference between classical and quantum systems, mostly in the framework of Leggett-Garg inequalities [9,10] and noncontextuality inequalities [11,12] tested via sequential measurements [13,14]. More recently, a notion of non-classical temporal correlations has been formulated also from a different perspective that does not require assumptions on the noninvasivity or compatibility of the measurements [15,16]. Few quantum information processing tasks have been formulated directly in this framework, such as dimension witnesses [17][18][19][20] and purity certification [21]. Many other tasks, despite not being directly formulated in the language of temporal correlations, are closely related, since they naturally involve sequential operations. This is, for example, the case for prepare-and-measure scenarios [22,23], random access codes [24][25][26], classical simulations of quantum contextuality [27,28], quantum simulation of classical stochastic processes [29], memory asymmetry between prediction and retrodiction [30], and optimal ticking clocks [31].
The temporal counterpart to the no-signaling constraints [8] are the arrow-of-time (AoT) constraints [32], stating the impossibility of signaling from the future to the past. These conditions define the AoT polytope [32]. It has been shown that the extreme points of this polytope are given by the deterministic assignments [19,33,34], where each output in a sequence is obtained as a deterministic function of the previous inputs and outputs. This implies that any such point is realizable by sequential measurements on a physical system, even for a classical theory, if the internal memory of the system is large enough to store the information about previous inputs and outputs. This is in stark contrast to the spatial case, where different correlations correspond to different theories, often irrespectively of the system dimension. A difference in correlations between classical, quantum, and generalized probability theories is recovered if the system dimension is constrained [16]. This dimension dependence can be exploited to construct temporal inequalities which can certify a lower bound on the dimension of the system, i.e. they are dimension witnesses [16, 19, 20, 22-24, 35, 36].
In this paper, we investigate the minimal resources necessary to simulate a given extreme point of AoT polytope. Here, the resource is quantified by the dimension (or memory) needed to reproduce the outcomes of a measurement sequence. First, we study the symmetries of the AoT polytope w.r.t. classical post-processing. Then, we determine the minimal dimension required for the realization of a given extreme point. As in Refs. [16,19,20], no assumption on the concrete realization or quantum description of a measurement is made, it is only assumed that the same measurement may be carried out at different times. We then continue by providing a simple method to combine dimension-sensitive temporal inequalities for shorter sequences to obtain inequalities valid for longer sequences. Finally, we discuss the robustness of temporal inequalities if the measurements are not perfect, i.e. if they vary over time. We note that our results on the simulation of the extremal correlations do not imply that all convex FIG. 1. Finite-state machine: A single box is provided an input sequence x 1 , x 2 , x 3 and generates an output a 1 , a 2 , a 3 at different instants of time. No external clock/memory is accessible to the box and hence its behavior is solely governed by its internal state. Mathematically, this corresponds to having transformation rules for the internal state of the machine that are time-independent.
combinations can be simulated with the same dimension, when initial randomness is not included in the resources (see the discussion in Ref. [16]). In fact, in this scenario the set of temporal correlations has a complicated structure, if the dimension of the system is fixed [37]. The paper is organized as follows. In Sect. II, we introduce our notation and the considered scenario in detail and we review some important previous results. In Sect. III, we further explore the properties of the AoT polytope, in particular the symmetries of the AoT polytope under classical post-processing, i.e., possible relabeling of the inputs and outputs, as such transformations do not affect the quantum realization. In Sect. IV, we investigate questions such as how much memory is required to realize a given extreme point within quantum theory, or stated differently, what is the minimal dimension that is necessary to obtain this correlation. In Sect. V, we provide a lower bound on the dimension needed to obtain an arbitrary extreme point and an estimate of the behavior in the asymptotic limit of arbitrary long measurement sequences, for any number of inputs and outputs. Then, in Sect. VI we show a general method to produce temporal inequalities for longer sequences by combining inequalities for shorter ones. Finally in Sect. VII, we investigate how robust are our statements on temporal correlations in the case in which the assumption of repeated measurements is only approximately satisfied.

II. NOTATION AND PRELIMINARIES
We consider the scenario of sequential measurements depicted in Fig. 1. A box receives a sequences of inputs, or measurement settings, x 1 , x 2 , . . . , x L and produces a sequences of outputs, or measurement outcomes a 1 , a 2 , . . . , a L . The machines works by transforming probabilistically its internal state, e.g., a quantum state ρ, according to the measurement input and outcome and generating a measurement outcome according to the input and previous state. We are then interested in the correlations p(a 1 a 2 . . . a L |x 1 x 2 . . . x L ).
More concretely, an operation on a quantum system, associated with an input x, is represented by a quantum instrument, namely a collection of completely positive maps {I a|x } a , that sum up to a unital map, i.e., ∑ a I a|x (1 1) = 1 1, where 1 1 denotes the identity operator, corresponding to the rule of preservation of probability in the Heisenberg picture, see, e.g., [38] for a textbook introduction. Each instrument {I a|x } a defines a generalized measurement, i.e., a positive operator valued measure (POVM), through the formula E a|x := I a|x (1 1). Correlations for a sequence of inputs x 1 , x 2 and outputs a 1 , a 2 are given by the formula where • denotes the composition of maps, and analogous expressions hold for longer sequences. We assume that the evolution of our box is time-independent, except for the external classical inputs that are provided at each time step. Practically, this assumption means two things. First, that the different correlations are generated by the transitions of the internal state of the machine; in quantum mechanical terms, this implies that for a given input x the machine applies the quantum instrument {I a|x } a independently of which time step t we are in. This is already implicit in Eq. (1), since we used only the symbol I a|x to denote the instruments, without any reference to the time step, e.g., we want to calculate the probability p(00|00) we apply the same mapping twice, i.e., p(00|00) = tr[ρ I 0|0 • I 0|0 (1 1)]. Second, we assume that the inputs are provided at equally spaced time intervals and the free evolution of the system is always implemented by the same quantum channel, e.g., one can think about an evolution governed by a time-independent Hamiltonian. Hence, the time evolution can be reabsorbed wlog into the definition of the quantum instruments. Boxes satisfying these assumptions are called finite-state machines by generalizing a well-known classical notion [39] (see also Ref. [16] for more details on the quantum and generalized probability theory case). We equivalently say that the measurement operations are time-independent. In this scenario with time-ordered measurements, any theory that respects causality must satisfy the so-called arrow of time (AoT) conditions [32], namely, the future choice of inputs cannot modify the probabilities of past outcomes. For the simple case of a sequence of two measurements, the correlation p(a 1 a 2 |x 1 x 2 ) must satisfy This condition is analogous to the no-signaling conditions for spatial correlations [8], but it constrains only one direction, i.e., signaling from the future to the past. These linear constraints, together with positivity, p(a 1 a 2 |x 1 x 2 ) ≥ 0, and normalization, ∑ a 1 a 2 p(a 1 a 2 |x 1 x 2 ) = 1, define a polytope called the AoT polytope [32], denoted in the general case as P O,S L , where O denotes the number of outputs, S the number of inputs (or measurement settings) and L the length of the measurement sequence.
Such constraints are satisfied by classical and quantum mechanics and it has been proven that all extreme points are given by deterministic assignments, i.e. correlations which have the property that for any input one obtains a deterministic output [19,33,34]. Intuitively, this comes from the fact that the AoT constraints allows us to decompose the probability distribution as the extreme points, then, are given by the products of deterministic functions, generating a 1 , a 2 and a 3 respectively, from the previous inputs and outputs. It has been shown in [19] that the AoT polytope extreme points. The extreme points can, then, be reached if the machine has enough "internal memory", namely, a large enough set of perfectly distinguishable internal states [19], to remember previous inputs and outputs and generate deterministically the corresponding outputs. For a sequence of length L, an extreme point of the AoT polytope can be represented as a tree graph with ∑ L−1 k=0 S k = S L −1 S−1 nodes, as depicted in Fig. 2. In fact, at each time step we have a choice of S possible settings that determine the possible future (evolution) of the machine. To each node of the tree is associated a state of the machine and the set of outcomes that are deterministically generated. More precisely, to each node of the tree we associate a tuple Γ l,k = (z 1 , z 2 , . . . , z S ), where z i ∈ {0, . . . , O − 1} denotes the outcome of the measurement M i , l ∈ {1, . . . , L} denotes the time-step, and k ∈ {1, . . . , S l−1 } denotes in which node of the time-step l we are. A (sub)tree T r l,k , called the r-length future of (l, k), is a collection of tuples connected to a root node (l, k), representing the current and future deterministic outcomes. It is defined as T r l,k : we denote T L−l l,k simply as T l,k and call it the future of (l, k). See Fig. 2 for more details. To each node (l, k) corresponds an internal state s l,k of the machine that generates deterministically the tuple of outcomes Γ l,k . Since the procedure is deterministic, to the same state must correspond the same sequence of future outcomes, namely, If two tuples Γ l,k and Γ l ,k satisfy T r l,k = T r l ,k , for r = min{L − l, L − l }, we say that they have equivalent futures, see Fig. 3. For a given deterministic sequence, we call T 1,1 , i.e., the entire tree, the history of the sequence.
This observation provides a way of counting the minimal number of states necessary for reproducing an extreme point of the AoT polytope, since inequivalent futures must correspond to different states. In particular, in order to check whether T r l,k = T r l ,k , one can simply check for shorter sequences, i.e., whether T s l,k = T s l ,k for s = 0, . . . , r.  In this example, for both measurements the outcome "0" is obtained in the first time step. After measuring M 1 in the first time step, one obtains in the second time step for both measurements deterministically the outcome "1" and after performing M 2 in the first time step one observes for measurement M 1 (M 2 ) in the second time step the outcome "0" ("1") respectively. The 1-length future of (1, 1) and the history of the sequence are indicated.

III. SYMMETRIES OF THE AOT POLYTOPE
It has been shown that, if no assumption on the dimension of the quantum system is made, any correlation in the polytope can be realized. The ability of realizing a correlation is independent of the chosen labeling of the outcomes and/or measurement settings as long as one performs exactly the same relabeling at every time step. This is due to the fact that any such relabeling can be implemented classically even after the measurement sequences have been performed, i.e. such relabelings correspond to some classical post-processing. Note that the condition that the same relabeling is applied to all time steps is necessary to be consistent with our assumption of time-independent measurements. In the following, we characterize the number of equivalence classes of extreme points under these symmetries for small numbers of settings and outcomes.
In particular, we define an outcome relabeling equivalence (ORE) class as an equivalence class of extreme points w.r.t. to the relation of being the same up to a relabeling of the outcomes. In particular, since relabeling is a classical post-processing, if one extreme point is obtainable by measurements on a physical system, the same is true for all elements in the class. Of course, also the measurement settings can be subject to relabeling and all extreme points that are equivalent up to relabeling of the measurement settings can be realized within the same physical implementation. Then, we define the relabeling equivalence (RE) classes of extreme points as the set of extreme points that are equal up to relabeling of outcomes and measurement settings.

A. General considerations
In the temporal scenario the only relevant symmetries are given by the relabeling of inputs and outputs of a given sequence. The corresponding symmetry groups are given by the symmetric groups S O and S S , where as defined before O and S are the number of outputs and inputs, respectively. The total group of symmetry is given by the direct product G : To each element g ∈ G, we associate a transformation S g on the extreme points of the polytope P O,S L . In this way, The action of a group on a set naturally induces an equivalence relations in terms of orbits given by In this case, belonging to the same orbit means that the extreme point v can be obtained from the extreme point w via a relabeling of inputs and outputs, and vice versa. The number of equivalence classes is then given by the number of different orbits. Hence, if one can evaluate the number of orbits one can deduce the number of RE classes for a given scenario. Below we show how to do so by identifying the elements that are invariant under a symmetry, their orbits and the cardinality of the orbits, which allows us to deduce the number of orbits, for the case of two outcomes and two and three settings. The same procedure can be applied to arbitrary number of outcomes and settings without extra conceptual difficulties; however, as the symmetric group grows, recall that there are n! permutations of n elements, the whole procedure becomes much longer and tedious.

B. Relabeling of outcomes and measurement settings
For the case O = 2 one can define for each ORE class as a representative an extreme point having the property that for any measurement the outcome in the first time step is "0". This allows us to count the number of ORE classes as given in the following lemma. Proof. We consider one representative of each ORE class and show that the cardinality of the ORE class is 2 S . The number of equivalence classes is the total number of extreme points divided by the cardinality of such classes. It is straightforward to see that one particular choice of a representative is given by demanding the outcome of each measurement setting at the first time step to be "0". Note that, obviously, any relabeling of the outcomes must alter the outcome at the first time step. Note further that all subsequent outcomes specify the ORE class and any sequence of these outcomes is possible. For each measurement setting there are 2 possible outcomes in the first time step and therefore there are 2 S possible relabelings of the outcomes at the first time step. This implies that the cardinality of a ORE class is given by 2 S . Using that the number of extreme point of P 2,S L is given by (2 S ) [19]), the number of ORE classes is then For the relabeling of the measurement settings, we first present a counting argument for O = S = 2. Then, we extend our investigation to the case of O = 2 and S = 3. Lemma 2. The number of RE classes of extreme points of P 2,2 L is given by 1 2 (4 2 L −2 + 4 (2 L−1 −1) ).
Equivalent tuples for an extreme point invariant under settings relabeling, in the O = S = 2 scenario. The operator X permutes the the elements of the tuple. E.g., The tuple Γ 2,2 corresponds to the outcomes for M 1 and M 2 in the second step, after M 2 has been measured. For the extreme point to be invariant under relabeling of outcomes, it must be that Γ 2,2 = XΓ 2,1 , i.e., the outcomes are, up to relabeling, those that would have been obtained had we measured M 1 in the first step.
Proof. As we know already from Lemma 1 the number of equivalence classes under the relabeling of outcomes, it is sufficient to study the action of the permutation of the inputs on the representatives of the ORE classes. Since S = 2, we have the group S 2 = {e, S 12 }, where e is the identity element and S 12 exchange the first and second input. In particular S 2 12 = e. For a given length L, the number of equivalence classes N (L) is given by where N The number of possibilities to extend a vector at the step m, given that it is symmetric at the step m − 1, is thus 4 2 m−2 . We can then compute the number of invariant vectors up to length L as Moreover, we can compute the number of equivalence classes of non-invariant vectors as I ) is the number of non-invariant vectors, and the factor 1/2 comes from the fact that each orbit contains two elements.
Finally, we can write For the most simple scenario, O = S = L = 2, this implies that there are ten RE classes. As already discussed in [19], six of these classes can be obtained with a qubit, whereas for four of these classes a qutrit is required. In Table I, we provide a representative for each of these classes and indicate whether a qubit or a qutrit is necessary in TABLE I. This table shows a representative for each of the 10 RE classes for O = S = L = 2 and the minimal dimension which allows to reach a member of the class (see also Theorem 5). Note that for O = S = L = 2 the RE classes and their minimal dimension has been already identified in [33] and a corresponding table can be also found there (with a different choice of representatives).
order to realize a member of this class. In the following section, we then present a general theorem which allows us to deduce from a given extreme point (with arbitrary O, S and L) the dimension that is necessary and sufficient to realize it. After having gained an understanding of the case of two inputs, we generalize our approach for counting the RE classes to the case of three inputs.
Lemma 3. The number of RE classes of extreme points of P 2,3 L is given by 2 In order to prove the Lemma, we start again from the equivalence classes of outcome relabeling and impose only the conditions for the relabeling of the inputs. In this case, we have the permutation group of three elements, S 3 , representing the permutation of the inputs. The group S 3 consists of the following elements e S 12 S 23 S 13 = S 23 S 12 S 23 = S 12 S 23 S 12 σ 123 = S 12 S 23 = S 23 S 13 σ 132 = S 13 S 23 = S 23 S 12 (12) We therefore write the total number of equivalence classes as where N (L) N are defined as above, as the orbits of vectors that are invariant or non-invariant under any symmetry respectively, and N

IV. MINIMAL DIMENSION FOR GIVEN EXTREME POINTS
It is a well known result in quantum state discrimination that two states have orthogonal ranges, corresponding to a trace-distance of 1, if and only if they can be perfectly discriminated, i.e., with probability 1, by a single measurement (cf., e.g., Ref. [40] Ch. 9). More precisely, this fact can be stated as follows Observation 4. Let E be an effect of a POVM and ρ 1 = ∑ i∈I p i |Ψ i Ψ i | with p i > 0 (ρ 2 = ∑ k∈K q k |Φ k Φ k | with q k > 0) the spectral decomposition of a density matrix ρ 1 (ρ 2 ) respectively. Then tr{ρ 1 E} = 1 and tr{ρ 2 E} = 0 only if Ψ i |Φ k = 0 for all i ∈ I and k ∈ K.
It is important to notice that via a single POVM E one can represent not only a single measurement, but also a sequence, e.g., E abc|xyz := I a|x • I b|y • I c|z (1 1), where the maps {I a|x } a represent the quantum instrument in the Heisenberg picture. This implies that not only states that produce a different outcome with probability one are orthogonal, but also states that produce a different sequence of outcomes with probability one are orthogonal.
Using this, we are able to determine the minimum dimension that is required for a quantum system to obtain a given extreme point of P O,S 2 .
Theorem 5. Given an extreme point p of P O,S 2 , the minimal dimension d needed to obtain it is given by the number of inequivalent tuples in the history of p, i.e., T 1,1 = {Γ 1,1 , Γ 2,1 , . . . , Γ 2,S }. In particular, a system with dimension d = S + 1 can always reach all extreme points of P O,S 2 , independently of the number of outcomes, as this is the maximal number of tuples in T 1,1 .
Proof. According to Eq. (5), to different futures correspond different states. In this particular case, namely, L = 2, we need to compare different T 1 l,k , i.e., single tuples. By Obs. 4, such states must have orthogonal ranges. These two conditions already provide the minimal number of orthogonal states necessary to reach a given extreme point of the AoT polytope. Intuitively, orthogonality is the only relevant property for obtaining different futures, hence a minimal realization requires only pure states. This is confirmed by the explicit construction below, which uses only pure states. Given the tuples Γ 1,1 , Γ 2,1 , . . . , Γ 2,S , there may be repetitions, which in this simple case of L = 2 corresponds to having equivalent futures. We can rewrite them as d tuples {Γ 1 , . . . , Γ d } = {Γ 1,1 , Γ 2,1 , . . . , Γ 2,S } with inequivalent futures, i.e., Γ i = Γ j . We associate to each of them a vector |k from the ONB {|k } d k=1 . Without loss of generality, we can assume that Γ 1,1 = Γ 1 = (0, . . . , 0), i.e., we fix all the measurement outcomes at the first step to be zero. This simply means that we relabel the outcome of all measurements such that 0 is obtained for all of them on the initial state. Then, we fix the initial state as ρ in = |1 1|. The measurements are constructed as follows for a = 0, . . . , O − 1, x = 1, . . . , S. Clearly, E a|x ≥ 0 and ∑ a E a|x = 1 1, for all x, so they are valid POVMs. In The corresponding Kraus operators {K j 0|x } j∈J 0|x providing the postmeasurement state, i.e., in the Schrödinger picture ρ → ∑ j K j † 0|x ρK j 0|x , are of the form K 1 0|x = |s 1| if Γ s = Γ 2,x or a tuple with equivalent future and K j 0|x = |j j| for j ∈ J 0|x and j = 1. By construction, there are at most S + 1 tuples, hence this number provides an upper bound on the minimal dimension necessary to reach any extreme point of P O,S 2 . The same argument can be generalized to sequences of arbitrary length. Theorem 6. The minimal dimension d required to reach an extreme point p of P O,S L , is given by the number of inequivalent futures T r l,k in the history T 1,1 . Proof. The proof generalizes straightforwardly from the case L = 2 above. Again, different futures must correspond to different states, such states must be orthogonal, and can be chosen to be pure, providing a minimal-dimension representation.
The explicit construction of the model can then be extended from the previous one. Let us assign the state |1 as initial state, i.e., to T 1,1 . Then compare T 1,1 with T 2,k , for k = 1, . . . , S, if they are equivalent, assign the same state |1 to T 2,k , otherwise, assign a new orthogonal state |2 , |3 , . . . to the future T 2,k . Repeat the same operation for T 3,k , k = 1, . . . , S 2 , assigning new state for any future inequivalent to T 1,1 or any T 2,k . Repeat again until the end of the tree, i.e., T L,k = Γ L,k , k = 1, . . . , S L−1 . To each node Γ l,k of the tree T 1,1 is then assigned a pure state |l, k ∈ {|j } d j=1 , possibly with repetitions. As in Th. 5, POVMs elements are constructed as projectors providing the correct outcomes for each state |j , as in Eq. (14). Similarly, Kraus operators {K j a|x } j , associated with the POVM element E a|x = ∑ j∈J a|x |j j|, consist in measure-and-prepare operations K j a|x = |i j|, when the state |j emits the output a for the measurement x and transition to the state |i , all with probability one. For the last time-step, i.e., from L − 1 to L, one can use diagonal Kraus operators analogously to the construction in Th. 5.
Note that the above protocol does not involve any coherences, as all states and effects are diagonal in the same basis and the state-update rule also involves transitions within the same basis, hence, it can be realized with a classical system.

V. LOWER BOUND ON THE DIMENSION WHICH IS NECESSARY TO REALIZE ANY EXTREME POINT
In the following, we provide a construction of extreme points for any L from which one can determine a lower bound on the minimal dimension required for its realization. This lower bound is then automatically also a lower bound on the dimension necessary to realize any extreme point. The main result is that the minimal dimension scales, at least, exponentially in L. Let us consider the polytope P O,S L . The main idea of the proof can be briefly explained as follows. Consider a sequence of length L and take a time-step j < L. If the number of remaining time-steps L − j is big enough, for the tuples {Γ j,s } s we can choose their futures {T j,s } s to be all different, hence, each Γ j,s will be associated to an orthogonal state and the number of such tuples will provide a lower bound on the minimal dimension necessary for their realization. Our argument consists in estimating the maximum j such that this is possible.
For the history T 1,1 at time-step j, there exist S j−1 different subtrees T j,k . If j is properly chosen, namely, L − j is large enough such that we can construct different futures T j,k for k = 1, . . . , S j−1 , then the realization of such an extreme point requires at least d = S j−1 . First notice that the number of possible futures of length x is given by Hence, j must be selected in such a way that the remaining sequence allows us to assign different futures (which are of length L − j) to each node, namely as the largest integer such that We can further simplify the expression using the identity O This equation can be solved in terms of the principal branch of Lambert function W, namely, the function implicitly defined as the solution to the equation xe x = k, i.e., xe x = k ⇔ x = W(k). In this case, let us see how to solve it for the equation We have ye y = aS −b ln S, for y : By substituting x = j − 2, a = S L S−1 log S O , b = − S S−1 log S O one obtains the condition which gives the maximal j as where x denotes the floor of x, i.e., the largest integer smaller than x.
To compute the asymptotic scaling, one can write Using that ln(x) − ln[ln(x)] + ln[ln(x)] 2 ln(x) ≤ W(x) for x ≥ e [41] we can obtain a lower bound on the minimal dimension as follows. For m ∈ R such that the minimal dimension satisfies where the "−2" term takes into account the fact that m may not be an integer. For large L such a lower bound scales as for appropriately chosen constants A, B, α, β, γ, δ. This proves that the minimal dimension required to reach any extreme point scales at least exponentially (up to logarithmic corrections). In Appendix C, we present a different construction of an extreme point which can provide an improved lower bound on the minimal dimension, however no closed formula on the scaling.

VI. COMBINING TEMPORAL INEQUALITIES
In the following, we present a method for deriving new inequalities for temporal correlations for sequences of length nL with n ∈ N + , based on the knowledge of inequalities for the shorter length L. It instructive to first describe the method for a simple example, based on the inequalities for the case L = O = S = 2 derived in Ref. [19]. The original inequalities were derived by computing the qubit bound for expressions of the form ∑ x,y=0,1 p(a x b xy |x, y), for a specific choice of the outputs {a x , b xy } x,y where the the algebraic bound 4 is achieved by an extreme point of AoT polytope P 2,2 2 , i.e., p(a x b xy |xy) = 1 for all x, y. Up to symmetries, four expression were derived, namely each one corresponding to one of the extreme points of the AoT polytope P 2,2 2 , which cannot be reached by qubit strategy, namely, e 1 : p(00|00) = p(00|11) = p(01|01) = p(01|10) = 1, and 0 otherwise; e 2 : p(01|00) = p(01|11) = p(00|01) = p(00|10) = 1, and 0 otherwise; e 3 : p(01|00) = p(00|11) = p(01|01) = p(01|10) = 1, and 0 otherwise; e 4 : p(01|00) = p(01|11) = p(01|01) = p(00|10) = 1, and 0 otherwise. (25) In general, to each extreme point e i , with components labelled by a = (a 1 , . . . , a L ), x = (x 1 , . . . , x L ), i.e., [e i ] a, x = p( a| x), we can associate a temporal inequality where c a, x := [e i ] a, x , c a, x ∈ {0, 1} since e i is a deterministic strategy, and C i is the bound for a given dimension d of the quantum system, corresponding to the algebraic bound ∑ a, x c a, x , if the extreme point e i can be reached in dimension d.
We denote the corresponding extreme point of P 2,2 4 as e k = (e k 1 , e k 2 ) with k = (k 1 , k 2 ) and construct the associated inequality The proof of the bound is straightforward where we used the AoT condition to break the probability and C k 2 as an upper bound to the expression , and finally the bound C k 1 . It is obvious that the above result depends only on the way of choosing a deterministic strategy, i.e., an extreme point of the AoT polytope, as a product strategy as in Eq. (27), the way of constructing the corresponding expression B k , and the knowledge of the bounds {C k i } i for the single expressions {B k i } i for k = (k 1 , . . . , k n ). We can then generalize the result as follows.
Theorem 7. Given a collection of n temporal inequalities involving O outcomes, S settings, and length L, and valid for quantum systems of dimension d, then, the following inequality for sequences of length nL B := ∑ a 1 ,..., a n , x 1 ,..., also holds for quantum systems of the same dimension.
The proof of this theorem is analogous to the case of four time steps presented above. As an explicit example for this construction consider the extreme point e k = (e i , . . . , e i ) of length 2n with i ∈ {1, 2, 3, 4} and n ∈ N + and the corresponding inequality B k . Then, according to Theorem 7, it holds that for qubits B k ≤ (C i ) n . Using Theorem 6, it can be easily seen that with a three-dimensional system one can reach the algebraic maximum of B k = 4 n . It follows that the ratio of separation between a qubit and a qutrit is exponentially decreasing with the length of the sequence, i.e. (C i /4) n .

VII. IMPERFECT IMPLEMENTATION OF TIME-INDEPENDENT MEASUREMENTS
The results obtained so far assume that the measurements are time-independent, i.e. the same input indicates that also the same measurement is implemented. Here, we discuss how a deviation from this assumption influences our results. Before proceeding further, it is helpful to remark what we mean by imperfect implementation. What does it mean to "perform the same measurement twice"? Consider the basic example of an apparatus that measures the spin of a particle either along the X direction or the Z direction with probability 1/2 each. Clearly, in each round of the experiment when a sequence of two measurements is performed there is 50% chance that two different measurements are performed, i.e., X, Z or Z, X. However, according to our definition of time-independent quantum instruments, this situation is still allowed, since such an uncertainty is already contained in the definition of quantum instrument. The notion of imperfect implementation, hence, does not deal with random fluctuations in the measurement apparatus, but rather with some time-dependent drift in the parameters describing the measurement apparatus, e.g., a drift in the magnetic field orientation in the spin example. Notice, however, that Markovian time-evolutions can be still be absorbed in the definition of quantum instruments with a proper choice of measurement times.
In the following, we quantify the effect of such imperfect implementations of quantum instruments on the observed correlations. Such deviations can be quantified in terms of the diamond norm [42]. It is also important to remark that in the following, it is more convenient to use the Schrödinger picture for the representation of quantum instruments. This corresponds to take the dual I * of the instruments appearing in Eq. (1), acting now on states rather than observables. To avoid a heavy notation, however, we drop the superscript * in the remaining part of this section.
If I a|x is the desired CP map for input x and outcome a andĨ a|x is the one that is instead implemented in the experiment then ||I a|x −Ĩ|| ≤ for all x and a, where the diamond norm of a CP map I is defined as I := max ρ AB I A ⊗ id B (ρ AB ) tr . Note that from the definition of the diamond norm it straightforwardly follows that tr[Ĩ a|x (ρ) − I a|x (ρ)] ≤ ||I a|x −Ĩ|| for all density matrices ρ. As we will see, this allows us to derive bounds on the influence of such a deviation on quantities that are linear in p(ab . . . |xy . . .). In order to illustrate the basic idea we consider here first two time steps and then three time steps, however, it is straightforward to generalize the bound to an arbitrary number of time steps. In particular, we obtain that and Note that we used here multiple times that tr As a final remark, it is interesting to notice the following. The above argument assumes certain quantum properties of the operations involved, hence, at least some partial characterization of the experimental devices. However, we simply noticed that instruments that are "close" in the quantum mechanical sense (and arguably the diamond norm is the natural distance among them) give rise to probability distributions that are again "close", with an error that scales linearly in the measurement length. Assumptions on such a distance, even if based on quantum mechanics, do not necessarily require a full characterization of the experimental devices. It would be interesting to estimate the diamond norm in a device independent way, then our result helps in the design of improved experimental tests of temporal quantum correlations that rely on minimal assumptions and do not require a complete characterization of the measurement devices.

VIII. CONCLUSION AND OUTLOOK
In this work, we studied the resources required to realize AoT correlations within quantum mechanics. We first identified which extreme points of the AoT polytope can be obtained by using the same protocol followed by some classical post-processing of the input and output for the case of a small number two-outcome measurements. Then we provided for an arbitrary given extreme point the dimension that is necessary and sufficient to realize it. In particular, we showed that this is given by the number of inequivalent futures in the history associated with a point and we gave an explicit protocol that allows one to obtain it. We observed that this protocol does not involve any coherences and hence can be also implemented with a classical system. Moreover, we derived a lower bound on the minimal dimension that is necessary to reach an arbitrary extreme point for a given number of settings S, outcomes O and time steps L and we showed that in the asymptotic limit of long sequences this scales as e αL /L (with α being some constant that depends on O and S).
In a previous work [19], extreme points of the AoT polytope have been used to construct dimension witnesses for sequences of short length. Here, we provided a general method to use these witnesses as building blocks for the construction of dimension witnesses for sequences of arbitrary length. Despite the fact that the bound on the so obtained temporal inequality is not necessarily tight, one finds inequalities which show an exponential scaling with respect to the length of the sequence. Finally, we made quantitative statements on how the bounds on linear temporal inequalities are affected if the assumption that at any time step one is able to implement the same measurement is violated. We showed that small deviations from these assumptions still allow us to deduce lower bounds on the dimension. The bounds obtained so far allow to to certify a dimension of at least three.
There are several possible directions for future research. First, one can consider a general point in the correlation polytope and consider the resources needed for a simulation. This problem is challenging for two reasons: First, the quantum realizations for arbitrary points are difficult to find, and may require a larger dimension than the extremal points [37]. Second, for a general point in the polytope a deterministic protocol is not suitable, so more general concepts, such as hidden Markov models [43] or, more specifically, ε-transducers [44,45] may be useful.
A second interesting problem comes from the observation that our simulation protocols were purely classical, in the sense that they can be implemented using quantum states diagonal in the computational basis. It would be interesting to develop a general theory of temporal correlations, for which the quantum mechanical simulation requires less resources than the classical one, due to effects like coherence [46]. This may open a further way to test quantum devices using temporal correlations.

Appendix A: Proof of Lemma 3
In the following we prove Lemma 3, i.e. we show that the number of RE classes of extreme points of the polytope P 2,3 L is given by Proof. As mentioned in the main text, we consider the ORE classes and impose then the conditions for the relabeling of the inputs. Hence, the relevant symmetry group has the following elements e S 12 S 23 S 13 = S 23 S 12 S 23 = S 12 S 23 S 12 The total number of equivalence classes can be written as where N At each step m, then, we need to choose 2 Q m possible values, i.e., two values for each extra measurement setting added, and we do not count the step m = 1, since this is fixed by the outcome relabeling symmetry. We then have 3 . Again, we do not count the step m = 1 since it is fixed by outcome relabeling symmetry. We can then compute To compute N It is important to notice, however, that if v is invariant, i.e., σ 123 v = v, then also S ij v is invariant, i.e., σ 123 S ij v = S ij v. For instance, for ij = 12, we have σ 123 S 12 v = S 12 S 23 S 12 v = S 12 σ 132 v = S 12 v. Analogous arguments apply to the case ij = 13, 23. This implies that each orbit contains two invariant vectors.  Hence, we have that (A10) Finally, we need to compute the number of orbits for vectors that are not invariant under any permutation. These can be obtained by removing all invariant ones from the total and divide by six, i.e., the number of vectors for each orbit, namely Finally, we have wich proves the lemma.

Appendix B: Proof of Observation 4
For completeness we prove here Observation 4 which is a well known result in quantum state discrimination (cf., e.g., Ref. [40] Ch. 9) and which is used in the main text to identify the minimal dimension of a quantum system that is required to reach a given extreme point of P O,S L . We first show the following lemma which straightforwardly extends to Observation 4. Lemma 8. Let E be an effect of a POVM, i.e. E ≥ 0 and E ≤ 1l, and ρ = ∑ i∈I p i |Ψ i Ψ i | with p i > 0 be the spectral decomposition of a density matrix ρ. Then tr{ρE} = 1 iff E = ∑ i∈I |Ψ i Ψ i | + ∑ k,k ∈K c k,k |Ψ k Ψ k | with I ∩ K = {} and {|Ψ i } i∈I ∪ {|Ψ k } k∈K being an ONB. The matrix E K = ∑ k,k ∈K c k,k |Ψ k Ψ k | is positive semidefinite and E K ≤ 1l.
Proof. If: Inserting E = ∑ i∈I |Ψ i Ψ i | + ∑ k,k ∈K c k,k |Ψ k Ψ k | in tr{ρE} and using that {|Ψ i } i∈I ∪ {|Ψ k } k∈K is an ONB as well as that ∑ i∈I p i = 1 readily proves the statement. Only if: Writing E in the basis {|Ψ i } i∈I ∪ {|Ψ k } k∈K , i.e. E = ∑ l,l ∈I∪K c l,l |Ψ l Ψ l , and inserting in tr{ρE} = 1 one obtains that ∑ i∈I c ii p i = 1. As 0 ≤ E ≤ 1 1 it holds that 0 ≤ c ii ≤ 1. Moreover, using that ∑ i∈I p i = 1 and p i > 0 it therefore follows that c ii = 1 ∀i ∈ I. It can be easily seen that this condition and E ≤ 1l can only be simultaneously fulfilled if c ik = 0 for i ∈ I and k = i. More precisely, due to E ≤ 1l it has to hold that Ψ i | E † E |Ψ i = ∑ k∈K∪I |c ki | 2 ≤ 1. As c ii = 1 for i ∈ I we have that c ki = c * ik = 0 for k = i. Hence, E is of the form E = ∑ i∈I |Ψ i Ψ i | + ∑ k,k ∈K c k,k |Ψ k Ψ k |. Note that it follows immediately from E ≥ 0 and E ≤ 1l that E K = ∑ k,k ∈K c k,k |Ψ k Ψ k | ≥ 0 and E K ≤ 1l.
It follows that states giving, with probability one, different outcomes for the same sequence of measurements have ranges corresponding to orthogonal subspaces.
Proof. Using Lemma 8 if follows from tr{ρ 1 E} = 1 that E = ∑ i∈I |Ψ i Ψ i | + ∑ k,k ∈K c k,k |Ψ k Ψ k | with I ∩ K = {} and {|Ψ i } i∈I ∪ {|Ψ k } k∈K being an ONB. Denoting ∑ k,k ∈K c k,k |Ψ k Ψ k | by E K we have that tr{ρ 2 E} = ∑ i∈I Ψ i | ρ 2 |Ψ i + tr{E K ρ 2 } = 0. As ρ 2 ≥ 0 and E K ≥ 0 we have that Ψ i | ρ 2 |Ψ i = 0 ∀i ∈ I and it can be easily seen that this implies Ψ i |Φ l = 0 for all i ∈ I and l ∈ L. . . . Here the subtrees e i have the following properties: a) In the first time step all measurements yield outcome "0". b) They are all chosen to be different. c) At least one tuple assigned to the second time step does not correspond to (0, 0, . . . , 0).

Appendix C: A potentially improved lower bound on the dimension needed to realize any extreme point
In the main text we discussed a way to construct an extreme point that yields a lower bound on the necessary dimension to realize any extreme point. For any O and S this construction allowed to provide a closed formula for the scaling of the bound with respect to the length of the sequence. Here we discuss a different construction which gives a potentially better lower bound. In order to do so we consider the following extreme point. All tuples that are assigned to a time step j < k correspond to (0, 0, . . . , 0), i.e. for the first k − 1 time steps one obtains outcome "0" for all settings. In time step t k emerging subtrees T k,m have the property that in the root node all settings yield outcome "0", however in the second time step at least one of the tuples is not of the form (0, 0, . . . , 0). Moreover, all of these subtrees are chosen to be different, see Fig. 7.
Note that therefore all possible futures assigned to a time step i ≤ k are not equivalent to each other. As discussed in the proof of Theorem 6 the number of inequivalent futures corresponds to the necessary dimension. Hence, one obtains straightforwardly a lower bound on the dimension given by ∑ k i=1 S i−1 . In order to obtain the best possible bound of this form it remains to identify the largest k for which such a construction is possible. Recall that S k−1 is the number of futures that can be assigned to time step t k , L − k is the length of these futures and (O S ) S L−k+1 −S S−1 is the number of different futures of length L − k for which the starting node is given by (0, 0, . . . , 0). The latter can be shown analogously to the proof of Lemma 1 which can be straightforwardly extended to an arbitrary number of outcomes. Note, however, that for an arbitrary O the condition that the first tuple corresponds to (0, 0, . . . , 0) does not uniquely identify one element of an ORE class.The number of futures of length L − k for which all tuples in the second time step are equal to (0, 0, . . . , 0) is given by This is due to the fact that the number of different possibilities to assign tuples in the second time step is (O S ) S . With this it follows that one has to identify the largest natural number k such that k ≤ j and