Deriving robust noncontextuality inequalities from algebraic proofs of the Kochen-Specker theorem: the Peres-Mermin square

When a measurement is compatible with each of two other measurements that are incompatible with one another, these define distinct contexts for the given measurement. The Kochen-Specker theorem rules out models of quantum theory that satisfy a particular assumption of context-independence: that sharp measurements are assigned outcomes both deterministically and independently of their context. This notion of noncontextuality is not suited to a direct experimental test because realistic measurements always have some degree of unsharpness due to noise. However, a generalized notion of noncontextuality has been proposed that is applicable to any experimental procedure, including unsharp measurements, but also preparations as well, and for which a quantum no-go result still holds. According to this notion, the model need only specify a probability distribution over the outcomes of a measurement in a context-independent way, rather than specifying a particular outcome. It also implies novel constraints of context-independence for the representation of preparations. In this article, we describe a general technique for translating proofs of the Kochen-Specker theorem into inequality constraints on realistic experimental statistics, the violation of which witnesses the impossibility of a noncontextual model. We focus on algebraic state-independent proofs, using the Peres-Mermin square as our illustrative example. Our technique yields the necessary and sufficient conditions for a particular set of correlations (between the preparations and the measurements) to admit a noncontextual model. The inequalities thus derived are demonstrably robust to noise. We specify how experimental data must be processed in order to achieve a test of these inequalities. We also provide a criticism of prior proposals for experimental tests of noncontextuality based on the Peres-Mermin square.


Introduction
Ontological models of quantum theory are an attempt to explain the statistical predictions of quantum theory. They take every system to be associated with a space of possible physical states, termed ontic states, every quantum state to be represented by a statistical distribution over these ontic states, and every measurement to be represented by a conditional probability distribution for the outcome given the ontic state [1]. Hidden variable models are examples of ontological models, but so is the physicistʼs orthodox conception of quantum theory, wherein the ontic states are simply the pure quantum states, not supplemented by any additional variables 4 .
The principle of noncontextuality is an assumption about ontological models that seeks to capture a notion of classicality. It started its life as an assumption about outcome-deterministic ontological models of quantum theory, that is, ontological models wherein the outcome of every measurement was fixed deterministically by the ontic state (in contrast to the orthodox conception). This assumption was famously demonstrated to be in contradiction with the predictions of quantum theory by Kochen and Specker [2] and Bell [3]. The Kochen-Specker theorem is one of the strongest constraints on the intepretation of quantum theory. Furthermore, failing to admit of a noncontextual model appears to be a resource. For instance, in the context of the state injection model for quantum computation [4,5], the failure of noncontextuality has been shown in some cases to be necessary for achieving universal quantum computation [6,7].
In [8], a generalized notion of noncontextuality was proposed. For measurements, it constitutes a relaxation of what was assumed by Kochen-Specker and by Bell. Specifically, it allows the assignment of measurement outcomes by ontic states to be indeterministic. In this way, it redefined the notion of noncontextuality for measurements in a way that excised the notion of determinism. This is desirable from a foundational perspective as it allows one to separate the issue of noncontextuality from that of determinism (recall that Bellʼs notion of local causality does not presume that the outcomes of measurements are fixed deterministically). Additionally, it can be shown that the assumption of outcome determinism is unwarranted for any unsharp measurement (i.e., a measurement for which one cannot find a basis of preparations relative to which it is perfectly predictable), and every measurement appearing in a real experiment is of this sort [9]. As such, this generalization is important if one hopes to turn the proven theoretical advantages for computation into practical advantages, because in practice, sharpness is an idealization that is never strictly satisfied.
Although the revised notion of noncontextuality yields a weaker constraint on the representation of measurements in the ontological model than did the traditional notion 5 , it naturally applies not only to measurements but to preparations as well, and thereby implies novel constraints on how quantum states can be represented by distributions over ontic states in the model 6 . It was argued in [8] that whatever motivations can be given for assuming noncontextuality for one type of procedure, such as a measurement, this same motivation can be given for assuming it of any other type of procedure, such as a preparation. Consequently, the only natural assumption to consider in this approach is that the revised notion of noncontextuality applies to all procedures. This assumption is termed universal noncontextuality or simply noncontextuality. We will henceforth refer to the traditional notion of noncontextuality as KS-noncontextuality (for Kochen-Specker) to avoid any confusion.
In [8], it was shown that quantum theory does not admit of a universally noncontextual ontological model. It was also demonstrated that if one replaces the assumption of KS-noncontextuality for measurements with the assumption of universal noncontextuality for all procedures, the relaxation of the constraints on the representation of measurements is compensated by the strengthening of the constraints on the representation of preparations in such a way that any proof that quantum theory fails to admit of a KS-noncontextual model can be translated into a proof that it fails to admit of a universally noncontextual model.
Much of the research on noncontextuality to date has centred on the question of whether quantum theory admits of a noncontextual model. A more general question, which has been the impetus for much recent work, is whether one can devise a direct experimental test of the assumption of a noncontextual ontological model, one that is independent of the validity of quantum theory. Just as a Bell inequality is a constraint on experimental statistics that follows directly from the assumption of a locally causal ontological model, without any reference to the quantum formalism, what one wants of a test of noncontextuality is a constraint on experimental statistics that follows directly from the assumption of a noncontextual ontological model, without any reference to the quantum formalism. Such constraints will here be termed noncontextuality inequalities. If experimental statistics are found to violate these inequalities, then one can conclude that not just quantum theory but any operational theory that can do justice to the experimental statistics-and therefore nature itself-must fail to admit of such a model, thereby constraining the form of all future physical theories.
The generalized notion of noncontextuality proposed in [8] was defined in such a way as to be applicable to any operational theory, not just quantum theory, such that if an experiment yields data supporting an operational theory distinct from quantum theory, the question of whether it admits of a noncontextual model is still meaningful. The definition asserts that an ontological model of an operational theory is noncontextual if two experimental procedures that are statistically indistinguishable at the operational level are statistically indistinguishable at the ontological level. They key point is that the notion of statistical indistinguishability at the operational level can be assessed in any operational theory 7 .
It has been shown that violations of noncontextuality inequalities defined in terms of this notion can imply advantages for information processing which are independent of the validity of quantum theory. For example, they imply an advantage for the cryptographic task of parity-oblivious random access codes [10][11][12]. Such inequalities also hold promise for making the results on quantum computational advantages discussed above robust to noise and for expressing the origin of the advantage in a manner that is independent of the validity of quantum theory.
Several recent works have considered the question of how to derive noncontextuality inequalities and how to subject them to experiment test [13][14][15]. The present work is concerned with a special case of this problem, namely, how to derive noncontextuality inequalities starting from any given proof of the Kochen-Specker theorem, that is, from a proof of the failure of KS-noncontextuality in quantum theory. As noted above, [8] showed how, in general, to convert a proof of the failure of KS-noncontextuality in quantum theory into a proof of the failure of universal noncontextuality in quantum theory, so the outstanding problem is how to convert a proof of the failure of universal noncontextuality in quantum theory into an operational noncontextuality inequality.
Note that any test of noncontextuality that is devised from a particular no-go theorem requires an experimentalist to target a particular set of preparations and a particular set of measurements, each with specified relations holding among their members (we will say more about the nature of these relations in due course). A more general version of the problem, however, is to figure out how to infer from any experimental data-that is, from an experiment that was not designed to target particular preparations or measurements or any particular relations among them-whether or not it admits of a noncontextual model. Because a test of noncontextuality is a test of classicality, having the capability to test the assumption of noncontextuality on any experimental data is clearly of greater utility than merely knowing how to implement a dedicated experiment for testing the hypothesis of noncontextuality. Pusey [15] identified the conditions that are both necessary and sufficient for the existence of a noncontextual model for experimental data derived from the simplest experimental scenario in which such conditions are expected to be nontrivial. Unfortunately, this simplest scenario does not arise within operational quantum theory 8 . Extending Puseyʼs analysis to more general scenarios is an important open problem.
Nonetheless, there are also advantages to building sets of noncontextuality inequalities from specific proofs of the Kochen-Specker theorem, because such proofs have nontrivial structural properties. Different proofsand there is now a great diversity of these-capture what is surprising about the failure of noncontextuality in different ways, and these intuitions are likely to be helpful in identifying the applications thereof.
We here focus on deriving noncontextuality inequalities from state-independent proofs of the Kochen-Specker theorem.
Reference [14] has already demonstrated how one can derive one such inequality from any stateindependent geometric proof of the Kochen-Specker theorem, that is, any proof expressed in terms of an uncolourable set of rays. Here, we extend this work in two important ways: (1) we provide a technique for finding all of the noncontextuality inequalities that apply to a certain set of correlations starting from any stateindependent proof of the Kochen-Specker theorem, and (2) we show how to do so for proofs that are expressed algebraically rather than geometrically. We expand on each of these points presently, in reverse order.
The distinction between geometric and algebraic proofs of the failure of KS-noncontextuality in quantum theory is not fundamental because one can convert any algebraic proof into a geometric form and vice-versa. Nonetheless, each proof style has its advantages. The first known proofs were geometric uncolorability proofs. Algebraic proofs arose later, but in many respects they have a logic that is easier to grasp. Indeed, the paradigm example of a proof of the Kochen-Specker theorem is now arguably the algebraic version of the Peres-Mermin square proof [16,17], which will be the example we focus on here.
Furthermore, the algebraic structure suggests generalizations of these proofs that might not be obvious from the geometric perspective [18,19]. Although one could derive a noncontextuality inequality for the Peres-Mermin square by first expressing the latter as a geometric proof (as in [16]) and then applying the technique described in [14], it is more useful to have a technique for deriving noncontextuality inequalities that is native to the algebraic approach. We here provide such a technique.
In order to turn a proof of the failure of universal noncontextuality in quantum theory into a noncontextuality inequality, one must operationalize the description of the experiment provided in the no-go theorem, purging it of any reference of the quantum formalism, and one must robustify the constraints on experimental data that are derived from noncontextuality, which means that these constraints must provide quantitative bounds that can be violated in principle even if the experimental operations are noisy. This progression was achieved in [14], but the resulting inequality provided an upper bound on just a single operational quantity (an average, over certain preparation-measurement pairs, of the degree of correlation between them). The technique described in the present article goes much further towards providing a means of 8 Recall that a set of measurements is said to be tomographically complete for a system if the statistics for any measurement on the system can be computed from the statistics of the measurements in this set. Puseyʼs simplest scenario is one wherein a tomographically complete set of measurements consists of just two binary-outcome measurements. This scenario does not arise in operational quantum theory, because the simplest quantum system, a qubit, requires three binary-outcome measurements for tomographic completeness. deriving all of the noncontextuality inequalities that hold for a given set of preparations and measurements. Although we focus on a subset of the correlations between preparations and measurements that arise in the construction, for this restricted set of experimental data, satisfaction of the inequalities that we derive is both necessary and sufficient for the existence of a noncontextual model.
Finally, we note a difference in the way experiments are described in this article relative to previous treatments of inequalities for universal noncontextuality [10,13,14]. We here use the notion of a source, that is, a process which samples a classical variable from a distribution, chooses which preparation procedure to implement on the system based on the value sampled and outputs both the system and the variable. This choice ensures that our derived noncontexuality inequalities are easier to compare with Bell inequalities.
The remainder of the paper is structured as follows.
In section 2, we provide an overview of operational theories (2.1) and ontological models (2.2). In particular, we discuss the concepts of operational equivalence and of compatibility (applied to measurements and sources) and illustrate the concepts with quantum examples. We provide formal definitions of measurement noncontextuality and preparation noncontextuality, in particular, a characterization of these assumptions in terms of expectation values for the outcomes of measurements and sources given the ontic state.
In section 3, we review the well-known proof of the failure of KS-noncontextuality in quantum theory based on the Peres-Mermin square (3.1), and we show how to translate this no-go theorem into one that demonstrates the failure of universal noncontextuality in quantum theory (3.2).
Section 4 is the heart of the article, describing our technique for turning quantum no-go theorems into operational noncontextuality inequalities. In the first section (4.1), we operationalize the description of the quantum measurements and sources that appear in the Peres-Mermin-inspired proof of the failure of universal noncontextuality, thereby obtaining a notion of a Peres-Mermin experimental scenario that is purged of any reference to quantum theory. This provides a template for how to achieve this operationalization for any such construction. The following five sections (4.2-4.6) describe how to derive noncontextuality inequalities from such an operational construction, using Peres-Mermin as the illustrative example. We also show how the ideal quantum realization of the measurements and sources in the Peres-Mermin scenario violate these inequalities (4.6.1), and we demonstrate the robustness of these inequalities to noise (4.6.2), by showing how they can be violated by partially depolarized versions of the ideal quantum realizations of the measurements and sources.
In section 5, we clarify what must be done experimentally in order to test the noncontextuality inequalities we have derived, and in section 6 we provide our concluding remarks.
Appendix A discusses the problem of computationally converting between the vertex and halfspace representations of a polytope, appendix B discusses the symmetries of our noncontextuality inequalities under deterministic processings of the experimental procedures, and appendix C demonstrates that a certain class of inequalities on experimental statistics are trivial. Finally, appendix D reviews a previous proposal for how to implement an experimental test of noncontextuality based on the Peres-Mermin square, and argues against its adequacy.

Preliminaries
2.1. Operational concepts 2.1.1. Operational theories The primitive elements of an operational theory are preparations and measurements, each specified as lists of instructions to be performed in the laboratory.
A source is a device that implements one of a set of preparation procedures on a system, sampled from some probability distribution, and has a classical outcome that heralds which preparation has in fact been implemented. (The use of the term 'source' to refer to such a device is conventional in both classical and quantum Shannon theory, where it is the standard way of modelling the input to a communication channel [20].) We will denote a source by S and the variable describing its classical outcome by s .
A measurement, denoted M , accepts as input a system and returns a classical outcome, denoted by the variable m .
An operational theory provides an algorithm for computing the probability distribution for the outcome of any measurement acting on any preparation, and consequently it allows the computation of the joint probability distribution over the outcome of any measurement M and the outcome of any source S, m s M S pr , , { | }. We refer to this as simply the joint distribution on the measurement-source pair (M, S).

Operational equivalence
Consider two measurement procedures, M 1 and M 2 , whose outcomes are random variables, denoted m 1 and m 2 respectively. M 1 and M 2 are said to be operationally equivalent if they define the same joint distribution for all Note that the notion of compatibility for quantum measurements that we have articulated above [22] concerns only their retrodictive aspect and makes no reference to how the quantum state of a system evolves as a result of the measurement. In other words, it is sufficient to know which POVM is associated to the measurement, while the instrument that is associated to it, i.e., the set of update maps for each outcome, is irrelevant (indeed, it is not even required that there be an update map-the quantum system could be destroyed in the measurement process). This contrasts with the notion of compatibility that is the focus of many other works seeking to devise experimental tests of noncontextuality [23], where two measurements on a system are deemed compatible if implementing them in one temporal order gives the same statistics as implementing them in the opposite temporal order. The joint-simulatability notion of compatibility articulated above pertains not just to sharp measurements (represented by projector-valued measures) but to all unsharp measurements as well (represented by POVMs). In particular, it allows nontrivial compatibility relations among unsharp measurements that are associated to POVMs wherein the different elements of the POVM do not commute with one another, whereas such POVMs need not even compatible with themselves according to the temporalreordering notion of compatibility. The wide scope of applicability of the joint-simulatability notion of compatibility makes it particulary well-equipped to contend with experimental noise and imperfections in tests of noncontextuality.
A given POVM defines an operational equivalence class of measurements insofar as it can be implemented in many different ways. This is because any given POVM is generally compatible with many other POVMs which are not compatible with one another, and for each such compatible set of POVMs, there is a different experimental procedure, and hence a different concrete realization of the given POVM. The compatible set of which the POVM is considered a part is therefore an example of a measurement context. Note that if one particularizes the definition of compatibility to projector-valued measures, then the condition for compatibility becomes commutativity of the associated observables, and we recover the standard notion of a measurement context of an observable as the commuting set of observables of which it is considered a part. The important point, however, is that in addition to recovering the standard notion of measurement context for sharp (i.e., projective) measurements, one has a notion of measurement context also for unsharp measurements.
In a similar fashion, a given ensemble of quantum states defines an operational equivalence class of sources because it too can be implemented in many different ways, depending on which compatible set of ensembles it is considered to be a member of. The compatible set of ensembles of which it is a member constitutes the source context. It will later be useful to distinguish between sharp and unsharp quantum sources, where sharp sources are those consisting entirely of states that are normalizd projectors. Clearly, the notion of compatibility of 9 The specialization of our notion of compatibility to the quantum case allows us to clarify our motivation for restricting the scope of the notion to pairs of sources satisfying equation (9): if we did not restrict the notion in this manner, then two sources could be incompatible simply by virtue of averaging to different states, while the component states in the two sources were all diagonal in the same basis. For the purpose of evaluating the possibility of a noncontextual model, one prefers to have a notion of compatibility wherein sources being incompatible guarantees that they are not jointly diagonalizable. sources that we have introduced applies equally well to sharp and unsharp quantum sources, just as the notion for measurements applies to the sharp and unsharp cases alike.

Ontological concepts 2.2.1. Ontological models
As proposed in [8], generalized noncontextuality is a constraint on an ontological model of an operational theory. An ontological model is an attempt to reproduce the predictions of the operational theory by imagining that the correlations between the outcome of the source and that of the measurement are explained by the physical system that acts as a causal mediary between them. All of the physical attributes of the system at any given point in time is termed the ontic state of the system at that time. We shall denote this by λ, and the space of all possible ontic states of the system will be denoted by Λ.
Consider the most general way of representing a measurement procedure M in an ontological model. The output m might not be completely determined by the ontic state λ of the system. Instead, specifying M might only specify the conditional probability of obtaining output m if the system is in the ontic state λ. This could arise because of objective indeterminism or because the outcome of the measurement depends not only on the input system but also on degrees of freedom of the measurement apparatus. We denote this conditional probability by m M , x l ( | ) and refer to it as the response function associated to M . Similarly, the most general way of representing a preparation procedure in an ontological model is to allow that the preparation does not uniquely fix the ontic state of the system, but rather that the ontic state might only be sampled probabilistically from a distribution that is specified by the preparation. This implies that the most general way of representing a source S in an ontological model is as a joint distribution over its outome s and the ontic state λ that it outputs, s S , m l ( | ). The purpose of an ontological model for an operational theory is to reproduce the statistics of that theory. This occurs if the ontological model is such that for all sources S and measurements M , It is useful to express the connection between the ontological model and the operational theory in terms of expectation values as well.
The expectation value of outcome m of measurement M for the ontic state λ is The expectation value of outcome s of measurement S for the ontic state λ is a retrodictive expectation value and so is a bit more subtle to express. Consider a source S with classical outcome s that is associated with the joint distribution s S , m l ( | ). What probability ought one to assign to the outcome variable s having taken a particular value if one knows that the ontic state emitted by the source was λ? The answer is given by a simple Bayesian inversion: We can then use this conditional probability to define an expectation value for an outcome s of a source S given knowledge of the ontic state λ, as: where we have used equations (12)- (14) and (16). We will use the latter expression when deriving our quantum no-go theorems and noncontextuality inequalities.

Measurement noncontextuality
The assumption of measurement noncontextuality stipulates that if two measurements M 1 and M 2 are operationally equivalent, then the response functions associated to these measurements in the ontological model are equal. Equivalently, we can express measurement noncontextuality as the assumption that the response function associated to a measurement depends only on its operational equivalence class and not on the measurement context (which explains the appropriateness of the term 'noncontextual'). Denoting the operational equivalence class of M 1 and M 2 by  (with outcome denoted by m ), and denoting the response function for M 1 and M 2 and  by m M , The fact that the notion of measurement noncontextuality does not include the assumption of outcome determinism translates, in the case of quantum observables, to the fact that these expectation values are not assumed, a priori, to lie in the eigenspectrum of O .
The generalized notion of noncontextuality allows one to extend this analysis to the case of a quantum measurement that is associated to a POVM that is nonprojective. If it can be measured jointly with either one of two other POVMs which are not compatible with one another, then the assumption of measurement noncontextuality implies that the expectation value of its outcome given the ontic state λ should be independent of which of the two other POVMs it is measured jointly with.

Preparation noncontextuality
The assumption of preparation noncontextuality has previously been expressed in terms of individual preparations. However, in this article we will be describing experiments in terms of sources, and so we will here express it in the language of sources.
The assumption of preparation noncontextuality stipulates that if two sources S 1 and S 2 are operationally equivalent, then the joint distributions over ontic states and outcomes that represent these sources in the ontological model are equal. Equivalently, the joint distribution over ontic states and outcomes representing a source depends only on the operational equivalence class of that source. Denoting the operational equivalence class of S 1 and S 2 by  (with outcome denoted by s ), and denoting the joint distribution over ontic states and outcomes for S 1 and S 2 and  by s S , 1 1 , and s ,  m l ( | )respectively, the assumption of preparation noncontextuality can be formalized as follows: , then : , , , Note that an individual preparation can be understood as a special kind of source, one wherein the outcome is trivial (i.e. taking a value in a singleton set), and for such sources the definition of noncontextuality provided above reduces to the standard one for preparations articulated in [8].
We can also express this assumption in terms of expectation values, as we did for the case of measurements, but with a critical difference, as noted above equation (16): for sources, the relevant expectation values concern retrodictions rather than predictions.
If two sources, S 1 and S 2 , are operationally equivalent then not only does preparation noncontextuality imply that the distributions s S , 1 1 m l ( | ) and s S , 2 2 m l ( | ) are equal (and hence the marginals S 1 m l ( | ) and S 2 m l ( | ) are equal as well), it also implies, via equation (14), that the conditional distributions s S , 1 1 m l ( | ) and s S , 2 2 m l ( | ) are equal as well. That is, applying equations (14)-(21), we find that we can express the assumption of preparation noncontextuality as It is apparent, therefore, that the assumption of noncontextuality for sources is a kind of retrodictive analogue of the assumption of noncontextuality for measurements. An ontic state λ will be said to be make a deterministic assignment to an equivalence class of sources  if We again pause to illustrate these notions by specializing to the quantum case. Consider a quantum source associated to the ensemble p s s  r = { }. Suppose that  is compatible with an ensemble p s s }and that it is also compatible with an ensemble p s s }but that 1  and 2  are not compatible with one another. In this case, there are two operationally equivalent ways of implementing the source associated to  , namely, by implementing it jointly with 1  and by implementing it jointly with 2  . If s 1 , 1   á ñ l ( ) denotes the expectation value for the outcome of the source  when it is implemented jointly with 1  and s 2 , 2   á ñ l ( ) denotes the expectation value for the outcome of the source  when it is implemented jointly with 2  , then preparation noncontextuality implies that s s s : .

Universal noncontextuality
An operational theory will be said to admit of a universally noncontextual ontological model if it admits of an ontological model that is noncontextual for all experimental procedures, and therefore for both the preparations and the measurements [8].

Quantum no-go theorems based on the Peres-Mermin square
KS-noncontextuality can be understood as the conjunction of the assumption of measurement noncontextuality defined above and the assumption that the ontic state assigns outcomes to projective measurements deterministically. It is well known that quantum theory does not admit of a KS-noncontextual model. The Peres-Mermin proof [16,17] is particularly intuitive and has therefore become a paradigm example. Reference [8] provided several reasons for adopting the notion of universal noncontextuality described in the previous section rather than KS-noncontextuality. First of all, the reference to projective measurements in the definition of KS-noncontextuality makes it clear that the notion of KS-noncontextuality can only be applied to quantum theory and leaves open the question of how to apply it to other operational theories or to experimental data. Furthermore, it has been argued in [8,9] that the notion of universal noncontextuality stands to the notion of KS-noncontextuality as the notion of local causality stands to that of local determinism (defined by Bell in [24]). The problem with both KS-noncontextuality and local determinism is that in the face of a contradiction, one can always salvage the spirit of noncontextuality or locality by simply abandoning determinism. Such an option is not available if one derives a contradiction from universal noncontextuality or local causality.
It was shown in section VIII of [8] that one can turn any no-go theorem for a KS-noncontextual model of quantum theory into a no-go theorem for a universally noncontextual model of quantum theory 10 . In this section, we carry out this translation for the proof based on the Peres-Mermin square. This will serve to clarify the constrast between KS-noncontextuality and universal noncontextuality. However, the main purpose of this section is to provide the reader with some intuition about how the contradiction arises, so that she may better follow our technique for deriving noncontextuality inequalities from the Peres-Mermin construction.

No-go theorem for KS-noncontextuality based on the Peres-Mermin square
The Peres-Mermin magic square construction [16,17] consists of nine observables, each defined on two qubits and each expressible as a product of Pauli operators. Denoting the four Pauli operators by: 10 The proof of this result relies on preparation noncontextuality, together with two facts about quantum theory: (i) for every observable, there is a basis of quantum states each element of which makes its outcome perfectly predictable and (ii) the uniform mixture of any basis of quantum states is operationally equivalent to the uniform mixture of any other such basis.
the nine observables relevant to the construction are as follows: These are organized into a 3×3 grid (the 'square') to visually represent their commutativity properties: the three observables on any row or column of the square commute and therefore are jointly measurable. KS-noncontextuality implies that every observable in the square is assigned a value deterministically by the ontic state λ and independently of whether that observable is measured together with the other observables in its row or whether it is measured together with the other observables in its column. The deterministic value assigned to observable O by the ontic state λ we denote by O l ⌊ ⌋ . Because an observable is only ever found to take values from the eigenspectrum of the associated operator, it follows that the deterministic assignments to O by λ can only take values in this set, Finally, for any set of observables that can be jointly measured, the functional relations that hold among the observables in the set must also hold among the values assigned to them by the ontic state. This follows from the fact that if a given functional relation failed to hold for the values assigned by the ontic state, then the ontological model would predict that it failed to hold for the values obtained in a joint measurement. In the Peres-Mermin square, one can show that the product of the observables along each of the rows and along each of the first two columns is   Ä , while the product of the observables on the last column is   -Ä . Therefore, the functional relations are Together with the fact, inferred from equation (27), that we conclude that the functional relations holding among the deterministic assignments to the nine observables in the Peres-Mermin square are However, given this constraint, the set of equations (30a)-(30f) has no solution. To see this, it suffices to note that the product of the left-hand sides of the six equations is +1 (because every term appears squared in this product), while the product of the right-hand sides of the six equations is −1. We have thereby arrived at a contradiction.

No-go theorem for universal noncontextuality based on the Peres-Mermin square
As noted in section 2.2.2, unlike KS-noncontextuality, measurement noncontextuality allows for measurement outcomes to be assigned indeterministically by the ontic state. Obtaining the contradiction in the previous section relied critically on this assumption of deterministic assignments. Indeed, once one allows indeterministic assignments, one finds that there are, in fact, many noncontextual assignments to the observables in the Peres-Mermin square. For instance, every quantum state defines such an assignment through the Born rule, and there are other valid assignments as well which do not arise from the Born rule (but could arise in some putative post-quantum theory). At first glance, therefore, it may seem that by replacing the notion of KS-noncontextuality with the generalized notion of noncontextuality proposed in [8], one has lost the possibility of deriving a contradiction. However, although the generalized notion of noncontextuality does indeed weaken the constraints on how one represents measurements in a noncontextual ontological model, it also introduces a novel constraint on how one represents preparations. By availing oneself of the assumption of preparation noncontextuality, one can again derive a no-go theorem for a noncontextual model of quantum theory. Each observable on a two-level quantum system represents an operational equivalence class of binaryoutcome measurements. We denote the observable in position i j , of the Peres-Mermin square by O ij . The projector-valued measure associated to the observable O ij consists of the pair of orthogonal rank-2 projectors: which correspond respectively to the +1 and −1 eigenspaces of O ij . We also define an equivalence class of binary-outcome quantum sources for each of the observables as follows. For each observable O ij , we consider the quantum source associated to the 2-element ensemble are normalized density operators. Note that each of these quantum sources defines the same average state, namely, the completely mixed state 1 2 . If we arrange these nine quantum sources into a square, then they are compatible along the rows and the columns. For example, in the first row of the square, the three sources are seen to be compatible by virtue of the fact that they can all be obtained by post-processing of the outcome of a single 4-outcome source, namely, the one associated to the uniform ensemble of joint eigenstates of the set of commuting observables associated to that row. The other rows and columns are analogous. We will speak of the source version of the Peres-Mermin square to refer to the compatibility relations among the sources.
Given these definitions, it is clear that when the measurement associated to the observable O ij is implemented on the source associated to the ensemble ij  (i.e., the source at the same location in the square), the outcome m of the measurement is perfectly correlated with the outcome s of the source and the marginal distribution over either outcome is uniform: This can be expressed equivalently as i j m s , : Now consider what this implies for any putative noncontextual ontological model of the experiment. Denoting the expectation value for the outcome m of the measurement of observable O ij given ontic state λ by m O , ij á ñ l , and the expectation value for the outcome s of the source associated to the ensemble ij  given ontic state λ by , ij s  á ñ l (note that the latter is a retrodictive expectation), then given equation (17), we have An assumption of measurement noncontextuality has been made at this stage because we have assumed that the expectation value m O , ij á ñ l depends only on O ij and not on what other observables were measured together with it, that is, we have assumed that this expectation value is independent of whether we measure O ij with the other observables in the same row of the Peres-Mermin square or with the other observables in the same column. Similarly, an assumption of preparation noncontextuality has been made at this stage because we have assumed that s , ij  á ñ l depends only on ij  and not on which set of compatible ensembles are implemented jointly with it, those on the same row of the source version of the Peres-Mermin square or those on the same column.
Finally, the assumption of preparation noncontextuality has an additional consequence that equation (36) does not yet fully incorporate. For each of the nine binary-outcome quantum sources, if one marginalizes over its oucome, one obtains the source that simply prepares 1 2 , the average state associated to the ensemble. Recall that no quantum measurement can distinguish the different ensembles by which the completely mixed state might have been prepared. Therefore, for any given pair of quantum sources, ij  and i j  ¢ ¢ , the pair of quantum sources one obtains by marginalizing over their outcomes are operationally equivalent. If is the representation in the ontological model of the quantum source ij  , then the quantum source that one obtains by marginalizing over its outcome s is represented in the ontological model by Applying the assumption of preparation noncontextuality, equation (21), to the operational equivalence of ij  and , Substituting this into equation (36), we finally obtain Now we are in a position to derive a contradiction. The only way to reproduce the perfect correlations of equation (35) is if for all λ in the support of and for all i j , , In other words, every ontic state in the support of must assign perfectly correlated outcomes to the source and measurement when these are associated to the same observable, and the only way to achieve this is if it assigns these outcomes deterministically. However, any deterministic assignment to all of the measurements in the Peres-Mermin square must satisfy the functional relationships that hold among the outcomes of the compatible subsets of those measurements, that is, it must satisfy equations (30a)-(30f). But following the standard argument (reviewed in the previous section), there are no such deterministic assignments, so we have arrived at our contradiction 11 .
It follows that if one entertains the hypothesis that a given experiment is, in fact, described by a noncontextual ontological model, then one expects that for some subset of the nine source-measurement pairs, the correlations will be imperfect. The noncontextuality inequalities that we derive for the Peres-Mermin scenario will capture the precise tradeoffs among the strengths of these nine correlations.
First, however, we must operationalize our description of the experiment, which is to say that we must purge it of any reference to the quantum formalism. 4. From the quantum no-go theorem to noncontextuality inequalities 4.1. A purely operational description of the Peres-Mermin square In section 2.1, we defined operational equivalence relations and compatibility relations among experimental procedures in a manner that made reference only to experimental statistics, without appeal to the quantum formalism. Here, we use these notions to express the relations must hold among a set of measurements and sources in an operational version of the quantum Peres-Mermin construction. Any experiment satisfying all of these relations will be termed an operational Peres-Mermin scenario.
We start with the measurements. There are 9 distinct equivalence classes of binary-outcome measurements, which we label by ij Laying these out in a 3×3 square, where the measurement ij  appears at the ith row and jth column, each triple of measurements making up a row or a column of the square constitutes a compatible set of measurements. This is depicted in the compatibility hypergraph of figure 1.
By the definition of compatibility for measurements, equation (7), this implies that for every row and column there exists a measurement that simulates all the measurements on that row or column. We denote the measurement that simulates the triple of measurements in row 1 by M R 1 , the one that simulates the triple in column 1 by M C 1 and so forth. We denote their outcomes by m R 1 , m C 1 , and so forth.
We now turn to the nature of the particular relation that holds between the measurements on a given row or column and the measurement that simulates them. Consider the measurements in the first row. The outcomes of the simulating measurement m R 1 is presumed to be 4-valued, such that it can be presented as an ordered pair of binary outcomes, which we denote by m R ,1 1 and m R ,2 1 . In terms of this notation, the three measurements in the first row are presumed to be obtained from the simulating measurement by the following identification of 11 Note that the contradiction could have been obtained equally well by considering the impossibility of finding deterministic assignments to all of the sources while respecting the functional relations that hold among compatible subsets of these. outcomes, which in terms of the conditional probabilities in equation (7) corresponds to the following post-processings of the simulating measurement: Analogous compatibility relations hold for the second and third rows and for the first and second column. The relations are slightly different for the third column: or in terms of the conditional probabilities, A similar story holds for the sources. There are 9 distinct equivalence classes of binary-outcome sources, which we label by ij } , with compatibility relations described by the hypergraph of figure 2.
By the definition of compatibility for sources, equation (10), this implies that for every row and column there exists a source that simulates all the sources on that row or column. We denote the source that simulates the triple of measurements in row 1 by S R 1 and its outcome by R 1 s , the one that simulates the triple in column 1 by S C 1 and its outcome by C 1 s , and so forth. Each such outcome is presumed to be 4-valued, such that it can be presented as an ordered pair of binary variables, so that s s s , The conditional probabilities, which, by equation (10), define the precise nature of the compatibility relations are exactly the same as for the measurements. For the first row, they are a pr , 45 with analogous relations holding for the other rows and the first and second column, while for the third column, they are a pr , 46

Noiseless quantum realization
It is straightforward to verify that the quantum measurements and quantum sources appearing in the no-go theorem described in section 3.2 instantiate all of the compatibility relations that were described in the previous section.
We begin with the quantum measurements. Take, for example, the three observables in the first row of the Peres-Mermin square. These are associated to the projector-valued measures m m 11 11 11 , m m 12 12 12 and m m 13 13 13 13 Î -+ { } , each corresponding to the projectors onto the pair of eigenspaces of the corresponding observables, as in equation (32). The measurement that simulates all of these is, of course, the one associated to the joint eigenspaces of the three commuting observables, which as a projector-valued The simulation of each of the three measurements in the row is achieved by implementing this PVM and then post-processing its outcome using the three conditional probability distributions specified in equations (42a)-(42c), that is, In a similar fashion, one can verify that the other rows and columns of the Peres-Mermin square of quantum measurements have the compatibility relations described in the previous section.
The nine quantum sources appearing in the no-go theorem of section 3.2 also have the compatibility relations described in the previous section. Consider the first row of the source version of the Peres-Mermin square as an example. The three sources on this row are associated to the ensembles  probabilities appearing in the simulation are precisely those given in equations (45a)-(45c). This fact follows from equations (48a)-(48c). The compatibility relations for the other rows and columns are verified similarly.

Noisy quantum realization
In deriving noncontextuality inequalities, it is critical that one not base these on assumptions that are only valid when the measurements or the sources are noiseless because this ideal is never achieved in real experiments. The compatibility relations outlined in section 4.1 satisfy this desideratum. In the quantum case, for instance, they can be satisfied even if the measurements and sources are not sharp (i.e. not associated to an orthogonal set of projectors). A specific example helps to clarify the point. Suppose the nine sharp measurements appearing in the Peres-Mermin square are replaced by noisy versions thereof, that is, by the nine unsharp measurements that are the images of the sharp measurements under a partially depolarizing channel . In this case, the projector-valued measures are replaced by POVMs that are not projective. For instance, the three measurements in the first row of the Peres-Mermin square are associated to the binaryoutcome POVMs m m 11 11 11 , m m 12 12 12 and m m 13 13 13 The three binary-outcome POVMs are simulated by the 4-outcome POVM using the conditional probabilities in equations (42a)-(42c); to see this, it suffices to apply  to equations (48a)-(48c) and recall that it is a linear map.
Similarly, suppose that the nine sources appearing in the source version of the Peres-Mermin square are replaced by partially depolarized versions thereof. (For simplicitly, we will assume that strength of the noise on the sources is equal to that on the measurements.) In this case, the ensembles associated to the three sources in the first row are , which again follows from the linearity of equations (48a)-(48c).
Recall that a partial depolarization map  can be written as a convex mixture of the identity channel,  , and the channel that traces over the system and reprepares the completely mixed state. An element of the 1-parameter family of such maps is The strength of depolarization is specified by the probability r of realizing the identity map (with lower values of r corresponding to stronger noise).
It follows that the degree of correlation that can be observed between sources and measurements is a function of r. For r 1 < , one no longer achieves the perfect correlations of the noiseless quantum realization, This can be expressed equivalently as i j m r , : . 51 Because the no-go theorem of section 3.2 relied on having perfect correlations, it is not applicable to the noisy quantum realization of the operational Peres-Mermin scenario. Nonetheless, one expects that for values of r sufficiently close to 1, a noncontextual model should still be ruled out. The noncontextuality inequalities that we derive confirm this expectation. They are robust to noise in the sense that they can be violated by values of r strictly less than 1. The lower bound on r that they imply is determined in section 4.6.2. This bound specifies how much noise one can tolerate in the noisy quantum realization of the Peres-Mermin scenario and still rule out a noncontextual model of the experiment.

Expressing operational correlations in terms of noncontextual ontic assignments
Consider an experiment that can realize the nine equivalence classes of measurements and the nine equivalence classes of sources having the compatibility structures of figures 1 and 2 respectively and having the compatibility relations specified in the text, such as equations (42a)-(42c) and (45a)-(45c). There are 81 possible pairings of a source with a measurement. For a given such pairing, say ij  with i j  ¢ ¢ , the experiment yields a joint probability distribution over outcomes, m s ¢ ¢ . Furthermore, we consider only 9 of the 81 possible pairings of a source with a measurement, namely those wherein the source and the measurement are associated with a common label in their respective compatibility hypergraphs. That is, we hereafter consider only those correlations s m ) ( ). We will derive the necessary and sufficient conditions-with respect to these nine correlations-for an experiment to admit a noncontextual model.
For the equivalence class of measurements ij  , there are two associated measurement procedures, which we denote by M ij R and M ij C , with outcomes denoted by m R ij and m C ij , and which correspond to whether ij  is implemented jointly with the other measurements in its row or with the other measurements in its column. Similarly, the equivalence class of sources ij  is associated with two sources, S R ij and S C ij , with outcomes denoted by s R ij and s C ij . The operational equivalences imply that s m s m s m s m ij R ij In other words, measurement noncontextuality warrants the assumption that the expectation value for the outcome of ij  does not depend on the measurement context, that is, whether it is implemented with the measurements in the same row or in the same column of figure 1. Similarly, under the assumption of preparation noncontextuality, every source in the equivalence class ij  is assigned the same retrodictive expectation value by the ontic state, such that because S S , In other words, the expectation value for the outcome of ij  does not depend on the source context, that is, whether it is implemented with the sources in the same row or in the same column of figure 2.
We conclude that The operational equivalence relations among the sources and the assumption of preparation noncontextuality together imply one further simplification of this expression, namely, that ij  m l ( | ) is independent of (i, j), iji j : To see why this is the case, consider the triple of sources in the first row of figure 2, , 11 12   and 13  . By assumption, these are each simulatable by a single source, namely, S R 1 , by post-processing its outcome in the manner specified by the compatibility relations, equations (45a)-(45c). Marginalizing over the outcome of , 11 12   or 13  is simply a further post-processing of S R 1 and consequently the outcome-marginalized versions of these three sources are each operationally equivalent to the outcome-marginalized version of S R 1 and therefore operationally equivalent to one another. The assumption of preparation noncontextuality then implies that the distributions over ontic states associated to these, namely that the three ontic state distributions We pause here to note that this expression for the correlation between the measurement outcome and the source outcome in a noncontextual model has the same form as the expression for the correlation between the measurement outcomes at the two wings of a Bell experiment in a locally causal model of the latter. This provides a particularly intuitive demonstration of the isomorphism between the assumption of local causality and the assumption of preparation noncontextuality for the outcome-marginalized sources articulated in equation (55). Note, however, that the assumptions of noncontextuality articulated in equations (52), (53) cannot be inferred from an assumption of local causality in the corresponding Bell scenario, so that the noncontextuality inequalities that we derive here are not isomorphic to Bell inequalities 12 .
The compatibility relations among the measurements imply constraints on the m ij á ñ l . We will refer to any 9-tuple of expectation values, m m m , , , á ñ á ñ ¼ á ñ l l l ( ), satisfying these constraints as a noncontextual ontic assignment to the measurements. We will see that the set of all such 9-tuples defines a polytope in a 9-dimensional space, which we term the noncontextual measurement-assignment polytope. Similarly, the compatibility relations among the sources imply constraints on the s ij á ñ l . We will refer to any 9-tuple of expectation values, s s s , , , á ñ á ñ ¼ á ñ l l l ( ), satisfying these constraints as a (retrodictive) noncontextual ontic assignment to the sources. These also form a polytope, which we term the noncontextual source-assignment polytope. The vertices of a polytope, i.e. the extremal noncontextual ontic assignments, can be deduced from that polytopeʼs defining constraints using standard convex hull algorithms [25][26][27].
Every ontic state λ specifies some noncontextual assignment to measurements, but not every noncontextual assignment corresponds to a vertex of the noncontextual measurement-assignment polytope. Nevertheless, those non-vertex noncontextual assignments to measurements can be simulated by a distribution over ontic states that do correspond to vertices: suppose κ is a variable that runs over the vertices of the noncontextual measurement-assignment polytope. Then, for any λ in the polytope, there exists a distribution p k l ( | ) such that It is useful to introduce a simplified notation for the nine operational correlations in which we are interested, namely, The set of 9-dimensional vectors ,..., 11 33 w w ( ) that can arise in an operational theory that admits of a noncontextual ontological model will be termed the noncontextual correlation polytope. Recalling equation (57), and representing the ij w as a 3×3 array, it is defined in terms of the noncontextual measurement-assignment polytope and the noncontextual source-assignment polytope as follows: where • denotes the entry-wise product of the arrays (also known as the Hadamard or Schur product). Substituting equations (58) and (59) å w w w w w w w w w k k = á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ´¢ Therefore, the noncontextual correlation polytope is the convex hull of the correlations one obtains for all possible pairings , k k¢ ( ) of a vertex κ from the noncontextual measurement-assignment polytope and a vertex 12 In particular, the noncontextuality inequalities we derive here are not isomorphic to the Bell inequality derived in [38] (and experimentally tested in [39]) even though the latter is inspired by a consideration of the Peres-Mermin construction (one such construction on each wing of the Bell experiment). This is because the inequality of [38] is based on the assumption of local causality alone. The analogue, for our prepare-and-measure scenario, of this inequality would be a constraint that follows from the assumption of preparation noncontextuality for the outcome-marginalized sources alone, equation (55).
k¢ from the noncontextual source-assignment polytope, that is, the convex hull of the 9-tuples m s m s , , 11 11 33 33 á ñ á ñ ¼ á ñ á ñ k k k k ¢ ¢ ( ) , as one varies over , k k¢ ( ). Not every pairing of , k k¢ ( ) corresponds to a unique vertex of the noncontextual correlation polytope: the fact that we consider correlations for only 9 source-measurement pairings and not the full set of 81 such pairings, and the fact that we do not consider any of the marginal expectations, implies that (i) more than one choice of , k k¢ ( ) can yield the same 9-tuple of correlations, and (ii) one choice of , k k¢ ( ) can yield a 9-tuple of correlations that lies in the convex hull of the 9-tuples associated to several other choices of , k k¢ ( ). It is therefore convenient to re-express equation (62)  å w w w w w w w w w g = á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ g g g g g g g g g g g g g where instead of ranging over all pairings of , k k¢ ( ) we restrict γ to range over the vertices of the noncontextual correlation polytope without loss of generality.
We ultimately seek to derive noncontextuality inequalities, that is, the nontrivial facet inequalities of the noncontextual correlation polytope. We begin by characterizing the noncontextual measurement-assignment polytope and the noncontextual source-assignment polytope. We will see that the nature of the compatibility relations among the measurements/sources determines their respective facet inequalities. From these, we infer the two set of vertices (measurements and sources) using standard convex hull algorithms [25][26][27][28]. Subsequently, by considering every possible pairing between those two sets of vertices, we determine the set of vertices of the noncontextual correlation polytope. Finally, using standard convex hull algorithms again, we obtain all of the facet inequalities of the noncontextual correlation polytope. The nontrivial facet inequalities define the set of noncontextuality inequalities for our problem. In the following sections, we proceed through these various steps explicitly.

Facets of the noncontextual measurement-assignment and noncontextual source-assignment polytopes
We will begin with the measurements. The compatibility relations holding among the measurements in a given row or column must also hold for the response functions representing these in the ontological model. (This is a constraint on any ontological model, rather than one arising from the assumption of measurement noncontextuality. See section 5 for further discussion.) Consider the response functions associated to the three equivalence classes of measurements in the first row of figure 1. We denote the set of response functions associated to each of these by m   (13) and (41), we infer that

Vertices of the noncontextual measurement-assignment and noncontextual source-assignment polytopes
In this section, we describe the conversion from the facet representation of the noncontextual measurementassignment polytope, defined by the facet inequalities of equations (71a) and (71b), to its vertex represenation. We use standard numerical algorithms to do so [25][26][27][28], the details of which are provided in appendix A. In addition to providing a description of this set of vertices, it is our aim here to provide some intuitions about their form.
To begin with, note that all of the points within the noncontextual measurement-assignment polytope are indeterministic assignments-in the sense of violating equation (20)-for one or more of the measurements. To see that there are no noncontextual ontic assignments that are deterministic for all of the measurements, it suffices to note that for deterministic assignments, the constraints (71a) and (71b) simplify to where m 11 l ⌊ ⌋ denotes a deterministic assignment by ontic state λ, and that these are equivalent to the constraints specified in equations (30a)-(30f), which, as noted in section 3.1 admit no solution.
To get a feeling for how indeterministic noncontextual ontic assignments to the measurements escape contradiction, it is useful to see a concrete example (one that is a vertex of the noncontextual measurementassignment polytope). We denote it by 1 k . We begin by describing it in terms of probabilistic assignments to the 4-outcome measurements associated to each row and column, rather than in terms of the expectation values for each the nine equivalence classes of binary-outcome measurements, because the correlations that hold between the different binary-outcome measurements are more transparent in this form.    It is easy to verify that the two ways of defining the value of the response function for ij  at 1 k (via simulation by M R i or via simulation by M C j ) yield the same result, so that this is indeed a noncontextual assignment satisfying the compatibility relations. In terms of expectation values, this assignment corresponds to á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ Note that it makes four of the nine measurements outcome-indeterministic. á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ where six of the nine measurements are outcome-indeterministic. By considering the set of all deterministic processings of the measurements that preserve the compatibility relations holding among these, defined in appendix B, one can determine the symmetries of the noncontextual measurement-assignment polytope. Specifically, each such deterministic processing induces a bijective mapping of the set of vertices to itself. The full symmetry group is specified in appendix B. It is straightforward to verify that it can be generated by the following three deterministic processings: Modified transpose Note that the number of measurements that are assigned outcomes deterministically is preserved by these symmetry operations. Consequently, our two examples above, equations (78) and (79), are in different symmetry classes. In fact, we find that there are only these two symmetry classes.
The symmetry class wherein six of the nine measurements are indeterministic contains 48 vertices. As 3×3 matrices, they correspond to those with elements in 1, 0, 1 -+ { } having the property that every row and every column contains precisely one non-zero element. The symmetry class wherein four of the nine measurements are indeterministic contains 72 vertices, and corresponds to those 3×3 matrices with elements in 1, 0, 1 -+ { } having a single row of nonzero elements and a single column of nonzero elements such that the overall parity of the row is +1, and the overall parity of the column is η, where 1 h =if it is the third column and 1 h = + otherwise.

Vertices of the noncontextual correlation polytope
To determine the vertices of the noncontextual correlation polytope from the vertices of the noncontextual measurement-assignment polytope and those of the noncontextual source-assignment polytope, we preserve only those pairings which lead to extremal 9-tuples, as noted above equation (64). Specifically, for each of the 120 14 400 2 = pairings , k k¢ ( ), one computes the 9-tuple m s m s , , 11 11 33 33 á ñ á ñ ¼ á ñ á ñ ) . By eliminating duplicate and non-extremal points from this set, we obtain the vertices of the noncontextual correlation polytope.
A concrete example of a vertex of the noncontextual correlation polytope is obtained by pairing the vertex 1 k of the noncontextual measurement-assignment polytope, described in equation á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ á ñ Note that this vertex can also be constructed by pairing 3 k with a corresponding 3 k ¢ , where 3 k is defined as 1 k per equation (78) but with the first two rows permuted, i.e. such that the −1 appears in the second row instead of the first. The distinct pairings , 1 1 k k ¢ ( ) and , 3 3 k k ¢ ( ) therefore yield duplicate noncontextual correlation points under entry-wise product.
By considering the deterministic processings of the measurements and sources which preserve the operational Peres-Mermin scenario-that is, the processings which preserve the compatibility relations among the measurements, the compatibility relations among the sources, and the manner in which the sources and the measurements are paired-we can infer the symmetries of the noncontextual correlation polytope, as discussed in appendix B. Specifically, every such symmetry bijectively maps the set of vertices of this polytope to itself. The full symmetry group is described in appendix B, and it is straightforward to verify that it can be generated by the following three processings (which we also describe as operations on the 3 × 3 matrix): Modified transpose á ñ á ñ « á ñ á ñ á ñ á ñ « á ñ á ñ á ñ á ñ « á ñ á ñ á ñ á ñ « á ñ á ñ á ñ á ñ « á ñ á ñ á ñ á ñ « á ñ á ñ á ñ á ñ « -á ñ á ñ á ñ á ñ « á ñ á ñ á ñ á ñ « -á ñ á ñ á ñ á ñ « á ñ á ñ á ñ á ñ « -á ñ á ñ g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g g Note that the modified transpose operation appearing in equation (86) is distinct from the modified transpose appearing in equation (80).

Facets of the noncontextual correlation polytope: noncontextuality inequalities
To determine the facet inequalities for the noncontextual correlation polytope from its vertices, we proceed (again) by solving the convex hull problem [25][26][27][28], discussed more fully in appendix A.
Facet inequalities of the noncontextual correlation polytope have the form where ij ij a { } and β are integers. Arranging the ij a and the ij w into 3×3 matrices and denoting the entry-wise matrix product by • and the sum of the elements of a matrix A by A su( ), we can express this as su . 85 11  12  13  21  22  23  31  32  33   11  12  13  21  22  23  31  32  33  a a a a a a a a a We will refer to the matrix of ij a ʼs as the coefficient matrix for the inequality. We find that there are 184 inequalities, all of which are expressed using coefficient matrices where i j , : } . Note that the deterministic processings of the experiment that bijectively map the set of vertices of the noncontextual correlation polytope to itself also bijectively map the set of facet inequalities of the noncontextual correlation polytope to itself, and vice versa. Consequently, the symmetry group of the set of facet inequalities is the same as the symmetry group of the set of vertices. The value of β in a facet inequality is invariant under the symmetry group, so that only the matrix of α coefficients transforms nontrivially. Given that the facet inequalities can be expressed in the form of equation (85), any map on the ω matrix can be transferred onto the matrix of α coefficients. Consequently, the action of the symmetry group on the coefficient matrix is precisely parallel to that described in equation (83), namely, the group generated by and and . 86 11  12  21  22  31  32   Columns 1 2   21  31  22  32  23  33   Rows 2 3   11  11  12  21  13  31  23  32  33  33   Modified transpose   a  a  a  a  a  a   a  a  a  a  a  a   a  a  a  a  a  a  a  a  a a Thus, if the α coefficient matrix of one inequality is related to that of another inequality by one of the symmetry operations we have identified, then these two inequalities are in the same symmetry class. An efficient description of all of the facet inequalities is achieved by describing representatives of each of the symmetry classes, and closing under the action of the symmetries. We find that there are just three symmetry classes of facet inequalities, conveniently distinguished by their values of β.
The first symmetry class is trivial in the sense that the facet inequalities therein hold for all correlations that are logically possible in the operational Peres-Mermin scenario, and consequently they are not sensitive to whether or not the correlations admit of a noncontextual model.
Trivial Class: a representative of this class is  w w w w w w w w w Closing under the symmetries, one finds that there are 24 such inequalities, corresponding to coefficient matrices where only one row or column has all nonzero elements, and such that the overall parity of these is −1.
We justify the claim that these inequalities are trivial in appendix C. The term noncontextuality inequality is reserved for those facet inequalities of the noncontextual correlation polytope that are nontrivial. There are two symmetry classes of these.
Nontrivial Class I: a representative of this class is  w w w w w w w w w Closing under the symmetries, one finds that there are 144 such inequalities, all of which can be constructed as follows: choose a special position in the matrix of coefficients, say element ij a , and make it +1 or −1. Let all other elements in the same row or column of the coefficients matrix be zero. Finally, choose any assignment of ±1 for the remaining four elements such that the overall parity of the five nonzero elements is +1. The example inequality of equation (88) is one of the eight inequalities that follow by starting with 1 32 a =as the special element.
An example of an inequality from this class that is maximally violated by the noiseless quantum realization of the operational Peres-Mermin scenario (described in section 4.  w w w w w w w w w Closing under the symmetries, one finds that there are 16 inequalities in this class, all of which have only nonzero elements in the coefficient matrix. The 16 inequalities are precisely those whose coefficient matrices have +1 overall parity for every row and every column. An example of an inequality from this class that is maximally violated by the noiseless quantum realization of the operational Peres-Mermin scenario (described in section 4.1.1) is  w w w w w w w w w as we will demonstrate in the next section.

Quantum violation of the inequalities
We have already seen in section 3.2 that one can identify sharp quantum sources and sharp quantum measurements that satisfy the Peres-Mermin compatibility structure and whose statistics are inconsistent with a universally noncontextual model. These quantum sources and measurements must therefore violate our inequalities. Indeed, for these quantum sources and measurements, it follows from equation (35) that i j , : 1 ij w " = , that is, every one of the nine source-measurement pairs exhibits perfect correlation. This implies that the left-hand side of the noncontextuality inequality of equation (89) evaluates to 5 for this quantum realization, thereby exceeding the noncontextual bound of 3. It also implies that the left-hand side of the noncontextuality inequality of equation (89) evaluates to 9, which exceeds the noncontextual bound of 5.

Robustness of the inequalities to noise
We now demonstrate explicitly how the noncontextuality inequalities we have derived are robust to noise by showing that the noisy quantum realization of the operational Peres-Mermin scenario, described in section 4.1.2, can still lead to a violation. Recall that this consisted of quantum sources and quantum measurements that were the image under a partial depolarization map of those appearing in the no-go theorem of section 3.2.
For these noisy sources and measurements, the value of the correlation for each of the nine sourcemeasurement pairs was computed, as a function of the weight r of the identity map in the partial depolarization, in equation (51). Translating into the ij w notation of equation (60), the result is i j r , : . 92 Substituting this expression into the noncontextuality inequality of equation (89), we obtain Consequently, as long as the level of noise is such that r 0.77460 3 5 >  , one has a violation of the noncontextuality inequality. (Lower values of r correspond to stronger noise, so this is an upper bound on the noise.) Similarly, from the noncontextuality inequality of equation (91), we obtain implying that we require r 0.735 36 5 9 >  to see a violation. We have shown that our noncontextuality inequalities still admit a violation even in the presence of significant depolarizing noise relative to the ideal quantum realization. As such, the fact that any attempt at an experimental implementation of the ideal quantum sources and measurements inevitably only realizes noisy versions of these is not an obstacle to demonstrating an experimental failure of noncontextuality.

How to implement an experimental test of these inequalities
The assumption of noncontextuality only has nontrivial consequences if one has experimentally verified that certain operational equivalence relations hold among the measurements and among the sources. This creates a problem for experimental tests of noncontextuality because the definition of operational equivalence for sources is in terms of equivalence of statistics for all measurements, and for measurements it is in terms of equivalence of statistics for all sources, and, strictly speaking, one can never experimentally implement all possible procedures of either type. The problem may be summarized as the physical impossibility of verifying any criterion that involves a universal quantifier. We here explain our view on what is the appropriate attitude to take towards this problem.
A tomographically-complete set of measurements is defined as a set of measurements whose statistics are sufficient to infer the statistics of any other measurement. A tomographically-complete set of sources is defined as a set of sources whose statistics are sufficient to infer the statistics of any other source. It follows that to judge operational equivalence relations among sources, it is sufficient to consider their statistics on a tomographicallycomplete set of measurements rather than on all measurements, and to judge operational equivalence relations among measurements, it is sufficient to consider their statistics on a tomographically-complete set of sources rather than on all sources. The problem therefore reduces to identifying tomographically complete sets for each.
It is well known that in quantum theory, the set of observables obtained by taking all products of Pauli operators corresponds to a tomographically complete set of measurements for a pair of qubits. Therefore, quantum theory dictates that one must supplement the products of Pauli operators appearing in the Peres-Mermin square with all the other nontrivial products of Pauli operators, that is, with }, in order to obtain a tomographically complete set. Consequently, by the lights of quantum theory, in order to test operational equivalence relations among the sources, it is necessary to do more measurements than appear in the Peres-Mermin square construction.
Furthermore, if one seeks to implement a direct experimental test of noncontextuality, then one does not want to presume the correctness of quantum theory. As such, it is inappropriate to presume that the fifteen binary-outcome measurements that one expects to be tomographically complete by the lights of quantum theory are in fact tomographically complete. Instead, one should accumulate experimental evidence for this hypothesis by implementing the greatest diversity of measurements on the system that one can and by verifying that the statistics for each such measurement can be inferred from the statistics of the hypothetical tomographically complete set.
Similar comments apply for the problem of identifying a tomographically complete set of sources for the purpose of evaluating operational equivalence relations among the measurements.
We refer the reader to [29] for more details on how to acquire experimental evidence for a given set of measurements (preparations) being tomographically complete.
Even given good evidence of tomographic completeness, one faces another problem with experimentally verifying operational equivalences, namely, that if one aims to implement a particular set of procedures (termed the target procedures), the unavoidable imprecision of experimental implementations implies that one inevitably fails to do so precisely, and oneʼs experiment instead realizes a set of procedures that deviate from the target procedures, termed the primary procedures. Given the failure of the primary procedures to coincide with the target procedures, operational equivalence relations that hold among the target procedures need not hold among the primary procedures. The primary procedures therefore do not generally realize the operational equivalence relations that are the starting point for derivations of noncontextuality inequalities. This has been termed the problem of 'no strict operational equivalences' in [13].
Before describing how to resolve it, we specify how it arises in the context of the Peres-Mermin construction.
There are eighteen target measurement procedures, corresponding to six compatible triples of binaryoutcome measurements, one triple for each row and column of the square. Recall that we denoted the 4-outcome measurement that simulates all of the binary-outcome measurements in the first row by M R 1 and its outcome by the pair of binary-outcome variables m m , Recall also that we defined M R 11 to be the binaryoutcome measurement procedure one obtains by implementing M R 1 and outputting the single binary variable m m R R 11 ,1 1 = (i.e., by marginalizing over m R ,2 1 ). Similarly, we defined M C 11 to be the binary-outcome measurement procedure that one obtains by implementing M C 1 and outputing the single binary variable m m C C 11 ,1 1 = (i.e., by marginalizing over m C ,2 1 ). M R 11 and M C 11 are two of the target procedures. Although they are distinct in the sense of involving different physical operations in the laboratory, they are requried to be operationally equivalent in order for the assumption of noncontextuality to have any nontrivial consequences. Similar comments hold for the pair of procedures associated to each of the other points of the Peres-Mermin square. Consequently, if instead of considering the hypergraph where the nodes are operational equivalence classes of measurement procedures and the hyperedges are compatibility relations as we did in figure 1, we consider the hypergraph where the nodes are individual measurements, and there are two types of hyperedges, one denoting compatibility relations and the other denoting operational equivalence relations, then we can represent the set of target measurement procedures by figure 3.
Note that all of the relationships of compatibility that hold among the measurements are guaranteed by the manner in which those measurements are implemented. Specifically, for every set of compatible measurements in the experiment, these are implemented by various coarse-grainings of a single measurement. As such, the compatibility relations are ensured by construction, and no further evidence must be accumulated to confirm their presence. Only the operational equivalence relations among these different coarse-grained measurements must be tested explicitly.
This distinguishes our approach from other approaches to experimental tests of noncontextuality [23] wherein the compatibility relations must be tested explicitly. In the latter approaches, the experiment seeks to implement a single measurement procedure M before or after another measurement M¢ which is drawn from a set of possibilities that are compatible with M but incompatible with one another. It is then critical to demonstrate that the procedures M and M ¢ that are actually realized in the experiment are indeed compatible.
Similar comments apply to sources. In a hypergraph where the nodes are individual sources, and there are two types of hyperedges, one denoting compatibility relations and the other denoting operational equivalence relations, we represent the set of target sources by figure 4. If every set of compatible sources is implemented by a coarse-graining of a single source, then the compatibility relations among the sources are achieved by construction and only the operational equivalence relations need to be tested.
We are now in a position to describe the problem of no strict operational equivalences in the case of Peres-Mermin. Given the unavoidable deviation of the actually-realized procedures (the primary procedures) from the target procedures, one expects that for each ij, the actually-implemented versions of M ij R and of M ij C will not be strictly operationally equivalent, nor will the actually-implemented versions of S ij R and of S ij C . The resolution to this problem was provided in [13] and proceeds as follows. When oneʼs experiment realizes a finite set of primary procedures, it simultaneously provides a characterization of an infinite set of procedures, namely, those that can be obtained by classical post-processing of the set of primary procedures, for instance, any procedure in the convex hull of these. One can then choose a set of secondary procedures from among this infinite set under the constraint of exactly satisfying the desired operational equivalence relations.
Specifically, we select secondary versions of the 4-outcome measurements and 4-outcome sources, under the constraint that the binary-outcome measurements and sources that they define precisely satisfy the operational equivalence relations depicted in the hypergraphs of figures 3 and 4 respectively.  Once this is done, one uses the secondary versions of the nine binary-outcome measurements and nine binary-outcome sources to compute the correlations ij ij w { } . Because the operational equivalences hold for these secondary measurements and sources, the assumption of noncontextuality implies a constraint on their ontological representation, and consequently the operational correlations exhibited by these secondary measurements and sources are expected to satisfy the noncontextuality inequalities if the experiment admits of a noncontextual model. A violation of these inequalities by the secondary procedures, therefore, witnesses a failure of noncontextuality.

Conclusions
In this paper, we have derived a set of noncontextuality inequalities based on the Peres-Mermin square proof of the impossibility of a KS-noncontextuality in quantum theory. These inequalities are robust to noise and consequently they can be tested directly by experiment. If they are found to be violated by experiment, then not only quantum theory, but any physical theory that can do justice to the experimental data must fail to admit of a noncontextual model. The procedure we have outlined for deriving such inequalities is quite general and can be applied to other state-independent proofs of the Kochen-Specker theorem, particularly those that are expressed algebraically.
In appendix D, we contrast our approach with previous attempts at deriving operationally-meaningful noncontextuality inequalities based on the Peres-Mermin square and we criticize the latter.
The technique we have described here can be applied to deriving inequalities for correlations in the full set of 81 source-measurement pairings, or even to deriving inequalities on these together with the nine marginal expectation values for the measurements alone and the nine marginal expectation values for the sources alone. What prevented us from doing so here was the computational infeasibility of solving the associated convex hull problems, using the best algorithms for this problem that we could identify and given the computational resources we devoted to the task. It is conceivable, however, that by leveraging our knowledge of the symmetries of the polytopes involved in the problem, one might render it computationally feasible using the same algorithms and computational resources. We are also hopeful that the graph-theoretic techniques for analysing contextuality scenarios, described in [30][31][32], might suggest better algorithms for deriving these inequalities.
It is worth noting that the technique for deriving noncontextuality inequalities we have introduced here, insofar as it reduces to a convex hull problem, is an instance of the problem of quantifier elimination. Recent work in quantum foundations has seen increasing use of quantifier elimination algorithms. Fourier-Motzkin elimination, which is appropriate for problems wherein the dependence on the variables to be eliminated is linear, has been used to derive Bell inequalities [33], and also recently, to derive Bell-like inequalities for novel causal scenarios [34][35][36]. In [36], where the problem is reduced to what is known as the classical marginals problem-that of determining whether a given set of distributions on various subsets of a set of variables can arise as the marginals of a single joint distribution over all of the variables-and this problem can be solved by performing quantifier elimination on the probabilities in the joint distributions using convex hull algorithms. Nonlinear quantifier elimination using computational algebraic geometry has also found application in deriving Bell-like inequalities in simple scenarios [37]. We anticipate that these more general techniques for quantifier elimination will ultimately also find applications to the derivation of noncontextuality inequalities. The set of all permutations of the Peres-Mermin square that map the third column to itself is: Here, R p runs over all permutations of the rows, and C p is either identity or the swap of the first two columns.
Of those permutations that do not take the third column to itself, it is useful to distinguish those that take it to another column, denoted B P , and those that take it to a row, denoted C P . These are defined as follows:    Recall that our noncontextuality inequalities will ultimately only refer to the following nine products of a measurement outcome variable with its corresponding source outcome variable: However, it is straightforward to verify that these three sets are equivalent, In this section, a prior proposal for an experimental test of noncontextuality based on the Peres-Mermin proof of the Kochen-Specker theorem, that of Cabello [42], will be reviewed and criticized. Our criticisms here parallel those provided in appendix C of [14] for a similar proposal that was based on a different proof of the Kochen-Specker theorem.
We describe the proposal of [42] using the notation introduced in this article. There are nine operational quantities appearing in the inequality derived therein, corresponding to the expectation value, relative to an arbitrary preparation P, of the product of the outcomes of each of the six triples of compatible measurements in Furthermore, the compatibility relations imply that for any preparation P, whatever triple of compatible measurements one implements, the outcomes always satisfy equation (B.5), and therefore the expectation values satisfy these relations as well: In other words, just as the observables   Ä and   -Ä in quantum theory are trivial insofar as they take the same value for all states, the triple-product of outcomes m m m m m m , , 11 12 13 13 23 33 ¼ are trivial operational quantities insofar as they take the same value for all preparations. Substituting the identities in equation (D.6) into (D.1), one obtains R P 6. D.7 = ( ) ( ) Therefore, for any set of measurements satisfying the compatibility relations of the operational Peres-Mermin scenario, one will necessarily find this equality to hold. In particular, we expect equation (D.7) to hold no matter how noisy the measurements are. To see this, it suffices to note that the equalities of equation (B.5) hold for any set of noisy measurements satisfying the compatibility relations, and this implies that equation (D.7) holds for such measurements as well. For instance, equations (B.5) and (D.7) hold for the noisy quantum realization of the operational Peres-Mermin scenario, described in section 4.1.2, for any amount of depolarization noise. Consequently, if measuring R(P) to have a value greater than 4 could constitute evidence for the failure of noncontextuality, then this evidence could be obtained even in the presence of arbitrarily large amounts of noise. This is an indictment of the proposal of [42] because a minimal constraint on any reasonable notion of noncontextuality (first articulated in [14]) is that it should not be possible to demonstrate its failure in a completely incoherent experiment.
To summarize then, our criticism is as follows. The inequality R P 4  ( ) should not be expected to hold for any experiment satisfying the compatibility structure of the operational Peres-Mermin scenario, while the equality R P 6 = ( ) (and hence the violation of R P 4  ( ) ) is expected to hold trivially in all such experiments. And this is the case regardless of whether the experiment admits of a noncontextual ontological model. As such, the operational quantity R(P) contains no information about whether or not a noncontextual model can describe the experiment.
The most significant point of contrast between the proposal of this article and that of [42] is that we assume the notion of universal noncontextuality proposed in [8], rather than the notion of KS-noncontextuality. Because universal noncontextuality, unlike KS-noncontextuality, does not assume outcome determinism, we are led to consider indeterministic noncontextual assignments to the measurements. As we saw above, the fact that there are strictly no deterministic noncontextual assignments respecting the compatibility relations of the operational Peres-Mermin scenario is what makes it futile to attempt to derive constraints on operational statistics from the assumption of such assignments. The assumption is logically ruled out, so an operational test is neither necessary nor conceivable. On the other hand, there are many indeterministic noncontextual assignments that respect the compatibility relations, such as the example provided in equation (78). These, therefore, do impose nontrivial constraints on operational statistics, constraints that are encoded in the noncontextuality inequalities that we have derived.