Twenty years of quantum contextuality at USTC

Quantum contextuality is one of the most perplexing and peculiar features of quantum mechanics. Concisely, it refers to the observation that the result of a single measurement in quantum mechanics depends on the set of joint measurements actually performed. The study of contextuality has a long history at University of Science and Technology of China (USTC). Here we review the theoretical and experimental advances in this direction achieved at USTC over the last 20 years. We start by introducing the renowned simplest proof of state-independent contextuality. We then present several experimental tests of quantum versus noncontextual theories with photons. Finally, we discuss the investigation on the role of contextuality in general quantum information science and its application in quantum computation.


I. INTRODUCTION
In its less than one hundred years of history, quantum mechanics has greatly changed the human society. The marriage of quantum mechanics and information theory have give birth to the novel interdisciplinary research field of quantum information science. Thanks to the ongoing "second quantum revolution" which * jsxu@ustc.edu.cn † cfli@ustc.edu.cn has further improved our ability to operate single quantum entities, quantum technology can find its rule in multiple aspects of contemporary science. Especially, quantum computation holds the promise of enormous advancements in human's computational power [1], and the state-of-the-art quantum computers have exhibited decisive speed-up [2,3,4] on specific tasks comparing with classical computers. An intriguing observation about quantum computation is that while some behaviors of quantum circuit are particularly hard for supercomputers to reproduce [5,6], some features are classically simulable [7,8] and thus do not provide a quantum speedup. The separation here can be traced back to a counter-intuitive phenomenon in the quantum foundation: contextuality (throughout this paper, we shall omit the qualifier "quantum" before "contextuality" for brevity). Contextuality has become a central concept in modern quantum information science: Not only does it engender a lot of quantum paradoxes [9,10,11,12,13,14,15], it also serves a resource for many quantum information processing tasks. In particular, research in recent years has unraveled the connection between contextuality and universal quantum computation [16,17,18]. In this setting, the study on contextuality helps both the comprehension of the quantum foundations and the further development of future quantum information technology.
Comparing with its broad applications and clear significance, the concept of contextuality itself is rather abstract and comes with heavy mathematical background. Historically, the discovery of contextuality is inspired by the debate of the completeness of quantum arXiv:2205.15538v2 [quant-ph] 12 Jun 2022 theory [19] and the difficulty of reformulating it with hidden-variable theories.
Kochen and Specker [20] first established that such a difficulty lies in that a hidden-variable description of quantum measurements must be context-sensitive, so it is not possible to reconcile quantum theory with noncontextual hiddenvariable models. The result is now known to the quantum community as the Kochen-Specker theorem. Among many theoretical topics in the study of contextuality, a renowned question is to simplify the proof of the Kochen-Specker theorem, so it utilizes less measurements, becomes more robust against noise, and does not rely on the selected quantum state to manifest contextuality. To date, this theoretical investigation has seen fruitful outcome: Yu and Oh [21] have proposed an elegant stateindependent proof of contextuality, which makes use of provably the fewest measurements [22].
The other complementary approach to the study of contextuality is to design and implement experiments to directly test the conflict between quantum and classical theories. However, to this purpose, it is not enough to measure the overall probability distribution of a quantum system under several measurement basesconsecutive measurements are almost always necessary to track down the evolution of a single quantum system, posing high requirements on contextuality experiments. The work by Huang et al. [23] is among the earliest experimental tests of contextuality. Since then, researchers have striven to carry out experimental works on various forms of contextuality and investigate its implications in the broader area of quantum information science. Among these works, many chose the linear optics platform [24] as the experimental system due to its capability of making high-precision quantum state preparation, transformation and measurement, its long coherence time and its rich intrinsic degrees of freedom facilitating complicated forms of quantum operationsall indispensable resource for the experimental study of contextuality.
Over the past twenty years, USTC have overseen the rapid development of the study of contextuality. The central role of contextuality in quantum information science, and the plentiful theoretical and experimental results achieved here are the dual motivations of this review. Because contextuality is a broad topic with a myriad of results and perspectives, it is impossible for us authors to present all the exciting results here. For the enthusiastic readers, we point to [25] for a more comprehensive review of contextuality, and [26] for advances in contextuality tests. The remainder of this paper is arranged as follows.
In Sec. II, we review the elegant proof of contextuality by Yu and Oh [21], and accompany this result with the recently developed, systematic method of discovering new proofs of contextuality from a graph-theoretical approach [27] and give an example.
In Sec. III, we present several representative experimental tests of contextuality by USTC groups on the linear optics platform, demonstrating the suitability of the photonic architecture as a testbed of quantum foundations. In Sec. IV, we discuss the role of contextuality in quantum foundation, quantum information science and quantum computation based on further experimental works. Finally, in Sec. V, we give a brief summary of the results and envisage the potential development in this vibrant research field.

II. THEORY: THE KOCHEN-SPECKER THEOREM AND ITS PROOFS
We begin by introducing the Kochen-Specker theorem. It reveals the impossibility of describing a quantum measurement in a Hilbert space dimension d 3 using a noncontextual hidden-variable model. Here, the term "noncontextual" indicates that the model relies only on the physical state, possibly plus a hidden variable that can be absorbed in the state, and the projector defining the projective measurement itself, instead of the entire set of projectors that forms an orthogonal measurement, that we denote as the "context". We note that here only contextuality with rank-1 projectors are considered; the recently developed generalized contextuality based on positive operator-valued measurements [28] are not discussed in this review.
It is beneficial to express the Kochen-Specker theorem using the terminologies in quantum measurements. An orthogonal measurement is composed of a series of orthogonal projectorsΠ = {Π 1 ,Π 2 , . . . ,Π d }, witĥ Π iΠj =Π i δ ij and d k=1Π k = I d , where d is the dimension of the Hilbert space spanned by these projectors. When an orthogonal measurement is cast on a quantum state ρ, it evolves the state to the nondegenerate eigenstate of a random projectorΠ k ∈Π according to the Lüders' rule: with a probability specified by the Born's rule: We see the randomness in measurement is an intrinsic feature of quantum theory. The measurement outcome can still be indeterministic even for a pure quantum state provided that ρΠ has nonunit rank.
On the other hand, in a noncontextual hidden-variable theory the randomness of a quantum measurement can be attributed to the ignorance of the ontic state λ, that we call the hidden variable. Within this framework, the density matrix is a function of the ontic state: ρ = ρ(λ),   [30]. The normalization of the rays are omitted and1 denotes −1. For example, if the bases are chosen as {|i1 , |i2 , |i3 , |i4 }, then the ray (0, 0, 1,1) corresponds to a state vector (|i3 − |i4 )/ √ 2. These rays as directions of projective measurements can reveal the conflict between quantum and noncontextual theories: if the results of the measurements are predetermined before the measurements actually take place, the outcomes "0" and "1" corresponding to a single run of experiment can be assigned to each rays, and the assignment throughout the table should be consistent [29]. Note that every ray appears twice in the table, so in total an even number of rays must be assigned "1"; however, every column of the table forms an orthonormal basis, so the number of rays assigned "1" must be 9, contradiction. and the outcome of a projective measurementΠ k can be instead specified by a binary response function: In order to recover the orthogonality between quantum measurements, the supports of the response function for orthogonal projectors must be disjoint: Also, the completeness of quantum measurement requires that if a response function is consistent with quantum predictions, it must satisfy: It is clear from the definition that the response function does not rely on the entire measurement context, and the hidden-variable description can recover the marginal distribution of every projective measurement. How does the difference between the quantum and hidden-variable theories manifest as an experimentally testable object, instead of staying at a metaphysical level? Many discussions have been devoted to address this question.
According to the definition of the response function, it is context-insensitive and its value for a specific projector must be consistent for different choices of orthogonal measurements. By Fine's theorem [29], the response function can then be extended to noncommutative projectors [Π i ,Π j ] = 0. Therefore, in a noncontextual hidden-variable model the definition of response function is global: every projective measurement, commuting or not, can be assigned to a definite outcome prior to experiment. The difference between the quantum and hidden-variable theories can then be revealed by showing that such a global response function cannot always preserve the completeness and orthogonality of quantum measurement.
We quote the delicate construction by Cabello et al. [30] to illustrate the impossibility of such a definite value assignment in a real experiment. Consider the rays |r in Table I which defines the set of projectorsΠ r = |r r|. We have the following observations: 1. The four rays in each of the nine columns spans an orthonormal basis.
2. Each ray appears twice in the entire table.
The first observation indicates that, in every run of experiment, one and only one of the four response functions corresponding to the four rays in the same column returns 1. Therefore, across the entire Table I, nine rays will be assigned 1. However, the second observation indicates that, if the response function is consistent and context-insensitive, the number of rays assigned to 1 must be even. The contradiction between the two observations shows the impossibility of defining a global response function for the measurement outcomes of the projectors in Table I, and demonstrated the incompatibility between the quantum and hiddenvariable theories' description on measurements.

A. Towards the ultimate simplicity
The simplification of the Kochen-Specker theorem's proofs appropriately reflects the development of the study on contextuality. When Kochen and Specker [20] found the first proof of contextuality, the number of rays used in the proof was 117, making its comprehension extremely hard at that time. After almost thirty years' search, the number was finally able to be reduced to 18 [30], and it can be shown that the number of rays required cannot be further reduced, if the proof is based on the impossibility of defining the response function [31]. Note however if we wish to demonstrate contextuality in three-dimensional Hilbert space, the number of required rays will be no less than 22 [32], and the well-known construction by Peres [33] utilizes 33 rays. Fig. 1 The Yu-Oh 13-ray appearing in the state-independent proof of contextuality by Yu and Oh [21]. Left: the geometric representation of the rays in a unit cube. The rays are defined as: . Right: the orthogonality relationship among the set of rays. Each vertex represents to a ray; when two vertices are linked by an edge, the corresponding rays are orthogonal. Figure taken from Reference [21].
Is it possible to derive further simplified proofs of contextuality? In 2012, Yu and Oh [21] published a construction of state-independent contextuality using a set of only 13 rays and works for a three-dimensional quantum system. Subsequently, the construction was found to be optimal [22] in the sense that the number of rays in a state-independent proof of contextuality cannot be further reduced. The set of rays, now renowned as the Yu-Oh 13-ray, are chosen from the vertices, face centers and edge centers of a cube.
The definition of the rays is shown in Fig. 1, together with a graph depicting the orthogonal relationships between these rays, where the vertices stand for the rays and the edges signify the orthogonal relationships between the rays corresponding to connected vertices. Using the graph representation, the restriction on the response functions, Eq. (4) and Eq. (5) can be instead interpreted as a coloring rule on the vertices: (1) at most one in a pair of connected vertices is colored (assigned "1"); and (2) one and only one vertex in a d-clique is colored. If the graph representation can be properly colored, then a global response function will exist. For the representation of the Yu-Oh 13-ray, one can easily check out that such a coloring scheme exist, so the methodology in previous proofs of contextuality-by demonstrating the impossibility of defining the response function-does not apply here.
However, here lays the essence of Yu and Oh's result: a proof of contextuality can be derived even a global response function exists. This new framework of finding state-independent contextuality goes beyond the previous paradigm of searching for a Kochen-Specker set of rays; here, we shall paraphrase their reasoning to show how this is accomplished. Firstly, observe that if we apply the color rule described in the last paragraph, only one of the four vertices h k , k ∈ {0, 1, 2, 3} can be colored. This can be explicate by the following two reductio ad absurdum arguments: 1. Suppose h 1 and h 2 are both colored, then y ± 1 and y ± 2 must be uncolored. By completeness, z 1 and z 2 must both be colored, but they are connected by edges and thus cannot be both colored, contradiction.
2. Suppose h 0 and h 1 are both colored, then y ± 2 and y ± 3 must be uncolored. By completeness, z 2 and z 3 must both be colored, but they are also connected by edges so, again, contradiction.
By the C 3 -symmetry of the graph, the arguments apply on any choices of vertices h k . Consequently, the response function of the four projective measurements must have no common support, and the total probability of finding a quantum state on the four projectors cannot be more than 1. Secondly, consider the projectors corresponding to the rays h k in Fig. 1 and take the sum of these projectors yields: that is, the total probability of finding an arbitrary quantum state on the four projectors is 4/3 > 1, in stark contrast to the predictions of the noncontextual hiddenvariable theory. This completes the proof of contextuality with the Yu-Oh 13-ray. Besides establishing the paradigm of state-independent contextuality beyond Kochen-Specker set, the Yu-Oh 13-ray also have other merits.
Firstly, as already mentioned above, the proof only involving 13 rays sets the final record of minimal number of rays required to observe contextuality. Secondly, the proof works for a three-dimensional indivisible system, the smallest system showing contextuality, and thus have strong universality. Thirdly, the relation of orthogonality between the 13 rays actually implies a state-independent noncontextuality inequality (that will be discussed in Sec. III) even without assuming Eq. (5), the requirement of completeness. The merit is the noncontextuality inequality is theory-independent and does not rely on any assumptions in quantum mechanics, in a similar vein with the inequalities in Reference [34]. Fourthly, the derived inequality only involves marginal probabilities and twopoint correlations between the projectors in Fig. 1 and greatly facilitates its experimental test; in comparison, previously studied state-independent noncontextuality inequalities [34] always require measurements of no less than three-point correlations. Finally, the methodology of analyzing the response function with the graph representation would soon exhibit its power and be developed into a more general framework, namely, the graph-theoretical approach to contextuality, which we shall discuss subsequently.

B. The graph-theoretical approach to contextuality
The graph-theoretical approach to contextuality, developed by Cabello et al. [27] provides a generic method to construct noncontextual hidden-variable inequalities using the orthogonal relationships between rays corresponding to projective measurements. More specifically, it gives the describes how to compute the strongest correlations allowed by noncontextual hiddenvariable and quantum theory with a given set of projection measurements.
We start by the formal definition of the graph of exclusivity that is a central concept in this approach; the orthogonality graph in Fig. 1 is already a graph of exclusivity.
Given a set of abstract measurementsΠ k , k ∈ {1, . . . , n}, the graph of exclusivity corresponding to the set of measurements is an undirected graph G = G(V, E), such that the vertex set V (G) of the graph and the set of measures Π has a one-to-one correspondence, |V (G)| = n, and the edges of the graph connect vertices corresponding to mutually exclusive abstract measurements: (i, j) ∈ E(G), ∀Π iΠj = 0. For quantum measurement, the abstract measurement operators are just the projectors: For noncontextual hidden-variable theory, although the form of measurement operators cannot be written explicitly at this time, the response functions corresponding to mutually exclusive measurement operators also satisfy a very simple relationship: suppose otherwise, then at least some λ makes both measurements respond to 1, contradicting the requirement of orthogonality. The graph-theoretical approach to contextuality links the maximally allowed quantum and noncontextual correlation to the constants of the graph of exclusivity. For the noncontextual correlations, it is: with α(·) being the independence number of a graph, defined as the cardinality of its largest subset of mutually disjoint vertices. For the quantum correlation, it is: Here, ϑ(·) is the Lovász number defined as: note that the rays φ k can be arbitrary chosen so long as the relations of exclusivity are satisfied. According to Lovász's sandwich theorem [35], it is not less than the independence number of the same graph: Therefore, the sum of probability allowed by a set of measurements with known orthogonality can be efficiently bounded with the graph-thoretic constants of the measurements' graph of exclusivity. In addition, to test contextuality with realistic quantum measurements and incorporate the imperfection of orthogonality, the following noncontextuality inequality [36] is shown to be tight: here, Pr(1|k) = Pr(Π k ) denotes the probability of the measurementΠ k returning 1, and Pr(1, 1|i, j) denotes the probability of the measurements on a pair of ideally exclusive measurementsΠ i andΠ j both returning 1. Several experimental works [37,38,39,40] have followed this approach. For example, Xiao et al. [38] utilized the graph of exclusivity corresponding to the Yu-Oh 13-rays to derive and test an optimal stateindependent noncontextuality inequality; Liu et al. [39] adopted an exclusivity graph originated from a threesetting Bell inequality to demonstrate a contextual correlation stronger than Bell nonlocality. Because requirements other than orthogonality of measurements are not required in the graph-theoretical approach to contextuality, it can be used to discover previously unknown noncontextuality inequalities by devising exclusivity graphs with large quantum-classical ratio ϑ/α; the quantum states and measurements that maximally violate the inequalities can in turn be efficiently searched with semidefinite programming.
Here, we explain this method in more detail by reviewing the work by Xiao et al. [40]. The authors considered the Platonic graphs, the skeletons of the five Platonic solids, as the graph of exclusivity. Through an exhaustive search, it was found that the skeleton of the dodecahedron and the icosahedron (see Fig. 2(a)) induce meaningful noncontextuality inequality, i.e., ϑ/α > 1. Especially, the icosahedron graph G I in Fig. 2(b) has a independence number of α(G I ) = 3 and a Lovász number of ϑ(G I ) = 3( √ 5 − 1), resulting in a pronounced quantum-classical ratio of ϑ/α = √ 5 − 1 ≈ 1.236 larger than that of the pentagram (ϑ/α = √ 5/2 ≈ 1.118) which was previously widely exploited to test contextuality [42,43,44,45]. Using Eq. (13), the noncontextuality inequality here can be explicitly expressed as: the inequality can be violated up to I Q 3( √ 5 − 1) using quantum measurement. Through semidefinite programming, it was found the set of state and measurement rays saturating the quantum maximum can be embedded in a 4-dimensional Hilbert space. The large quantum-classical separation and the relatively low requirement on system dimension makes the icosahedron inequality an excellent candidate for experimental test.
The noncontextuality inequality associated with the icosahedron graph G I has another merit that the inequality is pseudo state-independent: although a set of no more than 12 rays cannot comprise a stateindependent proof of contextuality, this icosahedron inequality can be violated by all but the maximally mixed quantum state, provided a set of projective measurements are appropriately chosen according to the input quantum state. The pseudo state-independence stems from the spectra of the projectors that are chosen to saturate the quantum bound: the eigenvalues of the sum of the projectors are {3( √ 5−1), 5− √ 5} with the first one being the Lovász number ϑ(G I ) and the second one threefold degenerate. For the maximally mixed state I 4 the inequality evaluates to I = [3( √ 5−1)+3(5− √ 5)]/4 = 3; for any other state we can choose the projectors so the eigenvector of the sum of the projectors will be aligned with the dominant eigenvector of the quantum state. By doing so, the inequality will be violated by any quantum state other than the maximally mixed state. Furthermore, if we choose the linear entropy to quantify the mixedness of the state, defined as = 4(1 − tr(ρ 2 ))/3 for ququart states, then it can be shown that the quantum value of the icosahedron inequality is upper Therefore, the icosahedron inequality as a noncontextuality inequality can also be considered a proxy to estimate the purity of a quantum state.

III. EXPERIMENT: PHOTONIC TESTS OF CONTEXTUALITY
In this section, we proceed to review the recent progress of contextuality tests on the photonic platform. As we shall elucidate below, the experimental tests of contextuality also develops into two complementary approaches. The first category of experiments simplifies the requirements of contextuality tests by introducing and justifying some additional assumptions. At the price of decreased stringency, this approach facilitates the tests of a vast family of noncontextuality inequalities. In contrast, the second category of experiments aims to strictly follow the requirements from the theoretical models, and closes the experimental loopholes for some celebrated forms of contextuality.
To further discuss the two paradigms, it is best to start from the seminal experimental work [23] which, despite being qualitative, caught the essence of contextuality. By this example, even the readers not initially familiar with contextuality can quickly establish the basic concepts of contextuality experiments. The work was based on two simple observation about the maximally entangled qubit state, |Φ + = (|00 + |11 )/ √ 2 in quantum theory [46]: firstly, it is one of the Bell states, hence it is the common eigenstate of the following Pauli-product operators: Therefore, the following assertions hold: Here, we have used the shorthand notation · ψ := ψ| · |ψ to denote the expectation value of an operator for a specific quantum state. Secondly, as σ 1 Equivalently, the measurement results for σ 1 x ⊗ σ 2 z and σ 1 x ⊗σ 2 z should be different-one being +1 while the other being −1. However, the quantum predictions Eq. (15) and Eq. (16) already exclude a noncontextual hiddenvariable description. Indeed, the response functions for the observables in Eq. (15) must satisfy v(σ 1 x ⊗ σ 2 x ) = 1 and v(σ 1 z ⊗ σ 2 z ) = 1 (here we omit the choice of the ontic state λ = Φ + for the response function.). Since each of the observables are defined over two qubits, it is also physically plausible to split the bipartite response functions into which of indivisible operators: By doing this and again recombine the operators, we arrive at v(σ 1 x ⊗ σ 2 z should be the same-simultaneously +1 or −1. Therefore, a noncontextual hidden-variable theory will give opposite prediction as Eq. (16) when the constraints in Eq. (15) held.
In a pioneering work, Huang et al. [23] reported a direct experimental test of the above arguments. The experimental setup is shown in Fig. 3. The two-qubit state was encoded on the polarization and path states of a single photon: through a polarizing beam splitter (PBS0), the photons with vertical polarization state, |V , were reflected towards PBS2; we denote this path state as |R . The horizontally polarized |H photons, on the other hand, still propagated towards PBS1; we label this path state as |L . Furthermore, the observables were defined as σ 1 z = |L L| − |R R| , σ 2 z = |H H| − |V V |, and the σ x operators accordingly. By adjusting the photon's initial polarization state with the half-wave plate HWP0, maximally entangled path-polarization state, (|HL + |V R )/ √ 2, was created. For the determination of σ 1 z ⊗ σ 2 z and σ 1 x ⊗ σ 2 x , the angle of the half-wave plates HWP1 and HWP2 were set as 0 • . Due to the high extinction ratio of the polarizing beam splitters, σ 1 z ⊗ σ 2 z = +1 was reasonably assumed. Subsequently, the measurement of σ 1 x was implemented with a Mach-Zehnder interferometer between PBS0 and a balanced beam splitter BS1; the photons going toward PBS3 (PBS4) had σ 1 x = +1(−1). Finally, at each output port of the interferometer, a half-wave plate set at 22.5 • assisted the polarizing beam splitter to realize the measurement of σ 2 x . For the determination of σ 1 z ⊗ σ 2 x and σ 1 x ⊗σ 2 z , the angle of HWP1 (HWP2) was changed to 22.5 • (−67.5 • ) to introduce a Haramard operation on the polarization state to guarantee σ 1 z ⊗ σ 2 x = +1. The PBS1 and PBS2 then implemented the measurement of σ 2 z . At the balanced beam splitters, the path information of the photon, σ 1 x , was transferred to the polarization degree of freedom, and was further read out again with the halfwave plates set at 22.5 • and the polarizing beam splitters.
Using the setup described above, the terms in Eq. (15) and Eq. (16) can be extracted according to a simple rule: in the ideal experimental setting, quantum theory predicts all photons to come to detectors labeled with odd numbers, while a noncontextual hidden-variable theory deems all photons to come to detectors labeled with even numbers. Therefore, the statistics of detector clicks directly tests the contextuality of quantum theory. Experimentally, it was found that 81% of photons ended at odd-numbered ports; therefore, the result provided clear evidence for the contextual nature of quantum mechanics. From a modern perspective, we also notice that that the experiment suffered from several loopholes: the deduction of the first group of correlations required knowledge from quantum theory; the measurements of the same observable in different contexts utilized different apparatuses [47]; most importantly, no testable noncontextuality inequality can be exploited to check the result and statistically refute the noncontextual models. These drawbacks will be solved in future works which are introduced in the following sections.

A. Violation of noncontextuality inequality
Having presented the above minimal example, let us now demonstrate another work by some of the same authors finished 10 years later, which can be considered a "standard" contextuality experiment. Here, the authors demonstrated a stringent violation of a noncontextuality inequality. As such, standard analysis of experimental errors applied and the evidence of contextuality became qualitative. Huang et al. [48] have followed the proposal in Reference [49] to test a stateindependent noncontextuality inequality based on the Yu-Oh 13-ray. The noncontextuality inequality reads: where the definition of the observables are A k = I 3 − 2 |a k a k |, with {|a } = {|h , |y ± , |z } being the sequenced assemblage of the Yu-Oh 13-ray. The coefficients Γ i,j are the elements in the adjacency matrix of the Yu-Oh 13-ray's graph of exclusivity G YO as shown in Fig. 1: Using quantum mechanics, it can be calculated that for any quantum state |ψ , YO ψ = 29/3 > 9, so the noncontextuality inequality (17) is state-independently violated by any quantum state. The noncontextuality inequality (17) only involves marginal probabilities and two-point correlations. Therefore, an experimental test will need to extract these probabilities and correlations. However, when a photonic quantum is "measured" in the common sense, the single photon detection process will destroy the photon and prohibit the registration of two-point correlation. To address this issue, Huang et al. [48] registered the measurement result of the first observable on the spatial modes of single photons, so the second measurement can be implemented using the conventional photon counting technique, making the measurement of two-point correlations possible. The experimental setup, as shown in Fig. 4, is based on the beam displacer architecture [50]. A beam displacer is a birefringent calcite crystal with a suitably cut optical axis. When passing through the beam displacer, the vertical and horizontal polarizations of photons are separated by a fixed distance, causing the path and polarization states of the photons to become entangled.
The setup for measurement of A i A j comprised four main stages. Firstly, a beam displacer and two half-wave plates were employed to prepare arbitrary qutrit state. The state was encoded on the hybrid polarization-path degrees of freedom of photons, so the three computational basis were defined by |0 ↔ |H |L , |1 ↔ |V |L , and |2 ↔ |V |R , with |H and |V denoting the horizontal and vertical polarizations of the photon, and |L and |R identifying the upper and lower paths of the photon, respectively. Secondly, a group of halfwave plates, followed by a beam displacer, implemented a basis rotation causing the −1-eigenstate of the operator A i to be shifted to the lower path and had a vertical polarization state. An additional beam displacer and two reflective mirrors then detached this mode from the main setup into an auxiliary setup. From now on, the evolutions in the main and auxiliary setups were made identical. Another group of half-wave plates followed by a of beam displacer then reverted the basis rotation and restored the computational basis. Thirdly, a basis rotation again shifted the −1-eigenstate of A j onto the computational basis. Finally, polarizing beam splitters separated the three computational basis into photodetectors, and coincidence counting was used to record the event probabilities, from which the expectation values, A i A j , can be recovered as: In this way, all the necessary statistics for testing the state-independent contextuality can be observed. The experimental results were close to quantum mechanics' prediction and clearly demonstrated contextuality: even the maximally mixed state violated the noncontextuality inequality (17) by over 44 standard deviations.

B. Contextuality as prepare-and-measure experiments
We see from the above example that photonic tests of contextuality relies heavily on interferometry. Arguably, the most significant obstacle for a contextuality test falls on the requirement of implementing successive measurements: if we want to test a noncontextuality inequality with n-point correlations, the final stage of interferometer will need to be repeated 2 n−1 times. For example, to test the famous Peres-Mermin square argument of contextuality [51], an experiment will need to record three-point correlations, which means the final stage of interferometer will need to be repeated 2 3−1 = 4 times [52]. The exponential overhead of interferometry complexity poses a severe limitation on the realizations of contextuality test with even slightly complicated forms containing multi-point correlations.
The above problem can be partially remedied by virtue of the graph-theoretical approach to contextuality, which demonstrate that noncontextuality inequality can already be composed using up to two-point correlations (see Fig. 5).
However, as the dimensionality of the system increases, even the architecture of twostage interferometer in tandem will become undesirably cumbersome and introduce significant experimental imperfections. Therefore, it is worth deriving a protocol to test contextuality using only marginal probabilities instead of even two-point correlations. Because with one-stage interferometer we can only measure marginal probabilities, and the marginals in quantum theory are governed by the Born's rule which is noncontextual, this objective must be realized with some additional assumptions, probably already from quantum mechanics.
Cabello [36] have proposed a method to test any form of graph-theoretical noncontextuality inequalities, with only marginal probabilities and with the assistance of the Lüders' rule of quantum measurement. With this method, the sequential measurements in a contextuality experiment is replaced by a destructive measurement and a subsequent repreparation of a suitable quantum state; the reprepared state is calculated from the measurement outcome of the destructive measurement according to the Lüders' rule. More specifically, the two-point correlation term in Eq. (13) shall be replaced by the product of two marginal probabilities: the subscript |i indicates the corresponding probability should be measured against the nondegenerate eigenstate of the first projector to conform with the Lüder's rule. Experimentally, the measurement-repreparation procedure in the dashed box of Fig. 5 can either be realized with active feed-forward via electro-optic modulation or split into two different times, so the first and the second probability terms can be tested individually and even with the same setup.
With the repreparation procedure, the complicated contextuality experiments can be reduced to the rather basic prepare-and-measure experiments.
Here, we By adopting the graph-theoretical approach to contextuality, the required number of sequential measurements can be reduced to one.
Bottom: by assuming the Lüders' rule, the sequential measurement can be substituted by a destructive measurement and a repreparation procedure, thus completely lifting the requirement of sequential measurements from contextuality experiments at the price of some conceptual disadvantages. Figure taken from Reference [36]. exhibit another contextuality experiment by Xiao et al. [38] based on the Yu-Oh 13-rays. The inequalities tested in this experiments and Reference [48] were similar.
However, The experimental setup shown in Fig. 6(a)-(c) was discernibly simpler than that in the other experiment: it only contained two stages, corresponding to the preparation and measurement procedures.
The two-point correlations required in Eq. (13) were calculated with Eq. (19) and with two different preparation and measurement procedures in the same setup, and the experimental results are given in Fig. 6(d). On a more technical level, this experiment had two further differences from Reference [48], that the photonic qutrit is entirely encoded on the path degree of freedom using one more beam displacer, and that a genuine single photon source by exciting an intrinsic defect in a silicon carbide sample [53] were employed in the place of the heralded single photon source to eliminate the multi-photon events during the parametric process which necessitate additional compensation [43,54,55].
The photonic contextuality experiments reviewed above are inevitably based on the path degrees of freedom; next we show its role in the prepare-andmeasure-based contextuality tests is not indispensable. Liu et al. [39] realized an experiment to compare the strengths of nonlocal and contextual quantum correlations, where the contextuality test were based on an orbital angular momentum interferometry. The orbital angular momentum of photons spans an infinite-dimensional Hilbert space [56], its on-demand manipulation can be achieved with a phase-only spatial light modulator [57] and its detection is feasible with the help of single-mode fibers [58]. The prepare-andmeasure setup based on orbital angular momentum have decent scalability [59,60,61]. Here, the authors used this degree of freedom to encode a ququart and compared its violation of a graph-theoretic contextuality inequality with the violation of a Bell inequality by a two-qubit system; the two inequalities share the same graph of exclusivity. A gap of ∆ ≈ 0.3 was observed between the violation of Eq. (13) by a ququart system and a two-qubit system, confirming a quantum contextual correlation beyond nonlocality. Comparing with the architecture of beam displacer array, the platform based on structured light could avoid the scaling overhead for manipulating high-dimensional system; to this purpose, its accuracy of operation and detection must be further improved, and techniques like weak measurement-based wavefront sensing [62,63] may find their applications here.
Two potential loopholes come with the simplification of contextuality experiments into prepare-and-measure experiments. Firstly, the Lüders' rule in quantum mechanics is assumed to obviate the sequential measurements. By doing so, cares must be taken to justify this additional assumption, and the experimenter is obliged to demonstrate the measurement is ideal and follows the prediction of the quantum mechanics. Secondly, the marginal probabilities themselves in a contextuality experiment must be noncontextual. With the procedure indicated above, the reprepared state may deviate from the eigenstates of the first measurement, so the experimenter is required to explicitly test the compatibility of the two measurements. This can be accomplished by showing the marginal probability of the second measurement is not affected by the choice of the first measurement. More clearly, the signaling factors, can be defined over the edges of the graph of exclusivity G, where the underline indicates the outcome of the first measurement is irrelevant, but the measurement itself (and its associated repreparation procedure) should nevertheless be implemented. Then, an experiment with reliable compatibility relationships should show overall vanishing signaling factors: ε ij ≈ 0, ∀ (i, j) ∈ E(G). In Reference [39], the authors reported an average signaling factor of |ε| = (0.22 ± 1.44) × 10 −2 ; the details are given in Fig. 6(e) .This level of signaling factor reflected close to ideal compatibilities between successive measurements and thus justified the assumptions in the simplified contextuality experiment.  and have different advantages. If combining different degrees of freedom in a same experiment, they can encode more complex quantum system in which more exotic features can be observed. The idea of combining multiple degrees of freedom has become a central theme in the development of photonic quantum information processing and have broad applications. The power comes from the ability of encoding more quantum information on a single photon [64,65], and entangling different photonic degrees of freedom [66,67].
Within the topic of contextuality, measurements corresponding to the rays in a Kochen-Specker set can give rise to "all-versus-nothing" contextuality paradoxes [68] (also known as quantum pseudo-telepathy [69], strong contextuality [70], and perfect Hardytype paradox [71] in different theoretical frameworks) which we shall subsequently discuss.
Interestingly, such paradoxes were first identified in the scenario of multi-qubit Bell nonlocality and formulated with the language of Pauli observables rather than projective measurements. Experiments involving multiple degrees of freedom can effectively manipulate more qubits using the same number of photons, and thus are more suitable for observing such paradoxes.
Here, we demonstrate a concrete example [72] with a four-qubit hyperentangled state |ξ = |Ψ − (12) ⊗|Ψ − (34) , with |Ψ − = (|01 − |10 )/ √ 2 being the singlet state and the superscripts labelling the four qubits. Because |Ψ − is the common eigenstate of the Kronecker products of two identical Pauli matrices, we have: Next, in reminiscence of the entanglement swapping [73] procedure, we have: Note that the operator product in the parentheses should be considered a single physical observable. Furthermore, the hyperentangled state is also an eigenstate of the following operator: Now, all it takes to establish the all-versus-nothing contextuality is to show that a global response function cannot be defined for all these operators: by replacing the observables in Eq. (21) through Eq. (23) by their corresponding response functions, we have: However, evaluating the product of all these response functions yields: Therefore, if we fix the measurement results of the observables in Eq. (21) and Eq. (22), then the quantum and noncontextual hidden-variable theories will give opposite predictions on the measurement outcome of the final observable in Eq. (23).
The main theoretical contribution of the above construction [72] lays at that only two observers will be needed to demonstrate the paradox: pairing the qubits (1,3) and (2,4) together makes all observables local. Still, the biggest technical challenge remaining for observing such a paradox is the requirement of fourqubit hyperentanglement. It was originally suggested that two pairs of photons carrying polarization singlet states generated from spontaneous parametric downconversion process [74] should be distributed between the two observers; however, this will require a Bell state measurement by one observer to herald the detection of another observer, and the multi-photon events will introduce systematic error even in the limit of vanishing pumping power. Fortunately, Chen et al. [75] soon pointed out that one pair of photons will already suffice to encode two singlet states. The trick is to utilize the path degree of freedom to encode an additional singlet state by creating two possible output paths for the parametric photons via two identical down-conversion processes.
We follow the experimental work by Yang et al. [76] to expound the idea of dual path-polarization encoding for demonstrating the all-versus-nothing contextuality. The experimental setup is shown in Fig. 8. A βbarium borate (nonlinear crystal) was pumped by an ultraviolet beam, where a spontaneous parametric down-conversion process can take place to generate a pair of infrared photons with entangled polarization state |Ψ − pol. = (|HV − |V H )/ √ 2.
The pump beam was then reflected by a mirror to pass through the nonlinear crystal again, enabling a second downconversion process. Subsequently, the generated photon pairs were distributed between two observers, causing the path states of the two photons to become entangled: In this manner, the fourqubit hyperentangeled state had been entangled on the two photons, where the polarization and path states of the two photons were taken as qubits (1,2) and (3,4), and the map between computational basis and physical states read |H pol. ↔ |0 ↔ |L path , |V pol. ↔ |1 ↔ |R path .
Once the map between the optical qubits and the mathematical model had been established, the observation of all-versus-nothing contextuality boiled down to certifying the constraints in Eq. (21), (22) and testing the product of the expectations in Eq. (23). The authors of Reference [76] realized these measurements with path interferometers and polarization analysis systems. These apparatuses are illustrated in Fig. 8. Concretely, the "Apparatus a" can measure the path observable σ path z plus an arbitrary polarization observable; here it was chosen from {σ pol.
x , σ pol. z }. The "Apparatus b" can measure the path observable σ x,path also plus an arbitrary polarization observable. The "Apparatus c" can cast a joint path-polarization measurement: the polarizing beam splitter entangled the two degrees of freedom. If without the halfwave plates before the polarizing beam splitter, a photodetection would indicate the photon comes from one of the Bell states, which are the common eigenstates of σ pol.
x ⊗ σ path x and σ pol. z ⊗ σ path z [77]. Furthermore, by adding the half-wave plates before the polarizing beam splitter, the observables σ pol. z ⊗ σ path x and σ pol.
x ⊗ σ path z can also be simultaneously measured. In this way, all the probabilities constituting the observables in Eq. (21) through Eq. (23) can be registered. The experimental results for testing Eq. (23) are given in Fig. 8 together with predictions from noncontextual hidden-variable theory and quantum theory. Clearly, the results were in agreement with quantum theory and demonstrated a sharp contradiction versus the axiom of noncontextuality, and no inequality was required to manifest the contradiction.

D. Loophole-free tests of contextuality
Having introduced the several contextuality tests above, we are now in a position to consider to what extent these contextuality tests serves to prove that the Nature is contextual. If a contextuality test comes with significant loopholes, the observed statistic refuting noncontextuality may actually be due to the loophole, thus the argument of contextuality will be hampered. In the light of the above argument, developing a loopholefree test of contextuality provides particularly more insights on the pertinent topic. For the photonic tests of contextuality, the imperfect single photon detection efficiency will cause some photons passing through the setup not being registered. In the most adverse scenario, all these unregistered events decrease the violation of noncontextuality inequality, so the observed phenomena could be instead resulted by the biased detection and a underlying physical law that is noncontextual [78]. This constitutes the so-called detection loophole. To close the detection loophole in a contextuality test, either medias of quantum information other than photons has to be chosen [79], or high-efficiency superconducting nanowire single photon detectors must be employed [80,81]. If the loophole is left open, the experimenter will be obliged to accept the assumption of "fair sampling" indicating that the detector is plausible and does not postselect over the photons to alter the statistics that should be observed.
The other less contrived loophole in contextuality experiments originates from the imperfections of compatibility between ideally orthogonal measurements. The theory of contextuality hinges on the definition of measurement compatibility, and it was argued without perfect compatibility and infinite measurement precision, all experimental evidences supporting contextuality will be nullified [82,83]. The loophole can be fixed with two methods: either a generalized definition of noncontextuality that takes into account the imperfection of compatibility can be adopted [84,85,86], or noncontextuality inequalities can be derived without using any sequential measurements on a single quantum system [87]. For the latter method, measurement on pairs of distant quantum systems facilitates the derivation of the compatibility-loophole-free noncontextuality inequality, since measurements happening in spacelikeseparated regions are perfectly compatible; the no disturbance between these measurements are guaranteed by Einstein's special relativity.
Hu et al. [88] realized an optical test of such compatibility-loophole-free contextuality with a pair of entangled qutrits.
The noncontextuality inequality in this experiment, based solely upon conditional probabilities of distant measurements, reads [87]: where the observables are defined as: with The inequality will be violated by two maximally entangled qutrits: choosing the quantum state as (|00 − |11 + |22 )/ √ 33 causes B = 1/9 > 0. Interestingly, the measurement setting in Eq. (26) by one of the observers is fixed, so inequality (26) cannot be considered as a test of nonlocality, although its form is in reminiscence of the probabilistic forms of Bell inequalities. Instead, it must be interpreted as a test of contextuality with distant quantum systems.
Experimentally, to observe the violation of Eq. (26), the most challenging task is the preparation of highquality qutrit entanglement. Here, the authors realized a spontaneous parametric down-conversion array to attack this problem.
Stemming from this work, the spontaneous parametric down-conversion array architecture has become the recent paradigm of highdimensional entanglement generation [89,90]. The experimental setup is depicted in Fig. 9. First, using a beam displacer array, the pumping beam were evenly distributed into three paths. Then, a β-barium borate was pumped simultaneously by the three pumping beams, so a pair of parametric photons can be emitted from either of the incidental points and distributed between two observers.
The propagating angle of the three possible paths of the emitted photons were identical; utilizing this parallelity, measurements of the photonic path were again implemented with beam displacer arrays by both of the receivers. Because the different wavelength of the pumping and parametric photons, the lengths of the beam displacers for preparation and measurement differed slightly. The authors reported an experimental value of B = 0.095 ± 0.003, violating inequality (26) by 31 standard deviations to provide a strong loophole-free test of contextuality.

IV. ADVANCES AND APPLICATIONS
In this section, we switch our focus on the applications of contextuality. As already mentioned briefly in the introductory paragraphs, contextuality has been found to have broad application in the general quantum information science, including quantum cryptography [91,92], quantum communication [93,94], randomness expansion [95,96], self-testing [97] and dimension witnessing [98,99]. Here, we only discuss two advances in details here, that how contextuality is related to universal quantum computation, and how contextuality activates nonlocality so the two resource for quantum computation and quantum communication can be inter-converted.

A. Towards universal quantum computation
Many approaches to quantum computation has been proposed in pursue of computing power beyond the classical supercomputers.
However, the computing power of current noisy intermediate-scale quantum [100] circuit is severely limited by the omnipresent noise that causes the quality of computing to deteriorate. If the computation accuracy falls below some certain level, the quantum advantage over classical computers will vanish. Fortunately, the situation can be radically overturned if the noise of the quantum circuit can be suppressed below a critical level [101]. In this case, a properly designed error-correction code is suffice to asymptotically suppress any residual noise.
The noise in a quantum circuit may occur both in the state preparation stage or during the transformation induced by quantum gates; if the transformation process is made noiseless, the errors from state preparation will not propagate, and the quantum computation will become accurate. Although it is not practical to make all transformations noiseless, it is possible for only some subsets of transformations. For example, the braiding of non-abelian anyons effectively implements noiseless Clifford gates on the encoded quantum information [102]. With these ideal Cifford gates, only one ideal non-Clifford gate is required to achieve universal, fault-tolerant quantum computation [7]. As a proxy to obtain such an ideal non-Clifford gate, Bravyi and Kitaev [103] proposed a subroutine now known as magic state distillation. The subroutine is based on the observation that a non-Clifford gate can be emulated by a controlled Clifford gate plus an ancillary quantum system, starting from a "magic state" away from the eigenstates of all Clifford operators and subjecting to a postselection. Magic state distillation allows the preparation of asymptotically ideal magic states with noisy magic states and Clifford quantum gates. However, magic state distillation also shows a threshold behavior: only when the noisy magic states have enough fidelity with ideal state does the subroutine increase its fidelity. A question naturally arise: what intrinsic property of a quantum system decides its usage in magic state distillation?
Howard et al. [16] showed that the decisive property of a quantum state for quantum computation is but contextuality by proving that all quantum states useful in magic state distillation violate a noncontextuality inequality constructed with Clifford operators. For a single quantum system, violation of such a magic noncontextuality inequality is equivalent to manifestation of negativity in the discrete Wigner function [104]. Here, we explicitly give the form of the magic noncontextuality inequality for a qutrit system as an example. We start from the definion of the Weyl-Heisenberg displacement operators: Here, the operators τ and σ are the three-dimensional shift and clock matrices defined as: they are analogous to the Pauli matrices σ x and σ z in the two-dimensional case. These operators has a spectrum of {1, ω, ω 2 }, with ω = e 2πi/3 . We then denote the list of displacement operators D = {D 0,1 , D 1,0 , D 1,1 , D 1,2 }, whose eigenstates span a complete set of mutually unbiased bases, and a set of magic contextuality witnessing operators: in which Π rj j is the projector of the eigenstate ω rj of the j-th element in D, and the definition of the vector r reads: r = xa + zb with a = {1, 0, 1, 2}, b = −{0, 1, 1, 1} and {x, z} ∈ {0, 1, 2}. Using the above notations, the magic noncontextuality inequality can be stated as: The inequality (32) can be violated up to the inverse of golden ratio, ( √ 5−1)/2, by the magic states. It quantifies the efficacy of a quantum state for implementing ancillabased non-Clifford gates: with the perfect magic state maximally violating inequality (32), a noiseless non-Clifford gate can be executed and no distillation process is required; if the inequality is non-maximally violated, then some rounds of magic state distillation are in order for suppressing the noise of the non-Clifford gates below threshold [103]; if the inequality is not violated then the noise level of the distillation process becomes classically simulable, so the quantum advantage vanishes.

Braiding
Local noise The roles of contextuality in quantum computation based on non-abelian anyons appear in twofold. Firstly, the resource of magic measured by the violation of inequality (32) is invariant under Clifford gates [104]. Therefore, the usefulness of a specific quantum state for magic state distillation is unaffected by braiding operations.
By this observation, high-fidelity non-Clifford gates induced by magic state distillation can be executed at any point of a compiled quantum circuit. Secondly, as any local noise emerging during a topological computation is exponentially suppressed by the excitation gap [105], the resource of magic should also be protected by the system topology. By this observation, having access to arbitrary braiding operations and infinite supply of non-perfect magic states already enables fault-tolerant universal quantum computation. Nonetheless, currently the realization of anyons in physical systems is still in its infancy [106,107] and faces many technical challenges; furthermore, observation of their non-abelian statistics is still intractable.
Taking an alternative approach, Liu et al. [108] studied the non-abelian statistics of anyons and its application in quantum computation with a designated photonic quantum simulator [109,110]. The authors studied a one-dimensional chain of Z 3 -parafermions (a type of non-abelian anyons) by mapping the parafermionic chain to the state space of spin-1 bosons through the Fradkin-Kadanoff transformation [111].
With interaction parameters chosen as appropriate, a pair of parafermion-edge zero modes will emerge at the end of the chain, on which a topologically-protected qutrit immune to any local noise can be encoded.
To elucidate the roles of contextuality in topological quantum computation, Reference [108] directly tested the dynamics of magic contextuality under braiding evolution and local noise. By tuning the interacting Hamiltonian of the parafermionic chain, the parafermionedge zero modes can be driven through the chain to induce the braiding evolution and topologically-protected gate operations.
Experimentally, the modulation of system Hamiltonian H was realized by beam displacer arrays and polarizing beam splitters. These dissipative elements caused discrete imaginary-time evolution e −Ht , t → +∞ to project the encoded wavefunction on the ground state of H. The geometric phase inducing the braiding evolution was preserved despite the discreteness of the evolution [112]; this correspondence was originally exploited in Reference [113] to optically simulate the geometric phases induced by Majorana zero modes' braiding. On the other hand, the local noise was introduced by repopulating the optical wavefunction according to the form of the anyonic noise, translated again from the Fradkin-Kadanoff transformation, and subsequently dissipating the modes corresponding to the excited states. We note that the method of ground state generation here effectively implements a non-Hermitian HamiltonianH = −iH, so it can also find applications in the investigation of non-Hermitian physics in, e.g., the (anti-) parity-time symmetric systems [114,115,116,117]. Besides, it is still effective even for an unknown H given as a controlled oracle and t → ∞; in this setting, the process is otherwise known as algorithm cooling [118].
To demonstrate the resource of magic's invariance under Clifford operations, the authors implemented an analogous braiding of photonic modes with the discrete imaginary-time evolution gates to generate a phase gate, so one of the computational basis of the qutrit acquired an additional 2π/3 geometric phase. The effect of the analogous gate operation can be best seen in Fig. 10(a), where the sample states orbited the Bloch sphere by roughly 120 • after braiding. The quantum process tomography [119,120] of the braiding evolution, as shown in Fig. 10(b), also helped to confirm this effect, showing a process fidelify of 93.4% comparing to the theoretical value. Next, the left hand side values of the magic noncontextuality inequality were directly measured for the nine sample states before and after braiding. The effect of braiding on the contextuality observations can be most intuitively seen in Fig. 10(c), where some of the observation's expectations were permuted. As the final measure of magic contextuality is defined over the maximum of the contextuality observations and is not affected by the permutation among different observations, the resource of magic (see Fig. 10(d)) was almost invariant, apart from some experimental imperfections, before and after the braiding process.
Regarding the noise resilience of magic contextuality against local noise, the authors also exploited quantum process tomography to characterize the effect of a local hopping noise in a parafermion system. After casting the noise-induced incoherent error with some certain probabilities, the system was projected back into the ground state subspace. The process matrix as shown in Fig. 10(e) was almost an identity matrix. Next, the magic contextuality value of a quantum state in the proximity of a magic state was measured before and after the disturbance-dissipation process. As the quantum state was almost not affected by the local noise apart from some probability amplitude damping, so was its degree of contextuality; this is true even in the limit of large error probability: as can be seen in Fig. 10(f), a value of M = 0.580 ± 0.013 was observed even at error probability p = 99%. For comparison, the effect of noise on a non-topologically-protected trivial qutrit was also investigated; in this scenario, the dissipation process is not implemented, and the noise quickly destroyed the resource of magic contextuality.

B. Activation of nonlocality from contextuality
Contextuality and nonlocality in quantum mechanics have deep-rooted connections: both of them are indispensable resources for quantum information tasks; behaviors of nonlocality can be seen as special forms of contextuality [121] and every noncontextuality inequality can be converted to a Bell inequality [122]; and the ability of a single quantum system manifesting the most elementary forms of contextuality [42] and nonlocality [123] has a trade-off relationship [124,125].
The final observation puts forward a fundamental questionwhether single-particle contextuality and two-party nonlocality can coexist. The answer is trivially positive if we choose to observe state-independent contextuality on one of the two particles constituting high-dimensional entangled states. However, it becomes highly intriguing if the observed contextuality promotes a quantum correlation that initially cannot violate a Bell inequality to a nonlocal correlation. To this end, experimental proposals based on two pairs of hyperentangled qubits [126] and a pair of maximally entangled qutrits [49] have been put forward.
Liu et al. [127] and Hu et al. [128] realized the proposals in Reference [49,126] with a pathpolarization hyperentangled and a high-dimensional entangled photonic state. This duo of experiments shared a similar conceptual basis; here we choose Reference [127] to exemplify the underlying idea. The objective was to test a local hidden-variable inequality whose violation signifies nonlocality: where the two quantities χ and S measure the degree of contextuality and the strength of bipartite correlations. Explicitly, χ is defined as: which is a sum of single particle sequential measurements' expectations. On the other hand, S is defined as: where the prime superscript indicates the operator acts on a different quantum system, and the subscripts specify the measurement contexts of the un-primed observables. The overall experimental schematic is shown in Fig.11(a). Crucially, it is not possible to compose a meaningful Bell inequality with only quantities in S, since all the correlation terms have a plus sign and the local bound 12 already saturates its algebraic maximum.
The situation becomes different only when the effect of contextuality is taken into consideration: Eq. (33) is a genuine Bell inequality that can be violated to 18, again its algebraic maximum, using a pair of singlet states |ξ = |Ψ − pol. ⊗ |Ψ − path . To test the inequality (33) experimentally, a photonic hyperentanglement source is employed in the first place; the similar technique was already discussed in Sec. III. In order to achieve the maximal quantum value, the definition of the observables should be chosen as those appearing in the Peres-Mermin square [51]: the choice of the primed obsevables should be identical to that of the un-primed obserables. An experimenter will need to implement sequence measurements to extract the six correlations appeared in ω. The measurement apparatuses are shown in Fig. 11(b), in short, each of these apparatuses moved the ±1-eigenstate of the interested observable to the upper and lower path, respectively.
By cascading these apparatuses, the expectation values of the correlations appeared in χ can finally be determined. In the setting of contextuality experiment, the cascading technique was first demonstrated by Amselem et al. [52] and subsequently adopted in various works [129,130], and in References [55,131] in a projective manner. Here, the authors imported this technique into the beam displacer array architecture where the phases between different optical paths are essentially free of drift; thanks to the self-stabilized interferometer, a very high value of contextuality witness χ = 5.817 ± 0.011 was reported. On the other hand, the nonlocal correlations in S = 11.430 ± 0.016 were determined by joint path-polarization measurements of the other two qubits at the remote site, and comparison between the outcomes of the corresponding observables. Combining these two results, a value of ω = 17.247 ± 0.019 was obtained, rejecting the prediction by local hidden-variable models with strong confidence. These experimental results are depicted in Fig. 11(c). Besides clarifying how to produce nonlocality from local contextuality, this work also advanced the tests of Peres-Mermin square-type contextuality, in the sense that the observed phenomena subject to neither an equivalent classical description [132] nor a decrease of contextuality visibility caused by imperfect photonnumber-resolving detection [133].

V. DISCUSSION AND OUTLOOK
We have reviewed some theoretical studies and experimental tests of contextuality, as well as some recent results demonstrating its applications, that have taken place in USTC in the last 20 years. We hope such a review is timely and relevant, since it has become clear only recently that contextuality is an indispensable resource for quantum computation [16] and the effect of contextuality is experimentally testable [34]. Therefore, although the record of the simplest proofs of contextuality has been sealed [21,22,30,31], many theoretical topics regarding contextuality are still left unaddressed. Here, we point out two of such questions. Firstly, the examples of contextual correlations that scale linearly with the dimension of quantum system are sparse [134,135]. This observation is in contrast to both the theoretical limit [136] and the situation in the study of nonlocality, where violations of Bell inequality that scale exponentially with the qubit number have been long recognized [137,138,139]. Although these Bell inequalities themselves can be trivially interpreted as noncontextuality inequalities, it is more intriguing to search for other noncontextuality inequalities that may have still larger quantum-classical separation. Secondly, within the framework of all-versus-nothing contextuality, currently known examples uses at least four argument clauses for demonstrating state-dependent contextuality and five for state-independent contextuality; it is thus worth exploring if this number could be further reduced. The discovery of such stronger forms of contextuality may have implications in further accelerating quantum computation.
The future directions of contextuality from the experimental side are also diverse. From a fundamental perspective, contextuality as a general phenomenon may go beyond the framework of quantum mechanics and hidden-variable models. For example, References [140] and [141] have shown a faithful classical causal model satisfying non-disturbance between conditional probabilities always results in noncontextual correlations. Experiments in this direction may act as a proxy for detecting the compatibility of quantum mechanics and various general probabilistic theories. From the view of quantum information science, contextuality enables novel applications like self-testing of a single quantum system [97].
Investigation of such properties [142] promotes our ability to certify quantum apparatus with minimal assumptions. Regarding the aspects of quantum computation, quantum simulation of subroutines for quantum computation in different physical systems [143,144,145] may supply additional insights, and the realization of contextuality-empowered algorithms that works with noisy intermediate-scale quantum devices, like reported in Reference [146], is also highly relevant. Finally, novel experimental systems like solid-state color centers [147,148,149] may found their role in contextuality experiments. With an intrinsic quantum memory, a support of exotic operations, and the possibility of hosting macro-scale quantum entanglement, these systems may serve to investigate different forms of contextuality [28,150,151].
As the topic of contextuality is too broad for even a dedicated book, and it is impossible to contain all the relevant results achieved in USTC in a short review article, we are compelled to choose over different works and have striven to enlarge the scope of this review. The confluence of many exciting advances reflects both the significant role of contextuality in the quantum foundation, and the profound accumulation of research power that the university possesses. We believe the study of contextuality will boost the development of quantum technology and finally benefits the human society, and we hope that in the next twenty years USTC will proceed to spearhead in the study of contextuality and quantum information science.