From Information Geometry to Quantum Theory

In this paper, we show how information geometry, the natural geometry of discrete probability distributions, can be used to derive the quantum formalism. The derivation rests upon three elementary features of quantum phenomena, namely complementarity, measurement simulability, and global gauge invariance. When these features are appropriately formalized within an information geometric framework, and combined with a novel information-theoretic principle, the central features of the finite-dimensional quantum formalism can be reconstructed.

The unparalleled empirical success of quantum theory strongly suggests that it accurately captures fundamental aspects of the workings of the physical world. The clear articulation of these aspects is of inestimable value not only for the deeper understanding of quantum theory in itself [1], but for its modification (for example, to allow non-unitary continuous transformations [2][3][4]) and its further development, particularly for the development of a theory of quantum gravity (see [5], for example). However, such articulation has traditionally been hampered by the fact that the quantum formalism, in which these aspects are presumably encoded, consists of postulates expressed in an abstract mathematical language to which our physical intuition cannot directly relate. Over the last two decades, there has been growing interest in elucidating these aspects by expressing, in a less abstract mathematical language, what quantum theory might be telling us about how nature works, and trying to derive, or reconstruct, quantum theory on this basis [1,[6][7][8][9][10].
Much of the recent effort in reconstructing the quantum formalism is motivated by the hypothesis that the concept of information might be the key, hitherto missing, ingredient, that may enable a reconstruction, and several attempts have been made to systematically explore the reconstruction of the quantum formalism from an informational starting point (for example [7,[11][12][13][14][15][16][17][18]). Although these approaches have yielded significant insights, they are either incomplete (for example, [11,12,14]) or employ abstract assumptions that involve the assumption of the complex number field (for example, [16][17][18]). Such assumptions significantly limit the degree to which the physical content of the quantum formalism can be elucidated since one of the most mysterious mathematical features of the quantum formalism is being assumed at the outset. In this paper, we show that the principal mathematical features of quantum theory can be reconstructed using the concept of information without employing such assumptions.
Our approach develops intimate connections, known to exist for some time, between structures that arise natu- * Electronic address: pgoyal@perimeterinstitute.ca rally in classical probability theory on the one hand, and the quantum formalism for pure states on the other [19][20][21][22]. For example, Wootters [19] has shown in the framework of classical probability theory that one can quantify the degree to which two discrete probability distributions, p = (p 1 , . . . , p N ) and p = (p 1 , . . . , p N ), can be distinguished given the same number of samples from each by means of the statistical distance, d S (p, p ) = cos −1 i p i p i , between them. If one considers the statistical distance, d S (p, p ), between the probability distributions p and p which characterize the results of projective measurement A when performed upon two N -dimensional pure states u and v, respectively, and if one chooses A such that d S is maximized, Wootters shows that d S is equal to the Hilbert space distance, d H (u, v) = cos −1 |u † v|, between u and v [19]. The existence of such a connection is remarkable, and suggest that the usual formalism of quantum theory might owe at least some of its structure to the notion of distinguishability that arises naturally in a purely classical probabilistic setting. Following Wootters, we adopt an operational approach, and so take the probabilistic nature of measurements as a given. Accordingly the framework of classical probability theory is taken as a starting point. We equip this framework with a metric, ds 2 = 1 4 i dp 2 i /p i , the information metric (or Fisher-Rao metric), the infinitesimal form of the statistical distance, rather than the statistical distance itself, as this suffices for the purposes of the reconstruction. This metric determines the distance between infinitesimally close probability distributions p = (p 1 , . . . , p N ) and p = (p 1 , . . . , p N ). As we shall describe below, the information metric can be understood as a natural consequence of the introduction of the concept of information into the probabilistic framework. Accordingly, we shall refer to this framework as the information geometric framework [23].
Within this framework, we formalize three elementary features of quantum phenomena, namely complementarity, global gauge invariance, and measurement simulability, detailed below. These features can be understood as assertions about the physical world quite apart from the setting of the quantum formalism within which they are arXiv:0805.2770v4 [quant-ph] 14 Feb 2010 usually encountered [24], and are sufficiently simple to be taken as primitives in the building up of quantum theory. To these features, we add an information-theoretic principle, the principle of metric invariance. From these ingredients, we reconstruct the principal features of the finitedimensional quantum formalism, namely that pure states are represented by complex vectors, physical transformations are represented by unitary or antiunitary transformations, and the outcome probabilities (and the corresponding output states) of measurements are given by the Born rule. The present paper provides a streamlined derivation of the key parts of the finite-dimensional quantum formalism, focussing on the essential ideas. The reader is referred to Refs. [24,25] for a more detailed discussion of the underlying ideas and methodology, as well as a derivation of the remainder of the finite-dimensional quantum formalism.
We begin by giving a simple argument which shows how the information metric arises in a classical probabilistic setting from the concept of information. Suppose that Alice has two coins, A and B, characterized by the probability distributions p = (p 1 , p 2 ) and p = (p 1 , p 2 ), respectively. Suppose that she chooses coin A, tosses it n times, and then sends the data to Bob, without disclosing to him which coin she chose. If Bob knows p and p , how much information does the data provide him about which coin was tossed? Intuitively, the more information the data provides, the more sharply the distributions are distinguished.
Using Bayes' theorem and Stirling's approximation for the case where n is large, on the assumption that coins A and B are a priori equally likely to be chosen, one finds that where P A is the probability that the tossed coin is A given the data, and likewise for P B [29]. When the probability distributions are close, so that p = p+dp, the argument of the exponent can be expanded in the dp i to give where ds 2 = 1 4 i dp 2 i /p i is the information metric. Now, the information gained by Bob, ∆I, is the reduction in his uncertainty, and is therefore defined as with U being an entropy (uncertainty) function such as the Shannon entropy. But, since P A +P B = 1 and P A /P B is determined by ds, once U is selected, ∆I is determined by ds. For example, if U is chosen to be the Shannon entropy U (π 1 , π 2 ) = − i π i ln π i , one finds that This result immediately generalizes to the case where p and p are M -dimensional probability distributions (M ≥ 2). Hence, from an informational viewpoint, it is natural to endow the space of discrete probability distributions with the information metric. Parenthetically, we remark that Wootters' statistical distance, d S (p, p ) = cos −1 i p i p i , between the probability distributions p and p is the minimum distance between p and p with respect to the information metric [30]. We do not, however, make use of this result in what follows.
Measurement is idealized as a process that (i) when performed upon some physical system, yields one of N possible outcomes, with probabilities, p 1 , . . . , p N , that are determined by the state of the system immediately prior to the measurement, and (ii) is reproducible, so that, upon immediate repetition of the measurement, the same outcome is obtained with certainty.

Formalizing Complementarity.
We take the first feature, complementarity, to consist of the general idea that, when a measurement is performed upon a system in some state, the measurement outcome only yields information about half of the experimentally-accessible degrees of freedom of the state. In the above classical probabilistic model of measurement, we can express this idea in a very simple way as follows: Postulate 1. Complementarity. When measurement A is performed, one of 2N possible events occur, but they are not individually observed. Outcome i is observed (i = 1, . . . , N ) whenever either event 2i − 1 or event 2i is realized. The events 1, . . . , 2N are assumed to occur with probabilities P 1 , . . . , P 2N , respectively, so that where p i is the probability of outcome i.
The P q (q = 1, . . . , 2N ) can be summarized by the probability n-tuple P = (P 1 , . . . , P 2N ). As a result, of the 2N − 1 degrees of freedom of P, the measurement outcome only yields information about the p i , which constitute N − 1 degrees of freedom. We shall shortly impose an additional constraint (global gauge invariance) which implies that only 2(N − 1) of the 2N − 1 degrees of freedom of P are physically relevant. Hence, the measurement yields information about exactly one half of the experimentally-accessible degrees of freedom in P.
Intuitively, performing the measurement brings about the realization of one of 2N possible events but the observed outcomes coarse-grain over these events: when event 2i − 1 or 2i occurs, the measurement is (for some reason to be investigated) unable to resolve the individual events, so that only outcome i is registered. This is a novel hypothesis, which, at this point in the derivation, is recommended by its simplicity, and remains to be judged by its explanatory power (namely its capacity to support a derivation of the quantum formalism) [31].
2. Imposing the Information Metric.
Next, we endow the space of probability distributions P with the information metric, ds 2 = 1 4 q dP 2 q /P q , where q = 1, . . . , 2N . It is convenient to define Q q = P q , where Q q ∈ [0, 1], since the metric over the Q q is then simply the Euclidean metric, ds 2 = dQ 2 1 + · · · + dQ 2 2N , so that Q = (Q 1 , Q 2 , . . . , Q 2N ) T is a unit vector that lies on the positive orthant of the unit hypersphere S 2N −1 is a 2N -dimensional Euclidean space.

Representing Physical Transformations.
We now consider transformations of state space which represent physical transformations of the system. We postulate that transformations of the state space, assumed one-to-one, preserve the metric over state space -that is, the information distance, d(Q, Q ), between any pair of infinitesimally close states, Q, Q , where d(·) denotes distance with respect to the metric over state space, is preserved. The essential idea here is that the discriminability of any pair of nearby states is a quantity that is intrinsic to this pair of states, and is therefore should remain invariant under reversible and deterministic transformations of the system [32]. Now, if one takes the Q themselves as the state space of the system, one immediately finds that continuous oneto-one transformations of the state space that preserve the information metric are not possible. A simple way to allow the existence of such transformations is to take the entire unit hypersphere, S 2N −1 , as the state space of the system. That is, we take the state of the system as been given by a unit vector Q = (Q 1 , Q 2 , . . . , Q 2N ) T , with Q q ∈ [−1, 1], where the probabilities P q are given by P q = Q 2 q . From the information metric over the P, it follows from the relation P q = Q 2 q that the metric over the Q is Euclidean, We can summarize the above requirements as follows: Postulate 2. Metric Invariance. The state of the system is given by the unit vector Q = (Q 1 , Q 2 , . . . , Q 2N ) T , with Q q ∈ [−1, 1], where the probabilities P q are given by P q = Q 2 q . The metric over the Q is Euclidean, ds 2 = dQ 2 1 + dQ 2 2 + · · · + dQ 2 2N , which any transformation, M, of state space must preserve.
It follows from this postulate that Q lies on the unit hypersphere, S 2N −1 , in a 2N -dimensional real Euclidean space. From the requirement of metric preservation, it follows that M is an orthogonal transformation of S 2N −1 , so that every transformation can be expressed as Q = M Q, where M is a 2N -dimensional real orthogonal matrix.
The above extension of the state space from the positive orthant of S 2N −1 to the entire hypersphere is an assumption which, although formally rather natural, presently awaits a clear physical basis.

B. Global Gauge Invariance.
The second feature, global gauge invariance, consists of the idea that one can find a representation of the state of a system such that, if one displaces a subset of the degrees of freedom of the state by the same amount, any physical predictions based on the state are left invariant. To formalize this feature, we begin by making a change of variables by expressing the state, Q, in terms of the probabilities p 1 , p 2 , . . . , p N , and N additional real degrees of freedom, θ 1 , θ 2 , . . . , θ N , so that, without loss of generality, Only the θ i can be subject to displacement since a displacement involving any of the p i would be experimentally detectable. Accordingly, we formalize the idea of global gauge invariance by requiring that θ i = θ(χ i ), where θ(·) is an unknown, non-constant, differentiable function to be determined, and that the transformation χ i → χ i + χ 0 for i = 1, . . . , N brings about no predictive changes for any χ 0 ∈ R. From this global gauge condition, we immediately draw the following postulate: The map M is such that, for any state Q ∈ S 2N −1 , the probabilities, p 1 , p 2 , . . . , p N , of the outcomes of measurement A performed upon a system in state Q = M(Q) are unaffected if, in any representation, (p i ; χ i ), of the state Q, an arbitrary real constant, χ 0 , is added to each of the χ i .
Additionally, we draw the the requirement that the measure, µ(p i ; χ i ), over p 1 , . . . , p N , χ 1 , . . . , χ N induced by the metric over S 2N −1 is consistent with the global gauge condition. This requirement is necessary in order that probabilistic inference using the measure as a prior over state space is consistent with our physical knowledge of the system. This requirement yields the following postulate: Postulate 4. Measure Invariance. The measure µ(p i ; χ i ) induced by the metric over state space satisfies the condition µ(p 1 , . . . , p N ; χ 1 , . . . , χ N ) = µ(p 1 , . . . , p N ; χ 1 + χ 0 , . . . , χ N + χ 0 ) for any χ 0 .
From Eqs. (5), (6), and (7), The measure, µ(p i ; χ i ), over (p 1 , . . . , p N ; χ 1 , . . . , χ N ) induced by this metric is proportional to the square-root of the determinant of the metric, and marginalizes to give as the measure over χ i , where c is a constant. Now, from the Measure Invariance postulate, it follows by marginalization that the measure µ i (χ i ) satisfies the relation µ i (χ i + χ 0 ) = µ i (χ i ) for all χ 0 , and is therefore independent of χ i . Hence, from Eq. (9), θ(χ) = aχ + b, where a, b are constants, where a = 0 since, by assumption, the function θ(·) is not constant. We can therefore write 2. Implementing Gauge Invariance, and the emergence of Complex Vector Space.
From Eq. (10), the Gauge Invariance postulate, and the relation θ i = aχ i + b given above, one can show that M is restricted to one of two types: M has the general form where T (ij) has the form and where either β = 0 (type 1), in which case T (ij) is a scale-rotation matrix, or β = 1 (type 2), in which case T (ij) is a scale-rotation-reflection matrix, with scale factor α ij and rotation angle ϕ ij in either case [33]. Now, the state Q can be faithfully represented by the complex unit vector and, remarkably, one can then show that every transformation M of type 1 corresponds one-to-one with the set of unitary transformations of v, and that every transformation M of type 2 corresponds one-to-one with the set of antiunitary transformations of v. In particular, on the assumption that a parameterized transformation that represents a continuous physical transformation must reduce to the identity for some value of the parameters, it follows that a continuous transformation must be represented by unitary transformations.
C. Representation of Measurements.
The third feature, measurement simulability, can be stated as follows: Postulate 5. Measurement Simulability. Any reproducible measurement, A , describable in the formalism can, insofar as its outcome probabilities and associated output states are concerned, be simulated by an arrangement consisting of measurement A flanked by suitable interactions with the system.
Given the results derived above, this postulate immediately implies that A can be simulated by the arrangement shown in Fig. 1, where U and V are unitary transformations representing the interactions with the system.
The reproducibility of measurement A implies that the state of a system immediately after A has yielded outcome i is given by v i = (0, . . . , e iφi , . . . , 0) T , where φ i is undetermined. Hence, the input state v i = U −1 v i will yield outcome i. In order that the arrangement behave like a reproducible measurement, the output state must be v i up to an overall phase, so that it suffices to choose Vv i = v i for i = 1, . . . , N , which implies that V = U −1 . Since the v i form an orthonormal basis, it follows from v i = U −1 v i that the v i also form an orthonormal basis. Therefore, any state v can be expanded With the input state v, the state measured by measurement A in the arrangement is Uv = i c i v i . From Eq. (12), the probabilities, p 1 , . . . , p N , of the outcomes of measurement A performed on state v = (v 1 , . . . , v N ) are given by p i = |v i | 2 . Therefore, in this case, the measurement yields outcome i, together with output state v i , with probability |c i | 2 = |v i † v| 2 , which is the Born rule.

III. DISCUSSION
The physical irrelevance of the overall phase of a pure state is usually regarded as being a minor mathematical feature of the quantum formalism of little physical importance. From this standpoint, one of the most surprising finding in the derivation is that the global gauge condition (which expresses in a more general way the physical irrelevance of the overall phase) is sufficiently strong as to transform a 2N -dimensional real formalism (where states are real unit vectors, and the transformations are the orthogonal transformations) into the familiar Ndimensional complex vector formalism of quantum theory (where states are complex unit vectors, and the transformations are the unitary and antiunitary transformations). In particular, the fact that the set of possible transformations one obtains is precisely the set of all unitary and antiunitary transformations (and neither more nor less) is not something that could, a priori, have been reasonably anticipated.
The derivation provides a number of other important insights into the structure of the quantum formalism. From the perspective of the derivation, it is clear that the use of complex numbers in the quantum formalism is directly tied to the set of possible transformations of state space. For example, if the set of all orthogonal transformations were allowed, then the complex form of the formalism, whilst still possible to write down, would involve non-linear continuous transformations and would therefore not appear mathematically natural. The derivation also suggests that information geometry is directly or indirectly responsible for many of its key mathematical features (such as the importance of square-roots of probability, and the sinusoidal functions that appear in a quantum state), thereby providing significant new support for the hypothesis that information plays a fundamental role in determining the structure of quantum theory.
Finally, the derivation illuminates a previous partial reconstruction of quantum theory due to Stueckelberg [26]. Stueckelberg makes an assumption similar to the Complementarity postulate to arrive at the idea that the state of a system is given by a 2N -dimensional probability distribution which can be written as a unit vector in a 2Ndimensional 'square-root of probability space', as we have done. He then asserts that the allowable transformations of the state space are orthogonal transformations, and shows that, if the transformations are restricted by a superselection rule, then the set of restricted transformations is equivalent to the set of unitary transformations acting on a suitably-defined N -dimensional complex state space. The present derivation shows that Stueckelberg's assertion that the allowable transformations are orthogonal transformations can be naturally accounted for in terms of the information metric over the probability simplex via the Metric Invariance postulate. The derivation also shows that Stueckelberg's superselection rule can be replaced by the Global Gauge Invariance postulate.