Logical inference approach to relativistic quantum mechanics: derivation of the Klein-Gordon equation

The logical inference approach to quantum theory, proposed earlier [Ann. Phys. 347 (2014) 45-73], is considered in a relativistic setting. It is shown that the Klein-Gordon equation for a massive, charged, and spinless particle derives from the combination of the requirements that the space-time data collected by probing the particle is obtained from the most robust experiment and that on average, the classical relativistic equation of motion of a particle holds.


Introduction
The inception of quantum theory was one of taking leaps. This is illustrated by e.g. Schrödinger's paper [1] in which he proposes his celebrated wave equation. In this article [1] Hamilton's principal function S is postulated to take the form S = k ln ψ with k a constant and is then substituted in the Hamilton-Jacobi equation (HJE). Upon variation of the resulting quadratic functional with respect to ψ (which Schrödinger later justifies using Huygens' principle [2]) an equation linear in ψ, now known as the Schrödinger equation, was obtained. The derivation of the Klein-Gordon equation [3][4][5][6][7] is essentially identical to that of the Schrödinger equation namely, an action Ansatz is substituted in the relativistic Hamilton-Jacobi equation, and after variation of the resulting quadratic functional with respect to ψ, the relativistic analogue of the Schrödinger equation is obtained [3][4][5][6][7].
Because of the ad hoc assumptions involved in obtaining these equations, standard quantum mechanics textbooks usually present the formalism of quantum theory as a set of postulates (see e.g. Refs. [8][9][10][11]) and considerable activity focuses on eliminating some of these postulates [12][13][14][15][16][17][18][19]. Instead of starting from a set of postulates, the current work presents an alternative derivation of the relativistic wave equation based on the principles of logical inference (LI) [20][21][22][23]. Specifically, we demonstrate how the Klein-Gordon equation for a massive, charged and spinless particle follows from LI based on the analysis of data recorded by a detector, thereby extending earlier work [24][25][26][27] to the relativistic domain.
The key concept in LI is the plausibility [23], a mental construct which quantifies e.g. the chance that a detection event occurs. In general, the degree of plausibility is expressed by a real number in the range of 0 and 1 [23]. The algebra of LI facilitates plausible reasoning in the presence of uncertainty in a mathematically well-defined manner [20][21][22][23]. In real experiments there is not only uncertainty about the individual detection events but there obviously is also uncertainty in the conditions under which the experiments are carried out. Inevitably, the conditions of the experiment will vary whenever the experiment is repeated. But if the experimental data is to be reproducible, the experiment must be robust (to be quantified later) with respect to small changes in the conditions under which the experiment is being performed. Earlier work has shown that the equations of non-relativistic quantum theory can be obtained by analyzing such robust experiments [24][25][26][27]; most notable are the Schrödinger [25] and the Pauli equation [26]. Importantly, the requirement that the experiment is to be robust implies that the plausibility must be viewed as an objective assignment (i.e. conditional probability) rather that a subjective one [25]. The present work extends this approach to the relativistic domain: it shows how the Klein-Gordon equation [3,7] for a massive, charged, and spinless relativistic particle emerges by an analysis similar to the one employed in Refs. [25,26].

Particle detection experiment
Consider an experiment in which a particle source and detectors are located at fixed positions relative to the laboratory reference frame. The source emits a particle that interacts with one of the detectors and triggers a detection event that yields data in the form of three spatial coordinates r = (x, y, z) of the detector and the clock time t at which the event occurred. The experiment is considered to be ideal in the sense that every emitted particle triggers one and only one detector.
The experiment is repeated N times, meaning that we let N particles pass through the detector. Each time a particle is created, the (laboratory) clock time is reset. We label the particles and the corresponding data by the index n = 1 . . . N and denote the spatial and temporal resolution by ∆ s and ∆ t , respectively. As particle n passes through the detector, the latter produces a time stamp t n and a vector of spatial coordinates r n = (x n , y n , z n ), which because of the limited resolution, correspond to the time-bin j n = ceiling(t n /∆ t ) and space-bin k n = ceiling(r n /∆ s ) where, element-wise, the function ceiling(x) returns the smallest integer not smaller than x. In practice the number of timebins and space-bins is necessarily finite. Therefore we must have 0 ≤ j n ≤ J and (0, 0, 0) ≤ k n ≤ K = (K x , K y , K z ), where J, K x , K y and K z are (large) integer numbers.
The data collected after N repetitions of the experiment is given by the set of quadruples Υ = {(j n , k n ) | 0 ≤ j n ≤ J; 0 ≤ k n ≤ K; n = 1 . . . N} , (1) or, denoting the total amount of clicks in bin j = (j n , k n ) by c j , by the equivalent data set Note that at this stage, we have not yet assumed that there is a relation between the space-time coordinates of the particle and the data set D.

Inference probability and Fisher information
Having specified the measurement scenario, the next step in the LI approach is to encode the relation between the space-time coordinates of the particle and the nth detection event j n = (j n , k n ) through the inference-probability (i-prob) P(j n |θ , Z ) where θ and Z specify the conditions under which the experiment is being performed [25]. The i-prob is, at this stage, a necessarily subjective number between zero and one that expresses the uncertainty with which the nth particle produces the data j n . The particle is assumed to be characterized by its own (unknown) clock time θ measured in a reference frame attached to the particle. The proposition Z represents all other experimental conditions (e.g. applied electromagnetic potentials) which are considered fixed for the duration of the experiment but are deemed irrelevant for the problem at hand. It is common practice to assume, as a first step, that events are independent, meaning that knowing all earlier and future events, it is impossible to say with certainty what the event will be. Following this practice, we assume that the N detection events are independent. Then, according to the algebra of LI [20][21][22][23], it follows immediately that the i-prob P(Υ |θ , N, Z ) to observe data set Υ factorizes as or, equivalently, The salient feature of the experiment considered here is that there is uncertainty about the individual detection events, that there is uncertainty in the mapping from θ to the spatial coordinates and the time of the detection events. However, if the experimental data is to increase our capability to uncover relations among the observed events at different space-time points, the experiment must be robust [25]. In the case at hand this means that small changes in the unknown clock time θ do not lead to erratic changes in the observed data D, even though there is no reproducibility on the level of individual events. It is convenient to express the requirement of robustness as an hypothesis test [25]. The evidence [22,23] Ev for the hypothesis that θ + ϵ produces the data D relative to the hypothesis that θ produces the same data is given by [22,23,25] The notion of a robust experiment then translates to the statement that for all θ and arbitrary but small ϵ, the evidence |Ev| should be as small as possible. In searching for the solution of the global optimization problem, we exclude the trivial, non-informative experiment for which P(D|θ , N, Z ) does not depend on θ [25]. Making use of Eq. (4) and expanding Eq. (5) to second order in ϵ yields where the primes indicate partial derivatives with respect to θ .
Our goal is now to minimize |Ev| for all θ simultaneously. First note that as  j P(j|θ , Z ) = 1, all partial derivatives of  j P(j|θ , Z ) with respect to θ are zero. Therefore the first and the third term in Eq. (6) vanish if we make the assignment c j = NP(j|θ , Z ). This is an important result: the criterion of robustness not only enforces the intuitively obvious assignment P(j|θ , Z ) = c j /N but by doing so, it changes the subjective nature of P(j|θ , Z ) into an objective, physically measurable quantity (the relative frequency of outcomes). Thus, it is at this point that the possibility to view the i-prob as a subjective assignment is eliminated [25].
With this assignment, the expression for the evidence becomes and as ϵ is arbitrary, we can find the solution of the optimization problem by minimizing the Fisher information for all θ simultaneously.
The basic equations of (relativistic) quantum theory are formulated in terms of continuous space and time. Therefore, to derive such equations from a LI approach, it is necessary to take the continuum limit of Eq. (8). This is readily accomplished in the standard manner by letting the temporal resolution ∆ t and spatial resolution ∆ s approach zero while keeping the four dimensions of the four-dimensional volume fixed. Taking the continuum limit and ignoring irrelevant prefactors, Eq. (8) becomes ] denotes the four-vector of a location in space-time and c is the speed of light in vacuum. Strictly speaking, Eq. (9) makes a slight abuse of notation: in the continuum limit P(x|θ , Z ) is a probability density whilst P(x|θ , Z )dx is the corresponding (dimensionless) i-prob.
Henceforth it is assumed that this change of notation is implicitly understood.

Special relativity
The above discussion focused on the relation between a robust experiment and the observed data but does not refer to any physical theory yet. The knowledge or expectation about the physical behavior enters the LI approach by imposing constraints on the minimization of Eq. (9). Generally speaking, in the absence of uncertainty, we may expect to observe data that complies with the classical mechanical description. Thus, in the case at hand, we require that in the absence of uncertainty, the LI approach yields the results of the special theory of relativity (STR).
Proper time, that is the time measured by a clock at rest, is a central notion in the STR. In the measurement scenario described above, the i-prob P(x|θ , Z )dx was already assumed to depend on the proper time θ of the particle. In the spirit of the STR, we assume that is the proper time of the detection event. In words, we assume that the i-prob to observe a space-time event depends on Lorentz scalars only. Note that because the clock is being reset with each repetition of the experiment, the proper times that enter our description are proper time intervals. In addition, we assume that space-time is homogeneous, meaning that and using this identity, Eq. (9) becomes Recall that the objective of the LI approach is to find P(x|θ , Z ) that minimizes I F for all θ simultaneously, subject to constraints that we discuss next.

Motion of the particle
In the absence of uncertainty, successive observed detection events map one-to-one on the relativistic motion of the particle, described by the laws of the STR. In the LI approach, this limiting case enters through a ''correspondence principle'' in terms of the HJE [25,26]. This is not a surprise: as mentioned in the introduction, the HJE is one of the key ingredients in the derivation of the Schrödinger [1] and the Klein-Gordon equation [3][4][5][6][7]28] and it plays a similar role in the LI derivation of these equations. In the present paper, we do not postulate a HJE but, in analogy with the derivation of the non-relativistic HJE [29,26], we follow an alternative path and derive the relativistic HJE for a massive and charged particle from a field description of the four-velocity dx/dτ .
We start by assuming that there exists a (four-)vector field U(x) such that Here and in the following, we use the standard co/contra-variant notation and the summation convention and denote the Minkowski metric by η = diag(1, −1, −1, −1). Taking the derivative of Eq. (14) with respect to τ yields where ∂ µ is the shorthand for ∂/∂x µ . As the norm of the four-velocity is c, we have U 2 ≡ U α U α = c 2 is a constant and hence any derivative thereof is zero. Therefore we have

Substitution of Eq. (14) into Eq. (16) yields
From Eqs. (15) and (17) it then follows that Eq. (18) has the same form as the Lorentz force equation of a particle moving in an electrodynamic field [30] if we identify F µν with the field-strength tensor of electrodynamics. In order that this identification makes sense, it is necessary to assume that the particles in the particle detection experiment are massive and charged.
If S = S(x) represents a scalar field, the transformation A µ = U µ + ∂ µ S yields (∂S − A) 2 = c 2 (19) where we introduced the shorthand notation (∂S) 2 = (∂ α S)(∂ α S). As it is the four-velocity dx/dτ which corresponds to a physically relevant quantity, imposing gauge invariance enforces introducing a non-vanishing canonical momentum p µ = ∂ µ S in order to keep the norm of the four-velocity fixed to c. Eq. (19) is the relativistic HJE in disguise. Indeed, making use of , introducing the symbols m and q for the mass and charge of the particle, respectively, and changing in Eq. (19) symbols according to ∂S → (∂S/∂ct, ∂S/∂x, ∂S/∂y, ∂S/∂z)/m and A → q(Φ, A x , A y , A z )/m (where Φ and (A x , A y , A z ) are the usual scalar and vector potential, respectively [30]), we find which is the relativistic HJE for a charged, massive particle in an electromagnetic field [3,7].

Derivation of the Klein-Gordon equation
As a first step, it is expedient to write Eq. (13) in an alternative form by noting that cτ = √ and hence Eq. (13) can be written as The general guiding principle of the LI approach is that the experiment that yields the most robust data is described by the probability density P(x|θ , Z ) that minimizes I F for all θ simultaneously, subject to additional constraints that are deemed relevant to the experiment at hand [25]. In the present case, we require that the description is compatible with the special theory of relativity. For a massive, charged particle and in the absence of uncertainty, the latter requirement implies that the classical, relativistic HJE (19) should hold. We can inject this requirement into the LI approach by considering the functional where λ is a weighting factor that reflects the importance of the uncertainty and robustness relative to the contribution of the classical dynamics. It is straightforward to show that the expression Eq. (23) is invariant under Lorentz transformations. Extremization of Eq. (23) can be carried out by the standard variational calculus and yields a set of nonlinear partial differential equations for P(x|θ , Z ) and S(x). It is not difficult to show that at an extremum, (i) the value of F does not depend on the value of the unknown proper time θ and that (ii) the value of F is zero, independent of λ. The latter result implies that the extrema describe situations in which the uncertainty about the detection events is perfectly balanced by the certainty that the classical HJE describes the motion of the observed detection events.
It is now expedient to write Eq. (23) more explicitly as We do not know of any direct method to solve the nonlinear set of equations that results from searching for the extrema of Eq. (24) but, by analogy with the non-relativistic case, we may consider a quadratic functional of a complex-valued field ϕ(x) and use the polar representation of this field to construct the corresponding functional in terms of this representation [25,26,31].
To this end, consider the quadratic functional Substituting the polar representation in Eq. (25) yields Q = F . Equations for the extrema of the functional Q can be found by variation with respect to ϕ * (x), yielding the linear partial differential equation which has the same mathematical structure as the KG equation [32]. This can be made more explicit by changing symbols according to A → q(Φ, A x , A y , A z )/m and λ = 4m 2 /h 2 , yielding Obviously, the weighting factor λ = 4m 2 /h 2 cannot be determined on the basis of logic only but has to follow from a comparison of the outcome of calculations based on Eq. (27) with experimental data. It is of interest to inquire to what extent Eq. (28) allows us to infer from the observed data properties of the massive charged particles. The speed of light in vacuum c certainly does not depend on the properties of the massive charged particle. Then, from Eq. (28), it is immediately clear that its solutions are invariant under the transformationh →hξ , q → qξ , and m → mξ . Hence, from the observed data we may be able to determine two but not three of the constants that appear in Eq. (28). For instance, by a suitable redefinition of the units of mass and charge,h can be eliminated from Eq. (28) [33].
In practice, instead of solving the set of nonlinear equations in terms of P(x|θ , Z ) and S(x) that result from minimizing Eq. (24), it is much easier to first solve Eq. (27) and then use Eq. (26) to find P(x|θ , Z ) = ϕ(x) * ϕ(x). It is important to recognize that the LI approach gives us the probability for observing a space-time event x but does not yield an estimate of the proper time of the particle θ . The latter was and remains unknown. The LI approach suggests that the wave function ϕ(x) is only a mathematical vehicle, be it an extraordinarily useful one, to transform a set of nonlinear partial differential equations into a linear set of partial differential equations. The interplay of the two real quantities S(x) and P(x|θ , Z ) which account for respectively, the classical relativistic physics and the uncertainty on the collected data, can be disentangled through the use of single complex wave function. But, as a mathematical tool, the wave function does not need an interpretation: it is P(x|θ , Z ) that is directly linked to the observed events.

Discussion
We have shown how the Klein-Gordon equation for massive, charged particles derives from logical inference applied to experiments for which the observed events are independent and for which the frequency distribution of these events is robust with respect to small changes of the conditions under which experiments are carried out. The present derivation is a logical generalization of earlier work [24][25][26][27] to the relativistic domain, the fundamental difference being that the measured time is subject to uncertainty.
Obviously, the transition from non-relativistic to relativistic quantum theory is expected to bring in some radically new features. Landau and Peierls [34] pointed out that in relativistic quantum theory the particle position cannot be measured with an accuracy higher than its Compton wavelength. Measuring the position of an electron with an accuracy higher than its Compton wavelength requires an energy that exceeds the threshold for the creation of electron-positron pairs [32], rendering meaningless the question which of the electrons is the original one. Therefore, there is a common believe that relativistic quantum theory cannot be a theory of individual particles but it must be a field theory for a non-constant number of particles [35,36]. The requirement of a field theory description is also linked to the fact that the charge density of the Klein-Gordon equation is not positive definite, as mentioned by Dirac [37] and also stressed by Feshbach and Villars [32]. This is due to the second order time derivative in the Klein-Gordon equation and indicates that the wave function describes in fact two degrees of freedom instead of one [32].
It is worth noting that there is no mention of the direction of time in the logical inference approach.
Indeed, when we derived Eq. (13) we allowed both θ < τ and θ > τ . This looks unusual from the point of view of single-particle quantum mechanics (see, however, a discussion of time (a)symmetry in Refs. [38,39]). Within relativistic quantum mechanics it seems more natural. As was suggested by Wheeler one might interpret anti-particles as particles with the sign of the proper time reversed, i.e. as if the particles are moving backward in time [40], and, at least, for non-interacting particles this interpretation seems to be possible. In the measurement scenario analyzed here, there is no way to discern the absorption of particles from the emission of anti-particles. Both types of events contribute equally well to the detection counts. This relates to the measurement scenario where we make no distinction between detection events for which τ > θ and detection events for which τ < θ; causality is not a prerequisite in the derivation presented here. Naturally one might ask how to extent this approach to particles with non-zero spin (e.g., Dirac equation). We leave this challenging program for future research.