1 Introduction

Tomography builds up higher-dimensional objects from lower-dimensional projections. Quantum tomography [1] is a strategy to reconstruct all that can be observed about a quantum physical system. After becoming a focal point of quantum computing, quantum tomography has recently been applied in a variety of domains [2,3,4,5,6,7,8,9].

The method of quantum tomography uses a known “probe” to explore an unknown system. Data is related directly to matrix elements, with minimal model dependence and optimal efficiency.

Collider physics is conventionally set up in a framework of unobservable and model-dependent scattering amplitudes. In quantum tomography these unobservable features are skipped to deal directly with observables. The unknown system is parameterized by a certain density matrix \(\rho (X)\), which is model-independent. The probe is described by a known density matrix \(\rho (\mathrm{probe}).\) The matrices are represented by numbers generated and fit to experimental data, not abstract operators. Quantum mechanics predicts an experiment will measure \(\mathrm{tr}(\rho (\mathrm{probe}) \cdot \rho (X))\), where tr is the trace. In many cases \(\rho (\mathrm{probe})\) is extremely simple: A \(3\times 3\) matrix, say. What will be observed is strictly limited by the dimension and symmetries of the probe. The powerful efficiency of quantum tomography comes from exploiting the probe’s simplicity in the first steps. The description never involves more variables than will actually be measured.

We illustrate the advantages of quantum tomography with inclusive lepton-pair production. It is a relatively mature subject chosen for its pedagogical convenience. Despite the maturity of the subject, we discover new things. For example, the puzzling plethora of plethora of ad hoc invariant quantities is completely cleared up. We also find new ways to assist experimental data analysis. Positivity is a central issue overlooked in the literature, which we show how to control. Moreover, the tomography procedure carries over straightforwardly to many final states, including the inclusive production of charmonium, bottomonium, dijets, including boosted tops, HH, \(W^+ W^-\), ZZ [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]. Our practical guide to analyzing experimental data uses density matrices at each step and circumvents the more elaborate traditional theoretical formalism. We concentrate on making tools available to experimentalists. We give a step-by-step guide where density matrices stand as definite arrays of numbers, bypassing unnecessary formalism.

2 The quantum tomography procedure applied to inclusive lepton-pair production

The tomography procedure reconstructs all that can be observed about a quantum physical system. For inclusive lepton-pair production, what can be observed is the invariant mass distribution, the lepton-pair angular distribution \(\mathrm{d}N/\mathrm{d}\Omega \) and the polarization of the unknown intermediate state of the system, contained in \(\rho (X)\).Footnote 1 In this section we reconstruct \(\mathrm{d}N/\mathrm{d}\Omega \) and \(\rho (X)\) from first principles using tomography. Structure functions and model-dependent assumptions as regards the intermediate state, common to the traditional formalism [32,33,34,35,36], do not appear.

The expert reader, who is accustomed to seeing some of these formulas derived, might note that the method of derivation is particularly simple. The particular steps we do not follow are to be noted. That also explains why some of the relations we find seem to have been overlooked in the past.

2.1 Kinematics

Consider inclusive production of a lepton pair with 4-momenta k, \(k'\) from the collision of two hadrons with 4-momenta \(P_{A}\), \(P_{B}\):

$$\begin{aligned} P_{A}P_{B} \rightarrow \ell ^{+}(k)\ell ^{-}(k')+{\mathcal {X}}, \end{aligned}$$

where \({\mathcal {X}}\) and the final-state lepton spins are unobserved and thus summed over. In the high energy limit \(k^{2} = k^{'2} =0\).

Let the total pair momentum \(Q=k+k'\). The azimuthal distribution of total pair momenta in the lab frame is isotropic. Lepton-pair angular distributions are described in the pair rest frame defined event-by-event. In this frame the pair momenta are back-to-back and equal in magnitude. The frame orientation depends on the beam momenta and the pair total momentum.

Defining momentumFootnote 2 observables via a Lorentz-covariant frame convention allows calculations to be done in any frame. In its rest frame the total pair momentum \(Q^{\mu } = (\sqrt{Q^{2}}, \, {\vec {Q}}=0)\). A set of xyz spatial axes in this frame will be defined by three 4-vectors \(X^{\mu }, \, Y^{\mu }, \, Z^{\mu }\), satisfying

$$\begin{aligned} Q\cdot X=Q\cdot Y =Q\cdot Z=0. \end{aligned}$$
(1)

The frame vectors being orthogonal implies

$$\begin{aligned} X\cdot Y=Y \cdot Z =X\cdot Z=0 . \end{aligned}$$
(2)

Taking \(P_{A}=(1, 0, 0, 1)\), \(P_{B}=(1, 0, 0, -1)\) (light-cone ± vectors), a frame satisfying the relations of Eqs. (1) and (2) is given byFootnote 3

$$\begin{aligned} \tilde{Z}^{\mu }&= P_{A}^{\mu } Q\cdot P_{B} - P_{B} ^{\mu } Q\cdot P_{A}; \\ \tilde{X}^{\mu }&= Q^{\mu }- P_{A}^{\mu } {Q^{2}\over 2 Q\cdot P_{A}} - P_{B} ^{\mu } {Q^{2}\over 2 Q\cdot P_{B} }; \\ \tilde{Y}^{\mu }&= \epsilon ^{\mu \nu \alpha \beta }P_{ A\nu }P_{B \alpha }Q_{\beta }. \end{aligned}$$

These frame vectors define the Collins–Soper (CS) frame.Footnote 4 The normalized frame vectors are

$$\begin{aligned} \left( X^{\mu }, \, Y^{\mu }, \, Z^{\mu }\right) = \left( {\tilde{X}^{\mu } \over \sqrt{-\tilde{X}\cdot \tilde{X}} }, \, {\tilde{Y}^{\mu } \over \sqrt{-\tilde{Y}\cdot \tilde{Y}} }, \, {\tilde{Z}^{\mu } \over \sqrt{-\tilde{Z}\cdot \tilde{Z}} }\right) . \end{aligned}$$

To analyze data for each event labeled J:

$$\begin{aligned} \text {Compute} \quad Q_{J}&=k_{J}+k_{J}';\, \ell _{J} =k_{J}-k_{J}'; \,(X_{J}^{\mu }, \, Y_{J}^{\mu }, \, Z_{J}^{\mu }) ; \nonumber \\ {\vec {\ell }}_{XYZ, J}&= ( X_{J}\cdot \ell _{J}, \, Y_{J}\cdot \ell _{J} , \, Z_{J}\cdot \ell _{J}); \nonumber \\ {\hat{\ell }}_{J}&=\vec {\ell }_{XYZ, \, J} /\sqrt{\vec {\ell }_{XYZ, \, J}\cdot \vec {\ell }_{XYZ, \, J}}. \end{aligned}$$
(3)

In fact, \({\hat{\ell }}_J = (\sin \theta \cos \phi , \,\sin \theta \sin \phi , \,\cos \theta )_J,\) where \(\theta \), \(\phi \) are the polar and azimuthal angles of one (e.g. plus-charge) lepton in the rest frame of Q. The meaning of a “Lorentz-invariant \({\cos }\) \(\theta \)” is a scalar \(Z_{\mu }(k-k')^{\mu }\) which becomes \({\hat{z}} \cdot \hat{(k - k')}\) in the rest frame of Q.

2.2 The angular distribution, in terms of the probe and target density matrices

The standard amplitude for inclusive production of a fermion–antifermion pair of spin s, \(s'\) has a string of gamma-matrices contracted with final-state spinors \(v_{a}(k's')\), \(\bar{u}_{b}(k, s)\). When the amplitude is squared, these factors appear bi-linearly, as in

Summing over unobserved s and dropping \(m \delta _{aa'}\), a form of density matrix appears:

$$\begin{aligned} \sum _{s} \, u_{a}(ks)\bar{u}_{a'}(ks) \rightarrow k_{\mu }\gamma _{aa'}^{\mu }. \end{aligned}$$

The Feynman rules for the density matrix of two relativistic final-state fermions (or antifermions, or any combination) is a factor given by

(4)

This fundamental equality is not present in pure state quantum systems. There is no spinor corresponding to a fermion averaged over initial spins, nor to a fermion summed over final spins.

As shown in the appendix, the rest of the cross section appears in the target density matrix \(\rho (X)\), which must have four indices to contract with the probe indices:

$$\begin{aligned}&\mathrm{d}\sigma \sim \sum _{aa'bb'} \, \rho _{aa', \, bb'}(k, \, k')\rho _{aa', \, bb'}(X) d\mathrm{LIPS} \nonumber \\&\quad = \mathrm{tr} \left( \rho (k, \, k')\rho (X) \right) d\mathrm{LIPS}, \end{aligned}$$
(5)

where \(d\mathrm{LIPS}\) is the Lorentz-invariant phase space.

Note that \( u_{a}(ks)\bar{u}_{a'}(ks)\) is not positive definite since the Dirac adjoint \( \bar{u}_{a'}(ks)= (u^{\dagger }(ks) \gamma _{0})_{a}\) has a factor of \( \gamma _{0}\), introduced by convention. Removing it, \(\sum _{s} \, u_{a}(ks)u_{a'}^{\dagger }(ks)\) becomes positive by inspection. (Any matrix of the form \(M \cdot M^{\dagger }\) has positive eigenvalues.) \(\rho (k, \, k')\) as written is not normalized, because the Feynman rules shuffle spinor normalizations into overall factors. To make the arrow in Eq. (4) into an equality, multiply on the right by \(\gamma _{0}\) twice, and standardize the normalizations. The same steps applied to \(\rho _{aa', \, bb'}(X)\) cancels the \(\gamma _{0}\) factors. The result is that the probability to find two fermions has the fundamental quantum mechanical formFootnote 5 \(P(k, \,k')=\mathrm{tr}(\rho (k, \, k')\rho (X))\).

The left side of Eq. (5) is \(\mathrm{d}\sigma (k, \, k')\), the same as the joint probability \(P(Q, \, \ell \big | \, \mathrm{init})\) where init are the initial-state variables. The phase space for two leptons converts as

$$\begin{aligned} k_{0}k_{0}' {\mathrm{d}\sigma \over \mathrm{d}^{3}k \mathrm{d}^{3}k'} = {\mathrm{d} \sigma \over \mathrm{d}^{4}Q \mathrm{d}\Omega } . \end{aligned}$$

We can write

$$\begin{aligned} P(Q, \, \ell \big | \, \mathrm{init})=P(\ell \big |Q, \, \mathrm{init})P(Q \big |\mathrm{init}). \end{aligned}$$

Here \(P(Q \big |\mathrm{init}) = \mathrm{d}\sigma / \mathrm{d}^{4}Q\), and \(P(\ell \big |Q, \, \mathrm{init})=\mathrm{d}N/\mathrm{d}\Omega \) is the conditional probability to find \(\ell \) given Q and the initial state. This factorization is general and unrelated to one-boson exchange, parton model, or other considerations. Since \(P(\ell \big |Q, \, \mathrm{init})\) is a probability, quantum mechanics predicts it is a trace:

$$\begin{aligned}&{\mathrm{d}N\over \mathrm{d} \Omega } = {1\over \sigma }{\mathrm{d}\sigma \over \mathrm{d}\Omega } = P(\ell \big |Q, \, \mathrm{init})={3\over 4 \pi }\mathrm{tr}(\rho (\ell ) \rho (X)), \end{aligned}$$
(6)

where tr indicates the trace, \(\mathrm{d} \Omega = \mathrm{d} \cos \theta \cdot \mathrm{d} \phi ,\) and \(\rho (\ell ),\) the probe, is a \(3\times 3\) matrix to be defined momentarily which depends only on the directions \({\hat{\ell }}_J\). The target hadronic system is represented by \(\rho (X)\). Since the probe \(\rho (\ell )\) is a \(3\times 3\) matrix, then \(\rho (X)\) is a \(3\times 3\) matrix of numbers.Footnote 6

The description has just been reduced from \(\rho _{aa', \, bb'}(k, \, k')\), a Dirac tensor with \(4^{4}\) possible matrix elements, to a \(3\times 3\) Hermitian matrix with 8 independent elements, since \(\mathrm{tr}(\rho (\ell ))=1\) is one condition. Equation 6 is the most general angular distribution that can be observed. It is valid for like sign and unlike sign pairs, and assumes no model for how the pairs are produced. The Dirac form (and Dirac traces) is over-complicated, because describing every possible exclusive reaction for every possible in and out state is over-achieved in the formalism.

2.3 The probe matrix

The probe matrix \(\rho (\ell )\) is given by

$$\begin{aligned} \rho _{ij}(\ell ) = {1+a\over 3}\delta _{ij} -a {\hat{\ell }}_{i}{\hat{\ell }}_{j} -\imath b \epsilon _{ijk}{\hat{\ell }}_{k}, \end{aligned}$$
(7)

which is derived in the appendix. The Standard Model predicts only two parameters, a and b. If on-shell lepton helicity is conserved (as in lowest-order production by a minimally coupled vector boson) then \(a =1/2\) and \(b = c_{A}c_{V}.\) The latter is not a prediction but a definition. If the production is parity-symmetric then \(c_{A}=0\). The only non-trivial prediction of the Standard Model is the value of \(c_{A}c_{V}\). Lowest-order production by Z bosons predicts \(b=\sin ^{2}\theta _{W}\sim 0.22\).

More generally, the probe matrix itself represents a reduced system that is unknown a priori. It should be determined experimentally. Consider the angular distribution of \(e^{+}e^{-} \rightarrow \mu ^{+}\mu ^{-}\). Let \(\rho (e; \, {\hat{z}})\) describe electrons with parameters \(a_{e}, \, b_{e}\) colliding along the z axis. Let \(\rho (\mu ; \, {\hat{\ell }})\) describe muons with parameters \(a_{\mu }, \, b_{\mu }\) emerging along direction \({\hat{\ell }}\). A short calculation using Eq. (7) twice givesFootnote 7

$$\begin{aligned}&{3\over 4 \pi }\mathrm{tr} \left( \rho (e; \, {\hat{z}}) \rho (\mu ; \, {\hat{\ell }}) \right) \nonumber \\&\quad = {3\over 4 \pi } \left( {1\over 3} + 2 b_{e}b_{\mu } {\hat{\ell }} \cdot {\hat{z}}+ a_{e}a_{\mu }(({\hat{z}}\cdot {\hat{\ell }})^{2} -1/3) \right) , \nonumber \\&\quad = {3\over 4 \pi }\left( {1\over 3} + 2 b_{e}b_{\mu }\cos \theta + a_{e}a_{\mu } (\cos ^{2}\theta -1/3) \right) . \end{aligned}$$
(8)

Fitting experimental data will give \( a_{e}a_{\mu }\) and \(b_{e}b_{\mu }\). If lepton universality is assumed the probe \(\rho (\mu ; {\hat{\ell }})\) has measured the probe \(\rho (e; {\hat{z}})\).

2.4 How tomography works: \(\mathrm{d}N/\mathrm{d}\Omega \) as a function of \(\rho (\ell )\), \(\rho (X)\)

Let \({\hat{G}}_{\ell }\) be a set of probe operators, with expectation values \(\left\langle G_{\ell }\right\rangle =\mathrm{tr}(G_{\ell }\rho (X))\). The trace defines the Hilbert–Schmidt inner product of operators. The condition for operators (matrices) to be orthonormal is

$$\begin{aligned} \mathrm{tr}(G_{\ell }G_{k}) = \delta _{\ell k} \quad \text {orthonormal matrices}. \end{aligned}$$
(9)

There are \(N^{2}-1\) orthonormal \(N\times N\) Hermitian operators, not including the identity. When a complete set of probe operators has been measured, the density matrix is tomographically reconstructed from observables as

$$\begin{aligned} \rho (X) = \sum _{\ell } \, G_{\ell }\mathrm{tr}(G_{\ell }\rho ) = \sum _{\ell } \, G_{\ell } \left\langle G_{\ell }\right\rangle . \end{aligned}$$

For a pure state density matrix, there exists a basis \(\{ G_{\ell } \}\) such that only one term appears in the sum over \(\ell \). Then \(\rho _{\mathrm{pure}} =|\psi \rangle \langle \psi |,\) and \(|\psi \rangle \) is reconstructed as the eigenvector of \(\rho _{\mathrm{pure}}\).

Each orthogonal probe operator measures the corresponding component of the unknown system, and is classified by its transformation properties. For angular distributions the transformations of interest are rotations. \(\rho (\ell )\) contains tensors transforming like spin-0, spin-1 and spin-2. Each tensor of a given type is orthogonal to the others.

Organizing transformation properties simplifies things significantly. Recall the general form of \(\rho (\ell )\), from Eq. (7). The most general form for \(\rho (X)\) that is observable will have the same general expansion, with new parameters:

$$\begin{aligned} \text {Probe:} \qquad \rho _{ij}(\ell )&= {1\over 3}\delta _{ij} +b {\hat{\ell }} \cdot {\vec {J}}_{ij} +a U_{ij}({\hat{\ell }}); \quad \nonumber \\ \text {where} \quad U_{ij}({\hat{\ell }})&= {\delta _{ij} \over 3} -{\hat{\ell }}_{i}{\hat{\ell }}_{j} =U_{ji}(\ell ); \, tr(U(\ell )) =0; \end{aligned}$$
(10)
$$\begin{aligned} \text {System:} \quad \rho _{ij}(X)&= {1\over 3}\delta _{ij} +{1\over 2}{\vec {S}} \cdot {\vec {J}}_{ij} +U_{ij}(X) ; \quad \nonumber \\ \text {where} \quad U(X)&=U^{T}(X); \quad tr(U(X))=0. \end{aligned}$$
(11)

These formulas reiterate Eq. (7) while identifying \((J_{k})_{ij} = -\imath \epsilon _{ijk}\) as the generator of the rotation group in the \(3\times 3\) representation.Footnote 8 Upon taking the trace as an inner product, orthogonality selects each term in \(\rho (X)\) that matches its counterpart in \(\rho (\ell )\). For example \({\vec {J}}\) is orthogonal to all the other terms except the same component of \({\vec {J}}\):

$$\begin{aligned}&{1\over 2} \mathrm{tr}(J_{i}J_{j}) =\delta _{ij} ;\quad \text {hence}\quad {1\over 2} \mathrm{tr}( {\hat{\ell }} \cdot {\vec {J}} \, {\vec {S}}\cdot {\vec {J}} ) = {\hat{\ell }} \cdot {\vec {S}}. \end{aligned}$$

Orthogonality makes it trivial to predict which density matrix terms can be measured by probe matrix terms. We call the matching of terms “the mirror trick”.

We now make several relevant comments about Eqs. (10) and (11):

  • All density matrices can be written as \(1_{N\times N}/N\) to take care of the normalization, plus a traceless Hermitian part. The unit matrix is the spin-0 part and invariant under rotations. The only contribution of the 1 terms is \(\mathrm{tr}( 1\times 1)/N^{2} =1/N\).

  • The textbook density matrix spin vector \({\vec {S}}\) consists of those parameters coupled to the angular momentum operator. This is also called the spin-1 contribution. The quantum mechanical average angular momentum of the system is

    $$\begin{aligned} \left\langle {\vec {J}}\right\rangle =\mathrm{tr}(\rho _{X}{\vec {J}}) ={\vec {S}}. \end{aligned}$$

    When the coordinates are rotated, the \({\vec {J}}\) matrices transform exactly so that \({\vec {S}}\) rotates like a vector under proper rotations, and a pseudovector under a change of parity.

  • The last term of Eq. (10), the spin-2 part, is real, symmetric and traceless. By the mirror trick it can only communicate with a corresponding spin-2 term in \(\rho (X)\) denoted \(U_{ij}(X)\), which is real, symmetric and traceless. It can be considered a measure of angular momentum fluctuations:

    $$\begin{aligned} \left\langle {1\over 2} \left( J_{i}J_{j}+ J_{j}J_{i} \right) - {1\over 3}{\vec {J}}^{2}\delta _{ij}\right\rangle = U_{ij}(X). \end{aligned}$$

    A common mistake assumes the quadrupole U should be zero in a pure “spin state”. Actually a pure state with \(|{\vec {S}}|=1\) has a density matrix

    $$\begin{aligned} \rho _{\mathrm{pure}, \, ij}({\vec {S}}) ={1\over 2} (\delta _{ij}- {\hat{S}}_{i}{\hat{S}}_{j})-{ \imath \over 2}\epsilon _{ijk}{\hat{S}}_{k}. \end{aligned}$$
    (12)

    For example, when \({\vec {S}} ={\hat{z}}\) the density matrix has one circular polarization eigenstate with eigenvalue unity, and two zero eigenvalues. Pure states exist with \({\vec {S}}=0\): They have real eigenvectors corresponding to linear polarization. From the spectral resolution \(\rho (X) =\sum _{\alpha } \, \lambda _{\alpha } |e_{a}\rangle \langle e_{\alpha }|\), there is no observable distinction between a density matrix and the occurrence of pure states \(|e_{a}\rangle \) with probabilities \(\lambda _{\alpha }\), which are the density matrix eigenvalues.

  • As it stands the \(U_{ij}\) matrices in Eqs. (10) and (11) have not been expanded in a complete set of symmetric, orthonormal \(3 \times 3\) matrices. Regardless \(\rho (X)\) can be fit to data whether or not an expansion is done. The purpose of such work is to complete the classification process to assist with interpreting data. We sketch the steps here. Details are provided in an appendix. Let \(E_{M}\) be a basis of traceless orthonormal matrices where \(U(\ell ) = \sum _{M} \, \mathrm{tr}(U(\ell )E_{M}) E_{M}\). This is the tomographic expansion of the probe. Choose \(E_{M}\) so the outputs are normalized real-valued spin-2 spherical harmonics \(Y_{M}(\theta , \, \phi )\). The expansion of the unknown system will be \(U(X) = \sum _{M} \, \mathrm{tr}(\rho (X)E_{M}) E_{M} =\sum _{M} \, \rho _{M}(X)E_{M}\). By orthogonality the spin-2 contribution to the angular distribution will be

    $$\begin{aligned} {\mathrm{d} N\over \mathrm{d} \Omega } \sim \mathrm{tr}(\rho (\ell ) \rho (X))_{spin-2} \sim \sum _{M} \, \rho _{M}(X) Y_{M}(\theta , \, \phi ). \end{aligned}$$

    Writing out the terms gives

    $$\begin{aligned} {\mathrm{d}N \over \mathrm{d} \Omega }=&{1\over 4 \pi }+\frac{3}{4 \pi }S_{x}\sin \theta \cos \phi \nonumber \\&+\frac{3}{4 \pi }S_{y}\sin \theta \sin \phi +\frac{3}{4 \pi }S_{z}\cos \theta \nonumber \\&+c\rho _{0} \left( {1\over \sqrt{3}} - \sqrt{3} \cos ^2\theta \right) - c \rho _{1} \sin (2 \theta ) \cos \phi \nonumber \\&+ c\rho _{2} \sin ^{2}\theta \cos (2 \phi ) \nonumber \\&+c\rho _{3} \sin ^{2}\theta \sin (2 \phi ) - c \rho _{4} \sin (2 \theta ) \sin \phi . \end{aligned}$$
    (13)

    The label X has been dropped in \(\rho _{M}\) and \(c= 3 /(8\sqrt{2} \pi ) \). Since \(E_{M}\) transform like \(Y_{M}\), the coefficients \(\rho _{M}\) transform under rotations like spin-2. That means \(\rho _{M} \rightarrow R^{(2)}_{MM \prime }\rho _{M \prime }(X)\), where \( R^{(2)}_{MM'}\) is a matrix available from textbooks [37]. The traditional \(A_{k}\), \(\lambda _{k}\) conventions do not use orthogonal functions. Transformations from the traditional conventions to the \(\rho _M\) convention are given in an appendix.

Note the transformation properties listed are exact. The systematic and statistical errors of a measurement appear in fitting \(\rho (X)\).

2.5 Fitting \(\rho (X)\), \(\mathrm{d}N/\mathrm{d}\Omega \)

Quantum mechanics requires \(\rho (X)\) must be positive, which means it has positive eigenvalues. Positivity produces subtle nonlinear constraints, similar to unitarity. In the \(3\times 3\) case the relations are generally cubic polynomials. Positivity is not the same concept as yielding a positive cross section, and generally is a more restrictive set of relations.Footnote 9 If density matrices are not used it is quite straightforward to fit data yielding a positive cross section while violating positivity.

Fortunately positivity can be implemented by the Cholesky decomposition of \(\rho _X\) [38], which is discussed in the appendix. For the \(3\times 3\) case it is

$$\begin{aligned} \rho (X)(m)&=M(m)\cdot M^{\dagger }(m); \nonumber \\ M(m)&= {1 \over \sqrt{ \sum _{k} m^{2}_k}} \left( \begin{array}{ccc} m_1 &{} m_4+i m_5 &{} m_6+i m_7 \\ 0 &{} m_2 &{} m_8+i m_9 \\ 0 &{} 0 &{} m_3 \\ \end{array} \right) , \end{aligned}$$
(14)

where the parameters \(-1 \le m_{\alpha } \le 1\).

Event-by-event \(\rho (\ell )\) is an array of numbers, and \(\rho (X)(m)\) is an array of parameters. The results are combined to make the Jth instance of \(tr(\rho _{J}(\ell )\rho (X)(m))\), where \(\rho (X)(m)\) has been parameterized in Eq. (14). Fit the \(m_{\alpha }\) parameters to the data set. For example, the log likelihood \({\mathcal {L}}\) of the set \(J=1 \cdots J_{\mathrm{max}}\) is

$$\begin{aligned} {\mathcal {L}}(m) =&\sum _{J}^{J_{\mathrm{max}}} \, \mathrm{log}\left( \mathrm{tr}\left( \rho ^{(J)}(\ell ) \cdot \rho (X)(m) \right) \right) \nonumber \\&+ J_{\mathrm{max}}log(3/4\pi ) . \end{aligned}$$
(15)

Sample code available onlineFootnote 10 carries out these steps, returning the parameters \(m_{\alpha }\). The details of cuts and acceptance appear in fitting the numbers \(m_{\alpha }\) using numbers for the lepton matrix \(\rho (\mathrm{lep})\) (not angles, nor trigonometric functions.) In one example with simulated Z-boson data we found

$$\begin{aligned}&\rho _{\mathrm{fit}}(X) = \left( \begin{array}{ccc} 0.5574 &{} 0.01399-0.07144 i &{} -0.004026+0.013487 i \\ 0.01399+0.07144 i &{} 0.4422 &{} 0.003138-0.002670 i \\ -0.004026-0.013487 i &{} 0.003138+0.002670 i &{} 0.0004268 \\ \end{array} \right) . \end{aligned}$$

Using the Standard Model parameters for \(\rho (\ell )\), Eqs. (3) and (7), the trace yields

$$\begin{aligned}&{\mathrm{d} N \over \mathrm{d} \Omega _{\mathrm{fit}}}\sim \mathrm{tr}(\rho (\ell )\rho (X))=0.5000+ 0.0007739 \sin (\phi ) \sin (\theta )\\&\quad +0.3090 \cos ^2(\phi ) \cos ^2(\theta ) +0.1904 \sin ^2(\phi ) \cos ^2(\theta )+\cdots \end{aligned}$$

where \(\cdots \) indicates several terms there is no need to write out. Integrated over \(\phi \), this expression becomes

$$\begin{aligned} {\mathrm{d}N_{\mathrm{fit}} \over \mathrm{d} \cos \theta } ={3\over 4 \pi }\left( 1.57 + 0.137 \cos \theta + 1.56 \cos ^{2} \theta \right) . \end{aligned}$$

A \(1+\cos ^{2}\theta \) distribution is the leading-order Drell–Yan prediction for virtual spin-1 boson annihilation, while \(0.137 \, \cos \theta \) represents a charge asymmetry.

It is trivial to go from \(\mathrm{tr}(\rho (\ell )\rho (X))\) to a conventional parameterization of an angular distribution by taking inner products of orthogonal functions. It is also easy to expand \(\rho (X)\) in a basis of orthonormal matrices with the same results. Note these steps are exact, and very different from fitting data to trigonometric functions in some convention, which tends to yield multiple solutions, along with violations of positivity, which can introduce pathological convention dependence. Perhaps struggles with convention dependence of quarkonium data [40, 41] are related to this. It would be interesting to investigate.

2.6 Summary of quantum tomography procedure

To analyze data for each event labeled J:

  • $$\begin{aligned} \text {Compute} \quad Q_{J}= & {} k_{J}+k_{J}';\,\,\ell _{J} =k_{J}-k_{J}';\,\, \; (X_{J}^{\mu }, \, Y_{J}^{\mu }, \, Z_{J}^{\mu }) ; \\ {\vec {\ell }}_{XYZ, J}= & {} ( X_{J}\cdot \ell _{J}, \, Y_{J}\cdot \ell _{J} , \, Z_{J}\cdot \ell _{J});\\ {\hat{\ell }}_{J}= & {} \vec {\ell }_{XYZ, \, J} /\sqrt{\vec {\ell }_{XYZ, \, J}\cdot \vec {\ell }_{XYZ, \, J}}. \end{aligned}$$
  • Make the lepton density matrix. For Z bosons in the Standard Model it is

    $$\begin{aligned} \rho _{ij}(\ell ) = {1 \over 2}(\delta _{ij} - {\hat{\ell }}_{i} {\hat{\ell }}_{j}) - 0.22\, \imath \epsilon _{ijk} {\hat{\ell }}_{k}. \end{aligned}$$
    (16)
  • For Beyond Standard Model (BSM) phenomenology, one can readily assign values for a and b according to model specifics.Footnote 11 See footnote 11 for details. The results are combined to make the Jth instance of \(\mathrm{tr}(\rho _{J}(\ell )\rho (X)(m))\), where \(\rho (X)(m)\) has been parameterized in Eq. (14). Fit the \(m_{\alpha }\) parameters to the data set. For example, the log likelihood \({\mathcal {L}}\) of the set \(J=1 \cdots J_{\mathrm{max}}\) is

    $$\begin{aligned} {\mathcal {L}}(m) =&\sum _{J}^{J_{\mathrm{max}}} \, \mathrm{log}\left( \mathrm{tr}\left( \rho ^{(J)}(\ell ) \cdot \rho (X)(m) \right) \right) \nonumber \\&+ J_{\mathrm{max}}\mathrm{log}(3/4\pi ) . \end{aligned}$$
    (17)

    Sample code available online (see footnote 10) carries out these steps, returning the parameters \(m_{\alpha }\).

2.7 Comments

  1. 1.

    The possible symmetries of \(\rho (\ell )\) enter here. Suppose \(c_{A}=0\). Then \(\rho (\ell )\) is even under parity, real and symmetric. The imaginary antisymmetric elements of \(\rho (X)\) are orthogonal, and contribute nothing to the angular distribution. When known in advance, the redundant parameters of \(\rho (X)\) can be set to zero while making the fit. (That does not mean unmeasured parameters can be forgotten when dealing with positivity.) In general a fitting routine will either report a degeneracy for redundant parameters, or converge to values generated by round-off errors. Degeneracy will always be detected in the Hessian matrix computed to evaluate uncertainties.

  2. 2.

    The normalization condition \( \sum _{k} \, m^{2} (k)=1\) can be postponed by removing \(1/\sqrt{m_{\alpha }^{2}}\) from Eq. (14), and subtracting \(J_{\mathrm{max}}\mathrm{log}( \sum _{k}^{J_{\mathrm{max}}} \, m^{2} (k))\) from the log likelihood (Eq. (15)). When that is done the fitted density matrix will not be automatically normalized, due to the symmetry \(\rho (X) \rightarrow \lambda \rho (X)\) of the modified likelihood. The density matrix becomes normalized by dividing by its trace. Incorporating such tricks improved the speed of the code available online (see footnote 10) by a factor of about 100.

  3. 3.

    Algorithms are said to compute a “unique” Cholesky decomposition, which would seem to predict \(m_{\alpha }\) given \(\rho (X)\). The algorithms choose certain signs of \(m_{\alpha }\) by a convention making the diagonals of M positive. However, that is not quite enough to ensure a numerical fit finds a unique solution.

    Table 1 Terms in the angular distribution with their properties under discrete transformations \(C_{\ell }\), P, and T. Here \(\ell \) stands for \({\hat{\ell }}\), \( X\ell \) stands for \({\hat{X}} \cdot {\hat{\ell }}=-X_{\mu } \ell ^{\mu }\), and so on with scalar normalization factors removed. T-odd scattering observables from the imaginary parts of amplitudes generally exist without violating fundamental T symmetry. See the text for more explanation

    The fundamental issue is that \(MM^{\dagger }=\rho (X)\) is solved by \(M= \sqrt{\rho (X)}\), and the square root is not unique. There are \(2^{N}\) arbitrary sign choices possible among N eigenvalues of \(\sqrt{\rho (X)}\). Forcing the diagonals of M to be positive reduces the possibilities greatly, and an algorithm exists to force a unique, canonical form of \(m_{\alpha }\) in a data fitting routine. We did not make use of such a routine, since fitting \(\rho (X)\) is the objective. Depending upon the data fitting method, increasing the number of ways for \(M(m_{\alpha })\) to make a fit sometimes makes convergence faster.

  4. 4.

    Let \(\left\langle \right\rangle _{\mathrm{exp}}\) stand for the expectation value of a quantity in the experimental distribution of events. By symmetry \( \left\langle {\hat{\ell }}\right\rangle _{\mathrm{exp}}\) and \(\left\langle {\hat{\ell }}_{i} {\hat{\ell }}_{j} \right\rangle _{\mathrm{exp}}\) are vector and tensor estimators, respectively, which must depend on the vector and tensor parameters \({\vec {S}}\), \(U_{ij}(X)\) in the underlying density matrix. A calculation finds

    $$\begin{aligned} \left\langle {\hat{\ell }}\right\rangle _{\mathrm{exp}}&= {1 \over J_{\mathrm{max}}}\sum _{J} {\hat{\ell }}_{J} = - {1 \over 4}{\vec {S}}; \\ \left\langle {\hat{\ell }}_{i} {\hat{\ell }}_{j} \right\rangle _{\mathrm{exp}}&={1 \over J_{\mathrm{max}}}\sum _{J} {\hat{\ell }}_{Ji}{\hat{\ell }}_{Ji} = {1\over 3}\delta _{ij} - {1\over 5} \mathrm{Re}[U_{ij}]. \end{aligned}$$

An estimate of \(\rho (X)\) not needing a parameter search then exists directly from data. However, positivity of \(\rho (X)\) is more demanding, and it is not automatically maintained by such estimates.

3 Results

3.1 Analysis bonuses of the quantum tomography procedure

3.1.1 Convex optimization

The issue of multiple solutions for \(\rho (X)\) is different. Multiple minima of \(\chi ^{2}\) statistics affects fits to cross sections parameterized by trigonometric functions. However, quantum tomography using maximum likelihood happens to be a problem of convex optimization. In brief, when \(\rho \) is positive then \(\left\langle e|\rho |e\right\rangle \) is a positive convex function of \(|e\rangle \). Then \(\mathrm{tr}(\rho (\ell ) \rho (X))\) is convex, being equivalent to a positively weighted sum of such terms. The logarithm is a concave function, leading to a convex optimization problem. That means that when \(\rho (X)\) is a local maximum of likelihood it is the global maximum. Exceptions can only come from degeneracies due to symmetry or an inadequate number of data points [42]. Convex optimization is important because without such a property the evaluation of high-dimensional fits by trial and error can be exponentially difficult.

3.1.2 Discrete transformation properties

Table 1 lists discrete transformation properties of all terms under parity P, time reversal T, and lepton charge conjugation \(C_{\ell }\). If leptons have different flavors (as in like or unlike sign \(e \mu \)) the \(C_{\ell }\) operation swaps the particle defining \({\hat{\ell }}\).

When coordinates XYZ are defined the direction of \({\hat{Y}} ={\hat{Z}}\times {\hat{X}}\) is even under time reversal and parity, which is exactly the opposite of X and Z. Then \({\vec {S}} \cdot {\hat{Y}}\) is T-odd, contributing the \(\sin \theta \sin \phi \) term.Footnote 12 The XY and ZY matrix elements of \(\rho (X)\) are also odd under T, contributing the terms shown. T-odd terms come from imaginary parts of amplitudes, which are generated by loop corrections in perturbative QCD. An excellent example of T-odd effects in QCD is given in Ref. [44] by Hagiwara et al.

Notice that every term in the lepton density matrix (Eq. (7)) is automatically symmetric under \(C_{\ell }P\). This is a kinematic fact of the lepton-pair probe which does not originate in the Standard Model. As a result the \(C_{\ell }P\) transformations of the angular distribution depend on the coupling to the unknown system. If overall CP symmetry exists the target density matrix will have CP odd terms where \(C_{\ell }P\) odd terms are found. In the Standard Model these \(\cos \theta \) and \(\sin \theta \cos \phi \) terms correspond to charge asymmetries of leptons correlated with charge asymmetries of the system, namely the beam quark and antiquark distributions.

While weak CP violation is a mainstream topic, P and CP symmetry of the strong interactions at high energies has not been tested [45]. The gauge sector of QCD is kinematically CP symmetric, because the non-Abelian \(\mathrm{tr}({\vec {E}}\cdot {\vec {B}})\) term is a pure divergence.Footnote 13 Higher-order terms in a gauge-covariant derivative expansion are expected to exist, and can violate CP symmetry [45].

However, measuring violation of CP or fundamental T symmetry in scattering experiments is invariably frustrated by the experimental impossibility of preparing time-reversed counterparts. Some ingenuity is needed to devise a signal. It appears that any signal will involve four independent 4-momenta \(p_{J}\) and a quantity of the form \(\Omega _{4}= \epsilon _{\alpha \beta \lambda \sigma } p_{1}^{\alpha }p_{2}^{\beta }p_{3}^{\lambda }p_{4}^{\sigma }\). For example a term going like \(\ell \cdot Y \sim \epsilon _{\alpha \beta \lambda \sigma }\ell _{\alpha }Q_{\beta }P_{A \lambda }P_{B \sigma }\) might possibly originate in fundamental T symmetry violation and might be mistaken for perturbative loop effects. A more creative road to finding CP violation involves two pairs with sum and difference vectors \(Q, \, \ell ; \, Q', \, \ell '\), and the scalar \(\epsilon _{\alpha \beta \lambda \sigma }\ell _{\alpha }Q_{\beta }Q_{\lambda }' \ell '_{ \sigma }\), which is even under C and odd under P. The pairs need not be leptons (although “double Drell Yan” has long been discussed) but might be (say) \(\mu ^{+}\mu ^{-} \pi ^{+}\pi ^{-}\). It would be interesting to explore further what a tomographic approach to such observables might uncover.

3.1.3 Density matrix invariants

We mentioned that scattering planes, trig functions, boosts and rotations could be avoided, and the examples show how. Once a frame convention is defined the lepton “coordinates” \(( X_{J}\cdot \ell _{J}, \, Y_{J}\cdot \ell _{J} , \, Z_{J}\cdot \ell _{J})\) are actually Lorentz scalars. However, they depend on the convention for XYZ, which is arbitrary. At least four different conventions compete for attention. Moreover, once a frame is chosen, at least two naming schemes (the “\(A_{k}\)” and “\(\lambda _{k}\)” schemes) exist to describe the angular distribution in terms of trigonometric polynomials..

Well-constructed invariants can reduce the confusion associated with convention-dependent quantities [43, 46,47,48]. Since \({\vec {S}}\) transforms like a vector its magnitude-squared \({\vec {S}}^{2}\) is rotationally invariant. The spin-1 part of \(\rho (X)\) does not mix with the real symmetric part under rotations. Since it is traceless, the real symmetric (spin-2) part has two independent eigenvalues, which are rotationally invariant.Footnote 14 Finally the dot-products of three eigenvectors \({\hat{e}}_{J}\) of the spin-2 part with \({\vec {S}}\) are rotationally invariant. Then \(({\hat{e}}_{j}\cdot {\vec {S}})^{2}\) are three invariants not depending on the sign of eigenvectors. That suggests six possible invariants, but \(\sum _{j} \, ({\hat{e}}_{j}\cdot {\vec {S}})^{2}={\vec {S}}^{2}\) makes the \({\vec {S}}\) invariants dependent, leaving five independent rotational invariants. That is consistent with counting eight real parameters in a \(3\times 3\) Hermitian matrix, subject to three free parameters of the rotation group, leaving \(8-3=5\) rotational invariants. The same counting for unitary transformations would leave only the two independent eigenvalues of the matrix.

Any function of invariants is invariant. The combinations below have useful physical interpretations:

  • The degree of polarization d is a standard measure of the deviation from the unpolarized case. It comes from the sum of the squares of the eigenvalues of \(\rho \) minus 1/3, normalized to the maximum possible:

    $$\begin{aligned}&d = \sqrt{ (3 \mathrm{tr}(\rho _X^{2})-1)/2} ,\\&\qquad \text {where} \quad 0 \le d \le 1. \end{aligned}$$

    When \(d=0\) the system is unpolarized, and when \(d=1\) the system is a pure state.

  • The entanglement entropy \({\mathcal {S}}\) is the quantum mechanical measure of order. The formula is

    $$\begin{aligned}&{\mathcal {S}}=-\mathrm{tr}(\rho _X \, \mathrm{log}(\rho _X)). \end{aligned}$$

    In terms of eigenvalues \(\rho _{\alpha }\), \({\mathcal {S}} = -\sum _{\alpha } \, \rho _{\alpha } \mathrm{log}(\rho _{\alpha })\). When \(\rho \rightarrow 1_{N\times N}/N\) the system is unpolarized, and \({\mathcal {S}}=\mathrm{log}(N)\). That is the maximum possible entropy, and minimum possible information. When \({\mathcal {S}}=0\) the entropy is the minimum possible, providing the maximum possible information, and the system is a pure state.

    It is instructive to interpret \(e^{{\mathcal {S}}}\) as the “effective dimension” of the system. For example the eigenvalues \((1/2 +b, \, 1/2-b, \, 0)\) occur in the density matrix of on-shell fermion annihilation with helicity conservation. One zero-eigenvalue describes an elliptical disk-shaped object. The entropy ranges from \({\mathcal {S}}=0\), (\(e^{{\mathcal {S}}}=1\) for \(b=1/2\), a one-dimensional stick shape) to \({\mathcal {S}}=\mathrm{log}(2)\), (\(e^{{\mathcal {S}}}=2\) for a disk-shaped object with maximum symmetry.) As expected, an unpolarized three-dimensional system has three equal eigenvalues, is shaped like a sphere, and \(e^{{\mathcal {S}}} \rightarrow 3.\)

    Figure 1 shows the entropy of the lepton density matrix \(\rho (\ell )\) (Eq. (7)) in the plane of parameters \((a, \, b)\). The matrix eigenvalues areFootnote 15 \((1/3 - 2 a/3, \, 1/3 + a/3 - b, \, 1/3 + a/3 + b)\). The triangular boundaries are the positivity bounds on these parameters, outside of which the entropy has an imaginary part. The corners of the triangle are pure states. The left corner represents a purely longitudinal polarization, \(\rho _{L}=|L\rangle \langle L|\) where \(|L >=(0, \, 0, \, 1)\) in a coordinate system where \({\hat{\ell }}={\hat{z}}\). The two right corners are purely circular polarizations, \(\rho _{\pm }=|\epsilon _{\pm }\rangle \langle \epsilon _{\pm }|\), where in the same coordinates \( \epsilon _{\pm } =(1, \, \pm \imath , \, 0)/\sqrt{2}.\) The interior lines \(a=\pm b, \, b=0\) represent maximal symmetry matrices having two equal eigenvalues. The figure also indicates the constraints of a positive distribution for the example of Eq. (8) assuming lepton universality. The values of a and b are actually unrestricted in all directions, so long as they lie within the bounding curves.

    The Standard Model leptons from lowest-order s-channel Z production have \(a=1/2, \, b=\sin ^{2}\theta _{W}\), which is shown in Fig. 1 as a dot. The edge \(a=1/2\) corresponds to on-shell helicity conservation, with eigenvalues \(0, \, 1/2 - \sin ^{2}\theta _{W}, \, 1/2 + \sin ^{2}\theta _{W}\). The \(a, \, b\) parameters of leptons from a different production process, or subject to radiative corrections, must still lie inside the triangle. Maximal symmetry with eigenvalues (1/2, 1/2, 0) occurs where the line of \(b=\sin ^{2}\theta _{W}\) just touches the \(b=-a\) line, which happens at \(\sin ^{2}\theta _{W}=1/4\). That is not far from the Standard Model value, which is very interesting. Since no established theory predicts \(\sin ^{2}\theta _{W}\) one cannot rule out a deeper connection.

    It is tempting but incorrect to assume the bounds discussed would apply to the same terms of a more general density matrix. For example, add \(-c {\hat{n}}_{i} {\hat{n}}_{j}\) to the expression in Eq. (7), where \({\hat{n}} \cdot {\hat{\ell }}=0\) and update the normalization condition. The resulting positivity region of \(a, \, b, \, c\) is shown in Fig. 2, which also shows the plane \(c=0\) equivalent to Fig. 1. At the extrema \(c=\pm 1\) the region of consistent \((a, \, b)\) parameters shrinks to single points.

    The matrix for \(\rho (X)\) computed earlier is an example where all terms in any standard convention happen to occur. By inspection this system (mostly quark–antiquark annihilation) is superficially much like the lepton one. The entropy of is 0.68 and \(e^{{\mathcal {S}}}=1.96\), and one eigenvalue is close to zero. Of course there is much more information in the other parameters, the orientation of eigenvectors, and in \({\vec {S}}\) and its magnitude.

Fig. 1
figure 1

Contours of constant entropy \({\mathcal {S}}\) of the lepton density matrix \(\rho (\ell )\) (Eq. (7)) in the plane of parameters \((a, \, b)\). Contours are separated by 1/10 unit with \({\mathcal {S}}=0\) at the central intersection. The horizontal dashed line shows the lowest-order Standard Model prediction \( b=\sin ^{2}\theta _{W}\). Annihilation with on-shell helicity conservation is indicated by the vertical dashed line \(a=1/2\). The left corner of the triangle is a pure state with longitudinal polarization, while the two right corners are pure states of circular polarization. The interior lines represent matrices with maximal symmetry, where two eigenvalues are equal. They cross at the unpolarized limit. The curved gray region represents the much less restrictive constraints of a positive distribution using Eq. (8) and lepton universality

Fig. 2
figure 2

Boundary of the positivity region of a density matrix depending on three parameters \(a, \, b, \, c\) described in the text. The two-dimensional region cut by the plane \(c=0\) corresponds to Fig. 1

4 Discussion

The quantum tomography procedure offers at least seven significant advantages over standard methods of analyzing the angular correlations of inclusive reactions:

  • Simplicity and Efficiency Tomography exploits a structured order of analysis. By construction, unobservable elements never appear.

  • Covariance Physical quantities are expressed covariantly every step of the way. That is not always the case with quantities like angular distributions.

  • Complete polarization information The unknown density matrix \(\rho (X)\) contains all possible information, ready for classification under symmetry groups.

  • Model-independence No theoretical planning, nor processing, nor assumptions are made about the unknown state. The process of defining general structure functions has been completely bypassed. It is not even necessary to assume anything about the spin of s- or t-channel intermediates. The observable target structures is always a mirror of the probe structure. The “mirror trick” is universal as described in Sect. 2.4.

  • Manifest positivity A pattern of misconceptions in the literature misidentifies positivity as being equivalent to positive cross sections. It is not difficult to fit data to an angular distribution and violate positivity. In fact, an angular distribution expressed in terms of expansion coefficients actually lacks the quantum mechanical information to enforce positivity.

  • Convex optimization The positive character of the density matrix leads to convex optimization procedures to fit experimental data. This provides a powerful analysis tool that ensures convergence.

  • Frame independence Once the unknown density matrix has been reconstructed, rotationally invariant quantities can be made by straightforward methods. This is illustrated in Sect. 3.1.3, which includes a discussion of the entanglement entropy.

Quantum tomography has already yielded significant results. Our tomographic analysis [43] of a recent ATLAS study of Drell–Yan lepton pairs with invariant mass near the Z pole [11] discovered surprising features in the density matrix eigenvalues and entanglement entropy. By way of advertising, we have also gained insight into the mysterious Lam–Tung relation [51, 52], including why it holds at NLO but fails at NNLO. These topics will be presented in separate papers.