Uncertainty Relations for Angular Momentum

In this work we study various notions of uncertainty for angular momentum in the spin-s representation of SU(2). We characterize the"uncertainty regions'' given by all vectors, whose components are specified by the variances of the three angular momentum components. A basic feature of this set is a lower bound for the sum of the three variances. We give a method for obtaining optimal lower bounds for uncertainty regions for general operator triples, and evaluate these for small s. Further lower bounds are derived by generalizing the technique by which Robertson obtained his state-dependent lower bound. These are optimal for large s, since they are saturated by states taken from the Holstein-Primakoff approximation. We show that, for all s, all variances are consistent with the so-called vector model, i.e., they can also be realized by a classical probability measure on a sphere of radius sqrt(s(s+1)). Entropic uncertainty relations can be discussed similarly, but are minimized by different states than those minimizing the variances for small s. For large s the Maassen-Uffink bound becomes sharp and we explicitly describe the extremalizing states. Measurement uncertainty, as recently discussed by Busch, Lahti and Werner for position and momentum, is introduced and a generalized observable (POVM) which minimizes the worst case measurement uncertainty of all angular momentum components is explicitly determined, along with the minimal uncertainty. The output vectors for the optimal measurement all have the same length r(s), where r(s)/s goes to 1 as s tends to infinity.


Introduction
The textbook literature on quantum mechanics seems to agree that the uncertainty relations for angular momentum, and indeed for any pair of quantum observables A B , should be given by Robertson'

s [25] inequality
valid for any density operator ρ, with A 2 D r ( )denoting the variance of the outcomes of a measurement of A on the state ρ. Perhaps the main reason for the ubiquity of this relation in textbooks is that it is such a convenient intermediate step to the proof of uncertainty relations for position and momentum. In that case the right-hand side is 4, 2  independently of the state ρ. For any pair A B , other than a canonical pair, however, the relation (1) makes a much weaker statement, requiring some prior information about the state. This begs the question: When and with what bounds it is true that Robertson's relation supports no such conclusion, but on the other hand such a statement does hold in many situations. In fact, in a finite dimensional context it is true whenever A and B do not have a common eigenvector. In this paper we will provide optimal bounds for angular momentum components, establishing the methods for deriving optimal bounds in the general case along the way.
The second reason that (1) is unsatisfactory is that it addresses only the preparation side of uncertainty, in the sense loosely described in the italicized sentence above. However, there is always also a measurement aspect to uncertainty, for which Heisenberg's γ-ray microscope [8] is a paradigm. The error disturbance tradeoff would be stated as Again, generic observables and angular momenta satisfy non-trivial relations of this kind. Errors A B 0 D = D = can occur only if A and B commute, i.e., under an even more stringent condition than for preparation uncertainty. In this paper we will provide some sharp measurement uncertainty relations for angular momentum, establishing along the way some methods which may be of interest in more general cases.
There is a third reason that one should not be satisfied with (1) with A L B L , : it involves only two of the three components of angular momentum. But there is no reason tradeoff-relations as described above should not be stated for more than two observables. For angular momentum this seems especially natural. Moreover, it seems natural to state relations for all components simultaneously, i.e., not only for the three components along the axes of an arbitrarily chosen Cartesian reference frame, but for the angular momenta along arbitrary rotation axes, restoring the rotational symmetry of the problem.
Indeed the idea that uncertainty should involve just pairs of observables can be traced to Bohr's habit of expressing complementarity as a relation between 'opposite' aspects, like 'in vitro' and 'in vivo' biology. This dualistic preference had more to do with his philosophy than with the actual structure of quantum mechanics. Other founding fathers of quantum mechanics did not share this preference. As Wigner said in an interview [34] in 1963: I always felt also that this duality is not solved and in this I may have been under Johnnys (John von Neumann) influence, who said, 'Well, there are many things which do not commute and you can easily find three operators which do not commute.' I also was very much under the influence of the spin where you have three variables which are entirely, so to speak, symmetric within themselves and clearly show that it isnt two that are complementary; and I still dont feel that this duality is a terribly significant and striking property of the concepts.
In this spirit, an uncertainty relation for triples of canonical operators was recently proposed and proved [16], and further generalizations are clearly possible. However, we will stick to angular momentum in this paper, and particularly seek to establish relations which do not break rotation invariance.
Our paper responds to an increasing interest in quantitative uncertainty relations. This interest is connected to an increasing number of experiments reaching the uncertainty dominated regime, so that that rather than qualitative or order-of-magnitude statements one is more interested in the precise location of the quantum limits. Measurement uncertainty was made rigorous in [2,5,26]. There is also a controversial [3,4] statedependent version [22]. That adequate uncertainty relations are sometimes better stated in terms of the sum of variances rather than their product has been noted repeatedly [10,14,20]. There has been some renewed interest also in the uncertainty between angular momentum and angular position [7], angular momentum of certain states [24] and other non-standard complementary pairs [18].

Setting and notation
In physics angular momentum appears as orbital or as spin angular momentum. Our theory applies to both, but it must be noted that the bounds obtained do depend on the quantum number for L . 2 For example, there are states with vanishing orbital angular momentum uncertainties (precisely the rotation invariant ones, i.e., s = 0) but none for a s 1 2 = degree of freedom. Therefore, one first has to decompose the given space into irreducible angular momentum components (integer or half integer), and then use the results for the appropriate s. Hence we will consider throughout a system of spin s, with s 1 2, 1, 3 2, = ¼in its s 2 1 + ( )-dimensional Hilbert space .
Rotation matrices, whether they are considered as elements of SO (3) or of SU(2), will typically be denoted by R, the corresponding matrix in the spin s representation by U R , and normalized Haar measure on SO (3) or SU(2) by dR. We will always set 1.  = Observables are in general always allowed to be normalized positive operator valued measures, with a typical letter F. For a self-adjoint operator A the spectral measure is an observable in this sense, denoted by E A . For a component of angular momentum, i.e., A e L = · we write E e for short. For the unit vectors e k along the axes we further abbreviate this to E k .
For the variance of the probability distribution obtained by measuring F on a state (density operator) ρ we write This minimum is taken over a quadratic expression in ξ, and it is attained when ( )is the mean value of the distribution. The most familiar case is that of the spectral measure for an operator A, in which case we abbreviate the variance by A . 2 D r ( ) Then the second moment x F x A tr d 2 2 ò r = ( ) can also be expressed by A and we get We say that a unit vector  fñ Î | is a maximal weight vector, if for some direction e | This is the same as saying that, for some rotation R Î SU(2), U s R fñ = ñ | | up to a phase. For such a vector we call r f f = ñá | |a spin coherent state. These states are candidates for states of minimal uncertainty.

Summary of main results
We now describe the structure of our paper and the main results. Section 2: Preparation uncertainty. The basic object of study is the variance e L 2 D r ( · )of the angular momentum in direction e as a function of the unit vector e, especially properties which hold for an arbitrary state ρ. After clarifying some general features and explicitly solving the two cases s 1  (section 2.2), we look at the traditional setting of just two components L L , . 1 2 The set of uncertainty pairs L L , is studied, and the fact that not both variances can be small is found to be well expressed by a lower bound not on the product but on the sum of the variances. We compute numerically (and exactly up to s 3 2 = ) the best constants in and find that they asymptotically behave like c s s For three components the uncertainty region is also studied in some detail. A prominent feature is again given by a linear bound [10] L L L s , 6 which is very easy to prove (see section 2.1, (17)).
Turning to features of the whole function e e L, 2 D r  ( · ) we show in section 2.5 that for any ρ there is at least one direction e such that s e L 2, This bound is optimal, since it is saturated by spin coherent states. We generalize from the maximum (seen as the L ¥ -norm) to all L p -norms (proposition 1). For large s equation (6) suggests the scaling s 1 (section 2.7). Indeed, the triples L s L s L s , , . The lower bound on the limit set is obtained by a generalization of Robertson's method for proving (1) section 2.6, which for finite s is (39) where the components are ordered so that L L L .
The upper bound in theorem 4 is provided by a family of states suggested by the Holstein-Primakoff approximation.
Section 3.1: Vector model and moment problems. We revisit the so-called vector model of angular momentum, a classical model which is still found in some textbooks. We show that it can correctly portray the moments up to second order (i.e., means and variances) of the angular momentum observables, but fails on higher moments and, of course, on correlations.
Section 3.2: Entropic uncertainty relations. We discuss entropic uncertainty relations only very briefly. We point out that the criteria 'variance' and 'entropy' may disagree on which of two distributions is 'more sharply concentrated'. This effect is illustrated by the uncertainty diagrams for s = 1. We show also that the general Maassen-Uffink bound [19] while suboptimal for s = 1, becomes sharp for s ,  ¥ and determine a family of states saturating it.
Section 4: Measurement uncertainty. We consider two measures for the deviation of an approximate observable from an ideal reference, called metric error and calibration error. We then discuss uncertainty relations for the joint measurement of all angular momentum components. The output of such an observable is an angular momentum three-vector , h from which one can obtain a measurement of the e-component (for any unit vector e) simply by taking e h · as the output. Such a marginal observable can in turn be compared with the quantum observable e L.
· The uncertainty relation in this case gives a lower bound on the error in the worst case with respect to e. Our main result (theorem 12) is a determination of the optimal bound, and an observable saturating it. It turns out that the optimal observable is covariant with respect to rotations, and this implies that it simultaneously minimizes the maximal metric error and the maximal calibration error. All the output vectors have the same length r s , min ( ) which depends in a non-trivial way on s but is close to s for large s.

Preparation uncertainty
In this section we consider the preparation uncertainty, i.e., a property of a given state ρ.
on the unit sphere. For the purposes of this section, this function summarizes all the uncertainty properties of the state ρ, and all results in this section are statements about properties of this function, which are valid for all ρ.
To visualize the function v, we can use a three-dimensional radial plot, i.e., the surface containing all vectors v e e, ( ) as e runs over all unit vectors. A typical radial plot is shown in figure 1. Often we are also interested in the components with respect to some Cartesian reference frame. In this case the best visualization is an uncertainty diagram, which represents the possible pairs/triples etc of variances in the same state. In our case this will be the set of pairs v v e e , , ( ( ) ( ) ( )) The diagrams for s = 1 are shown in figure 4. In this diagram it can be seen that the uncertainty region is not convex in general. Since we are only interested in lower bounds, we therefore always take the monotone closure of the uncertainty region, i.e., we also include with every point the whole quadrant/octant of points in which one or more of the coordinates increase. This is described in more detail in section 2.3.
It turns out that after a rotation to suitable principal axes (which has already been carried out in figure 1), the function v depends only on three real parameters , , . 1 2 3 m m m To see this we introduce the 3 × 3-matrix r L = L( ) by Since the L k transform as a vector operator (i.e., with respect to the spin-1 representation of SU (2)) we see that by choice of an appropriate coordinate basis in 3  we can diagonalize Λ, i.e., we can choose jk j jk Now the squared rotation matrix is doubly stochastic, so by Birkhoffʼs theorem it is a convex combination of permutation matrices. We therefore find the variance triple in basis e in the convex hull of the six points, arising from the triple of k m by permutation. These six points lie in a plane orthogonal to the vector 1, 1, 1 , ( ) so they form a hexagon (see figure 2), which degenerates into a triangle if two of the k m are equal. One can easily check that the full hexagon is attained by squared rotation matrices.

Basic bounds
In the same simple way we can get an inequality for the variances along the three coordinate directions of a Cartesian coordinate system: In both cases equality holds precisely for i.e., if ρ is an eigenstate of one of the operators e L · for the maximum eigenvalue m = s.

Special features for s 1 2
= and s = 1 For s 1 2, = it happens that L j and L k (i.e., up to a factor the Pauli matrices) anticommute for different j k , , so that and so the uncertainty region is described by a triangle. The case s = 1 is still special because the 3 + 6 operators L k and L L L L 2 ) form a basis of the operators on . 3  Therefore, ρ can be reconstructed from , l L ( )and, in particular, the set of pure ρ can be characterized in terms of conditions on the eigenvalues k m and .
k l In order to analyze these conditions, let us take the representation of the group SU (2) In the three-component diagram the curve parameterized by τ is a parabola lying in a diagonal plane. This parabola, and the two copies arising by coordinate permutation are shown in figure 3, as well as the body of uncertainty triples of all pure states, which arises by adding to each point on the parabola the hexagon formed by its permutation orbit. A paper cutout model of this solid is provided as a supplement 2 .

General minimization method
Consider now, a little more generally, any collection of Hermitian operators A A , , . ( ( ) ( )) and ask which region Ω in n  is filled, when ρ runs over the whole state space. We call Ω the uncertainty region of the operator tuple. Typically, this is not a convex set, because A 2 D r ( ) contains a term quadratic in ρ, which consequently does not respect convex mixtures. Ω will be simply connected (as a continuous image of the state space), but beyond that there are few general facts. It can happen that starting from a point in the uncertainty region we can leave the region by increasing one of the coordinates, i.e., the region encodes upper bounds on variances as well as lower bounds. This is clearly not relevant to the theme of uncertainty relations, where we ask for universal lower bounds only. We can therefore consider the monotone closure of the uncertainty region, by including all points with larger uncertainties, i.e.
x x x A , , . 20 This is still not necessarily a convex set. We will denote the convex hull of W + by W  and call it the lower convex hull of Ω (see figure 4). It is this set which has an efficient characterization. Indeed, as a closed convex set it is the intersection of all half spaces containing it, and the monotonicity condition restricts these half spaces to those whose normal vector w has all components non-negative. In other words ) An efficient algorithm is therefore obtained by alternating between these two steps. The upper estimates on m obtained in this way are non-increasing and in practice converge quite well, and independently of the starting value. However, we do not have a theorem to this effect. An analytic consequence of this algorithm (independent of convergence) is that we can restrict the infimum to pure states, since this is sufficient to get the ground state energies.
The algorithm is then run for a suitable set of tuples w w , , , n 1 ¼ ( ) so that for each run, one obtains a tangent plane to W  but also the state ρ and with it, the tuple of variances in Ω. We illustrate the results in figure 4 for the case of spin 1, and the operator tuples L L , 1 2 ( ) and L L L , , , ( ) respectively. For low spin these diagrams can be determined analytically (see the next subsection). The most prominent feature of two-component diagram is the symmetric linear bound, which depends on s and is determined in section 2.4. . The monotone closure of the uncertainty region of spin 1. Since this turns out to be convex, it is equal also to the lower convex hull (see text). Left panel: for two orthogonal spin components. The light gray area belongs to the monotone closure, but these points cannot be realized as uncertainty pairs. The parabolas outline the shape (compare also figure 3) the orange lines correspond to coherent states. Right panel: the analogue for three spin components. Projecting this body onto one coordinate plane gives the shape shown in the left panel.

The linear two-component bound
where the first infimum runs over all pure states (for fixed a a ground state problem) and a over the reals (for fixed f the expectation value of L 3 ). One notes that in this operator only matrix elements with even m m -¢ are non-zero, so the problem can be further reduced. For up to s 3 2 = it effectively leads to two-dimensional ground state problems. In this way (resp. by using the results of section 2.2) we get Note that the bound c 1 2 ( ) was already obtained in [10]. It is readily seen numerically that c s 2 ( ) increases with s, but sub-linearly. This means that if we scale the diagram of W  (see figure 4, right) by a factor s 1 so that the bottom triangle described by (17) stays fixed, the two-component inequality excludes an asymptotically small prism around the axes. Figure 5 shows the asymptotic behavior of c 2 in a log-log plot, which suggests that c s s s 0.569 524 for large . 28

Power mean and maximal uncertainty
A natural way to characterize states with small variance is to look for the maximum of the variance function v e ( ) defined in (10). An uncertainty relation would then put a lower bound c(s) on this maximum. In other words we would like to prove the following statement: for every state ρ there is some direction e such that E e 2 D r ( )is larger than c. By considering coherent states we can immediately see that c s s 2.  ( ) The following proposition shows that coherent states in fact have minimal variance in terms of this criterion, and we even have equality.
Such a result can be seen as one end of a one-parameter family of criteria, of which (16) is the other end: we can judge the 'size' of the function v by its p  norm, of which the maximum is the special case p , = ¥ and the mean the case p = 1. We therefore formulate a proposition to cover all these cases.
] there is a constant c p s , ( ) such that, for every density operator ρ in the spin s representation with equality whenever ρ is a spin coherent state. For p < ¥ these are the only states with equality. For p = ¥ equality holds also for mixtures p s s p s s , r = + ñá+ + -ñá- be the vector of expectation values, and consider the set of density operators r b arising from ρ by rotation R β around the vector λ by the angle β. For each , where we used, crucially, that all r b and r have the same expectations j l . By the triangle inequality for the p- || || Hence we can restrict the search for the ρ with minimal v p || || to those which are rotation invariant around some axis, say the three-axis.
Such a state can be jointly diagonalized with L 3 , and is hence of the form p m m .
The last equation shows that the function v becomes pointwise smaller (and hence smaller in p-norm) if we decrease some .
ii L That is, we have to go to the minimum on both 11 L and . 33 L The minimum in (32) is attained precisely when p 0 m ¹ only for m s. =  Then minimality in (32) forces ρ to be a spin coherent state. For p = ¥ the norm only sees the maximum, so the pointwise minimum need not be chosen, and we may allow 0 33 11   L L without changing the maximum. The latter inequality translates to the one given in proposition 1.
The concrete constants follow easily by integrating the pth power of (34) with s 2 11 L = and 0 33 L = with respect to the normalized surface measure on the sphere, i.e.

Robertson's technique: a generalization
We have criticized the Robertson inequality (1) for not giving a state independent bound. However, with only little effort it can be used to derive such a bound. Indeed, abbreviating v L , Clearly, this no longer allows v v 0,  The set of variance triples satisfying this is shown in figure 6. Comparison with figure 4 readily shows that this bound is not optimal. However, we can generalize Robertson's technique from two to three components rather than extend his two component result in this trivial way. The basis of the technique is to utilize the observation that for any finite collection of operators X j (not necessarily Hermitian or normal) the matrix m X X tr is positive definite, which is the same as saying that for any complex linear combination X a X j j j å = the expectation of X X * must be positive. In order to get Robertson's inequality for L L , 1 2 this idea is applied to the three operators L L , , .
In fact, this leads to Schrödinger's improvement of the inequality [27] which also contains the square of the covariance matrix element 12 2 r L ( ) on the right-hand side.
We will apply the method to the four operators L L L , , , .
 In order to simplify the expressions, however, we will not look for variances and the off-diagonal elements of , r L( ) but for inequalities involving the eigenvalues . j m As discussed at the beginning of this section, this will contain all the information needed. In other words, we will take the matrix r L( ) as the diagonal matrix with entries .
The condition on the triples , , m m m ( )we have to evaluate is the existence of i l satisfying both these relations.
Since only the squares enter, let us set x . and cyclic. Note that Robertson's inequality is automatically satisfied on this tetrahedron. Obviously the tetrahedron and the triangle intersect if an only if one of the axis intercepts of the tetrahedron reaches or lies above the triangle. Since we can take the eigenvalues ordered: 1 This is a bound to the eigenvalue of the Λ-matrix. By Birkhoffs theorem, the variances arising from such Λ also includes all convex combinations of permutations of the i m (see beginning of section 2). In order to characterize the set of variance triples generated in this way we need the following Lemma. In its formulation the variables S 3 s Î run over the permutation group on three elements, and are applied to the components of a three-vector (see also figure 2). Lemma 2. With the notation from above, the following sets K 1 and K 2 are equal Proof. For the equality of K 1 and K 2 it is sufficient to show that H 1 and H 2 coincide for every γ. The restriction v , with the three-fold symmetry of the problem, tells us that H 1 g ( ) and H 2 g ( ) are subsets of the triangle, whose corners lie on the axes at a distance γ from the origin. In this triangle the ordering of the v i and i m reduces h 1 and h 2 to the dashed subset marked in figure 6.
Now the first and last condition in the definition of h 2 can be combined to obtain v v s s 4 1 , Because v 0 1  we have to choose the positive sign, which means that H 2 g ( ) is the intersection of the triangle with the three halfspaces v c , i  g ( ) whose boundaries are marked as a orange lines in figure 6, i.e.
Therefore we get the following statement:    be variances of the angular momentum components, then the following holds: v As one can see in figure 6, the boundaries of the corresponding uncertainty region on the coordinate planes are given by permutations of the hyperbolic curve v v s s v v 4 1 .
This uncertainty region is monotonously closed and given by the convex hull of the above hyperbolic curves. This is shown in figure 7.

Asymptotic case
Now we take a look at the behavior of the asymptotic uncertainty region for s .  ¥ We already know that and as s goes to infinity the set of possible variances shrink to Hence the inequality (47) gets stronger for increasing s. In this section we will show that this bound is attained by states, which will be constructed in the following way. Using the technique described in part 2.3, we look for the states ψ, which minimize the expectation of the operator for a normal vector w. We do this using the Holstein-Primakoff transformation [11]: Here a and a * are the creation and annihilation operators, so we have a representation of the angular momentum algebra in the oscillator basis. For large s and appropriate states, this transformation can be reduced to Notice that in the Holstein-Primakoff basis, the spin coherent state sñ | is transformed to the ground state 0 , in the standard angular momentum L 3 eigenbasis. Now we rewrite H using the above transformation and the relation for position L L L a a sX i Here , x h and ζ denote the transformed expectation values. From 2.1 we know that sñ | has minimal uncertainty for w 1, 1, 1 ( )and arbitrary s. Based on this observation we make the assumption that we are close to the L 3 spin coherent state. We thus have s a a * á ñ  and s, 3 l » hence ζ is linear in s. Furthermore we can order the weights, such that w w w 1 2 3   to minimize the expectation value. Now we take the limit and let s becomes large, the operator converges to harmonic oscillator Here we use that the expectation value of the harmonic oscillator is translation-invariant in phase space, so that we can choose ξ and η to be zero. The state which minimizes the expectation of this operator is simply the harmonic oscillator ground state m, , and w w 4 . 1 2 w = In the following these will be combined in the parameter m .
For the comparison of this result with numerical calculations using the above described algorithm, we must express these ground states in a common basis n , HP ñ | i.e. decomposing y a ( ) in the basis of a harmonic oscillator with 1. a = This transformation is given by which is zero for odd n and can be solved for even n through The corresponding probability distribution is given by Because this is zero for odd n, we can set n k 2 = and get The above approximation does not necessarily yield the optimal states and it is not rigorously justified so far. As a first step, we compare the distribution p n with numerically determined ones for finite s. These tend to converge as shown in figure 8.
Theorem 4. The lower bound of the asymptotic uncertainty region on a scale of s 1/ is fully described by the generalized Robertson inequality (48) and is saturated by the states . y a ( ) Proof. First we will show that the approximation (51) is justified for y a ( ) and evaluate the corresponding asymptotic variances. While the generalized Robertson inequality gets stronger for increasing s, every extremal point of the corresponding boundary is attained by , y a ( ) which will prove the above statement. Moreover, by truncating the sequence n y a ( ) at n s 2 1 = + and renormalizing, we get a sequence of spin-s states well approximating y a ( ) as s goes to infinity. With this in mind, we will prove the above statement in two steps: This may be a good place to comment on the so-called vector model of angular momentum, as it was suggested by old quantum theory. It still seems to be quite popular in teaching, although theoreticians tend to deride it as ridiculously classical and obviously inconsistent. Indeed, its two-particle version gives manifestly false predictions even for spin-1/2, as witnessed by Bell's (CHSH) inequality. Since any local classical model fails this test, not much can be learned about angular momentum from this observation. Therefore we consider here only the one-particle version, and try to sort out how far it can be trusted. The basic rationale of the vector model is shown in figure 9: angular momentum is thought of as a classical random variable taking values on a sphere of radius r s s 1 . s = + ( ) For an eigenstate mñ | the corresponding classical distribution is supposed to be concentrated at latitude m, and uniform with respect to rotations around the three-axis. The expectation value of this distribution is m 0, 0, . ( ) Moreover, its matrix of second moments is also diagonal, since the coordinate axes are clearly the inertial axes of a mass uniformly distributed on a circle of fixed latitude. One readily checks that all second moments are the same as for the corresponding quantum state. This can be generalized: sphere. This is a compact convex set, which we can think of as embedded into 3 6 1 +dimensional real space, because the real symmetric matrix M is specified by six parameters, and we have an additional linear constraint M r tr . 2 = By the separation theorems for compact convex sets the set K is therefore completely characterized by a collection of affine inequalities with the dot indicating scalar product, and .  g Î The functionals for which these inequalities have to be satisfied are precisely those for which the above inequality holds for all pure probability measures, i.e., for on the sphere. In this case we slightly abuse notation and write f m M f v , . = ( ) ( ) Not all inequalities are needed to characterize K, but only the extremal ones, which furnish a minimal subset from which all the others follow as linear combinations with positive scalar factors. In particular, we can assume that f is not strictly positive, so has a zero f u 0, = ( ) which then also has to be a minimum. The extremality condition gives Au b u 2 2 , l -= where  l Î is a Lagrange multiplier. This determines b, and from f u 0 = ( ) we get γ, so that we can rewrite ) so we can combine the two terms, and obtain again the form (67), with A modified by a multiple of the identity, and 0. l =

It remains to determine all real symmetric matrices
for all multiples of vectors of the form v u.
-But this set is dense in . = as Since the second term is anyhow positive, the positive semidefiniteness of V is sufficient for all these inequalities. This shows the sufficiency of the conditions stated in the lemma. , Let us make some remarks, which all fit into a fruitful analogy here with the phase space case, i.e., the case of two canonical operators P Q , ,and moment problems posed in the respective contexts.
(1) The phase space analogue of proposition 5 is the statement that for any quantum state the first and second moments can also be realized by a classical probability distribution on phase space. Of course, not all classically allowed first and second moments can arise in this way: This is just the theme of preparation uncertainty relations.
(2) The classical probability measure μ is not uniquely defined by ρ. For example, the density operator s 2 1 1 r = + - ( ) can either be represented by the uniform distribution on the sphere, or by an equal-weight mixture of the distributions with constant latitude m (in any direction). In the phase space case it is well- known that with the given, quantum-realizable moments one can always find a Gaussian state, which is defined as the distribution with the maximal entropy given those moments. The same idea also works for angular momentum, and it gives probability densities which are the exponential of a quadratic form in the variables. In contrast to the phase space case, when approaching eigenstates (any direction, any m) this entropy will go to , -¥ since for eigenstates only the singular measures depicted in figure 9 can be used.
(3) Proposition 5 is certainly false if we include higher than second moments. For example, consider a pure qubit state with m 1 2.

= +
Without loss of generality we can choose the measure μ invariant under rotations around the three-axis. Since μ must be concentrated on m 1 2, = this uniquely fixes the measure μ, and hence the moments to all orders. Now consider a direction e which is at an angle strictly between 0 and 2 p to e . 3 Then the quantum expectation of e L e L 4 3 = ( · ) · is e e 8, 3 ( · ) but the classical expectation of x e 3 ( · ) is larger, reflecting the nonlinearity of the cube function.
(4) The quantum analogue of the classical Hamburger moment would be to reconstruct a quantum state from the set of moments, i.e., the expectations of the monomials in the basic operators (P Q , ,or L L L , , 3 ). Commutation relations impose some constraints on these moments, so that in the end only monomials like L L L n n n 1 2 3 1 2 3 need to be considered. Of course, the expectation values of such operators will generally be complex numbers. Can we do the reconstruction for arbitrary states in the spin-s representation? Indeed, we can, and it is actually much easier than in the phase space case since only finitely many moments suffice. The basic observation is that the moments fix all expectations on the von Neumann algebra  generated by the L i . Because the representation is irreducible, the commutant of this algebra consists of the multiples of the identity. Hence  must be the full matrix algebra, and the state is uniquely determined. That finitely moments suffice is clear because dim .  < ¥ (5) Noncommutative moment problems are plagued by 'operator ordering' issues. But in some sense we have already adopted a standard 'symmetrized' solution for operator ordering, namely to form moments only of the operators e L · for all fixed e. This is analogous in the phase space case to considering the moments of linear combinations of P and Q. Now, famously, the full distributions of all such combinations are correctly rendered by the Wigner distribution function, which is itself hardly ever positive [15]. The analogy to the angular momentum case is immediate. So what do we get if we accept 'quasi-probability distributions'? Can every state be represented like that? This is answered by the following proposition. Proof. We can compute the Fourier transform of  r directly from (69), by multiplying with k n i n ( ) !and summing over n. This turns the left side into the Fourier integral over  r allowing the sum to be evaluated also on the right-hand side: Strictly speaking this computation should be regularized by multiplying with an arbitrary test function before summation, but this would lead to the same explicit representation of the Fourier transform  r  as a bounded  ¥ -function. This shows that the desired tempered distribution  r is essentially unique, and can be defined for every ρ. It is formally real, because k k.  The main use of Wigner functions on phase space is the visualization of quantum states. Unfortunately, the much more singular nature of  r for angular momentum will prevent these Wigner functions from becoming similarly popular. This irregularity can be tamed by replacing on the right-hand side of (70) e e e e . 73 The corresponding distribution is then a sum of point measures sitting on a finite cubical grid [6]. This may actually be useful in quantum information, where it relates to a discrete phase space structure over the cyclic group of d s 2 1 = + elements. However, for angular momentum proper we find this breaking of rotational symmetry abhorrent.

Entropic uncertainty
In this section we will have a look at the entropic uncertainty relations. Given a measurement of a Hermitian operator A a P, with eigenprojectors P i , the probability of obtaining the ith measurement outcome will be denoted by P tr i i p r r = ( ) ( ) and the associated probability distribution as , , . which serves as an uncertainty measure. Note that we normalize the Shannon entropy by its maximal value d log so that all occurring entropies are bounded by 1. In contrast to the variance, the entropy of a probability distribution does not change by permuting or rescaling the measurement outcomes and so only depends on the choice of the P i (up to permutations) and not on the eigenvalues a i . This implies that an entropic uncertainty relation, which constrains the output entropies of two (for simplicity non-degenerate) observables A B , ,only depends on the unitary operator U connecting the respective eigenbases. A well-known bound in this setting is the general Maassen-Uffink bound [19]    + For arbitrary angles these representations are called Wigner-D matrices [1] and will also be used in section 4.5. It turns out that the Maassen-Uffink bound is in general not optimal, but describes precisely the uncertainty region for s .  ¥ For spin s = 1, the uncertainty region can still be reliably investigated by parameterizing the set of pure states in the L 3 eigenbasis. Numerics suggests that real valued states and their permutations characterize the lower bound of the uncertainty region. The resulting uncertainty regions for two and three components are shown in figure 10, which should be compared directly to figures 4(left) and 3(right).
The marked lines in figure 10 correspond to states of a form t t t t cos 2 , sin , cos 2 , For larger s this inversion no longer holds. Figure 11 shows this effect. Because we can exchange the roles of L 1 and L 2 by a unitary rotation, the uncertainty diagrams are symmetric with respect to the diagonal. Therefore the optimal linear bound must be of the form (75), with a suitable c. The entropy sums for the eigenstates mñ | with minimal and maximal m | | are shown in figure 11. For all half-integer s and for integer s 7 > the coherent state sñ | produces not only the lowest variance, but also the lowest entropy. Figure 11 also shows the Maassen-Uffink bound, which has been computed by Sánchez-Ruiz [29]. It is attained for the overlap of two spin coherent states and is given by However, for large s, the bound is again optimal, as the following result shows. Proposition 8. In the limit s  ¥ the optimal lower bound on the entropic uncertainty region of L 1 and L 2 is given by the Maassen-Uffink inequality, which converges to which proves the convergence of the Maassen-Uffink bound to the right-hand side of (78).
In order to show that this bound describes the asymptotic uncertainty region, we have to exhibit sequences of states saturating for every point on the boundary curve. We first show that the endpoint 0, 1 2 ( ) is asymptotically attained by the L 1 -eigenstates s : ñ | the output entropy of sñ | in the L 1 basis is always zero, whereas the output entropy in the L 2 basis can be evaluated as H L s H L U s , , , 1 ,

Measurement uncertainty 4.1. Introduction
As mentioned in the introduction, a measurement uncertainty relation is a quantitative bound on the accuracy with which two observables can be measured approximately on the same device. Already in Kennard's 1927 paper [17] it is clearly stated that in quantum mechanics the notion of a 'true value' loses its meaning, so that we should not think of 'measurement error' as the deviation of the observed value from a true value. What we can always do, however, is to compare the performance of two measuring devices, one of which is an (perhaps hypothetical) 'ideal' measurement and the other an approximate one. The only requirement is that these two measurements give outputs which lie in the same space X and whose distance is somehow defined. A good approximate measurement is then one which will give, on every input state, almost the same output distribution as the ideal one. This operational focus on the output distributions is also in keeping with the way one would detect a disturbance of the system. Consider how we discover that trying to detect through which of two slits the particles pass disturbs them: the interference pattern, i.e., the output distribution of the interferometer is changed and fringes washed out. Two related ways to build up a quantitative comparison of distributions, and thereby a quantitative approximation measure between observables, were introduced in the papers [2,5] and applied to the standard situation of a position and a momentum operator. These two notions, called calibration error and metric error will be described in the following subsections. Either way we get a natural figure of merit for an observable F jointly measuring two or more components of angular momentum. In fact, we will only treat the case where F jointly measures all components. By this we simply mean an observable whose output is not a single number but a vector .
h From this, one derives a 'marginal measurement' F e of the e-component by post-processing, i.e., by taking the e-component e h · of the output vector as the output of F . e These marginals can then be compared with the standard projection valued measurement of the angular momentum component e L.
· When D G E , ( ) is the quantity chosen to characterize the error of an observable G with respect to the ideal reference E we get, in our special case , 85 is the desired figure of merit. This is the quantity which we will minimize. But first we have to be more explicit about the two choices for the error quantity D G E , . ( ) This will be done in the next two subsections.

Calibration error
The simpler one assumes that the 'ideal' observable is projection valued, so that we can produce states which have a very narrow distribution around one of its eigenvalues (or points in the continuous spectrum). In other words, we have some states available which come close to having a 'true value' in the sense that the ideal distribution is sharp around a known value. A good approximate measurement should then have an output distribution, which is also well peaked around this value. Thus we only have to compare probability distributions to δ-function like distributions, i.e., point measures x d with x X. Î This is straightforward, and we set, for any where D under the integral is the given metric on X. This could be called the power-α deviation of μ from the point x. We are mostly interested in quadratic deviations, i.e., 2. a = However, in this section we keep α general, which causes no extra difficulty, but makes clear which numbers '2' arise directly from the role of the averaging power α in (86) and similar equations.
We apply this now to F ρ the output distribution F ρ obtained by measuring the observable F on the input state ρ, and its ideal counterpart E ρ . The e-deviation or e-calibration error of the observable F with respect to the ideal observable E is where the supremum is over all x X Î and 'calibration states' ρ, which are sharply concentrated on x up to quality .
e Note that as a function of e this expression is decreasing as 0, e  because the supremum is taken over smaller and smaller sets. Therefore the limit exists, and we define the calibration error of F with respect to E by For observables E with discrete spectrum (like angular momentum components) we can also take 0, e = in (87), and directly get

Metric error
A possible issue with the calibration error is that it describes the performance of F only on a very special subclass of states. On the one hand this makes it easier to determine it experimentally, but on the other hand we get no guarantee about the performance of the device on general inputs. Classically this problem does not arise, because broad distributions can be represented as mixtures of sharply peaked ones, and this allows us to give an estimate also on the similarity of output distributions for general inputs. The form of this estimate gives a good hint towards how to define the distance of probability distributions both of which are diffuse. Indeed suppose ρ is an input state such that If one integrates out the x variable one gets the output distribution for ρ, because F ρ is linear in ρ, and if one integrates out y one gets μ, because each F x r is normalized. To within e this is the output distribution E ρ , and with known calibration error we get the bound . 89 This suggests the following definitions. For two probability distributions μ and ν on X we define a coupling to be a measure γ on X × X whose first marginal is μ and whose second marginal is ν. The set of couplings will be denoted by , , m n G( ) and is always non-empty because it contains the product measure. We then define the Wasserstein α-distance of μ and ν as D xy D x y , i n f d d , This is also called a transport distance, because of the following interpretation, first seen by Gaspar Monge in the 18th century who considered the building of fortifications. We consider μ and ν as some distribution of earth, and the task of a builder who wants to transform distribution μ into distribution ν. The workers are paid by the bucket and the power α of the distance travelled with each bucket (giving a bonus pay on long distances). The builder's plan is precisely the coupling γ saying how many units are to be taken from x to y, and the integral is the total cost. The infimum is just the price of the optimal transport plan. The theory of such metrics is well developed, and we recommend the book of Villani [31] on the subject, but in the present context we only need some simple observations. With a metric between probability distributions we define the distance of two observables as the worst case distance of their output distributions: For the connection between this metric error and the calibration error introduced above, note first that when ν is the point measure , x d and μ is arbitrary the product is the only coupling, and the two definitions D , x m d a ( )from equations (86) and (90) coincide. Therefore, if D E , , There is a second 'quasi-classical' setting, in which calibration and metric error coincide, and this will actually be used below. This is the case when F and E differ only by classical noise generated in the measuring apparatus. More formally this is described by a transition probability kernel P x y , d , ( ) which is for every x the probability measure in y describing the output of F, given that E has been given the value x. We can think of this as classical probabilistic post-processing or noise. It is, of course, not necessary that F actually operates in two steps, but only that it could be simulated in this way, i.e., the relation F y ) and to give a formula for both in terms of the size of the noise kernel P. In the following lemma the E-essential supremum of a measurable function f with respect to a measure E (denoted E f x ess sup x X -Î ( )) is the supremum of all λ such that the upper level set x f x l > { | ( ) }has non-zero E-measure. In our application E is the spectral measure of a component e L, · so it is concentrated on the finite set s s , , .
-¼ { } The essential supremum is then simply the maximum of f over this set.
Lemma 9. Let E be a projection valued observable on a separable metric space X D , . ( ) Let F be an observable arising from E by post-processing with a transition probability kernel P. Then, for all α, , e s ss u p , , .
Proof. Let I, II, III be the three terms in this equation. Then I  II is given by (94). To show II III, note that for any state ρ we get a coupling γ between F ρ and E ρ by We introduce the function f x P x y D x y , d , and split the integral with respect to E ρ into an integral over X x f x t = > > { | ( ) }and an integral over its complement X , where and t E f x ess sup .
x > -( ) Then, by definition of the essential supremum, E ρ vanishes on X > and on X the integrand is bounded by t .
Since E is projection valued, we find a state ρ such that the probability measure E x d r ( ) is concentrated on this set. In particular, D E , ) The supremum over all calibrating states can only increase the left-hand side, and on the right we use that the only condition on t was that t E f x ess sup , In [5] a special case of this Lemma was used to show D c D = for the position and momentum marginals of a covariant phase space measurement. In that case the noise kernel P is even translation invariant, i.e., the output of the marginal observable can be simulated by just adding some state-independent noise to the output of the ideal position or momentum observable. Such translation invariance makes no sense in the case of angular momentum, since the range of the outputs m s s , , Î -¼ { }of the ideal observable is bounded. This is why the above generalization was needed, in which the noise can depend on the ideal output value. The reason for the existence of a post-processing kernel, however, will be the same as in the phase space case: the covariance of the joint measurement. Roughly speaking this makes the marginal corresponding to e L · invariant under rotations around the e-axis, which in an irreducible representation means that it must be a function of e L.
· It is therefore crucial to argue that the optimal joint measurement is covariant, which will be done in the next section.

Covariant observables
Consider a general observable F with outcome space X. Suppose some group G acts on X, with the action written g x gx ,  ( ) as usual. Suppose that the group also acts as a symmetry group of the quantum system. That is, there is a representation g U g  of G by operators U g , which are unitary or antiunitary, and satisfy the group law (possibly up to a phase factor). The observable F is then called covariant if U F S U F g S g g 1 * = -( ) ( ) for all g G Î and every measurable set S. In other words, shifting the input state by U g will result in the entire output distribution shifted by g. For our purposes it will be convenient to express this in terms of an action F T F g  of G on the the set of observables: Then the covariant observables are precisely those for which T F F g = for all g G. Î For angular momentum the group will be the rotation group with its action on the three-vectors (X The representation U is then up to a factor ±1. Alternatively, we can take G as the covering group SU 2 . ( ) Since the covariant observables are exactly the same this choice is completely equivalent.
Covariance is certainly a reasonable condition to impose on a 'good' observable, so it would make sense to study uncertainty relations just for these. However, there is no need for such an ad hoc restriction, because the minimum of uncertainty over all observables is anyway attained on a covariant one. The basic reason for this is that our figure of merit (85) does not single out a direction in space, so that it is invariant under the action T g . We therefore only have to show that there is no symmetry breaking, i.e., the symmetric variational problem has a symmetric solution. This will be done in the following lemma. (2) Each of the infima while the first follows trivially because the covariant observables are a subset. Hence the two infima are the same, and the same argument also applies to , max D proving the second claim. , We could have included also a statement that the infima in this Lemma are all attained. The argument for that is the compactness of the set of observables (in a suitable topology) and the lower semi-continuity of D max and max D which follows, like convexity, from the representation of these functionals as the pointwise supremum of continuous functionals. However, since we will later anyhow explicitly exhibit minimizers, we will skip the abstract arguments. We also remark that one of the main difficulties in the position/momentum case [5] does not arise here: in contrast to the group of phase space translations, the rotation group is compact, so the average is an integral and not an 'invariant mean', which has the potential of producing singular measures with some support on infinitely far away points.
The main importance of this Lemma is to make the variational problem much more tractable. For covariant observables we have a fairly explicit parameterization, which allows us to explicitly compute the minimizers. In contrast, for the seemingly easier case of joint measurement of just two components covariance gives only a very weak constraint, and we were not able to complete the minimization.
To develop the form of covariant observables, let us first consider the case when the output vectors have a fixed length r. A plausible value would be r = s, but we will leave this open. In this case X reduces to a sphere of radius r, which is a homogeneous space for the rotation group. We could thus apply the covariant version [28,32] for all observables F, whether covariant or not. We will now compute s , min D ( ) and show that both minima are attained for a unique covariant observable.

Minimal uncertainty
While the above holds for arbitrary exponent α, we will now restrict to the standard variance case, i.e.
2. a = So far, we have derived that the optimal observable F is covariant, leading to the parametrization (110). In particular, F e arises from e L · by a transition probability kernel, so that metric and calibration error coincide. In the sequel, we will therefore only consider the calibration error, which is easier to evaluate. By covariance the calibration error F e L , Before calculating the optimal case, we introduce the following lemma, which provides a more manageable expression I s r n m , , , ( )for the integral over θ, such that (112) reads In the following we use a recurrence relation for the Jacobi polynomials. This three-term relation does not hold for s 1 2, = so that we have to treat this case separately: the integrals I r n 1 2 , , , We will use this lemma to simplify the minimization over m. Moreover, the integral over r and the sum over n can be seen as taking a convex combination over two-dimensional vectors A r n B r n , , , .
s s ( ( ) ( )) Hence optimizing F can be analyzed geometrically in terms of the set of such pairs (see figures 13). This is solved in the next theorem, whose results are visualized in figure 12.   Except for s = 1, the maximum over m in (113) is trivial for the optimal observable, i.e., the calibration error is the same for all calibration inputs m.
Proof. We consider first the case s 1.  For arbitrary s 1  we reformulate the problem using that F L , 3 3 2

D(
) is a convex combination of the functions A r n , s ( )and B r n , . s ( ) Here we must find the best n as well as the probability distribution F n (dr) for the worst m. We denote the convex set of all possible combinations by The problem is now to minimize the functional K a b , . ( ) Since, for general n and s, Ω is hard to describe, we choose the following strategy, which is illustrated in the left panel of figure 13.
We will show that, for s 1, > K takes its minimum at  For s = 1, this last step of the above proof fails. Indeed, the level set K v which was determined by taking that point v on the horizontal axis which is also on the boundary of Ω does intersect Ω, as can be seen in figure 13. We therefore have to take a level set of K for a slightly smaller value. Since the tangents of the level sets are all the same for b 0, > we can readily find the level set which is tangent to Ω. This gives the optimal radius , where the optimal probability measure F must therefore be concentrated. In all cases, the optimal value s min D ( ) is computed by substituting the obtained optimal r s min ( ) and n = s in (113). ,

Conclusions and outlook
Uncertainty relations can be built for any collection of observables. In this paper we provided some methods, which work in a general setting, but chiefly looked at angular momentum as one of the paradigmatic cases of non-commutativity in quantum mechanics. The basic mathematical methods are well-developed for the case of preparation uncertainty, so that even in a general case the optimal tradeoff curves can be generated efficiently. We resorted to numerics quite often, since it turns out that the salient optimization problems can rarely be solved analytically for general s. One of the features one might hope to settle analytically in the future is the asymptotic estimate c s s 2 2 3 µ ( ) which comes out with a precision that suggests an exact result.
Much is left to be done for entropic uncertainty. Here we gave only some basic comparisons to the variance case. It would be interesting to see whether the entropic relations can be refined to the point that they can be used to derive sharp variance inequalities as Hirschman did in the phase space case [9].
For measurement uncertainty the general situation is not so favourable, perhaps due to the much more recent introduction of the subject. At this point we know of no efficient way to derive sharp bounds for generic pairs of observables. Nevertheless, we were able to treat the case of a joint measurement of all components in arbitrary directions, because in this case rotational symmetry is not broken and leads to considerable simplification. One of these simplifications is the observation that the two basic error criteria, namely metric uncertainty and calibration error lead to the same results. This was already familiar from the phase space case. However, a further simplification one might have expected from this analogy definitely does not hold: there seems to be no quantitative link between preparation and measurement uncertainty for angular momentum. Further research will show whether useful general connections between the two faces of the uncertainty coin can be established.
The limit large s  ¥ can be understood as a mean field limit [23], when the spin-s representation is considered as s 2 copies of a spin-1 2 system in a symmetric state. We can also see this as a classical limit 0   [33], in the sense that the angular momentum in physical units, i.e., ÿ is fixed, and hence the dimensionless halfintegral representation parameter s has to diverge. This offers a way to treat not just the uncertainty aspects of this limit, but also the limit of the whole theory of angular momentum.