Uncertainty, joint uncertainty, and the quantum uncertainty principle

Historically, the element of uncertainty in quantum mechanics has been expressed through mathematical identities called uncertainty relations, a great many of which continue to be discovered. These relations use diverse measures to quantify uncertainty (and joint uncertainty). In this paper we use operational information-theoretic principles to identify the common essence of all such measures, thereby defining measure-independent notions of uncertainty and joint uncertainty. We find that most existing entropic uncertainty relations use measures of joint uncertainty that yield themselves to a small class of operational interpretations. Our notion relaxes this restriction, revealing previously unexplored joint uncertainty measures. To illustrate the utility of our formalism, we derive an uncertainty relation based on one such new measure. We also use our formalism to gain insight into the conditions under which measure-independent uncertainty relations can be found.


INTRODUCTION
Revealing one of the most striking features of quantum mechanics, Heisenberg [1] showed that the outcomes of certain pairs of measurements on a quantum system can never be predicted simultaneously with certainty-regardless of how the system is prepared. Heisenberg's original statement of what he called the "indeterminacy" principle concerned potential measurements of the position and the momentum of a quantum particle. Many later works [2][3][4][5][6] lent quantitative rigor to Heisenberg's original idea and generalized it, both in the number and type of measurements involved and in the measures used to quantify joint uncertainty. At the same time, Heisenberg himself set off another chain of research on a related concept: measurement-induced disturbance and so-called noise-disturbance relations [7][8][9][10].
In all of these areas, the primary ingredient is the concept of the uncertainty of a variable, as well as that of the joint uncertainty of several variables. The aim of this paper is to clarify these concepts from an informationtheoretic perspective. In the literature, the uncertainty of a variable has almost always been discussed in terms of * vnarasim@ucalgary.ca † alireza.poostindouz@ucalgary.ca ‡ gour@ucalgary.ca measures that quantify "the amount of uncertainty", e.g. the Shannon entropy and its extended family of Rényi entropies, geometric norm-based measures such as the quadratic variance, etc. In most cases, there is a clear operational meaning for such measures, rendering them well-suited to the particular application wherein they are used. Similarly, measures of the joint uncertainty of more than one variable have been constructed either by considering operational tasks that involve all the variables, or by combining single-variable uncertainty measures mathematically. In the present work we extract the common thread beneath the operational descriptions of all such (single or joint) uncertainty measures, resulting in some basic operational axioms that are independent of the measure used to quantify uncertainty, and that define the essence of our concept of uncertainty. These axioms are motivated by information-theoretic principles that are intended to be as objective as possible. Considering the challenges inherent in such a requirement, we restrict the generality of our treatment in the following ways. Firstly, we restrict to notions of uncertainty applied to classical random variables. In particular, this class of variables includes the classical outcomes of quantum-mechanical measurements. Secondly, we avoid measures of uncertainty that explicitly involve the values of a variable, and instead consider only such measures that depend on the variable's probability distribution. This necessitates a restriction to discrete variables; in fact, we consider only finite-dimensional variables. We make some tentative suggestions for the treatment of discrete and continuous infinite-dimensional cases, but leave the actual extension for future work. Finally, in comparing the uncertainties of different variables (which a measure of uncertainty should naturally be expected to enable), we will require the compared variables to represent the same type of physical quantity. For example, a comparison between the uncertainties in two different length variables will be possible within our formalism, but not one between a length uncertainty and a mass uncertainty.
The crux of this paper are the following axioms: (1) One's knowledge about a variable cannot increase under any processing without addition of new information about the variable; (2) The uncertainty in a variable representing a physical observable is invariant under the symmetries of the observable; and (3) The joint uncertainty of several variables is a valid concept even without an underlying operational description that combines those variables. The first two axioms are inspired by earlier approaches [24][25][26][27] to measure-independent notions of uncertainty, wherein the connection between uncertainty and a mathematical concept called majorization [43] was utilized. Majorization is a hierarchy among probability distributions, induced by the action of a class of transformations called doubly stochastic maps. In this paper, by finding a mathematical characterization of mechanisms that can increase a variable's uncertainty, we gain an operational understanding of why, and to what extent, majorization plays a role in characterizing uncertainty.
First, we find that for variables with unrestricted symmetries, uncertainty-increasing mechanisms are associated with the set of all doubly stochastic maps, leading to the emergence of majorization as the relation determining uncertainty. A function that quantifies uncertainty must then possess the property of never decreasing under any doubly stochastic maps. On the other hand, with restricted symmetries, only certain sub-classes of doubly stochastic matrices feature. The resulting hierarchy is then different from majorization, and a measure of uncertainty is required to be non-decreasing only under the restricted classes of doubly stochastic maps. This opens up more options for functions that can serve as uncertainty measures for variables with restricted symmetries.
Another element of novelty in our work lies in the third of our axioms, concerning joint uncertainty. In the context of physics, we can rephrase this axiom in terms of experiments: Suppose that we are interested in quantifying the joint uncertainty of several experiments, e.g. in connection with the quantum uncertainty principle, where the several experiments are different quantum measurements. One approach would be to construct new experiments that combine the original experiments in some way. For example, consider the following combined experiments constructed from a given set of experiments: (a) all the original experiments are performed independently; (b) all the apparatuses are set up, but only one of the experiments is chosen at random and performed.
The uncertainty in the outcome of such a combined experiment would quantify the joint uncertainty of the constituent experiments. But we see that there are different ways to combine experiments, which all capture different aspects of the joint uncertainty. In this paper we argue that the richness of joint uncertainty is not captured even by considering all such combined experiments. The most general notion of joint uncertainty is devoid of the particulars of such combinations, and allows all the component experiments to be, in principle, counterfactual. To illustrate this, we consider an extensively-studied type of quantum uncertainty relations: the socalled preparational uncertainty relations. For ease of explanation, let's consider a two-measurement preparational uncertainty relation, which has the generic form where J is a measure of the joint uncertainty of two variables, and p(ρ) and q(ρ) are the expected outcome probability distributions of a pair of measurements performed on a quantum state represented by the density operator ρ (our arguments can be extended to more than two measurements). We show that most existing preparational uncertainty relations can be subjected to one of the specific operational interpretations (a) and (b) mentioned above. To show that these two interpretations are unnecessarily restrictive, we construct joint uncertainty measures that cannot be interpreted either way. We go on to derive an uncertainty relation based on one such measure, which is a relation nontrivially different from all the ones discovered in the past. The main purpose of deriving this new relation is to demonstrate the possibilities opened up by our joint uncertainty axiom. Another contribution of this paper is a deeper understanding of so-called universal uncertainty relations found in [24][25][26][27]: pairs of vectors (u, v) such that J (u, v) provides a nontrivial bound [like the c of Eq. (1)] for a whole class of measures, J ∈ J. We find that no universal relations exist if J includes all possible measures; however, restricting to specific operational frameworks [using the (1) and (2) types of combined experiments discussed in the previous paragraph] is what makes the nontrivial universal relations found in [24][25][26][27] possible.
Even though we focus on preparational uncertainty relations in quantum mechanics, in principle our notions can be applied to any situation where probability-based uncertainty measures of classical variables are relevant. We summarize the possible applications in the conclusion, along with open problems.

I. WHAT IS UNCERTAINTY?
We will now develop a notion of uncertainty that can be applied to finite-dimensional classical variables. In particular, we seek a general method of comparing the uncertainties of two variables, in such a way that the comparison gives the same verdict independent of the function used to measure uncertainty. For our purpose, it will be sufficient to be able to compare physically similar variables, that is, variables representing the same underlying physical quantity; for example, comparing the uncertainty of a length with that of another length. We will not concern ourselves with how a comparison can be made between dissimilar variables.
Consider an experiment where Alice is about to roll a (possibly biased) die, whose faces she calls "1", "2". . . , "6". The eventual outcome of the roll will be a value x ∈ {1, 2 . . . , 6}, but since we don't know x a priori, we represent it as a random variable X ≡ {(x, p x )}. What is Alice's minimum uncertainty about X prior to the experiment? We could answer this question in different ways, some of which might appeal to the particular labels that Alice uses to call her outcome. For example, the difference between the largest and smallest possible outcomes that have a non-zero probability is an uncertainty indicator, and it depends on the choice of labels. In principle, Alice could relabel her die's faces to, say, "a", "b", etc., without changing the essential physical nature of the experiment. We will require our notion of uncertainty to make no distinction between two physically identical experiments that differ only in the outcome labels. In other words, we will consider uncertainty to be a property of just the distribution p X , measured possibly by some real-valued function U(p X ). In fact, an even stronger restriction follows. Let Y be a random variable obtained by merely relabeling the different values of X. The probability distribution p Y of Y must then necessarily contain the same values as p X , possibly differing only in their order. Therefore, the effect of any relabeling on p X is as though the original labels were just permuted amongst themselves: p X → M (π) p X . In this sense, permutations, although not the only possible way to relabel outcomes, still capture the effect of arbitrary relabelings, as far as our notion of uncertainty is concerned.
If, instead of a die-roll outcome, X were a physical property, e.g. the energy of a quantum harmonic oscillator, arbitrary permutations could result in loss of the variable's physical meaning. To avoid this, we would have to restrict the permutations, e.g. to only shifts in the energy. In general, the restricted class of reorderings is the group G of symmetries of the observable underlying X, with each symmetry g corresponding to a change in one's reference frame. For finite-dimensional observables, G is a subgroup of the group of all permutations.
Our first requirement from a measure U of uncertainty is that it be invariant under the symmetry group G of the underlying observable. This immediately leads to the following: Two variables X and Y , both representing the same observable, are equally uncertain if their distribution vectors are related by some g ∈ G. If X is the outcome of a certain experiment, the random variables Y that are equally uncertain to X include relabeled (under G) versions of X (which are perfectly correlated with X); the outcomes of other runs of the same experiment with the same apparatus (which may be correlated with X if the apparatus has a memory); outcomes of the same experiment performed on independent but identical apparatuses (uncorrelated with X); and in general any Y representing the same observable, with p Y = M (g) p X for some g ∈ G.
Thus far, we have found a way to tell when the uncertainties of two variables are equal. Now we will develop a method of determining when and how the uncertainty of one variable can be said to be more, or less, than that of another. To this end, we will first identify certainty- nonincreasing transformations: processes that take any given variable X to an equally-or more-uncertain one,X, by virtue of a "randomizing" or "forgetting" mechanism. Thereafter, we will use the following rule to compare the uncertainties of two variables X and Y (arbitrary but with the same underlying physical observable): Y is at least as uncertain as X if some uncertainty-increasing transformation of X results in a variableX that has the same probability distribution as Y up to the symmetries of the underlying observable.
In order to identify the certainty-nonincreasing transformations, we will now construct a couple of extended versions of the "Alice rolls a die" thought experiment. First, consider the modified experiment depicted in Fig. 1 [44]: After rolling her die, Alice sends the outcome x to her collaborator Bob (who doesn't even know the bias distribution of Alice's die) via some classical channel [45] given by the column-stochastic matrix T ≡ (T y|x ). Here let's pause to reflect upon the uncertainty in the output Y of the channel. The channel could transmit x perfectly, or with some added noise. In these cases the output Y is equally or more uncertain than X. On the other hand, the channel could also completely ignore x and output some constant value, in which case the uncertainty of Y could be less than that of X. In fact, the processing might result in information in a fundamentally different form from X. For example, Alice could just send the parity of her die outcome to Bob, in which case Y doesn't even represent the same underlying observable as X. Therefore, we cannot make a general statement about non-increase of certainty under an arbitrary channel.
However, instead of the uncertainty of Y itself, we can consider the following question: How much information does Y contain about x? Since Y results from processing X with the possible addition of noise or irrelevant information, it cannot tell us more about x than X does. In order to lend mathematical rigor to this statement, we must extract from Y some variable that has the same physical meaning as X, so that we can treat them both on an equal footing. Now let's return to Alice and Bob's experiment: Bob, who knows T but not p X , now tries to recover x from the channel output y, which is a pri-ori distributed according to q Y = T p X . Since this game is being designed to analyze uncertainty about x, Bob's aim in his recovery task is not to maximize his chances of guessing x correctly, but rather to faithfully account for the uncertainty that Y contains about x. Suppose he sees an instance Y = y. This could have resulted from a particular X = x with conditional probability T y|x . Without knowing the prior p X , Bob's rational guess for the likelihood that X = x (among all the possible x ) is given by The resulting distribution of Bob's recovered variable (call itX) is given by the composite action of T and R on p X : Since Y could contain irrelevant information, its uncertainty cannot be interpreted as "uncertainty about x". On the other hand, X directly represents x, whileX results from extracting out of Y precisely all the information it contains about x. Therefore, these two variables both represent the same physical observable as x, and their uncertainties directly quantify uncertainty about x. This equal physical footing also ensures that their uncertainties can be compared under our rules. This comparison tells us that the uncertainty ofX cannot be less than that of X. The cumulative transformation that takes X toX is therefore a certainty-nonincreasing transformation. It can be verified easily that for any columnstochastic T , with the corresponding R [46] constructed as in (2), the matrix D rec = RT is doubly stochastic. The (necessarily degenerative) evolution of the information about some entity (like x), when the representation of this information is subjected to any classical processing (represented by the action of the channel), is always via such matrices, whose collection we call D rec .
We saw that, after the action of a generic channel, the uncertainty of the final variable Y doesn't have a consistent hierarchical relationship with that of the initial variable X. In order to draw a consistent rule of certainty non-increase, we had to consider a recovery transformation from Y toX. But there are certain special transformations that always result in certainty non-increase, even without the addition of a recovery transformation. In fact, we already saw an example: symmetry transformations of the underlying physical observable. In the die-roll example, symmetry transformations include nonidentity permutations, which can easily be shown to be outside of the die's D rec class, yet result in final variables Y with the same physical meaning and (consistently) no less uncertain than X. We will find a family of such certainty-nonincreasing transformations by considering another thought experiment, depicted in Fig. 2: Before rolling her die, Alice will toss a coin; she will then relabel the die's faces with a permutation that is determined by the outcome of this coin toss, and then roll the die. The random choice of relabeling makes the outcome Y of this modified experiment more uncertain than X. In general, if a variable X is transformed by applying a g ∈ G chosen at random under a distribution t ≡ (t g ), the resulting variable Y is distributed as q Y = D sym p X , where D sym = g∈G t g M (g) . Since each M (g) is a permutation, every possible D sym is doubly stochastic. We denote by D sym the set of all such D sym matrices. If the observable's symmetry group G includes all permutations, then by Birkhoff's theorem [47,48] D sym is the set of all doubly stochastic matrices, but a restricted G results in a corresponding shrinkage of D sym .
The characterization of the classes D rec and D sym is an interesting problem that we leave for future work. While the latter class depends on the symmetry group of the observable, the former depends only on the dimensionality. For a variable with complete permutation symmetry, as noted above, D sym contains all doubly stochastic matrices, in particular all of D rec . But under restricted symmetries, each class can contain members not belonging to the other. For instance, take a 3-dimensional variable whose symmetry group is the (order-3) group of cyclic permutations of the components. The two nontrivial permutations are transformations contained (by design) in D sym , but not in D rec . On the other hand, the matrix   1 0 0 0 0.5 0.5 0 0.5 0.5   is in D rec , but not in D sym . Therefore, the structure of the union of these classes cannot be reduced to either one of the classes. This example can be generalized naturally to higher dimensions.
Due to our restriction to uncertainty comparison between physically-similar variables, the "sym" and "rec" classes of doubly stochastic matrices together suffice as mechanisms of uncertainty increase. In principle, any function U(p X ) meant to measure the uncertainty of X is required to increase under both these matrix classes. But the "sym" class is more important that the "rec": the former is based on the natural symmetries of an ob-servable, and therefore the constraints that it induces on uncertainty measures are inviolable. On the other hand, "rec", even though it is an essential ingredient in the strictest information-theoretic definition of uncertainty, could be ignored in natural situations where informationprocessing is not involved. Functions that respect the "sym" constraints, but violate the "rec" ones, nevertheless turn out to be useful indicators of uncertainty. Based on these considerations, we define: Definition 1. A measure of uncertainty of a variable X is a function U of the distribution p ≡ p X of the variable, satisfying U(Dp) ≥ U(p) ∀D ∈ D sym ; (4) Here the class " sym" is determined by the symmetries of the variable's underlying physical observable. A function that satisfies (4), but not (5), will be considered a weak measure of uncertainty.
If the symmetry group G of a finite-dimensional X contains all permutations, then functions that satisfy (4) are called Schur-concave functions [43]. Examples of such functions are the entropies of Shannon, Rényi, and Tsallis. Now, Hardy et al. [49] proved that the existence of a doubly stochastic D such that q = Dp is equivalent to the binary relation p q, read "p majorizes q" [43], which for a general d-dimensional vector space is defined as follows. Define p ↓ and q ↓ as the same vectors with their components arranged in nonincreasing order. Then, p q if, and only if, The "completely certain" and "completely uncertain" distributions e ≡ (1; 0 . . . ; 0) and u ≡ (1/d; 1/d . . . ; 1/d) satisfy e p u, ∀p.
If a variable has restricted symmetries, then the uncertainty hierarchy of its distributions becomes different from the majorization hierarchy. All Schur-concave functions still remain valid uncertainty measures. But in addition, by virtue of the reduction in the class D sym , some non-Schur-concave functions could also qualify to be weak measures of uncertainty [i.e., may violate condition (5)]. For example, for a finite-dimensional variable X whose symmetries are cyclic permutations, it can be easily shown that the variance of X is only a weak uncertainty measure. Generalizing the classes D sym and D rec for discrete-infinite and continuous variables may not be straightforward, and we leave it for future work. We expect it to be possible to achieve such a generalization by considering parametrized families of symmetries (e.g. Lorenz transformations) and convex combinations (integrals) over different parameter assignments.

II. JOINT UNCERTAINTY
The uncertainty of the outcomes of individual experiments cannot provide a complete description of the quantum uncertainty principle, since most uncertainty relations are lower bounds on measures of the joint uncertainty of the outcomes of at least two measurements. For clarity of discussion, here we will restrict to pairs of experiments, each with a finite number of possible outcomes; extension to more experiments is straightforward. To motivate our definition of joint uncertainty, consider the following hypothetical scenarios involving the joint uncertainty of a coin-toss outcome, X, and a die-roll outcome, Y : Example 1: Perform the combined experiment comprising an independent and simultaneous performance of both the original experiments [ Fig. 3 (a)]. The outcome is Z ≡ (X, Y ), which has |X||Y | = 12 possible values, distrubuted as p Z = p X ⊗ p Y . Therefore, U(p X ⊗ p Y ), for U any single-variable uncertainty measure (in the sense of Def. 1), serves as a joint uncertainty measure of X and Y . Most measures considered in the literature on the quantum uncertainty principle, e.g. the sum of Shannon entropies of the individual outcome distributions, can be interpreted through such a combined experiment. Example 2: This time we first toss a second coin to make a choice between the actions "toss the coin" (resulting in outcome X) and "roll the die" (leading to Y ), and then perform only the chosen action [ Fig. 3 (b)]. The outcome Z of this experiment has |X| + |Y | = 8 possible values, whose uncertainty (modulo the uncertainty in the choice of action) is also a manifestation of the joint uncertainty of (X, Y ). In this case, if the choice coin is unbiased, p Z = 1 2 p X ⊕ 1 2 p Y and therefore we get measures of the form U 1 2 p X ⊕ 1 2 p Y . The measures of joint uncertainty proposed in [37] can be interpreted through such a combined experiment.
As these scenarios illustrate, there could be different ways in which experiments could be combined into one super-experiment, the uncertainty of whose outcomes then reflects an aspect of the joint uncertainty of (X, Y ). But the essence of joint uncertainty is not quite captured by any one of these joint experiments. In fact, some joint uncertainty measures, such as the functions H α (p X ) + H β (p Y ) (where H α and H β are Rényi entropies) [12], and even Heisenberg's ∆x∆p, cannot be interpreted as the uncertainty of any single combined experiment. The quantum uncertainty principle applies also to cases with several potential measurements, each a potential (actual or counterfactual) experiment in its own right.
These considerations indicate that the notion of joint uncertainty is not bound to the concept of combined experiments. What, then, are the essential properties of a measure of joint uncertainty? Firstly, the pairs (X, Y ) that have the least joint uncertainty are ones where both distributions are completely certain. The most jointlyuncertain pairs, on the other hand, are the ones where both variables are completely uncertain. Furthermore, all the measures of the joint uncertainty of (X, Y ) are real-valued functions of the distributions p ≡ p X and q ≡ q Y , and must reduce to the measures of singlevariable uncertainty (as in Def. 1) if one of the vectors p and q is kept fixed. This brings us to the following definition: Definition 2. A measure of joint uncertainty of two variables X and Y is a real-valued function J of (p, q) ≡ (p X , q Y ), such that for all doubly stochastic matrices D 1 , D 2 in the respective " sym" and " rec" classes of both variables. As in the single-variable case, we will call functions satisfying (7) for the " sym" class, but not for the " rec" class, weak measures of joint uncertainty. It can be verified that this definition applies to entropic joint uncertainty measures of the form f (p)+g(p), where f and g are single-variable uncertainty measures. The vast majority of the literature on entropic uncertainty relations uses such measures. Note that if the symmetry groups of both variables are the respective full permutation groups, then D 1 and D 2 can be any two doubly stochastic matrices of appropriate dimensions. In this case, the relation in (7) states that J is monotonic under the direct product relation "Ï" defined by: (p 1 , q 1 ) Ï (p 2 , q 2 ) ⇔ (p 1 p 2 and q 1 q 2 ).

III. THE QUANTUM-MECHANICAL UNCERTAINTY PRINCIPLE
The "uncertainty principle" of quantum mechanics is actually a collection of identities known as uncertainty relations (UR's), all concerning the uncertainties of individual quantum-mechanical measurements, as well as joint uncertainties of sets of two or more (actual or counterfactual) measurements. Broadly, there are three different operational contexts of UR's: different measurements applied on the same quantum state (either counterfactually or by preparing many copies of the same state); simultaneous (approximate) execution of several measurements; and sequential execution of several measurements. The notions that we developed in the last two sections can be applied in all of these contexts, since they all include instances of finite-dimensional classical variables. But here we will focus on the first type of situation, where different measurements are considered on identical preparations. Furthermore, we restrict to UR's that involve only the probability distributions of measurement outcomes, and not the "values" assigned to the outcomes.
Since these UR's involve only the probabilities of outcomes, a positive-operator-valued measure (POVM) description of measurements is adequate in the formalism. Consider the case of two POVM's A ≡ {Π a } a and B ≡ {Γ b } b . For a quantum state ρ, measurement A leads to outcome probability distribution p(ρ) where p a (ρ) = Tr [Π a ρ], and B to q(ρ) with q b (ρ) = Tr [Γ b ρ]. For a so-called incompatible pair of POVM's (A, B), there is no ρ that results in both p(ρ) and q(ρ) completely certain, leading to the existence of a "minimal joint uncertainty". Many UR's are statements to this effect: where J is a measure of joint uncertainty, and 0 < c ≤ C J (A, B) := min ρ J (p(ρ), q(ρ)). In some relations (e.g. Robertson's), c is not a constant but rather a nonnegative function of ρ. The disadvatage of such a lower bound is that it can be zero in some cases even if A and B are incompatible. For this reason, state-independent c's are favored in most of the recent literature. In general, our analysis of uncertainty and joint uncertainty enables us to unify the understanding of all UR's of the form J (p 1 (ρ), p 2 (ρ) . . . , p n (ρ)) ≥ c, where J is a (strong or weak) joint uncertainty measure (under a generalized version of Def. 2) of the n probability distributions (p 1 . . . , p n ) that result from measurements (A 1 . . . , A n ) (counterfactually) applied to the same state ρ. A vast number of UR's reported in the literature, including most entropic UR's, take this form. In fact, most of the entropic UR's found so far fall under a much stronger restriction. As we mentioned in the previous section, they can all be constructed upon specific no-tions of joint uncertainty based on the "combined experiment" scenarios where either all the measurements are performed on independent, identically prepared quantum systems [as in Fig. 3 (a)], or a random choice is made to decide which of the several measurements to perform [as in Fig. 3 (b)]. All entropic relations based on joint uncertainty measures of the form f (p) + f (q), where f is an entropy function that is additive under tensor products, fall under this category. Going beyond these operational notions and using our general definition of joint uncertainty enables us to construct new UR's, with the following "recipe": 1. Find a measure of the joint uncertainty (under a restricted class of symmetries, if applicable) of the desired number n of distributions, based on Def. 2; 2. For the given n measurements, find a lower bound on the n-joint uncertainty of the outcome distributions of the measurements applied to quantum states, like the c in (8). This bound leads to an assertion of the form (8), i.e. an uncertainty relation.
As an illustration, we derive an uncertainty relation for two rank-1 projective measurements on pure states of a 2-level system, using the following joint uncertainty measure constructed using Def. 2: J 2 (p, q) = 1 − p ↓ · q ↓ . Here (·) denotes the usual dot product. Note that this measure of joint uncertainty is faithful in the sense that it is zero if and only if both vectors p and q are completely certain. We find that, for p and q the outcome distributions of the projective measurements with respect to two arbitrary orthonormal bases {|x 1 , |x 2 } and {|y 1 , |y 2 }, where η := max i,j | x i |y j |. We provide the proof in the Appendix A. Since the measure J 2 cannot be interpreted based on the two combined-experiment scenarios under which most existing entropic UR's fall, or indeed based on any single-experiment scenario, the above UR is nontrivially different from all previous ones. More generally, for d-dimensional p and q, any joint uncertainty measure constructed as a Schur-concave function the vector is a valid joint uncertainty measure of (p, q). So is any Schur-concave function of vectors of dimension k < d constructed with the components . These are just a handful of examples that we contrived for illustration, suggesting that a rich variety of UR's could be obtained by allowing joint uncertainty measures that don't yield themselves to interpetation as the outcome uncertainty of any single experiment.

IV. UNIVERSAL UNCERTAINTY RELATIONS
We could construct various uncertainty relations using the aforementioned recipe, with the given pair (A, B) and different measures J . Every relation is stated in terms of a lower bound like the c of (8), which in turn depends on J . In general, for a given J it might be hard to compute such a bound. But suppose there were a fixed pair (u, v) of distribution vectors, such that If there were such a pair, then for any given J 0 we would merely have to compute J 0 (u, v), immediately yielding a bound. In this sense, finding such a pair would amount to finding a plethora of uncertainty relations; therefore, such a pair can be said to constitute a universal uncertainty relation for the pair (A, B) [24,25].
As it turns out, a nontrivial pair satisfying (11) never exists for any given (A, B), because the clause "∀J " in (11) includes all single-uncertainty measures of p and q alone, leading necessarily to the trivial choice (u 0 , v 0 ), where u 0 p(ρ) and v 0 q(ρ) for all ρ. Such a (u 0 , v 0 ) would be unhelpful in that it wouldn't impose joint restrictions on (p, q). In order to avoid this triviality, we can relax the condition "∀J ", and instead require the inequality in (11) to only hold for some restricted class of J 's.
Here we consider again the two restricted combinedexperiment scenarios that we discussed in Section II. In the first scenario, both A and B are carried out independently of each other on copies of the same state ρ. The restricted class of joint uncertainty measures then consists of functions of the form J (p(ρ), q(ρ)) = U (p(ρ) ⊗ q(ρ)). In Ref. [24,25], it was shown that for any given A, B there exists a distribution vector u ≡ ω(A, B) such that the pair (u, e 2 ) forms a universal uncertainty relation under this restricted class of joint uncertainty measures.
Similarly, following Example 2 of Section II, we can consider a combination wherein we first pick, at random, only one of the two measurements A and B, and then perform that one. The joint uncertainty measures considered here are of the form U (p(ρ) ⊕ q(ρ)). A nontrivial (u, v) for this restricted class can be found using the methods in [26,27].
It might be possible to unify the spirit of the above two classes of universal relations into a larger class, by including all measures of joint uncertainty that are symmetric in the two (or more) distributions: J (p, q) = J (q, p). This requirement avoids the case of trivial relations resulting from the requirement (u, v) Ï (p, q), but we leave it open whether a nontrivial (u, v) can be found. Another way of unifiying several classes of universal relations, each with its respective (u i , v i ), is by bounding any measure J as follows: An interesting open problem is whether there exists a finite integer m such that minimizing over all j ≤ m provides a nontrivial bound for all nontrivial joint uncertainty measures.
Universal uncertainty relations are a powerful tool inasmuch as they generate a variety of uncertainty relations, but the bounds they yield may not be tight. Besides, there are joint uncertainty measures that may not lend themselves to inclusion in a class that admits a nontrivial universal relation, but nevertheless do provide a nontrivial uncertainty relation. An example is the measure J (p, q) = 1 − p ↓ · q ↓ , for which we found a UR in the previous section.

V. CONCLUSION
In this paper, we identified the most basic, measureindependent elements of the concept of uncertainty as applicable to finite-dimensional classical variables. We based our analysis on an information-theoretic study of mechanisms of uncertainty increase: randomly-chosen symmetry transformations; and classical processing via channels (followed by recovery). Corresponding to these, we identified two classes of doubly stochastic matrices, D sym and D rec . Uncertainty measures in the strictest sense must be monotonically non-decreasing under both these classes.
We then took a similar information-theoretic approach to the concept of joint uncertainty of several variables, resulting in the principle that the most basic features of joint uncertainty measures must not depend on specific operational combinations of the variables. We then considered quantum uncertainty relations (UR's) of the preparational uncertainty type, where past works have always considered specific operational combinations. Applying our new notion of joint uncertainty not only resulted in a unified understanding of a large class of UR's, but also opened up the possibility of deriving a new class of preparational UR's, namely identities that are mathematically valid for any preparation, but cannot be interpreted based on any single experimental scenario. To illustrate, we constructed a class of joint uncertainty measures with this property, and derived a new UR using one of these measures as an example. Finally, we found that so-called universal uncertainty relations cannot be found over all possible measures of joint uncertainty. We connected universal relations found in past works [24][25][26][27] with specific operational interpretations of joint uncertainty.
In cryptographic tasks we must consider the uncertainty of systems that could be correlated with quantum memories in adversarial control; our recent work [50] is a step towards developing a measure-independent notion of such conditional uncertainty. More generally, a formalism for treating the uncertainty of quantum information correlated with quantum memories is not yet developed. A more complete characterization of uncertainty on infinite-dimensional systems is another challenging future project. This could impact applications of squeezed states, which are ubiquitous in quantum information processing with continuous variables. Yet another open problem is to improve our understanding of universal uncertainty relations; in particular, to answer the open questions posed at the end of Section IV regarding stronger classes of universal relations. Finally, there is much to be understood about the classes D rec and D sym of doubly stochastic matrices.

VI. ACKNOWLEDGMENTS
GG is grateful for many interesting discussions with Micha l Horodecki, Amir Kalev, Iman Marvian, and Rob Spekkens. In particular, we are grateful to Iman Marvian and Rob Spekkens, for pointing out to us the role of symmetry in quantum uncertainty relations. The authors acknowledge helpful discussions with Mark Girard, Marco Piani, and Borzu Toloui. We thank Steven Nich for help with the calculations, Patrick Coles and Marco Tomamichel for bringing relevant literature to our notice, and anonymous reviewers for helpful comments and suggestions. This research is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).