Matrix-valued Bessel processes

This paper introduces a matrix analog of the Bessel processes, taking values in the closed set $E$ of real square matrices with nonnegative determinant. They are related to the well-known Wishart processes in a simple way: the latter are obtained from the former via the map $x\mapsto x^\top x$. The main focus is on existence and uniqueness via the theory of Dirichlet forms. This leads us to develop new results of potential theoretic nature concerning the space of real square matrices. Specifically, the function $w(x)=|\det x|^\alpha$ is a weight function in the Muckenhoupt $A_p$ class for $-1<\alpha\le 0$ ($p=1$) and $-1<\alpha1$). The set of matrices of co-rank at least two has zero capacity with respect to the measure $m(dx)=|\det x|^\alpha dx$ if $\alpha>-1$, and if $\alpha\ge 1$ this even holds for the set of all singular matrices. As a consequence we obtain density results for Sobolev spaces over (the interior of) $E$ with Neumann boundary conditions. The highly non-convex, non-Lipschitz structure of the state space is dealt with using a combination of geometric and algebraic methods.


Introduction and preliminaries
The Wishart processes, taking values in the cone S d + of positive semidefinite d × d matrices, constitute a class of matrix-valued Markov processes generalizing the squared Bessel (BESQ) processes. They were first introduced by Bru [5,6], and have subsequently been studied further and extended in various directions by a number of authors, for example [11,18,10,8]. They have also found use in applied contexts, for instance in finance [16,9].
The existence of a well-behaved matrix analog of the BESQ processes raises the question of whether the same is true for the Bessel (BES) processes. Since the Wishart process is S d + -valued, a natural candidate is its positive semidefinite square root. This was considered in [17], where the resulting Markov process is described via the dynamics of its eigenvectors and eigenvalues. However, as was pointed out already by Bru [6], it appears difficult to obtain the dynamics of the process itself, or to succinctly describe its generator. The aim of the present paper is to show that a more well-behaved class of processes is obtained by passing to the larger state space where M d is the Euclidean space of all d × d real matrices, endowed with the usual inner product x • y = Tr(x ⊤ y) and norm x = √ x • y. The matrix-valued Bessel process with parameter δ > 0 and matrix dimension d, abbreviated BESM(δ, d), will be an E-valued Markov process whose generator is given by where ∇f is its gradient), and ∆ = ∇ • ∇ = i,j ∂ 2 x ij x ij is the Laplacian. To make x −1 globally defined, we set to zero for singular x (this choice is arbitrary and inconsequential.) Notice that for d = 1, L is the generator of the BES(δ) process.
Existence of the BESM(δ, d) process is proved via the theory of Dirichlet forms, which is able to nicely handle the singular drift term of L. The crucial fact is that L is a symmetric operator with respect the measure m(dx) = | det x| δ−1 dx, which is a consequence of an integration by parts formula (Theorem 2). The Dirichlet form is then given by the simple expression Uniqueness is a much more delicate issue. Relying on density results for certain Sobolev spaces with Neumann boundary condition (Theorem 4), we establish Markov uniqueness in the sense of Eberle [13]. Obtaining these density results is a nontrivial matter. In particular, we are led to prove several results, interesting in their own right, about the measure m and its interaction with the state space. Specifically, we show that the matrices of co-rank at least two form a set of zero capacity with respect to m, and that if δ ≥ 2, the set of all singular matrices has zero capacity (Theorem 5). Moreover, we prove that | det x| α is locally Lebesgue integrable on M d precisely when α > −1 (Theorem 1), and that it is a weight function in the Muckenhoupt A p class when α ∈ (−1, 0] and p = 1, and when α ∈ (−1, p − 1) and p > 1 (Theorem 6). This exactly parallels the well-known situation for the weight function t α on R.
The proofs of these results require some effort. The difficulties mainly arise due to the highly non-convex, non-Lipschitz structure of the state space E. In fact, the interior E o does not even lie on one side of its boundary ∂E, as can be seen by considering the lines through the origin, {tx : t ∈ R}, x ∈ M d : If d is even, each line lies either entirely inside E, or entirely outside E o . These issues are resolved via a combination of geometric methods (relying on the stratification of E into smooth manifolds M k consisting of rank k matrices) and algebraic methods (mainly the QR-decomposition and estimates of the determinant function near M k .) One would expect similar techniques to be useful for the analysis of Markov processes on more general stratified spaces; some of the groundwork for this is laid in [12].
Let us say something about why the BESM processes are natural analogs of the BES processes, other than the resemblance of their generators. The main reason is that the process X ⊤ X, where X is BESM(δ, d), is a Wishart process with parameter α = d − 1 + δ, denoted WIS(α, d), see Theorem 3. As a consequence, X is BES(dα), and det X is a timechanged BES(δ) process. Moreover, just as in the scalar case, X is the weak solution of a stochastic differential equation for δ > 1, while it is not even a semimartingale for 0 < δ < 1. A general discussion of BES and BESQ processes is available in [30,Chapter XI]. For a specialized treatment of the case 0 < δ < 1, see [4,2].
A second motivation, which was the original "clue" that led us to consider the generator L, is as follows. Let X be an M d -valued Brownian motion starting from I (the identity matrix), and let τ 0 be the first time det X t hits zero. Then det X t∧τ 0 is a martingale, and we may use it to change the probability measure. An application of Girsanov's theorem, using the identity ∇ ln det(x) = x −⊤ , shows that under the new measure, the process is M d -valued Brownian motion. Thus X becomes a Markov process whose generator is L with δ = 3, and its determinant is positive by construction. This is fully analogous to the well-known construction via Doob's h-transform of the BES(3) process as Brownian motion "conditioned to stay positive". In addition to the notation already introduced above, the following conventions will be in force throughout the paper.
• As usual, the symbols C(U ); C c (U ); C k (U ) denote the spaces of continuous; compactly supported; k-times continuously differentiable functions on a subset U ⊂ M d equipped with the relative topology. Writing C(U ; V ), etc., means that the functions take values in the topological space V . Note that U may be closed in M d , e.g. if U = E. In this case, the compact sets need not be bounded away from ∂U . [20], and we have ∂E We refer to [25] for background on differential geometry.
where µ is a non-normalized Haar measure on O(d), and dR = i≤j dR ij .
• For x ∈ M d , we let adj x denote the adjugate matrix of x (i.e., the transpose of the matrix of cofactors). It satisfies the identities x adj x = (det x)I and ∇ det(x) = adj x, and we have adj x = 0 if and only if rank x ≤ d − 2.
The rest of this paper is organized as follows. The BESM process and semigroup are defined, and proved to exist, in Section 2. Some fundamental properties, including the relation to the Wishart process, are discussed in Section 3. Markov uniqueness is proved in Section 4. The crucial Theorems 5 and 6 are proved in Sections 5 and 6, respectively. The integration by parts formula (Theorem 2) is proved in Appendix A, while Appendix B and C contain, respectively, some auxiliary results on Sobolev spaces and differential geometry.

Definition and existence
The definition of the BESM process is based on the differential operator L in (1) acting on functions in D, where As we will see momentarily, (L, D) is a symmetric operator on L 2 (E, m), where the measure m is given by m(dx) = | det x| δ−1 dx.
(Occasionally m will be viewed as a measure on M d , or on subsets other than E.) The inner product on L 2 (E, m) is denoted by ·, · . Specifically, we write where f, g ∈ L 2 (E, m) and F, G ∈ L 2 (E, m; M d ). The overlapping notation should not cause any confusion.
The following result shows that m is a Radon measure on M d , and hence on E, when δ > 0. It implies in particular that we have D ⊂ L 2 (E, m). Theorem 1. Let α ∈ R and define w(x) = | det x| α . The function w is locally integrable on M d if and only if α > −1.
Proof. Let A ⊂ M d be relatively compact. Since ∂E is a nullset, we may assume that A ∩ ∂E = ∅. Then there is a cube K ⊂ T (d), say K = i≤j I ij , of bounded open intervals I ij such that I ii ⊂ (0, ∞) and I ij ⊂ R (i < j), and such that A ⊂ O(d) · K. Hence, the change-of-variable formula in Lemma 1 yields where µ is a non-normalized Haar measure on O(d). The right side is finite, provided α > −1. If on the other hand α ≤ −1, take I ij = (0, 1) for all i ≤ j, and set Then A is relatively compact, but A w(x)dx = ∞.
Consider the differential operator ∇ * given by This notation is justified by the following integration by parts formula, which shows that ∇ * acts as an adjoint of ∇. Together with the observation that L = − 1 2 ∇ * ∇, this will imply that L is indeed a symmetric operator on L 2 (E, m). Theorem 2 (Integration by parts formula). Suppose δ > 0, and consider f ∈ C 1 c (E) and G ∈ C 1 (E; M d ). If δ ≤ 1, assume that G(x) is tangent to ∂E at x for all x ∈ ∂E. If δ < 1, assume in addition that G(x) • x −⊤ is locally bounded. Then Proof. See Appendix A.
Remark 1. It is not hard to show that local boundedness of G(x) • x −⊤ implies that G(x) is tangent to ∂E at x for all x ∈ ∂E. Hence for g ∈ D, G = ∇g will always satisfy the assumptions of Theorem 2.
The fact that L is symmetric is apparent from the equalities valid for any f ∈ C 1 c (E) and any g ∈ D.
If δ > 1 we may take any g ∈ C 2 c (E). The BESM process is now defined as follows.
An E-valued Markov process X is said to be m-symmetric if its transition function p t (x, dy) is m-symmetric. In this case the operators E f (y)p t (x, dy), where f is bounded and in L 2 (E, m), can be extended to all of L 2 (E, m), see [15, page 30]. This extension is called the L 2 (E, m) semigroup of X.
While uniqueness of the BESM semigroup and process is a delicate matter, existence is straightforward via the theory of Dirichlet forms. In view of (3) it is natural to consider the symmetric bilinear form This form is closable on L 2 (E, m), as can be deduced from Theorem 2 as follows. Pick a sequence ( It follows that F = 0 m-a.e., establishing closability. We define (E, D(E)) = closure of (E, C 1 c (E)).
An application of [15, Theorem 3.1.2] then shows (after routine verification of the conditions of that theorem) that (E, D(E)) is a regular, strongly local Dirichlet form. Furthermore, (3) implies that the generator associated with E coincides with L when acting on functions in D (or in C 2 c (E) when δ > 1.) With some abuse of notation, we therefore let (L, D(L)) = generator of (E, D(E)), noting that the domain D(L) contains D, and even contains C 2 c (E) if δ > 1. In particular, the semigroup (T t : t > 0) on L 2 (E, m) associated with E and L is a BESM(δ, d) semigroup.
A corresponding BESM(δ, d) process is then obtained as the m-symmetric Hunt process X on E associated with the Dirichlet form (E, D(E)), see [15, Theorems 7.2.1 and 7.2.2]. The strongly local property of E implies that this process has continuous paths. However, it is not guaranteed a priori that X is conservative; we now prove that it is, thereby obtaining existence of the BESM(δ, d) process. In the following, let P x be the law of X starting from x ∈ E. Proposition 1. The Dirichlet form E is conservative, i.e. the semigroup (T t : t > 0) satisfies T t 1 = 1 for all t > 0. Consequently, X can be chosen so that Proof. By [15,Theorem 1.6.6], E is conservative if there is a sequence (f n ) ⊂ D(E) such that 0 ≤ f n ≤ 1 and lim n f n = 1 m-a.e., and such that lim n E(f n , g) = 0 holds for any g ∈ D(E) ∩ L 1 (E, m).

Differentiating twice yields
Since ∇f n and ∆f n both vanish outside the set E n = {x ∈ E : n ≤ x < n + 1}, we obtain f n ∈ D ⊂ D(L) as well as where Hölder's inequality was applied. For n ≥ 1, the supremum is bounded by a constant c > 0 that is independent of n. Hence We deduce that lim n |E(f n , g)| = 0, showing that E is conservative. The statement about X now follows from [15, Exercise 4.5.1].

Some properties and the relation to Wishart processes
Throughout this section X denotes a BESM(δ, d) process, and P x denotes its law when started from x ∈ E. Our goal is to study some of its basic properties, in particular the relation to Wishart processes. Much of the analysis relies on the following standard result, which states that X solves the martingale problem for L. Lemma 2. Pick any f ∈ D(L) such that Lf is locally m-integrable on E. For q.e. x ∈ E we have t 0 |Lf (X s )|ds < ∞ for all t ≥ 0, P x -a.s., and the process is a square integrable martingale under P x .
Remark 2. The exceptional set for which the conclusion of the lemma fails depends on the function f in general. We conjecture that X is in fact strongly Feller. In this case the quantifier "for q.e. x ∈ E" can be replaced with "for every x ∈ E" in the above lemma, as well as in all subsequent results.
(ii) the process W defined via is M d -valued Brownian motion under P x .
Proof. We saw in Section 2 that C 2 c (E) ⊂ D(L) if δ > 1, and Theorem 1 implies that 1/ det(x) is locally m-integrable in this case. Let C be a countable subset of C 2 c (E). Lemma 2 then implies that there is an exceptional set N ⊂ E such that for all x ∈ E \ N we have (i), and (4) defines a P x martingale for all f ∈ C. Choosing C suitably, standard arguments (see for instance [31,Theorem V.20.1]) show that X solves the stochastic differential equation associated with L-that is, (ii) holds. Corollary 1. For δ > 1, X is a semimartingale under P x for q.e. x ∈ E.
We now describe the properties of the transformed processes X ⊤ X, X , det X. The main observation is the following. Define the map and consider the operator This is the generator of the WIS(α, d) process, see [6]. The Wishart process exists and is nondegenerate (in the sense of not being absorbed when it hits the boundary ∂S d + ) precisely when α > d − 1.
Now, for any g ∈ C ∞ c (S d + ) and any x ∈ E, one readily verifies the identities Consequently we have where the latter function lies in C ∞ c (E) and in particular is locally m-integrable. An application of Lemma 2 then shows that Φ(X) = X ⊤ X solves the martingale problem for and hence is a Wishart process. This proves part (i) of the following theorem.
(ii) Let dz denote Lebesgue measure on S d , and define a measure on S d + by Then m is a Radon measure if and only if α > d − 1.
This form is closable in L 2 (S d + , m). Its closure is a regular, strongly local, conservative Dirichlet form, whose generator coincides with L WIS on C ∞ c (S d + ). In particular, ) is a symmetric operator on L 2 (S d + , m). Proof. Part (i) was proved above. For part (ii), it follows from [29, Theorem 2.1.14] that m = c Φ * m for some constant c > 0, where Φ * m is the pushforward of m under Φ. Moreover, due to the bounds The result now follows from Theorem 1. It remains to prove part (iii), and we start by expressing where we used the definition of E, the first identity in (7), the change of variable theorem, and finally the expression for m. This together with the equality lets us deduce closability, regularity and strong locality from the corresponding properties of E. Conservativeness is proved as in Proposition 1 by observing that the functions f n appearing there are of the form f n = g n • Φ.
To complete the proof we must relate E WIS to the operator L WIS . Using (3) and (8) Combining this with the previous expression, we arrive at which implies that L WIS coincides with the generator of E WIS on C ∞ c (S d + ).

Remark 3.
It is interesting to note that the restriction δ > 0 corresponds exactly to the (well-known) condition (6) for the existence of non-degenerate Wishart process, see [6,Theorem 2]. Theorem 3 connects this to the Radon property of the symmetrizing measure m.
Remark 4. The transition density q(t, u, z) of the WIS(α, d) process is given in [11]. In terms of the measure m it becomes where Γ d is the multivariate Gamma function, and 0 F 1 is a hypergeometric function with matrix argument; see [11] for the precise definitions. Note that the density with respect to m is symmetric, as it should.
Corollary 2. The following statements hold for q.e. x ∈ E. Let X be a BESM(δ, d) process as above.
where C(u) = inf{t ≥ 0 : A t > u} is the right-continuous inverse of A. Then A is strictly increasing, and ξ is a BES(δ) process stopped at A ∞ .
Proof. Part (i) is immediate from the well-known fact that the trace of a WIS(α, d) process is a BESQ(dα) process, see e.g. [6]. We now prove part (ii). The strict increase of A follows from the fact that X spends zero time at ∂E (since m(∂E) = 0), so that {t : adj X t = 0} is a nullset. Next, set Z = X ⊤ X and note (see [6,Section 4]) that det Z satisfies Tr(adj Z s )ds for some standard Brownian motion β. Hence after a time change (see [30, Proposition V.1.4]) and using that where we defined which is Brownian motion stopped at A ∞ . It follows that det Z C(·) satisfies the stochastic differential equation for the BESQ(δ) process, stopped at A ∞ . Since det X t = √ det Z t the result follows.
Corollary 3. For 0 < δ < 1, X fails to be a semimartingale under P x for q.e. x ∈ E.
Proof. If X were a semimartingale, then so would det X, as well as the process ξ in Corollary 2. However, the BES(δ) process, 0 < δ < 1, fails to be a semimartingale on any interval larger than [0, τ ), where τ is the first time it hits zero. It thus suffices to show that det X t = 0 for some finite t. Indeed, setting u = A t then yields ξ u = det X t = 0 and u < A ∞ , due to the strict increase of A. Thus ξ hits zero before it is stopped, and fails to be a semimartingale. This contradiction shows that X could not have been a semimartingale. The fact that det X t = 0 for some finite t follows from the corresponding well-known fact for the Wishart process.
Remark 5. In view of Corollaries 1 and 3 one wonders whether X is a semimartingale for δ = 1. Just as in the scalar case it turns out that it is-in fact, X is reflected Brownian motion. We do not discuss this further here.
We close this section with a pathwise construction of the BESM process as the strong solution to the stochastic differential equation (5). This construction only works for δ ≥ 2 and if the process starts from the interior of E. Whether strong solutions exist for all δ > 1 is an open question also in the case of Wishart processes.
Proposition 3. Suppose δ ≥ 2, and let W be standard M d -valued Brownian motion defined on some probability space. The stochastic differential equation has a unique E o -valued strong solution for every x ∈ E o .
Proof. The proof uses the so-called McKean's argument; see [26, Section 4.1] for a thorough treatment in a related setting. We only sketch the proof here. Since To see this, set Z = X ⊤ X and note that we have and after verifying that W is again M d -valued Brownian motion on [0, ζ), it follows that Z is a WIS(d − 1 + δ, d) process on [0, ζ) (this calculation is of course closely related to the one leading up to (8).) Since δ ≥ 2, well-known properties of the Wishart process (c.f. [6], or Corollary 2 (ii) above) imply that det X t = √ det Z t stays strictly positive, and that X t 2 = Tr Z t is nonexplosive. Hence ζ = ∞ as claimed, and the result follows.

Uniqueness
The goal of this section is to establish uniqueness of the BESM semigroup. Specifically, we will prove that (L, D) is Markov unique. This means that there is at most one (and hence exactly one) symmetric sub-Markovian strongly continuous contraction semigroup on L 2 (E, m) whose generator extends (L, D), see [13, Definition 1.1.2]. Since the Hunt process corresponding to such a semigroup is unique up to equivalence 1 , this form of uniqueness will hold for any realization of the BESM process as a Hunt process. In particular, uniqueness in law is guaranteed.
1 Two symmetric Hunt processes are called equivalent if their transition functions coincide outside a properly exceptional set, see Section 4.1 in [15].
Note that m(∂E) = 0. Therefore L 2 (E, m) and L 2 (E o , m) can be identified, implying that it is enough to prove Markov uniqueness of (L, D) as an operator on the latter space. To do this we will apply a general result by Eberle [13,Corollary 3.2] that relies on studying the relationship between various weighted Sobolev spaces, which we now introduce. To simplify notation we henceforth write Observe that 1/(det x) δ−1 is locally integrable on Ω. Hence by [24,Theorem 1.5], L 2 (Ω, m) is continuously imbedded in L 1 loc (Ω). In particular, every f ∈ L 2 (Ω, m) has a gradient Df in the sense of distributions, and one can define the weak Sobolev space Equipped with the norm [24,Theorem 1.11]. Appendix B reviews some basic properties of W 1,2 (Ω, m) that will be needed in the sequel. We also consider the following strong Sobolev spaces (here the word completion is always meant with respect to the norm · W 1,2 (Ω,m) ): These are all Hilbert spaces by construction, and we automatically have The main result of this section shows that for any δ > 0, the last two inclusions are equalities; and if δ ≥ 2, all three inclusions are equalities. This will lead to Markov uniqueness of the BESM semigroup.
Theorem 4. The following statements hold.
Before giving the proof of Theorem 4, we note that Markov uniqueness now follows directly from the basic criterion for Markov uniqueness given in [13,Corollary 3.2], which only relies on the equality W 1,2 (Ω, m) = H 1,2 Neu (Ω, m).
, and we deduce the following corollary to Theorem 4: In particular, [23, Theorem 2.5] then implies that the last inclusion in (10) is in fact an equality.
The proof of Theorem 4 relies crucially on Theorems 5 and 6. It also uses what we refer to as tube segments, discussed in Appendix C, as well as some basic properties of the space W 1,2 (Ω, m), reviewed in Appendix B. Most of the difficulties arise for 0 < δ < 2. In fact, the case δ ≥ 2 only requires the following lemma, which uses Theorem 5 but not Theorem 6.
Since the elements f ∈ W 1,2 (Ω, m) that are bounded with bounded support are dense (see Lemma 11 (iv)), it suffices to approximate such f by elements h ∈ W Γ . By scaling we may assume |f | ≤ 1. Let ε > 0 be arbitrary, and let C > 0 be the constant given by Lemma 12. By Theorem 5 there is a neighborhood U of Γ and an element g ∈ W 1,2 (M d , m) such that g ≥ 1 on U and g W 1,2 (Ω,m) ≤ ε/C. By truncating (using Lemma 11 (ii)) we may assume |g| ≤ 1. Define h = (1 − g)f . Lemma 12 then yields This proves the lemma.
For 0 < δ < 2 the boundary no longer has zero capacity, which makes this case more delicate. The following lemma is crucial. It uses a powerful extension theorem due to Chua [7] for weighted Sobolev spaces with Muckenhoupt weights. In particular, therefore, we will rely on the Muckenhoupt A 2 property of | det x| δ−1 for 0 < δ < 2, which is asserted by Theorem 6. Chua's theorem requires the domain to be a so-called (ε, δ)-domain. Unfortunately, it does not appear straightforward to show that Ω itself is of this type. Instead we employ a partition of unity argument with an open cover consisting of tube segments (see Appendix C). The intersection of Ω with a tube segment around some x ∈ M d−1 is an (ε, δ)-domain (indeed, a Lipschitz domain), and Chua's theorem becomes applicable.
Proof of Theorem 4 (i). In view of (10) and the definition of H 1,2 Neu (Ω, m), we need to prove that D is dense in W 1,2 (Ω, m). By Lemma 4 it suffices to approximate compactly supported elements f ∈ W 1,2 (M d , m) such that f = 0 on a neighborhood of Γ. By mollification (see [23,Lemma 1.5]) we can assume f ∈ C ∞ c (E \ Γ). An approximating function h ∈ D can be constructed explicitly, relying on the fact that f = 0 on a neighborhood of Γ. We now give the details.
For ε > 0, let φ ε ∈ C 2 (R + ) satisfy the following properties: and φε(t) t are bounded in t (where the bound may depend on ε), Such φ ε exists: first set φ 1 (t) = tψ(t) where ψ is some smooth cutoff function, and then φ ε (t) = εφ 1 (t/ε). Now, let K denote the support of f and define Note that ∇ det(x) = adj x ⊤ = 0 for all x ∈ K ⊂ M d \ Γ, so that g is well-defined and in C 2 c (E). Moreover, we have G • ∇ det = 0. Consider the function We claim that h ε ∈ D and h ε → f in W 1,2 (Ω, m) as ε ↓ 0. To prove this we first obtain, via a calculation using the chain and product rules, the following two expressions for ∇h ε : Equations (11) and (12) and properties (i) and (ii) of φ ε yield the pointwise inequalities Together with the fact that m({x ∈ K : 0 ≤ det ≤ ε}) tends to zero as ε ↓ 0, this yields h ε ∈ D and h ε → f in W 1,2 (Ω, m). It remains to check h ε ∈ D. Clearly h ε ∈ C 2 c (Ω). Moreover, (13) together with the orthogonality G • ∇ det = 0, as well as the fact that Property (iii) of φ ε implies that the right side is bounded, as required. This completes the proof.
Remark 7. Using results in [12], the space D can be shown to be dense in C 2 c (E) with respect to the norm · W 1,2 (Ω,m) . An alternative approach to proving Theorem 4 (i) would therefore be to show directly that C 2 c (E) is dense in W 1,2 (Ω, m), for example by showing that Ω is an (ε, δ)-domain and then apply Chua's extension theorem. Proving the (ε, δ) property does not appear to be straightforward-one obstruction is that Ω does not lie on one side of its boundary, as discussed in the Introduction.

Low-rank matrices have zero capacity
This section is devoted to proving that the sets M k consisting of rank k matrices have zero capacity for all sufficiently small k. This is a key ingredient in the proof of Theorem 4, and also interesting in its own right. We use the following notion of capacity. For any subset where the infimum is taken over all f ∈ The core of the proof of Theorem 5 is an application of the following lemma, which bounds the growth of the determinant function near a point x ∈ M k .

Proof. By [3, Corollary 5],
where σ(x) = (σ 1 (x), . . . , σ d (x)) is the vector of singular values of x, and p i is the i:th elementary symmetric polynomial in d variables. Now, p d−i (σ 1 (x), . . . , σ d (x)) consists of a sum of terms, each of which is the product of d − i distinct elements of σ(x). However, since rank x = k, only k of those elements are nonzero. Therefore the product must contain at least one zero factor whenever d − i > k, implying that p d−i (σ 1 (x), . . . , σ d (x)) = 0 for these i. Since in addition v ≤ 1 and det x = 0, we get The local Lipschitz property follows from the smoothness of p d−i and the fact that the singular value map is Lipschitz continuous, see [21,Theorem 7.4.51].
In proving Theorem 5, the case δ = 2, k = d−1, turns out to require separate treatment using the following lemma.
We are now ready to prove Theorem 5. The proof uses the tube segments discussed in Appendix C.
Proof of Theorem 5. Suppose for any fixed x ∈ M k we can find a bounded neighborhood U of x in M d and bounded functions g ε ∈ W 1,2 (U, m) such that each g ε equals one on a neighborhood of M k ∩ U , and lim ε↓0 g ε W 1,2 (U,m) = 0 holds. We then take an open set V ⊂ M d with V ⊂ U , and a smooth cutoff function φ ∈ C ∞ c (M d ) with φ = 1 on V and φ = 0 on M d \ U . The function f ε = φg ε then lies in W 1,2 (M d , m), is equal to one on a neighborhood of M k ∩ V , and satisfies lim ε↓0 f ε W 1,2 (M d ,m) = 0 by Lemma 13. It follows that Cap(M k ∩ V ) = 0. Since M k can be covered by countably many such sets M k ∩ V , we deduce Cap(M k ) = 0, as desired.
We thus focus on finding functions g ε as above. To this end, set M = M k , n 1 = d 2 − (d − k) 2 , n 2 = (d − k) 2 , pick x ∈ M , and let U be a tube segment of width ε ∈ (0, 1) around x, see Definition 3 and Proposition 4 in Appendix C. Let Φ : A × B ε → U be the corresponding diffeomorphism, where A ⊂ R n 1 is a centered open ball, and B ε ⊂ R n 2 is the centered open ball of radius ε. Let π : R n 1 × R n 2 → {0} × R n 2 be the projection onto the last n 2 coordinates. Let φ ∈ C ∞ (R + ) be a cutoff function valued in [0, 1], equal to one on [0, 1/2], equal to zero on [1, ∞), and with |φ ′ (t)| ≤ 3 for all t ∈ R + . For each 0 < ε < ε, define a map We then have g ε ∈ C ∞ (U ) and g ε = 1 on Φ(A × B ε/2 ), a neighborhood of M ∩ U . It remains to prove g ε ∈ W 1,2 (U, m) and lim ε↓0 g ε W 1,2 (U,m) = 0. A computation based on the chain rule gives the gradient of g ε , where ∇(Φ −1 ) denotes the transpose of the Jacobian matrix of Φ −1 , and similarly for ∇π.
where C = sup x∈U ∇(Φ −1 )(x) op is finite by property (iv) of Definition 3, and where we used that the projection π is 1-Lipschitz. Write We then have g ε = 0 on U \ U ε , which yields Thus, it remains to show that lim ε↓0 ε −2 m(U ε ) = 0 holds. A change of variables yields where J = det ∇Φ is the Jacobian determinant. Since Φ has bounded derivative, there is a constant κ such that Φ is κ-Lipschitz and J ≤ κ holds. Together with Lemma 5 (and the fact that Φ(y, 0) ∈ M ), we get where c k (·) is as in Lemma 5. The integral over A is finite due to the boundedness of Φ −1 on A × {0} and the Lipschitz continuity of c k on U , so we get for some constant C > 0 that does not depend on ε. Since the integral is over n 2dimensional space, the right side is finite provided Since, as we just saw, (17) holds, this quantity tends to zero as ε tends to zero. This finishes the proof of the case δ > 0, k ∈ {0, . . . , d − 2}.
If δ > 2, then (17) holds also for k = d − 1, which takes care of this case as well. The only case that remains to consider is δ = 2, k = d − 1. This is done by a slight modification of the above argument. First, g ε is now given by where φ ε is the function from Lemma 6. (Note that n 2 = (d − k) 2 = 1, so that π • Φ −1 (x) is a real number; hence the absolute value bars.) Since φ ε is Lipschitz it is almost everywhere differentiable by Rademacher's theorem. Hence ∇g ε is well-defined up to a nullset. Next, instead of (15) we need a more precise estimate. Specifically, we have the inequality where as before C = sup x∈U ∇(Φ −1 )(x) op is finite. In particular this gives g ε ∈ W 1,2 (U, m). By the same calculations as those leading up to (16) we then obtain, using Lemma 5, By the property (14) of φ ε given in Lemma 6, the right side tends to zero as ε ↓ 0. This concludes the proof.

The Muckenhoupt A p property
Weight functions satisfying the so-called Muckenhoupt A p condition play an important role in potential theory, where they arise as precisely those weight functions for which the Hardy-Littlewood maximal operator is bounded on the corresponding weighted L p space, 1 < p < ∞, see [28]. This and related results have far-reaching consequences, some of which are discussed in [32,33,23]. In this section we prove that the weight function w(x) = | det x| α lies in the Muckenhoupt A p class for certain combinations of p and α. Our result generalizes the case d = 1, for which the result is known, in a striking way. We let |A| = A dx denote the Lebesgue measure of a measurable subset A ⊂ M d .
(i) If −1 < α ≤ 0, then w lies in the Muckenhoupt A 1 class. That is, there is a constant C > 0 depending only on d and α, such that for every ball B ⊂ M d .
(ii) If −1 < α < p − 1, p > 1, then w lies in the Muckenhoupt A p class. That is, there is a constant C > 0 depending only on d, α and p, such that Once part (i) has been proved, part (ii) follows directly from [32,Proposition IX.4.3]. It thus suffices to prove part (i), which will occupy the rest of this section. We first introduce some notation. Let D d + denote the set of diagonal matrices with nonnegative and ordered diagonal elements, The open ball centered at x ∈ M d with radius r > 0 is denoted by B(x, r). Its intersection with the nonsingular matrices is denoted by B * (x, r). That is, The proof of the Muckenhoupt property is somewhat involved (but nonetheless mostly elementary), due to the relatively complicated geometric structure of the set ∂E, which is where the weight function becomes singular. The main idea is to change variables using the QR-decomposition and integrate over the product space O(d) × T (d) instead of M d . Unfortunately, balls in M d do not always map to balls (or comparable shapes) in O(d) × T (d), and this is where the main complications arise. The resolution to this issue resides in Lemma 8 below, which relies on a detailed analysis of the mapping taking x to its QR-decomposition.
We start with a lemma that establishes an inequality similar to (18), where the balls B are replaced by sets of the form U · K = {QR : Q ∈ U, R ∈ K}, with U ⊂ O(d) measurable and K ⊂ T (d) a cube.
Then there is a constant C 1 > 0, depending only on d and α, such that the inequality holds for any measurable subset U ⊂ O(d) and any cube K ⊂ T (d).
Proof. Pick a cube K = {R ∈ T (d) : R ij ∈ I ij , i ≤ j}, where the I ij are bounded intervals, and let U ⊂ O(d) be measurable. By Lemma 1 we have Hence the result follows from the following Claim: Let α ∈ (−1, 0] and β ≥ 0. Then there is a constant C α,β such that for every bounded interval I ⊂ (0, ∞), we have To prove the Claim it suffices to consider I = (a, b) with 0 ≤ a < b. We obtain: as required.
Consider now balls B(Σ, r), where the diagonal elements of Σ ∈ D d + are either "large" (comparable to the radius r) or zero. The following result reduces the proof that (18) holds for balls of this form to an application of Lemma 7. In the statement of condition (19) below, we use the convention that σ 0 = ∞ and that i runs over {0, . . . , d}.
Lemma 8. Suppose Σ ∈ D d + and r > 0 satisfy the following property, where σ ∈ R d is the vector of diagonal elements of Σ: There is an index n ∈ {0, 1, . . . , d} such that σ i > 18dr for all i ≤ n, and σ i = 0 for all i > n.
Then there is a measurable subset U ⊂ O(d) and a cube K ⊂ T (d) such that where C 2 is a positive constant that only depends on d.
Proof. The problem of finding the advertised constant C 2 can be reduced to proving the following Claim, where e 1 , . . . , e d denote the canonical unit vectors in R d : There is a constant C 3 , depending only on d, such that the following holds: For any x ∈ B * (Σ, r), let x = QR be its QR-decomposition, and let q 1 , . . . , q d be the columns of Q. Then the inequalities R − Σ < C 3 r and |q i − e i | < rσ −1 i C 3 hold for all i ∈ {1, . . . , n}, where n is the index from condition (19).
Let us show how the statement of the lemma follows from this claim. Define K to be the cube in T (d) centered at Σ with side 2C 3 r, i.e.
and let U ⊂ O(d) be given by The Claim then directly implies B * (Σ, r) ⊂ U · K. We thus need to show that it also implies U · K ⊂ B * (Σ, C 2 r) for some constant C 2 > 0 that only depends on d. To this end, observe that for any x = QR ∈ U · K we have, by the triangle inequality, the rotation invariance of · , and the definition of K, Furthermore, since σ i = 0 for i > n, we have (Q − I)Σ 2 = σ 2 1 |q 1 − e 1 | 2 + . . . σ 2 n |q n − e n | 2 . We then deduce from the Claim that (Q − I)Σ < √ nC 3 r, and consequently We are thus left with proving the Claim. Since it is vacuously true for n = 0, we can assume n ≥ 1. The proof relies on a rather careful analysis of the Gram-Schmidt orthogonalization procedure for obtaining the QR-decomposition of a generic matrix x ∈ B * (Σ, r), so we briefly recall this procedure. To improve readability, we temporarily (for this proof only) adopt the notation y, z = y ⊤ z for y, z ∈ R d . Fix x ∈ B * (Σ, r) and let x 1 , . . . , x d be the columns of x. To obtain the QR-decomposition of x, one defines and, if q 1 , . . . , q j−1 have been defined, The vectors q 1 , . . . , q d obtained in this way are the columns of Q, and R is given by We now proceed with the proof of the Claim. Recall that e 1 , . . . , e d are the canonical unit vectors in R d . Since x ∈ B * (Σ, r), we have Also let a = 5 + 18d denote the constant appearing in condition (19).
Together with (22) this yields where in the last step we used that σ k ≤ σ j (since k > j) and σ j > ar (since j ≤ n). Since a = 18d, the right side is at most 3r, as one readily verifies. We deduce that (21) holds with j replaced by j + 1, and since it is vacuously true for j = 1 it follows by induction that it holds for all j ∈ {1, . . . , n + 1}. We now use this result to bound |R jj − σ j | for j ∈ {1, . . . , n}. To this end, write using that q i , x j 2 = |R ij | 2 < 9r 2 due to (21). Moreover, we have and by the reverse triangle inequality, Assembling the pieces and using the bound (22) gives Dividing the numerator and denominator by σ j and using that σ j > ar, we finally arrive at The only elements of R that remain to analyze are R ij for i ≤ j and j > n. But x j = h j for these j, so |R ij | = | q i , x j | ≤ |h j | < r. We are now able to estimate R − Σ as follows: A bound solely in terms of d is then easily obtained. For instance, we may take Let us now focus on bounding |q j − e j |, j ∈ {1, . . . , n}. The calculations are similar to the ones used to bound |R jj − σ j | above, but slightly simpler. We have using (23) in the last step. Using again (22) together with σ j > ar, The Claim, and hence the lemma, is now proved, if for C 3 we take the maximum of 11d and the constant in (24).
Next, Lemma 10 below implies that the proof of (18) for any ball whose center lies in D d + reduces to an application of Lemma 8. It uses the following simple observation. Proof. The result clearly holds for k = 0. If it holds for k − 1, we get 1 + a + a(1 + a) + · · · + a(1 + a) k = (1 + a)(1 + a + · · · + a(1 + a) k−1 ) = (1 + a) k+1 , showing that it holds for k as well.

A Proof of the integration by parts formula
In this section we give a proof of the integration by parts formula, Theorem 2, which we now restate for the reader's convenience: Throughout the proof, let K be the (compact) support of f . For ε ≥ 0, define For ε > 0, U ε has smooth boundary with outward unit normal ν(x) at x ∈ ∂U ε . For any smooth function h : U 0 → R such that the integrals are well-defined, the standard integration by parts formula yields, for each ε > 0, where σ ε denotes the surface area measure on ∂U ε . Case 1: δ > 1. Take h(x) = det(x) δ−1 . As ε ↓ 0, the left side of (25) tends to The absolute value of the boundary term is then dominated by using also that h(x) = ε δ−1 for x ∈ ∂U ε . It is easy to see that σ ε (K) remains bounded as ε ↓ 0, so we conclude that the boundary term vanishes in the limit. Consider now the second term on the right side of (25). The product rule yields and since det(x) δ−2 dx is a Radon measure due to Theorem 1 and the fact that δ > 1, we may again use dominated convergence to get (Here we used the equality ∇ det(x) det(x) δ−2 dx = x −⊤ m(dx).) Assembling the pieces gives the desired formula (2). Case 2: δ = 1. We again take h(x) = det(x) δ−1 ≡ 1. Except for the boundary term, everything works as in the case δ > 1, if we just note that ∇h = 0. Letting C be a bound on |f (x)| over K, the boundary term is bounded above by Using that G(x) is tangent to ∂E at every x ∈ ∂E it is not hard to show that the supremum tends to zero. Hence (2) is established.
Case 3: δ < 1. Things are now a bit more complicated due to the fact that det(x) δ−1 blows up at ∂E. To get around this, for each n let τ n be a smooth, nondecreasing function satisfying the following properties: τ n (t) ≤ t ∧ n, τ n (t) = t for t ≤ n − 1, τ n (t) = n for t ≥ n + 1 0 ≤ τ ′ n ≤ 1, τ n (t) ↑ t as n → ∞.
In (25) we now take h = h n , where h n = τ n • w and w(x) = det(x) δ−1 . We first hold n fixed and let ε ↓ 0. The left side of (25) converges to E ∇f (x) • G(x)h n (x)dx by dominated convergence. The boundary term on the right side will vanish by the same argument as in the case δ = 1. The integrand in the second term on the right side is in fact bounded, since by the properties of τ n , The final step is to send n to infinity. The left side converges to E ∇f (x) The right side is finite since G • x −⊤ is bounded on K by hypothesis, so dominated conver- It is clear that for two open subsets U , V satisfying U ⊂ V and an element f ∈ W 1,2 (V, m), we have f | U ∈ W 1,2 (U, m). To alleviate notation we simply write f ∈ W 1,2 (U, m). If the open set U ⊂ M d has compact closure in Ω, we have C −1 ≤ (det x) δ−1 ≤ C for some constant C > 1 and all x ∈ U . Hence · W 1,2 (U,m) and · W 1,2 (U,dx) are equivalent, (26) and the unweighted space W 1,2 (U, dx) coincides with W 1,2 (U, m). This has several useful consequences.
Proof. (i) Let K be the support of f and pick an open set U ⋐ Ω with K ⊂ U . Then ψ ε * f → f in W 1,2 (U, dx) by standard results in the unweighted case, see [1,Lemma 3.16].
The result now follows with C = 2 + 2κ 2 .

Lemma 13.
Consider open subsets U ⊂ V and a function φ ∈ C ∞ c (V ) with φ = 0 on V \ U . Then there is a constant C > 0 such that holds for all g ∈ W 1,2 (U, m).

C Tube segments
The proofs of some of our results require the notion of a tube segment, which we now introduce. For a background on the relevant notions from differential geometry the reader is referred to [25]. Let M be a smooth n 1 -dimensional embedded submanifold of R n (n 1 < n) and set n 2 = n − n 1 . For the applications in the present paper, R n is identified with  Shrinking V if necessary, we may assume that V has compact closure in M , and that there exists a diffeomorphism ψ : V → A. Set ε = inf x∈V ρ(x), which is strictly positive since V has compact closure in M , and define This is a subset of T , so the addition map χ takes T V,ε diffeomorphically to a neighborhood U of x in R n with M ∩ U = V . Summarizing, we have the diffeomorphisms We thus define Φ = χ • ϕ −1 • (ψ, Id) −1 and check that this map, which is clearly a diffeomorphism, satisfies properties (i)-(iii). For (i) we have Hence Φ(A × {0}) = V = M ∩ U holds, as required. Property (ii) is immediate since M ∩ U = V has compact closure in M by construction. For (iii), note that the inclusion U ⊂ χ(T ) holds, and that the latter set does not intersect M \ M . Finally, if (iv) fails we replace ε and A by ε ′ < ε and a centered ball A ′ A, and consider the restriction Φ ′ = Φ| A ′ ×B ε ′ .