Stability estimate for the broken non-abelian X-ray transform in Minkowski space

We study the broken non-abelian X-ray transform in Minkowski space. This transform acts on the space of Hermitian connections on a causal diamond and is known to be injective up to an infinite-dimensional gauge. We show a stability estimate that takes into account the gauge, leading to a new proof of the transform's injectivity. Our proof leads us to consider a special type of connections that we call light-sink connections. We then show that we can consistently recover a light-sink connection from noisy measurement of its X-ray transform data through Bayesian inversion.


Introduction and main results
We start by defining the broken non-abelian X-ray transform and provide the motivation for its study. We then state our main results. Sections 2 and 3 contain the proofs of those results.
Recall that a line segment γ : and that it is future-pointing if v 0 > 0 and past-pointing if v 0 < 0. We say that γ is parametrised by arc length if |v| R 4 = 1. The set of points y ∈ R 1+3 such that there is a future-pointing (past-pointing) lightlike geodesic from x to y is called the future (past) light cone at x. Hence, (x, y) ∈ L if and only if y is in the future light cone of x, or equivalently, x is in the past light cone of y.
We will work with Hermitian connections on the trivial bundle D × C n . Such a connection A is a u(n)-valued one-form on D and we can write it as A = A 0 dt + A 1 dx 1 + A 2 dx 2 + A 3 dx 3 for some matrix fields A i ∈ C ∞ (D, u(n)). We denote the set of Hermitian connections on D by U . A connection induces a covariant derivative on functions f : D → C n given by d A f = df + Af . Given a smooth curve γ : [0, T ] → D, the parallel transport isomorphism P A γ : C n → C n is given by the solution of the matrix ODE U (t) + A(γ(t))U (t) = 0, at time T . Hence, the parallel transport of a vector v ∈ C n along γ is P A γ v := U (T )v. One can check that P A γ does not depend on the parametrisation of γ and that it takes values in U (n) since A is Hermitian. Given x, y ∈ D, we denote by P A y←x the parallel transport from x to y along the straight line between the two points. The notation is chosen as to behave nicely with compositions.
We can now define the broken non-abelian X-ray transform. In [CLOP21a] and [CLOP21b], they define it as follows. Consider the set S + ( ) := {(x, y, z) ∈ D 3 : (x, y), (y, z) ∈ L, x < y < z with x, z ∈ , y ∈ }. This set is comprised of light rays starting from x ∈ that exit and break at y ∈ before returning to at z. We denote by the sets of values that x and z can take in , respectively. It is important to note that neither X or Z cover , but that = X ∪ Z . Given a Hermitian connection A as above, its broken non-abelian X-ray transform is Therefore, the scattering data of A ϕ coincides with that of A whenever ϕ is in the gauge group G := {ϕ ∈ C ∞ (D, U (n)) : ϕ| = Id}.
This natural obstruction to recovering A from S A turns out to be the only one. Indeed, it is shown in [CLOP21a,Theorem 5] that Hermitian connections A and B share the same scattering data if and only if they are in the same gauge orbit, that is, there exists ϕ ∈ G such that B = A ϕ.
Our goal is to find a stability estimate relating the scattering data of two connections A and B with some measure of distance between them in a gauge invariant way. In other words, we want to show that A and B must be relatively similar whenever S A and S B are close.
1.2. The non-abelian X-ray transform and broken Radon transform. The usual non-abelian X-ray transform assigns to a matrix field A ∈ C ∞ (R d × S d−1 , C n×n ) the scattering data map where ψ + is the unique solution of the transport equation such that lim s→−∞ ψ + (x + sθ, θ) = Id .
Given that A decays sufficiently fast as |x| → ∞, the transform is well-defined and one can ask whether it is possible to recover A from the scattering data. The nonabelian X-ray transform has been studied extensively in the last 20 years and has applications in many different types of tomographies, such as single-photon emission computed tomography or neutron polarisation tomography. See [Nov19] for a recent survey on the non-abelian X-ray transform and its applications.
The non-abelian X-ray transform has also been studied on simple surfaces [PS20,MNP21] and compact manifolds with strictly convex boundary [Boh21] where the transport equation is now solved along unit-speed geodesics with endpoints on the boundary of the manifold. For more details and background on the two-dimensional problem, see [PSU21].
When n = 1, the broken non-abelian X-ray transform is also called the broken-ray Radon transform. In [FMS11], they consider the broken-ray Radon transform with rays breaking at a fixed angle within a slab and provide an inversion formula. The broken-ray Radon transform has applications in optical tomography, see [AS09] for a survey. The V-line Radon transform [Amb12,ALJ19] is another example of an inverse problem making use of broken rays and has applications in imaging.
1.3. Physical motivation. The broken non-abelian X-ray transform has been introduced in [CLOP21a] where they began to analyse inverse problems for the Yang-Mills-Higgs equations. They show that one can recover a Hermitian connection A from the source-to-solution map L A taking a source f ∈ C 4 c ( , C n ) to Here A is the connection wave operator given by Note that when A = 0, we recover the usual wave operator = ∂ 2 t − ∆. The map L A is well-defined as long as f is sufficiently small. They show that the maps L A and L B agree if and only if A and B are gauge equivalent. To do so, they first show that L A determines the broken non-abelian X-ray transform S A z←y←x for all (x, y, z) ∈ S + ( ). Injectivity up to gauge of L A then follows from that of the broken X-ray transform.
To determine S A z←y←x from the source-to-solution map L A , they construct a source of the form where each f j is a conormal distribution supported near x ∈ . Let φ be the solution of (1) corresponding to such an f . The functions ∂ j φ| j =0 satisfy a wave equation and, when the sources are chosen carefully, can produce an artificial source at y which emits a singular wave front that reaches z. This interaction is encoded in the operator f → ∂ 1 ∂ 2 ∂ 3 φ| =0 , whose principal symbol determines S A z←y←x . The creation of an artificial source is only possible thanks to the nonlinearity in (1) and shows how one can exploit nonlinearities in an advantageous way, similar to what is shown in [KLU18].
1.4. Statistical motivation. The second motivation for considering the broken nonabelian X-ray transform is to use it as an example for dealing with injectivity issues that arise in the study of Bayesian inverse problems. We give a short summary to the Bayesian approach to solving inverse problems, as introduced in [Stu10].
For some mapping G : Θ → Y between Banach spaces, and y ∈ Y , we wish to find θ ∈ Θ such that y = G(θ).
Let us take Y = L 2 λ (X , V), the set of square-integrable functions on a probability space (X , λ) with values in a finite-dimensional normed space V. Rather than working with the whole infinite-dimensional L 2 space, we discretise it by considering the following regression model which mimics the setting of an experiment. Let (X i ) N i=1 be i.i.d. random variables on X with distribution λ. These random variables correspond to experimental measurement of G θ = G(θ) with input X i . Such measurements come with experimental noise that we model through the random variables where the E i are i.i.d. standard Gaussian variables on V, independent of the X i .
In our setting, the set Θ could be the set of Hermitian connections A on D, Y the set of matrix fields on S + ( ) and θ → G θ the mapping that sends a connection A to its scattering data S A . Each X i then amounts to a random choice of path z ← y ← x in S + ( ) and V i a noisy version of S A z←y←x .
be the full data vector and let P N θ be its law. By making a choice of prior Π on the parameter space Θ, Bayes' rule yields a posterior distribution on Θ given the data D N . For a Borel set O ⊂ Θ, it is given by where the log-likelihood is, up to additive constants, One can study how the posterior distribution Π N behaves when N gets large. If we suppose there exists a unique underlying parameter θ ∈ Θ from which the observations are made, we would want the posterior distribution to concentrate around θ (see [GN16,Chapter 7.3] or [GvdV17]), that is, we would want that as N → ∞ for some sequence δ N → 0 that dictates the rate of convergence. Following substantial developments in the field, one should then get a good estimatorθ for θ by computing the expectation of the posterior distribution Π N through MCMC sampling. Depending on the inverse problem, can we get estimates such as (2) and can we guarantee that the posterior mean indeed converges to θ , legitimising Bayesian inversion? This question has been studied for a range of different inverse problems and is an active area of research, see [MNP21] as well as [AN19,Boh21,GN20] for some examples.
However, in the case of the broken non-abelian X-ray transform, the map G : A → S A is not injective and so the true underlying parameter is not uniquely identifiable. Indeed, all connections in the same G -orbit yield the same scattering data. Moreover, these orbits are all infinite dimensional. Can we still find a way to get a meaningful candidate for A from samples of S A through the framework of Bayesian inverse problems?
The first approach one could use to deal with injectivity issues is as follows. Let us assume, as is our case, that a group G acts on Θ and that G is injective up to the action of G. This means that for every g ∈ G and θ ∈ Θ, we have G(θ g) = G(θ) and that G(θ 1 ) = G(θ 2 ) if and only if there is g ∈ G with θ 1 = θ 2 g . Then G naturally induces an injective map on the quotient spacẽ One could try to prove statistical guarantees for this map. However, as settings such as the present one where G is non-linear, the quotient space Θ /G is intractable as it is unclear how one would parametrise the equivalence classes. What one needs is a choice of representative for each class in the quotient, that is, a continuous map s : Θ /G → Θ such that the following diagram commutes.
The existence of s is nontrivial and it is often the case that such a lift simply does not exist, see [Sin78] for examples where topological obstructions prevent its existence. And even if s exists, it might only be theoretical and not correspond to an explicit choice (not constructive or numerically computable). Hence, we need a new approach that is adapted to the problem we want to consider. What we will end up doing is finding another group H of which G is a proper subgroup and for which we can find an explicit section s H : Θ /H → Θ. Although the forward map will not be invariant under the action of H, our stability estimates will. Those same estimates will guarantee that the forward map is injective when restricted to the image of s H . We will then show that we can use Bayesian inversion to solve this restricted problem. Finally, through some choice of extension operator, we will show that, from the solution to the restricted problem, we can recover an element that is G-equivalent to the true solution θ . See the discussion after Proposition 1.9 for more details.
1.5. Definitions and notation. Before presenting the main results, we use this section to gather some notation and additional definitions that will be used throughout.
Unlike in [CLOP21a], we will not consider all paths in S + ( ). We will mostly consider two types of paths that we refer to as past-determined and future-determined paths. A past-determined path is a path of the form z ← y ← x y for (x y , y, z) ∈ S + ( ) where x y is the unique point such that (x y , y) ∈ L and x y ∈ O, that is, x y = (t, 0, 0, 0) for some t ∈ [−1, 1]. Similarly, a future-determined path is a path of the form z y ← y ← x for (x, y, z y ) ∈ S + ( ) where now z y is the unique point on O such that (y, z y ) ∈ L. We denote the corresponding scattering data as S A z←y←xy and S A zy←y←x .
Hence, the wiggle room in will only be used to move x in X or z in Z , but not both.
Remark 1.2. It is not sufficient to only consider paths that are both past-determined and future-determined, that is, paths of the form z y ← y ← x y . Indeed, in polar coordinates (t, r, ϑ, φ), the tangent vector along the path z y ← y is 1 √ 2 ∂ ∂t − ∂ ∂r while the tangent vector along the path y ← x y is 1 Hence, the angular components of the connection play no role in the forward problem for such paths.
Any future-determined path can be identified by its break point y ∈ D\ and its first endpoint x which lies in the intersection between X and the past light cone of y. We can represent the admissible future-determined paths as a B 3 -bundle π : F X → D \ , where B 3 stands for the unit ball in R 3 . For every y ∈ D \ , the fibre is given by Similarly, the set of admissible past-determined paths can be represented through the B 3 -bundle π : F Z → D \ with fibre We write F X ε or F Z ε whenever we want to emphasise the dependence of the bundles on ε through ε .
For two points x and y in D, let γ y←x : [0, T ] → D be the straight line from x to y parametrised by its (Euclidean) arc length. We denote v y←x :=γ y←x (T ), that is, v y←x is the unit length vector pointing from x to y, but based at y. For a function Φ : D 2 \ Γ → C n where Γ is the diagonal of D 2 , we define the differential operator Note that if (x, y) ∈ F X and the domain of Φ is F X , the operators ∂ y←x and ∂ x←y are both well-defined since x ∈ F X y+tvy←x and x + tv x←y ∈ F X y whenever x ∈ F X y and t is sufficiently small. One can see ∂ y←x and ∂ x←y as horizontal and vertical vector fields on F X , respectively. Similarly, ∂ y←z and ∂ z←y are well defined operators if (y, z) ∈ F Z and the domain of Φ is F Z .
We define the L 2 -norm of a function Φ : F X → C n as where dx is the natural measure on F X y induced by Euclidean space. Note that this norm scales down as ε goes to 0 at a rate of ε 3 .
Given a linear map T from R m to C n , we will denote its operator norm as where | · | denotes the usual norm on R m or C n . This induces a pointwise norm on C n×n -valued one-forms ω on D at any given point y ∈ D by seeing ω y as a mapping from R 4 to C n 2 . Note that we then have |ω y (v)| = Tr([ω y ω * y ](v)) and so ω y is invariant under the action of U (n). This also induces an L 2 -norm on the space of one-forms by  Setting for the broken non-abelian X-ray transform in R 1+2 for a future-determined path. The point y lies inside the causal diamond (in blue), but outside the set (in green). The point x can take values in the fibre F X y which is given by the intersection of and the past light cone at y (in red). The point z y is always taken on the origin's world line and is uniquely determined by y. The vector v y←x is based at y and points in the direction coming from x.
1.6. Main results. We state our results only for future-determined paths, but equivalent statements hold for past-determined paths by seeing [S A z←y←xy ] −1 as a futuredetermined path. We will first show the stability estimates below for the values of a connection inside and outside .
. Theorem 1.4. Let A and B be Hermitian connections. There exists a smooth function p ∈ C ∞ (D, C n×n ) vanishing on O and C > 0 such that for all 0 < ε < ε 0 , By combining both theorems we can get a new proof of the injectivity (up to the gauge G ) of the broken non-abelian X-ray transform. Proof. Since the scattering data of A and B agree for all future-determined paths, Theorem 1.3 implies that A and B must agree on X . By the same estimate for pastdetermined paths, the connections must also agree on Z and hence they agree on . Theorem 1.4 yields p ∈ C ∞ (D, C n×n ) such that As the proof of Theorem 2.2 will reveal, ϕ takes values in U (n) since actually ϕ = P A y←zy P B zy←y . We can rewrite (5) as ϕB = Aϕ + dϕ and so B = A ϕ. It follows that A and B are gauge equivalent since they agree on and so ϕ| = Id. The converse implication is the statement of Proposition 1.1.
This can be seen as a partial data result improving on Theorem 5 in [CLOP21a] as we only considered past-determined and future-determined paths. It also suggests that always taking such paths might be a more efficient problem to study.
Both stability estimates are invariant under the action of G , but they are also invariant under the action of the bigger group In fact, we can rewrite the left-hand side of (4) in a way that highlights this.
Theorem 1.6. Let p be as in Theorem 1.4. Then for all y ∈ D \ , Hence, this defines a distance between the connections A and B that is invariant under the action of H , and so gauge independent as G ⊂ H . With a little bit of work, we can combine this expression with Theorem 1.4 to get the following H 1 estimate.
Corollary 1.7. Let A and B be Hermitian connections. There exists a constant C > 0 such that and F A is the curvature two-form of A.
Theorem 1.6 also suggests we should naturally try to fix the gauge by considering connections such that A P A y←zy = A. We call them light-sink connections. They form a linear space and we can characterise them, see Proposition 3.2.
Proposition 1.8. Every connection A is H -equivalent to a unique light-sink connection and the map ρ : The map ρ is almost a fixing of the gauge. Contrary to that of G , the action of H on U does not preserve the scattering data. Therefore, the map ρ does not define a lift as we defined it in Section 1.4. Nonetheless, if a light-sink connection A is H -equivalent to another connection B, we can use their scattering data and the map ρ to make them gauge equivalent. Proposition 1.9. Let A be a light-sink connection and let B be a Hermitian connection such that From the past-determined and future-determined scattering data of A and B, we can find a map Φ ∈ H such that A Φ and B are gauge equivalent (with respect to G ).
The map Φ is defined up to an extension operator We have reduced the choice of a gauge to the choice of an extension operator E. Note that such an operator can be constructed by first extending with values in GL(n, C) and then projecting onto U (n) through a strong deformation retract (a continuous map F : [0, 1] × GL(n, C) → GL(n, C) such that F (0, x) = x and F (1, x) ∈ U (n) for all x ∈ GL(n, C), and F (t, ·)| U (n) = Id for all t ∈ [0, 1]).
In practice, say that we observe the scattering data S B on past-determined and future-determined paths for some connection B and that we have complete knowledge of the forward map A → S A . We wish to find the gauge equivalence class of B from S B , which amounts to finding a connection A such that A = B ϕ for some ϕ ∈ G . Our results give the following strategy to do so.
(1) By taking y on the boundary of , use Theorem 1.3 to determine B inside from the scattering data of B along past-determined and future-determined paths.
(2) Minimise the mapping over all light-sink connections A. Note that we can compute P B x←zx from the first step since we know B inside .
(3) By Corollary 1.7 and the definition of ρ, the unique minimiser of this problem is A = ρ([B]). (4) Use Proposition 1.9 to get a connection A Φ that is gauge-equivalent to B. Note that in step (2), S B zy←y←x P B x←zx is precisely the scattering data of ρ([B]), which explains why Corollary 1.7 implies that A = ρ([B]) is the unique minimiser of the problem.
One can implement this algorithm with the use of Bayesian inversion.
Step (2) is equivalent to recovering a light-sink connection from its scattering data and we will show in Section 4 that we can consistently do so through Bayesian inversion, see Theorem 4.2. Using similar arguments, one could also provide guarantees for recovering B on in step (1) using Bayesian inversion. As steps (3) and (4) are only simple direct computations, the above algorithm fits within the framework of Bayesian inverse problems. Therefore, by following these steps, one should be able to compute a connection that is close to being gauge-equivalent to B from noisy measurements of its scattering data.
Acknowledgements. I would like to thank Gabriel Paternain for suggesting this project and for his guidance. I would also like to thank Richard Nickl, Lauri Oksanen and Jan Bohr for their helpful comments. This research was supported by the Cambridge Trust, NSERC's PGS D scholarship and the CCIMI.

Stability estimate
The goal of this section is to prove the following two pointwise estimates from which Theorems 1.3 and 1.4 will follow.
Theorem 2.1. Let A and B be Hermitian connections on D. Then, there is a constant C > 0 such that for all x ∈ X , Theorem 2.2. Let A and B be Hermitian connections on D. There exists a smooth function p ∈ C ∞ (D, C n×n ) vanishing on O and C > 0 such that for all 0 < ε < ε 0 and y ∈ D \ ε , it holds that To do so, we introduce the attenuated X-ray transform, as well as a pseudolinearisation identity. We also show how to reformulate the theorems in the form of an H 1 estimate.
2.1. The attenuated X-ray transform. Let γ : [0, T ] → D be a smooth curve and let ω ∈ Ω 1 (D, C n ), that is, ω is a one-form on D with values in C n (we will actually use C n×n in the proof of Theorem 2.2, but everything will be defined analogously through the isomorphism with C n 2 ). Fix a Hermitian connection A on D as above. The attenuated X-ray transform of ω along γ with respect to A is given by Similar to the parallel transport, we can express I A γ (ω) as the solution of a matrix ODE. Lemma 2.3. Let u be the unique solution along γ : A quick computation shows that(U −1 ) = U −1 A. Therefore, along γ we havė Integrating both sides from 0 to T yields By definition of the parallel transport, U (t) = P A γ(t)←γ(0) . Isolating u(T ) in the previous equation and replacing U by the parallel transport yields the result.
If A vanishes identically, the attenuated X-ray is simply the integral of the one-form ω along γ, and so if ω is potential (ω = df for some f ∈ C ∞ (D, C n )), then the attenuated X-ray of ω is the difference between the values of f at both endpoints of γ by the fundamental theorem of calculus. This is not exactly true when A does not vanish as we have to account for the parallel transport in the definition of I A γ (ω). Instead of potential forms with respect to d, we actually have to consider potential forms with respect to d A = d + A to get an analog of the fundamental theorem of calculus.
Lemma 2.4. Let f : D → C n be a smooth function on D. Then Recall the definition of ∂ y←x as in (3). We can apply ∂ y←x to the attenuated X-ray to evaluate the values of a one-form from the tangent space at y.
Lemma 2.5. Let ω be a one-form on D. For x = y, T ] be the line segment from x to y parametrised by arclength. By extending γ, we see that γ(s) + tv y←x = γ(s + t). Hence, we get = P A x←y ω y (γ(T )) since γ(T ) = y. The result follows since v y←x =γ(T ).

2.2.
The broken attenuated X-ray transform. We will actually be interested in a broken version of the attenuated X-ray transform. One could define naively the broken attenuated X-ray transform I A z←y←x (ω) as I y←x (ω) + I y←z (ω). However, this is not compatible with Lemma 2.4 as we would want to hold in general. It also does not coincide with the usual attenuated X-ray transform I A z←x (ω) if x, y and z lie on the same line in order. Instead, we need to define the broken attenuated X-ray transform as (10) I A z←y←x (ω) := I A y←x (ω) + P A x←y I A z←y (ω). One can check that (9) holds under this definition and I A z←y←x (ω) = I A z←x (ω) whenever the curve z ← y ← x is smooth.

Pseudolinearisation identity.
The key tool in the proofs of Theorems 2.1 and 2.2 is the following pseudolinearisation identity. It relates parallel transports along a curve with respect to two different connections with an attenuated X-ray of their difference. See [PSU21, Chapter 13.2] for more details on the pseudolinearisation identity. We shall adapt their proof to our setting.
Lemma 2.6. For any smooth curve γ : [0, T ] → D and connections A and B, where E(A, B) ∈ End(C n×n ) is given by E(A, B)Q = AQ − QB for Q ∈ C n×n .
The right-hand side of (11) is the attenuated X-ray of A−B with respect to E(A, B). This is slightly different to how we introduced the attenuated X-ray earlier. However, we can see A − B as a one-form taking values in C n 2 C n×n and E(A, B) as a connection on the trivial bundle D × C n 2 . Before proving Lemma 2.6, we state another useful lemma.
Lemma 2.7. Let γ : [0, T ] → D be a smooth curve and let A and B be connections on D. For any Q ∈ C n×n ,

Rearranging the last equation yields (11).
Importantly, the pseudolinearisation identity is also valid in the broken case, where the parallel transports are replaced by the scattering data.  We can use Lemma 2.6 on both attenuated X-ray transforms and Lemma 2.7 on the parallel transport to get The pseudolinearisation identity and Lemma 2.5 are enough to prove Theorem 2.1.
Proof of Theorem 2.1. By interchanging the role of x and z, we see that the pseudolinearisation identity can also be written as The operator ∂ x←y is essentially a derivative with respect to x, and so the first term in the definition of the broken X-ray vanishes. Moreover, P

E(A,B) z←y
is unaffected. It follows from Lemmas 2.5 and 2.7 that Taking norms, the scattering data vanish since they belong in U (n) and we get The choice of z on the right-hand side is irrelevant, and we take z = z y . Since vectors of the form v x←y form a basis of the tangent plane at x without degenerating when ε goes to 0, we can find a constant C > 0 such that and the theorem follows.
To prove Theorem 1.3, it only remains to integrate over X to get a global estimate.
Proof of Theorem 1.3. By equivalence of norms, we can find C > 0 independent of both ε and x ∈ X such that After changing the integrand through equation (13) with z = z y , integrating over x ∈ X and using Cauchy-Schwarz yields the desired estimate.
The proof of Theorem 2.1 crucially relies on the fact that x is always an endpoint of the path and is not the breaking point, since then the operator ∂ x←y only hits I E(A,B) x←y in the expression for the broken attenuated X-ray. This allows us to evaluate A − B inside , but such an approach does not immediately work for evaluating A − B outside . This is where we need to take the gauge into account.

2.4.
Dealing with the gauge through a potential form. In order to use similar techniques as in the proof of Theorem 2.1 to estimate the connection outside , we aim to make the second term in (10) vanish. To do so, we will modify the argument of the attenuated X-ray by a potential form. For a connection A and a one-form ω, we define the function (14) p(y) := p A ω (y) = P A y←zy I A y←zy (ω). This function will serve as an approximate potential for ω. We chose p in this way so that I A zy←y (ω − d A p) vanishes for all y ∈ D \ , as the next lemma shows.
Lemma 2.9. Let γ : [0, T ] → D be the unit-speed lightlike geodesic from y to z y . Then, with p defined as above, we have for all t ∈ (0, T ). In particular, ω(v y←zy ) = d A p(v y←zy ).
We can deduce from Lemma 2.9 and (8) that I A zy←y (ω − d A p) = 0 for all y and so, on the one hand, . On the other hand, by (9), we have since p(z y ) = 0 and so by combining both expressions, we get . By applying ∂ y←x to both sides of the last expression, Lemma 2.5 yields ) since ∂ y←x is essentially a derivative in y, and so ∂ y←x p(x) = 0. To prove Theorem 2.2, we will replace A by E(A, B) and ω by A − B in (16) in order to use the pseudolinearisation identity.
2.5. Evaluating from the tangent space at y. As shown in [CLOP21a, Lemma 1] , the set of vectors v y←x for x ∈ (F X ε ) y form a basis of the tangent space at y, but this basis degenerates when ε goes to 0. We therefore need estimates to quantify how well we can estimate ω − d A p at y ∈ D \ from moving x around in the intersection of and the past lightcone of y.
Lemma 2.10. Let 0 < ε < ε 0 and let y ∈ D \ ε . Then The key to proving Lemma 2.10 is this small linear algebra lemma whose proof is straightforward.
Lemma 2.11. Let b 1 , . . . , b m be a basis of R m with |b i | R m = 1 and let T : R m → C n be a linear map. Then where B is the matrix whose columns are the b i 's and B −1 is the operator norm of its inverse.
Proof of Lemma 2.10. As stated earlier, Lemma 1 in [CLOP21a] guarantees that the set of vectors v y←x generate T y R 1+3 . Hence, we wish to apply Lemma 2.11 by evaluating from T y R 1+3 using different light rays γ x from x to y for different x ∈ ε with (x, y) ∈ L, that is, x in the fibre of F X ε at y. We first claim that it suffices to compute the case where y = (0, 1, 0, 0). Through a rotation in space and a translation in time, we can identify the sets {v y←x } x∈ ε and {v y ←x } x∈ ε whenever y and y share the same spatial norm. By symmetry, this does not intervene in norm estimates. Therefore, without loss of generality, we can choose y = y r = (0, r, 0, 0). Moreover, whenever r 1 < r 2 , we can see that {v yr 2 ←x } x∈ ε ⊂ {v yr 1 ←x } x∈ ε and so any stability estimate for y r 2 is also valid for y r 1 since we're taking the supremum over a larger set. Hence, it suffices to show the case r = 1, as claimed.
To apply Lemma 2.11, we need a basis of T y R 1+3 .
It is obvious that the b i 's are linearly independent and hence form a basis of the tangent space at y. The vector b 4 is v y←zy , while the other vectors b i correspond to v y←x i with x 1 = (−1, 0, 0, 0), x 2 = (− √ 1 + ε 2 , 0, −ε, 0) and x 3 = (− √ 1 + ε 2 , 0, 0, −ε). Notice that (x i , y) ∈ L and that x i ∈ ε for i = 1, 2, 3. A quick computation with Mathematica yields The operator norm and the Frobenius norm are equivalent with B −1 ≤ B −1 F and so by Lemma 2.11, The last inequality follows from the fact that (ω − d A p)(b 4 ) = 0 by Lemma 2.9 and since {b 1 , b 2 , b 3 } is in the closure of {v y←x } x∈ ε . 2.6. Proof of Theorems 1.4 and 2.2. We finally have everything to prove Theorem 2.2. The main idea is to use the pseudolinearisation identity to relate the scattering data with an attenuated X-ray transform of A − B, and then use the operator ∂ y←x to evaluate A − B − d E(A,B) p from T y R 1+3 . Theorem 1.4 then immediately follows by integrating over D \ .
Proof of Theorem 2.2. By Lemma 2.8, we have Since ∂ y←x Id = ∂ y←x p(x) = 0, it now follows from Lemma 2.5 that The parallel transports are in U (n) and so We can finally apply Lemma 2.10 to get Finally, note that for any one-form ω as ε goes to 0. Combining this with the fact that vol((F X ε ) y ) is proportional to ε 3 , we can find a constant C > 0 independent of ε such that Remark 2.12. Note that even though there is a supremum in the right-hand side of (7), one does not need to know for all x ∈ ε in the past light cone of y to get an estimate. Indeed, the important equation is (17) as it reveals the linear structure behind the estimate. In practice, one only needs to evaluate A − B − d E(A,B) p at three different linearly independent vectors v y←x since we already know it vanishes when evaluated at v y←zy . Lemma 2.11 then yields an estimate for those vectors. 2.7. H 1 estimate. It remains to show Corollary 1.7, which relates S A and S B in a linear fashion rather than through the group multiplication in U (n). To do so, we follow the argument in [MNP21, Corollary 2.3].
Lemma 2.13. There is a constant C > 0 such that Proof. To simplify notation, we omit the paths in what follows and write S A for S A zy←y←x . We can expand The third equality follows from the second by using that S A ∈ U (n) as well as adding and substracting ∂ y←x S A . Taking the supremum over the fibres F X y , squaring, integrating and using that (a + b) 2 ≤ 2(a 2 + b 2 ) yields It remains to estimate ∂S A L ∞ (F X ) . We did not show it yet, but the proof of Theorem 1.6 reveals that and so ∂S A L ∞ (F X ) ≤ A P A y←zy L ∞ (D\ ) . The estimate follows by taking square roots and using that The last estimate is again invariant under G and involves the L ∞ -norm of the lightsink connection A P A y←zy . We can get an estimate on that norm involving the curvature of A and the value of A along O.
Lemma 2.14. There is a constant C such that Corollary 1.7 will then directly follow from Theorem 1.4, Theorem 1.6, Lemma 2.13 and Lemma 2.14. However, we need another lemma before proving Lemma 2.14.
where ∂ s γ s (t) = d ds γ s (t)| s=0 and P A γ[0,t] is the parallel transport along the segment of γ restricted to the interval [0, t].
The vectorsγ(r) and ∂ s γ s (r) commute so the term with the commutator vanishes. We can isolate −∂ s A(γ s (r)) in that expression to get We can integrate by parts the last term using that A(γ(r))P A γ[0,r] = −∂ r P A γ[0,r] . This yields The boundary term corresponds to the first two terms in (18) and we can expand the integrand of the second term to get . These terms cancel with the second and third terms in (19) to simplify v(0, T ) to (18).
Proof of Lemma 2.14. For y ∈ D and unit v ∈ T y D, we have We can compute dP A y←zy (v) by using Lemma 2.15 with the variation γ s given by the lightlike geodesic from z y+sv to y + sv. This yields for some −1 ≤ c ≤ 1 and hence The result follows since z y ∈ O for all y and 2.8. Forward estimates. We finish the section by collecting forward estimates that will be useful for Section 4.
Lemma 2.16. Let A and B be Hermitian connections. Then where SD = {(x, v) ∈ T D : |v| e = 1} is the (Euclidean) sphere bundle on D.
Proof. Since S A z←y←x and the parallel transports lie in U (n), Lemma 2.8 yields the pointwise estimate where γ y←x : [0, T 1 ] → D and γ z←y : [0, T 2 ] → D are the unit speed geodesics from x to y and y to z, respectively, and |x − y| e is the Euclidean distance between x and y. In particular, by taking the supremum over all values of (x, y, z) ∈ S + ( ) and using that the distance between x and y is at most √ 2, we get x,z∈ ,y∈D (x,y),(y,z)∈L since the second supremum is taken over a larger set. Note that we did not restrict y outside in the first supremum since the curves γ y←x and γ y←z cross .
Lemma 2.17. Let A be a Hermitian connection and ω a one-form on D. There is a constant C > 0 such that be the usual ray transform on D × SD given by for β ∈ C ∞ (D × SD). It is well-known that I is continuous from H k (D × SD) to H k (D × D) for all k ≥ 0 [Sha94, Theorem 4.2.1]. Consider the natural projections in the following diagram.
These projections induce pullbacks on functions.
We can view P A x←γ(t) as a U (n) valued function on D × D and ω as a C n×n valued function on SD. Therefore, we can rewrite I A x←y (ω) as I A y←x (ω) = I(x, y) (π * 2 P A )(π * 1 ω) . The result follows by continuity of I and the pullbacks.
Lemma 2.18. Let A and B be Hermitian connections. There is a constant C > 0 such that . Proof. By the pseudolinearisation identity and the definition of the broken attenuated X-ray, we have . The first term can be bounded by the same L 2 -norm on D × D and is hence bounded by a multiple of A − B L 2 (SD) by Lemma 2.17 with k = 0. For the second term, we have where T is the length of the segment γ zy←y . We can rewrite the integral in the previous display as y∈D v∈SyD |(A − B) y (v)| 2 g(y, v) dv dy for some bounded nonnegative function g : SD → R. To see this, pick a point y * ∈ D and a direction v ∈ S y * D . Then, given y ∈ D \ ,γ zy←y (t) = v for some unique t precisely when y * , y and z y lie on the line spanned by v based at y * and y < y * < z y . Since y and y * must lie on a bounded line and D is bounded, it follows that g is also bounded. The result readily follows by bounding g.
Lemma 2.19. Let A be a Hermitian connection. For every k ≥ 0, there is a constant c k such that We follow the inductive approach laid out in [Boh21]. Let Γ be the diagonal of D × D, that is, Γ = {(x, x) : x ∈ D}. Let ∂ = ∂ y←x be the vector field on D × D \ Γ defined via (3). Then P A y←x can be characterised as the unique smooth function U A on D × D such that and U A | Γ = Id.
Note that if G : D × D → C n×n solves for some F : D × D → C n×n with G| Γ = G 0 , then it follows from Lemma 2.3 that where γ : [0, T ] → D is the line segment from x to y parametrised by arc length. Hence, it holds that By continuity, since P A y←x is smooth, its C k (D×D)-norm agrees with its C k (D×D\Γ)norm. Let {∂, L 1 , . . . , L 7 } be a global commuting frame on D × D \ Γ and denote L α = L α 1 1 . . . L α 7 7 for α ∈ Z 7 . Such a frame exists since we can choose global coordinates on D×D\Γ by first prescribing the usual coordinates for x in R 4 and then choosing polar coordinates based at x to describe y. The vector ∂ then corresponds to the coordinate vector field related to the radial coordinate of y with respect to x. We claim that for all k ≥ 0. For k = 0, this holds trivially as P A y←x takes values in U (n). Suppose now that (21) holds for some k − 1 ≥ 0. Take j and α such that j + |α| = k. Then, if we let Any differential operator V on D × D can be decomposed as V = V 1 + V 2 where V 1 acts on the first coordinate and V 2 on the second with corresponding vectors v 1 and v 2 in T D based at x and y respectively. Lemma 2.15 gives is a differential operator of order k − 1 whose coefficients are derivatives of A and can be bounded by A C k (SD) . Therefore, by the induction hypothesis, we have . Absorbing the term for G| Γ in (22) and taking the supremum over all j and α such that j + |α| = k, we see that (21) holds. Hence,

Gauge invariance
3.1. Gauge invariance. We now study the quantity A − B − d E(A,B) p in (7) in oder to prove Theorem 1.6. Recall the definition of p A ω as in (14). The p used in Theorem 2.2 is actually p  Let us now prove (ii). Let u be as in the proof of (i) above and consider ϕ −1 u. We first claim that ϕ −1 u solves

Now notice that
To make notation less cumbersome, we write p for p (A,B) and q for p (A ϕ,B) . We can now apply Lemma 2.3 again to get that is, q = ϕ −1 p − ϕ −1 + Id. We can now compute and therefore as claimed. Finally, to prove (iii), it suffices to combine (i) and (ii) to get where we used that ψ takes values in U (n) and therefore ψ −1 = ψ * .
3.2. Proof of Theorem 1.6. We are now ready to prove Theorem 1.6. The proof mostly amounts to using Lemma 3.1 with the right choice of matrix fields in H .
Proof of Theorem 1.6. Recall that the gauge takes values in U (n) and therefore if ϕ, ψ ∈ G , Lemma 3.1 yields This shows that the estimate in Theorem 2.2 is gauge invariant. Therefore, we can actually choose a gauge to compute ∆ (A, B) . We take ϕ = P A y←zy and ψ = P B y←zy . Using that d E We can expand this last expression with the definition of d E(A,B) and Lemma 2.7. This yields (23) ∆(A, B) = d(P A y←zy P B zy←y ) + AP A y←zy P B zy←y − P A y←zy P B zy←y B. We have P A ϕ y←zy = ϕ −1 (y)P A y←zy ϕ(z y ) from Proposition 1.1 and since ϕ| O = Id, ϕ(z y ) = Id. We chose ϕ = P A y←zy , and so P A ϕ y←zy = Id. Similarly, we have P B ψ zy←y = Id. Therefore, plugging A ϕ for A and B ψ for B in (23) gives for all y ∈ D.
Here, r is the outward radial component in space, that is, r 2 = x 2 1 + x 2 2 + x 2 3 . Changing to polar coordinates in the space variables, we can therefore write any light-sink connection as for some matrix fields A 0 , A ϑ , A φ with values in u(n).

Statistical application
We show that, when restricting ourselves to light-sink connections, one can use Bayesian inversion to consistently recover a connection from its scattering data. To do so, we follow the approach first laid out in [MNP21]. Specifically, we will use Theorem 5.1 in [BN21] as it only requires checking a nice set of conditions.
We will only consider light-sink connections as the problem then becomes injective. It would be natural to then only consider future-determined paths for the scattering data, but in doing so, the endpoints of our paths would never lie in \ ( X ∪ O). Therefore, we also need to consider past-determined paths.
To simplify the statistics slightly, we will only consider connections with values in so(n) instead of u(n). We do not lose any generality in doing so, but no longer have to deal with complex noise. 4.1. Setting. We consider the following experimental setup similar to the one we described in Section 1.4. Let λ be the uniform distribution on S + ( ) induced by the Lebesgue measure on D and consider the random variables λ corresponding to random draws from S + ( ). For a light-sink Hermitian connection A ∈ Ω 1 (D, so(n)), we denote its future-determined scattering data S A z←y←xy by S A + (x, y) and its past-determined scattering data S A z←y←xy by S A − (y, z). Suppose that we observe noisy versions of the scattering data corresponding to both types of paths according to our random draws, that is, we observe The matrices E ± i correspond to independent Gaussian noise in the sense that E ± i = (ε ± i,j,k ) 1≤j,k≤n and all the ε ± i,j,k 's are i.i.d. N (0, 1) that are independent from the other random variables. We denote by P N A the joint law of the random variables (S i , (X i , Y i , Z i )) N i=1 . In order to estimate the connection A from D N , we need to choose a prior Π on the space of so(n)-valued light-sink connections. Any such connection can be represented by three skew-symmetric matrix fields since and therefore by d n := 3 dim so(n) = 3n(n − 1)/2 continuous functions on D. Following [BN21], we choose the prior Π by prescribing an orthonormal basis on L 2 (D, R) as well as a sequence of positive scalars. For conciseness, we choose as basis the normalised eigenfunctions (e j ) j∈N of the Laplacian with Neumann boundary conditions and choose their eigenvalues (λ j ) j∈N as scalars. It follows from classical L ∞ estimates for eigenfunctions from [Hör68] and Weyl's law [Hör09] that we can choose τ = 3/4 and d = 4 in Condition 3.1 of [BN21]. This choice gives rise to Sobolev-type spaces which plays the role of our parameter space Θ. The eigenfunctions (e j ) naturally induce the basis {e j,i : 1 ≤ i ≤ d n , j ∈ N} on H s (D, so(n) 3 ) where e j,i = (δ i,1 e j , . . . , δ i,dn e j ), δ i,j = 1 i = j, 0 i = j.
For D, an integer multiple of d n , let E D be the span of the first D vectors of the basis, that is, For α > 0, we take as prior on E D (26) i≤dn j≤D/dn λ −α/2 n g j,i e j,i , g j,i ∼ i.i.d. N (0, 1).
Theorem 4.2 can also be proved for D → ∞, giving rise to a commonly used Matérn prior of order α for the Laplacian, see e.g. [GvdV17,Chapter 11] . For simplicity, we take a truncated prior as it reflects what happens in practice. We denote the law of A by Π and its density by π. Through Bayes' rule, the choice of prior gives rise to the posterior distribution , O ⊆ E D Borel with log-likelihood given by H α up to some additive constant, where δ N = N −α/(2α+4) . See [BN21] for more details.

4.2.
Statistical guarantees for light-sink connections. To apply [BN21, Theorem 5.1], it remains to show that the map S : A → (S A + , S A − ) satisfies their Condition 3.2 that contains three parts. The first part, uniform boundedness, is immediately satisfied since S A takes values in U (n). The second part consists of global Lipschitz estimates for the forward map and follows from Lemma 2.16 in the L ∞ case and from Lemma 2.18 in the L 2 case. Hence, it only remains to show that the last part of their Condition holds, which they have called inverse continuity modulus. That is the content of the next lemma. Proof. First, note that we can find C ε > 0 that depends on ε such that S(A) − S(B) 2 L 2 (S + ( )) ≥ C ε S A + − S B Taking α > k sufficiently large such that A H α (SD) + B H α (SD) ≤ M , we can bound the C k norms of A and B via Sobolev embedding inequalities. Note that α = k + 3 suffices. This in turn allows us to bound the C k -norm of E(A, B), and so . It follows from Theorem 1.4 for light-sink connections (p = 0 by Theorem 1.6) that A − B L 2 (D\ ) ε,k,M S A + − S B + 1−1/k L 2 (F X ) . Similar inequalities also hold on X and Z by Theorem 1.3. Hence, we can choose γ = α−4 α−3 for α ≥ 5 by taking α = k + 3. We can finally apply Theorem 5.1 in [BN21] to get the following estimate regarding the concentration of the posterior distribution around the real parameter A obtained through noisy samples of S A as the number of samples goes to infinity.
Theorem 4.2. Let the posterior distribution Π(·|(S i , (X i , Y i , Z i )) N i=1 ) arise from the prior (26) with α ≥ 5 and data (S i , (X i , Y i , Z i )) ∼ P N A as in (25). Suppose that A ∈ H α and D N 2/(α+2) . Let γ = α−4 α−3 . Then, there is M > 0 such that In short, the posterior distribution converges to a delta distribution about A in P N A -probability at a rate that depends on the smoothness of the prior and of A . The smoother A is, the smoother we can choose the prior, and the faster the posterior distribution concentrates. Moreover, by the same arguments used at the end of [MNP21] to complete their proof of their Theorem 3.2, one can expect the rate of Theorem 4.2 to carry over to the posterior mean, that is, as N → ∞. Note that we have convergence to A in Theorem 4.2 and not only its projection A ,D on E D as in the statement of Theorem 5.1 in [BN21]. This is due to the fact that the estimate in Lemma 4.1 holds for all A, B in the whole parameter space H α (D, so(n) 3 ), and not just E D . Indeed, Remark 5.2 in [BN21] guarantees that we can then replace A ,D by A .