Signature inversion for monotone paths

The aim of this article is to provide a simple sampling procedure to reconstruct any monotone path from its signature. For every N, we sample a lattice path of N steps with weights given by the coefficient of the corresponding word in the signature. We show that these weights on lattice paths satisfy the large deviations principle. In particular, this implies that the probability of picking up a"wrong"path is exponentially small in N. The argument relies on a probabilistic interpretation of the signature for monotone paths.


The signature of a path
A path γ is a continuous map from a fixed interval J into a metric space (V, · V ). The length of γ is defined by where the sum is taken over all dissections D(J) = {u j } of the interval J. γ is said have bounded variations if γ < +∞. In what follows, we consider V = R d . Note that although the choice of the norm on R d affects the actual length of γ, it does not affect whether γ has bounded variations or not.
Let {e 1 , . . . , e d } denote the standard basis of R d . For every integer n ≥ 0, a word of length n is an ordered sequence of n letters from the set {e 1 , . . . , e d } (with repetition allowed). We use |w| to denote the length of the word w, that is, the number of letters consisting of the word. For two words w 1 = e i 1 . . . e in and w 2 = e j 1 . . . e jm , their concatenation w 1 * w 2 is a new word of length n + m defined by w 1 * w 2 = e i 1 . . . e in e j 1 . . . e jm .
We use to denote the empty word, which is the unique word of length 0. The signature of a bounded variations path is defined as follows.
Definition 1.1. Let γ : [0, 1] → R d be a continuous path of bounded variations. For every integer n ≥ 0 and every word w = e i 1 . . . e in , let C γ (w) = 0<u 1 <···<un<1γ i 1 (u 1 ) · · ·γ in (u n )du 1 · · · du n , where γ i is the i-th component of γ. The signature of γ is the formal power series where the second sum is taken over all words of length n, and we have set C γ ( ) = 1 by convention.
We call the collection {C γ (w) : |w| = n} the n-th level coefficients in the signature. The signature is a definite integral over the fixed interval where γ is defined. Changing the parametrisation or the size of the interval does not change the signature of γ. The signature contains important information about the path. For example, the collection {C γ (w) : |w| = 1} reproduces the increment of the path, and the second level coefficients {C γ (w) : |w| = 2} represents the (signed) areas enclosed by the projection of the path on e i − e j planes.
It was proved by Hambly and Lyons ([HL10]) that bounded variation paths are uniquely determined by their signatures up to tree-like pieces. 1 In [LX14], Lyons and one of the authors developed a procedure based on the use of symmetrisation that enables one to reconstruct every C 1 path (when at natural parametrisation) from its signature. The purpose of this article is to give a significant simplification of the reconstruction procedure in the case when γ is monotone.

Monotone paths and the main result
From now on, we fix our path γ : [0, 1] → R d that is monotone. We can assume without loss of generality that γ is monotonically increasing such thaṫ We also equip R d with the 1 norm, so the length of a monotone path is then simply the sum of all its first level coefficients in the signature. Thus, we can assume without loss of generality that L = 1; otherwise one can just simply recover L first and rescale the path by L −1 so that the new path has length 1. Finally, since the signature is invariant under re-parametrisation, we also assume γ is at natural parametrisation so that Since γ is monotonically increasing, we have C γ (w) ≥ 0 for every word w, and for every integer N , we have since we have assumed L = 1. This suggests that for every N , the quantities {N !C γ (w) : |w| = N } constitute a probability measure on the words of length N , giving each word w with |w| = N the "probability" N !C(w). Now, for every word with length N , we associate to it a lattice path X N with step size 1 N such that X N is a monotone lattice path parametrised at unit speed, and moves in exactly the same direction as the word w. More precisely, if w = e i 1 · · · e i N , then X N = 1 N e i 1 * · · · * e i N at natural parametrisation, where "*" denotes the concatenation of two paths, and we have had an abuse use of the notation e i k also to denote the one-step lattice path moving in the e i k direction. Now, for every N ≥ 0, we assign the N -step paths {X N : |w| = N } "probabilities" N !C(w). This gives us a sequence of laws on the space of lattice paths. The main result of our article is the following.
Theorem 1.2. The laws on {X N } above satisfies a large deviations principle on the space of continuous function from [0, 1] to R d .
The above theorem implies that one can reconstruct any monotone path from its signature by sampling directly from the lattice paths with weights given by the corresponding terms in the signature. More precisely, for fixed large N , one "sample" a lattice path S N w according to the "probabilities" {C(w) : |w| = N }. The large deviations principle for these laws in Theorem 1.2 then ensures that the chance of picking a wrong lattice path is exponentially small in N .
The proof of this theorem relies on a probabilistic interpretation of the signature of monotone paths. Once this observation is made, the rest follows directly from standard large deviations techniques. Also, unfortunately, the rate function for the LDP for {X N } does not have a closed form. However, an observation by [DRL04] suggests that we can add another random process T N (to be defined below) to {X N }, so that the pair (X N , T N ) satisfies LDP with a rate function of closed form. These will be made more precise in Section 2.2 below.

Acknowledgements
WX thanks Terry Lyons for very helpful discussions, especially for telling him about the probabilistic interpretation of signatures for monotone paths. ND is an Associate Member of the Oxford-Man Institute of Quantitative Finance, which partially supported this work with a visitorship during July 2015. WX is supported by EPSRC through the research fellowship EP/N021568/1.

Sampling path large deviations 2.1 Probabilistic interpretation
We first give the probabilistic interpretation of the signature of a monotone path. Let γ : [0, 1] → R d be a monotone path parametrised at unit speed in the sense of (1.1), and it has length 1 under this assumption. We can associate it with a probability measure on random lattice paths in the following way. Consider d independent Poisson processes run simultaneously on the time interval [0, 1], generating letters e 1 , . . . , e d , respectively. Let W (t) be the word of ordered letters that arrive up to time t. For example, if at times 0 < u 1 < · · · < u 5 < t, the letters e 3 , e 2 , e 2 , e 1 , e 3 arrives, then One can make W (·) into a lattice path in the following way. Suppose the arrival times are τ j for j = 1, 2, . . . with the convention that τ 0 = 0, then W can be defined as a lattice path by setting W (τ 0 ) = 0, and where e i j is the arriving letter at time τ j . We have thus associated to γ a random lattice path W . We will be interested in the laws of W conditional on the total number of arrivals up to time 1. Thus, we let N (t) be the process counting the total number of arrivals up to time t. Since γ is parametrised at unit speed, N (t) is a homogeneous Poisson process on [0, 1] with intensity 1. Now, we condition on the event N (1) = N , that is, there are totally N arrivals up to time 1 (when the path runs out). Let P N (·) = P · |N (1) = N denote the conditional probability. Thus, for every word w with |w| = N , with the abuse use of notation that W denoting the word generated by the processes, we have precisely the relation This is the probabilistic interpretation of the signature for lattice paths.

Lattice path sampling and large deviations
From now on, we always condition on N (1) = N , and we will prove large deviations for a sequence of conditional laws L · |N (1) = N . Let W (t) be the random lattice path generated by the conditional Poisson process, and W N (t) = 1 N W(t). Thus, under P N , every realisation of W N is a lattice path with step size 1 N and total length 1 (N steps).
Our aim is to show the large deviations principles for the processes W N and its random time change version. For this, we first introduce the proper function spaces that the processes live in. Let C denote the set of continuous functions from [0, 1] to R with the uniform topology, and let C d be d copies of C. Also, we let Note that ψ ∈ A implies ψ is absolutely continuous withψ ≥ 0. For the set A d a , we will mainly use it for a = 1. Also, we let the function I : Our first aim is to show that the conditional laws obey a large deviations principle for processes. In order to derive the rate function, we need the following lemma for its finite dimensional approximations.
Lemma 2.1. Let k ≥ 1. For every {0 = u 0 < u 1 < · · · < u k = 1}, the conditional laws satisfy the large deviations principle with scale N and good rate function where we have used the convention that z 0 = 0. Otherwise I k (z) = ∞.
Proof. Fix k ≥ 1 and 0 = u 0 < · · · < u k = 1, and let Λ N be the log moment generating function of the multi-vector W N (u 1 ) − W N (u 0 ), . . . , W N (u k ) − W N (u k−1 ) , conditioned on N 1 = N . Here, each component is a d-dimensional vector, and this should be understood as a random vector in (Z/N ) dk . Then, the conditional distribution (on N 1 = N ) of this random vector is precisely multinomial with N trials and probabilities Here, p i i denotes the probability of the outcome of the trial being "W i N (u j ) − W i N (u i−1 )". Also, the p i j 's are already normalised as a probability since we have assumed L = 1. Hence, for θ = {θ i j } ∈ R dk , we have where the second equality follows from the moment generating function for multinomial distribution, and the sum is taken over the set which is precisely the stated form.
We are now ready to give the large deviations principle for the conditional laws on the rescaled paths W N .
Proof. By Lemma 2.1, any finite dimensional distribution of difference of the processes W N satisfies the large deviations principle. Thus, by [DZ95, Theorem 1], the laws of the processes W N (·)'s also satisfy the large deviations principle with good rate function and ∞ otherwise.
Note that the above large deviations principle are for the processes {W N (·)}, where the time parametrisation is random and cannot be observed from the signature. We thus need to parametrise the paths W N 's at unit speed. For this reason, we introduce the random time change below.
We still condition on N (1) = N . For j = 1, . . . , N , let τ j ∈ [0, 1] denote the arrival time of the j-th word in the process W , so we have Let the random map T N : [0, 1] → [0, τ N ] be such that This says T N (j/N ) is the arrival time of the j-th word, and linearly interpolate in between. Thus, T N is almost surely a strictly increasing map with inverse Thus, for every realisation such that N (1) = N , the random path W N • T N is parametrised at unit speed. The intuition is that the map takes the (N q)-th arrival of the N letters to the position "q" of the path W N . We let which is nothing but the lattice path W N re-parametrised at unit speed. To investigate the LDP for X N , note that T N = Q −1 N , and the operations are both continuous. Thus, by contraction principle, it suffices to prove the LDP for (W N , Q N ). We give it in the following lemma.
Lemma 2.3. The laws for (W N , Q N ) ∈ C d × C conditioned on N (1) = N satisfy the large deviations principle with scale N and good rate function for every t ∈ [0, 1], so the rate function for the pair (W, Q) is the same as I W except that one further requires i ψ i = φ. Note that together with ψ ∈ A d 1 , this constraint forces φ(1) = 1 in order for I (W,Q) to be finite.
Corollary 2.4. The laws L(X N |N (1) = N ) on C d satisfies a large deviations principle.
The rate function for X can be expressed in terms of I (W,Q) using the Contraction Principle [DZ98] but it does not have a closed form. A nice observation from [DRL04] suggests that we can add the component T N so that the pair (X N , T N ) satisfies the large deviations principle with a closed form rate function. This is the content of the following theorem.
for ζ i , ξ ∈ A such that iζ i ≡ 1, and I X,T = ∞ otherwise.
Proof. Since T N = Q −1 N , it follows directly from the large deviations for inverse processes (see e.g. [DRL04,Theorem 4]) and Lemma 2.3 that To derive the specific form of the rate function (the second equality in (2.3)), we note that for each i, we have so a change of variable q = ξ −1 (t) gives The constraint that ξ ∈ A 0,1 is obvious. For the constraint of ζ, we notice that by Lemma 2.3, we need This is equivalent as i ψ i (t) = t for all t, or iψ i ≡ 1.

Connections with the symmetrisation procedure
In [LX14], the authors used a procedure of symmetrisation to produce a deterministic sequence of piecewise linear approximations from the signature of a C 1 path to the true path. The construction of that sequence requires rather complicated operations beyond symmetrisation between terms in the signatures. The aim of this subsection is to show that, in the case of monotone paths, these piecewise linear paths can also be "sampled" in a rather straightforward way after symmetrisation. This is actually a simple consequence of the large deviations principle for lattice path sampling. We first briefly recall the symmetrisation procedure on signatures used in [LX14], and will mainly follow the notations there. For every integer N ≥ 0 and k ≥ 0, let P N,k denote the set of k-partitions of N ; that is, P N,k = n = n 1 , . . . , n k : n j ≥ 0, k j=1 n j = N .
For n ∈ P N,k , let Now, for n ∈ P N,k and ∈ L n k , define the set of words W n k ( ) by W n k ( ) = w = w 1 * · · · * w k : |w j | e i = i j , ∀i = 1, . . . , d, j = 1, . . . , k , and define the symmetrised signatures by In other words, S n k ( ) is the sum of the coefficients of all words w such that w = w 1 * · · · * w k , and the number of letters e i in w j is i j . Note that in [LX14], the symmetrisation procedure is taken with n j ≡ n for all j, and N = kn, so the set-up above is a slight generalisation of that in [LX14]. Recall the random word W generated by the Poisson process associated to the path γ; we have P N W ∈ W n k = S n k ( ). Thus, each W n k corresponds to a random piecewise linear path, which we call Y n k . We have the following theorem. Proof. It suffices to show that Y n k,N and X N • T N are exponentially equivalent. In fact, for every realisation of the lattice path X N • T N , Y n k is its polygonal approximation such that its j-th piece connects the points X N T N (n j−1 /N ) and X N T N (n j /N ) , wheren Thus, the difference between the j-th piece of Y n k and the corresponding part in Thus, we have for all sufficiently large N . This proves the exponential equivalence of (Y n k , T N ) and (X N , T N ), and hence the LDP follows.
In this section, we will provide a numerical example for the sampling scheme introduced above. For the convenience of simulation and display, we use piecewise linear approximations as in Section 2.3, and we use n j ≡ n for all j, so the total number of arrivals is N = kn. We study the following example.
We compute the truncated signature of the piecewise linear approximation of X with the time mesh 0.01 as the approximate value of that of X. Then we apply the inversion algorithm outlined in the above section, and we obtain the following results.
3.1 k = 2, 3 ≤ n ≤ 8 We now test the two-piece approximation to γ in Example 3.1. We fix k = 2, and vary n from 3 to 8 to look at their accuracies. We show the probability matrices M k,n for the two-piece approximations for different n in the tables below.
The entries in the j-th row (j = 1, 2) represents the "probabilities" that are assigned to various directions of the j-th linear piece. More precisely, the (j, m) entry is the "probability" that the direction of the j-th piece has the e 1 : e 2 ratio m : (n − m). The red colour indicates which of the direction has been assigned the biggest weight in sampling. The weight of the two-piece linear path is simply the product of the weights of two pieces, and these two pieces have the same 1 length.
The two-piece linear paths with the biggest weight for n = 3, . . . , 8 are plotted in Figure 1. One can see clearly that the MLS estimator of the path is closer to the true path γ(t) = (t, t 2 ) as t increases.      1.16E-01 3.87E-02 7.64E-03 6.81E-04 Table 6: k = 2, n = 8 Here, we fix n = 4, and take k = 3 or 4. The weights for each approximation are listed in Tables 7 and 8 below, and the MLS estimator of the path are plotted in 2. Note that it also includes the case of (k, n) = (2, 4) from the previous subsection.