1 Introduction

Linear optical devices under quantum light show a rich behaviour and have different applications in experiments on the foundations of quantum optics and quantum information [1,2,3]. While they can be built with relatively simple optical elements like beam splitters and phase shifters [4,5,6,7,8], their behaviour for photon number states cannot be accurately reproduced by any classical system. One clear example is the boson sampling problem, for which quantum systems can give efficient solutions which cannot be produced by any classical method [9].

Passive lossless linear optical systems where the number of photons is preserved are particularly attractive: they are simple, well understood, and they can be translated to standard experimental setups.

There are many results on the synthesis of linear systems from their classical description [4, 7, 8] and that analyze the evolution of multiple photons in those devices [10,11,12,13,14,15,16,17,18]. However, there are relatively few methods for the design of tailored quantum evolutions for multiple photons. We have previously presented an inverse method which can tell if any desired quantum evolution on n photons can be achieved with a linear optical system or not, giving the corresponding system when it is possible [19].

In this paper, we complete this design method with a procedure that gives the best possible approximation to any quantum unitary that cannot be achieved using only photon preserving linear optical systems. The approximation starts from an initial guess and returns a matrix in its neighbourhood which minimizes the matrix distance to the desired unitary but can be achieved using linear optics alone. The result is based on Toponogov’s comparison theorem [20] from differential geometry.

Section 2 introduces the mathematical description of linear optical systems and previously known results we will use. Section 3 describes the structure of the image algebra and its complement and states two theorems that will become useful later. Section 4 gives the basic concepts from differential geometry used in the proof and the notation for the rest of the paper. Section 5 defines the bi-invariant Riemannian metric in which the results are given. Section 6 shows how to apply Toponogov’s theorem to reduce the problem of approximating a unitary to finding a geodesic in the correct manifold. Section 7 discusses some tricks related to the generation of random unitaries which are needed to explore the image group and to be able to compute a valid matrix logarithm for any desired U. Section 8 describes the iterative method that produces the desired approximation. Section 9 gives an application example. Finally, Sect. 10 gives a general overview of the method and comments on some practical problems and possible improvements for the approximation algorithm.

2 Mathematical description of linear optics interferometers

All through this paper we restrict to lossless linear optical systems where the total number of photons is preserved. Classically, the evolution of the electrical field in m orthogonal modes going through such a linear optical system is perfectly described by a unitary \(m\times m\) matrix, S, called the scattering matrix of the system [21]. Linear optical systems including loss and amplification can be described using quasiunitary matrices [22].

The evolution of n photons distributed through these m possible modes is given by an \(M\times M\) unitary evolution matrix U acting on the \(M={m+n-1 \atopwithdelims ()n}\) states of the resulting Hilbert space.

The photonic homomorphism \(\varphi : U(m) \rightarrow U(M)\) gives the evolution matrix U which corresponds to a scattering matrix S describing the photon preserving linear optical system. \(U=\varphi (S)\) can be computed from different equivalent methods [10,11,12].

Any unitary matrix can be written as an exponential \(U=e^{iH}\) for a Hermitian matrix H. In linear optical devices, we will call this matrix the effective Hamiltonian \(H_U\) of the linear system. Similarly, \(S=e^{iH_S}\).

The image of the photonic homomorphism, \(\mathrm {im}(\varphi )\), is a subgroup of U(M) which contains all the quantum evolutions that are allowed for n photons a linear optical system with m modes. The image subgroup is a representation of U(m) in U(M) and maps each possible classical scattering matrix S describing a linear system into the quantum evolution \(U=\varphi (S)\) it induces for n photons.

The evolution in the corresponding unitary algebras from \(iH_S\) to \(iH_U\) is given by the differential of \(\varphi \), \(d \varphi :\mathfrak {u}(m)\rightarrow \mathfrak {u}(M)\), for which there are also explicit expressions [13,14,15,16,17,18].

From the point of view of system design, a natural question is whether any given quantum evolution \(U \in U(M)\) can be realized using only linear optics. From a simple dimensional argument, it is clear that, except when \(m=1\) or \(n=1\), there must be some impossible operations [23].

In a previous work, we have given an explicit inverse method to find the S corresponding to any \(U\in \mathrm {im}(\varphi )\) which can be implemented using linear optics [19].

Here, we address the problem of approximating \(U\not \in \mathrm {im}(\varphi )\). We give a method to find the linear optics system with an evolution matrix \(\widetilde{U}\in \mathrm {im}(\varphi )\) which minimizes the distance to U locally.

3 The image algebra and its orthogonal complement

If we study the induced map \(d \varphi :\mathfrak {u}(m)\rightarrow \mathfrak {u}(M)\), we can decompose the Lie algebra \(\mathfrak {u}(M)\) orthogonally so that

$$\begin{aligned} \mathfrak {u}(M)=\mathrm{im}\, d\varphi \oplus (\mathrm{im}\, d\varphi )^\perp , \end{aligned}$$
(3.1)

where \((\mathrm{im}\, d\varphi )^\perp \) is the orthogonal complement of \(\mathrm{im}\, d\varphi \) with respect to the metric

$$\begin{aligned} \langle u,v\rangle =\frac{1}{2}\mathrm{tr}(u^\dag v+v^\dag u). \end{aligned}$$
(3.2)

For this metric, we can prove a couple of useful facts.

Theorem 1

For \(U\in U(M)\) such that \(U\not \in \mathrm{im}\,\varphi \), let \(v\in \mathfrak {u}(M)\) be the principal logarithm of U. Let

$$\begin{aligned} v=v_T+v_N \end{aligned}$$
(3.3)

be the orthogonal decomposition of v, with a tangent component \(v_T \in \mathrm{im}\, d\varphi \) and a normal component \(v_N \in \mathrm{im}\, (d\varphi )^\perp \). Then,

  1. 1.

    \(U_a=\exp (v_T)\in \mathrm{im}\, \varphi \).

  2. 2.

    \(\Vert U -U_a\Vert \le \Vert v_N\Vert \).

Therefore, for any normalized \(\vert \psi \rangle \) with \(\langle \psi \vert \psi \rangle =1\), we have

$$\begin{aligned} 1\ge \vert \langle U \psi \vert U_a\psi \rangle \vert \ge 1-\frac{\Vert v_N\Vert ^2}{2}. \end{aligned}$$
(3.4)

The proof is given by introducing a bi-invariant metric and reducing the issue to a problem in plane geometry thanks to Toponogov’s comparison theorem [20]. Later, with this theorem, we can give a recursive method to find a locally optimal approximation and show it converges.

4 Prerequisites and Notation

For the very basic notions in differential geometry, such as manifold, curve, tangent space, the reader is referred to the books of Do Carmo [24] or Sakai [25].

A Riemannian metric on a differentiable manifold M is a correspondence which associates to each point p on M an inner product \(\langle ~, ~\rangle _p\) on the tangent space \(T_pM\) which varies differentiably in the sense that, for any pair of vector fields X and Y which are differentiable in a neighbourhood V of M, the function \(\langle X,Y\rangle \) is differentiable on V. The metric with which a Riemannian manifold M is endowed may come from a distance. Given two points \(p,q \in M\), the distance d(pq) between them is defined to be the infimum of the lengths of all curves joining p and q which are piecewise differentiable.

Two fundamental concepts of Riemannian geometry are those of geodesic and curvature. Roughly speaking, a geodesic is a curve minimizing the distance between two nearby points. More precisely, let I be a closed interval in \(\mathbb {R}\); a parametrized curve \(\gamma : I \rightarrow M\) is called a geodesic at \(t_0\) if the covariant derivative \(\frac{D}{dt}\left( \frac{d\gamma }{dt}\right) \) vanishes at the point \(t_0\) (see, i.e. [24], Definition 2.1); if \(\gamma \) is a geodesic at t for all \(t \in I\), then \(\gamma \) is called a geodesic. If \([a,b] \subseteq I\) and \(\gamma :I \rightarrow M\) is a geodesic, the restriction of \(\gamma \) to [ab] is called a geodesic segment joining \(\gamma (a)\) to \(\gamma (b)\). By abuse of language it is often referred to the image \(\gamma (I)\) of a geodesic \(\gamma \) as a geodesic.

A minimal geodesic between p and q is the shortest one joining p and q. It is easily seen that if there exists a minimal geodesic \(\gamma \) joining p to q, then d(pq) equals the length \(\ell (\gamma )\) of \(\gamma \). This conditional if holds under the hypothesis of completeness: a Riemannian manifold M is said to be (geodesically) complete if for every \(p \in M\) the exponential map \(\mathrm {exp}_p\) is defined for the whole tangent space \(T_p M\), i.e. if any geodesic \(\gamma (t)\) starting from p is defined for all \(t \in \mathbb {R}\), and the statement is:

Theorem 2

(Hopf-Rinow) Let M be a Riemannian manifold and let \(p \in M\). Then M is geodesically complete if and only if it is complete as a metric space. Moreover, this implies that for any \(q \in M\) there exists a minimizing geodesic \(\gamma \) joining p to q.

An important consequence of Theorem 2 is the following, see [25], Corollary 1.4 and Problem 1 of Chapter III:

Corollary 1

A \(C^{\infty }\) manifold M is compact if and only if any Riemannian metric on M is complete.

On the other hand, the concept of curvature we will refer to is that of sectional curvature. According to Milnor [26] (p. 295), the sectional curvature of the tangential 2-plane spanned by some orthogonal unit vectors u and v can be described geometrically as the Gaussian curvature, at the point, of the surface swept out by all geodesics having a linear combination of u and v as tangent vector.

We are interested in Riemannian manifolds with additional algebraic structure: Lie groups. A Lie group is a group G with a differentiable structure such that the mapping \(G\times G \rightarrow G\) given by \((x,y)\mapsto xy^{-1}, x,y \in G\), is differentiable. It follows that translations from the left \(L_x\) resp. translations from the right \(R_x\) given by \(L_x: G \rightarrow G, L_x(y)=xy\) resp. \(R_x: G \rightarrow G, R_x(y)=yx\) are diffeomorphisms.

A Riemannian metric on G is said to be left invariant resp. right invariant if for all \(p, g \in G\) and for all \(u,v \in T_pG\) it holds that

$$\begin{aligned} \langle u,v\rangle _p = \langle d(L_g)(u) , d(L_g)(v) \rangle _{L_g(p)}\ \ ~\text{ resp. }~\ \ \langle u,v\rangle _p = \langle d(R_g)(u) , d(R_g)(v) \rangle _{R_g(p)}. \end{aligned}$$

A Riemannian metric is called bi-invariant if it is both left and right invariant. Any compact Lie group can be endowed with a bi-invariant metric [24, Exercise 7].

We also consider the Lie algebra \(\mathcal {G}\) of G, which consists of the vectors in \(T_e G\) with e the neutral element of G and with a well-known additional structure provided by a commutator (or Lie bracket) in the usual way.

We will focus on the Lie group U(M) as a differentiable manifold, for a positive integer M; its Lie algebra will be denoted by \(\mathfrak {u}(M)\). Identity matrices of any size will be denoted by Id.

5 A bi-invariant Riemannian metric

In this section, we endow U(M) with a Riemannian structure which will be useful later on.

For \(u,v \in \mathfrak {u}(M)\), we define an inner product

$$\begin{aligned} \langle u,v \rangle :=\frac{1}{2}\mathrm {tr}(u^{\dag } v + v^{\dag }u). \end{aligned}$$
(5.1)

This definition does actually correspond to a positive definite symmetric bilinear form: The bilinearity is an easy exercise, and moreover:

  1. 1.

    Since \(u,v \in \mathfrak {u}(M)\), then \(u^{\dag } = -u\), \(v^{\dag }=-v\), and therefore

    $$\begin{aligned} \langle u,v \rangle :=\frac{1}{2}\mathrm {tr}(-u v - v u) = -\mathrm {tr}(uv). \end{aligned}$$
  2. 2.

    The symmetry of (5.1) is clear, since \(\langle u,v \rangle = -\mathrm {tr}(uv) = -\mathrm {tr}(vu) = \langle v,u \rangle \).

  3. 3.

    The positive definiteness follows from the fact that

    $$\begin{aligned} \langle u,u \rangle = \mathrm {tr}(u^{\dag }u)= ||u||^2\ge 0, \end{aligned}$$

    where \(||u||=\sqrt{\sum _{i,j}|u_{ij}|^2}\) is the Frobenius norm of u, which is nonnegative and has all the required norm properties [27].

The metric defined above is Riemannian and bi-invariant. Bi-invariant metrics are useful for us because of the following.

Theorem 3

(Milnor [26]) Every compact Lie group admits a bi-invariant metric, which has nonnegative sectional curvature.

In the case of a bi-invariant metric, the sectional curvature admits an easier formula, see [26, p. 323, Eqn. (7.3)]:

$$\begin{aligned} \kappa (u,v)=\frac{1}{4}\langle [u,v],[u,v] \rangle . \end{aligned}$$

Furthermore, in a Lie group admitting a bi-invariant metric, geodesic curves have an easy description: they coincide with the exponential map. More precisely, for \(p \in U(M)\), a geodesic curve \(\gamma : I \rightarrow U(M)\) such that \(\gamma (0)=p\) and \(\dot{\gamma }(0)=u\) is of the form

$$\begin{aligned} \gamma (t)=\exp {(up^{-1}t)}\cdot p \end{aligned}$$
(5.2)

In fact, for \(p,q\in U(M)\), there exists a geodesic \(\gamma \) joining p and q with \(\gamma (0)=p\) such that

$$\begin{aligned} \ell (\gamma ([0,t]))=&\int _{0}^{t} \sqrt{\langle \dot{\gamma }(t),\dot{\gamma }(t) \rangle } dt \nonumber \\ =&\int _0^{t} \sqrt{\langle u,u \rangle } dt = \int _0^{t} \left\| u\right\| dt \nonumber \\ =&\left\| u\right\| t. \end{aligned}$$
(5.3)

A geodesic segment \(\gamma : [0,1] \rightarrow U(M)\) is called minimal if it realizes a distance for any \(t\in [0,1]\) , i.e.,

$$\begin{aligned} d(\gamma (0),\gamma (t)) = \ell (\gamma ([0,t])). \end{aligned}$$
(5.4)

By equation (5.2) a geodesic segment \(\gamma : [0,1] \rightarrow U(M)\) joining \(\gamma (0)=p\) and \(\gamma (1)=q\) can be always obtained as

$$\begin{aligned} \gamma (t)=\exp (vt)p,\quad \text { with } v \text{ such } \text{ that } \exp (v)=q p^{-1}. \end{aligned}$$

The concept of a minimal geodesic is related to the concept of principal logarithm in the following way

Lemma 1

Let \(p,q\in U(M)\), let w be a principal logarithm of \(qp^{-1}\). Then, the geodesic segment

$$\begin{aligned} \gamma :[0,1]\rightarrow U(M),\quad t\mapsto \gamma (t)=\exp (wt)p \end{aligned}$$

is a minimal geodesic segment joining p and q.

Recall iK is called a principal logarithm of a unitary matrix \(M\in U(M)\) if

$$\begin{aligned} K^\dag =K,\quad \exp (iK)=M,\quad \text { and the eigenvalues of { K} are in } (-\pi ,\pi ]. \end{aligned}$$

There are efficient algorithms that can compute the principal logarithm of a unitary matrix [28]. We choose this definition of principal logarithm over the one for the interval \((-\pi ,\pi )\) so that there is always a principal matrix logarithm, even for matrices with real negative eigenvalues (-1). Some properties, like infinite differentiability, are lost with this definition, but they are not used in our result.

Proof of Lemma 1

The length of a geodesic segment \(\gamma (t)=\exp (vt)p\) with \(\exp (v)=q p^{-1}\) is (by equation (5.3))

$$\begin{aligned} \ell (\gamma ([0,1]))=\Vert vp\Vert =\Vert v\Vert . \end{aligned}$$

The distance d(pq) between p and q is the shortest length of the curves joining p and q. In the case of a complete metric this shortest length is attained by a geodesic segment joining p and q. Hence, the geodesic segment \(\gamma (t)=\exp (vt)p\) is minimal if and only if

$$\begin{aligned} \Vert v\Vert =\min \{\Vert w\Vert \, :\, \exp (w)= q p^{-1}\}. \end{aligned}$$

But the equation \(\exp (w)= q p^{-1}\) has the following family of solutions

$$\begin{aligned} w=U [\log (\lambda _i )\delta _{ij}] U^\dag \end{aligned}$$

with \(pq^{-1}=U \varLambda U^\dag \), U being a unitary matrix, \([\varLambda ]_{ij}=(\lambda _i )\delta _{ij}\) being the diagonal matrix of eigenvalues of \(pq^{-1}\), and \(\log (\lambda _i)\) being any logarithm of \(\lambda _i\). Observe that

$$\begin{aligned} \Vert w\Vert =\sqrt{\sum _{i=1}^M\vert \log (\lambda _i)\vert ^2}. \end{aligned}$$

This expression only depends on the list of eigenvalues of \(pq^{-1}\) and their logarithms. Since \(pq^{-1}\) is unitary

$$\begin{aligned} \log (\lambda _i)=i(k_i+2\pi l_i),\quad \text { with } k_i\in (-\pi ,\pi ], \text { and } l_i\in \mathbb {Z}. \end{aligned}$$

Then

$$\begin{aligned} \{\Vert w\Vert \, :\, \exp (w)= q p^{-1}\}=\left\{ \sqrt{\sum _{i=1}^M\vert (k_i+2\pi l_i)\vert ^2}\, :\, {l_i}\in \mathbb {Z}\right\} , \end{aligned}$$

with a minimum when \(l_i=0\) for \(i=1,\ldots , M\), which corresponds to a principal logarithm. \(\square \)

6 An application of Toponogov’s comparison theorem

Riemannian manifolds whose curvature is bounded below may be investigated by applying Toponogov’s comparison theorem. We first need to define triangles on the Riemannian manifold.

Definition 1

A geodesic triangle \(T=\varDelta (p_1p_2p_3)\) of a Riemannian manifold M is a set consisting of three segments of minimal geodesics, which are called the sides of T, say

$$\begin{aligned} \gamma _1:[0,1] \rightarrow M, \ \gamma _2: [0,1] \rightarrow M, \ \ \text{ and } \ \ \gamma _3: [0,1] \rightarrow M, \end{aligned}$$

such that \(\gamma _i(1) = \gamma _{i+1}(0)\) for \(i=1,2\), and \(\gamma _3(1)=\gamma _1(0)\). The endpoints \(p_1\), \(p_2\) and \(p_3\) are called the vertices of the triangle. The angle between the tangent vectors to \(\gamma _{i-1}\) and \(\gamma _{i+1}^{-1}\) at \(p_i\) is called the angle of T at \(p_i\) and denoted by \(\alpha _i=\angle (p_{i-1}p_ip_{i+1})\) or \(\angle p_i\). The perimeter \(\ell \) is defined as the sum \(\ell (\gamma _1)+\ell (\gamma _2)+\ell (\gamma _3)\); if we consider, in addition, that the two sides \(\gamma _2, \gamma _3\) are minimal geodesics, and the side \(\gamma _1\) is a geodesic segment, not necessarily minimal, with \(\ell (\gamma _1)\le \ell (\gamma _2)+\ell (\gamma _3)=d(p_1,p_3)+d(p_1,p_2)\), then the set is said to be a generalized geodesic triangle (Fig. 1).

Fig. 1
figure 1

Geodesic triangle \(\varDelta (p_1p_2p_3)\)

Let us set the part of Toponogov’s comparison theorem we are interested in, cf. [25, Theorem 4.2 in Chapter IV]:

Theorem 4

(Toponogov) Let M be a complete Riemannian manifold whose sectional curvatures satisfy \(\kappa \ge \delta \) everywhere for some constant \(\delta \). Denote by \(M_{\delta }^2\) the 2-dimensional complete simply connected Riemannian manifold of constant curvature \(\delta \). Consider a generalized geodesic triangle \(\varDelta (p_1p_2p_3)\) such that \(\gamma _2, \gamma _3\) are minimal and \(\ell (\gamma _1) \le \pi /\sqrt{\delta }\). Then the perimeter \(\ell \le 2\pi /\sqrt{\delta }\) and there exists a geodesic triangle \(\varDelta (\tilde{p}_1\tilde{p}_2\tilde{p}_3)\) in \(M_{\delta }^2\) with the same side lengths \(\ell (\tilde{\gamma _i})=\ell (\tilde{\gamma _i})\), for \(i=1,2,3\) and satisfying \(\alpha _2 \ge \tilde{\alpha }_2\) and \(\alpha _3 \ge \tilde{\alpha }_3\).

Remark

It is to assume that \(\pi /\sqrt{\delta } = +\infty \) when \(\delta \le 0\) in the theorem above.

Theorem 4 allows us to compare triangles of U(M) with triangles in \(\mathbb {R}^2\) by setting \(\delta =0\), since in this case \(M_{0}^2=\mathbb {R}^2\). In our situation it is \(p_1=U\), \(p_3=Id\) and \(p_2=U_a=\exp (v_T)\) is our approximation matrix in \(\mathrm {im}(\varphi )\) (recall that \(v_T\) is the tangential component of the principal logarithm v of U). Set \( \gamma _1(t)=U_a\exp (-v_Tt), \) \( \gamma _2(t)=\exp (vt) \) and \(\gamma _3(t)\) a minimal geodesic segment joining U with \(U_a\). Then \(\gamma _2\) and \(\gamma _3\) are minimal geodesic segments and \(\gamma _1\) is a geodesic segment (not necessarily minimal). Set \(\ell _1=\ell (\gamma _1([0,1]))=\left\| v_T\right\| \), \(\ell _2=\ell (\gamma _2([0,1]))=d(U,Id)\) and \(\ell _3=\ell (\gamma _3([0,1]))=d(U,U_a)\). Observe that

$$\begin{aligned} \ell _1=\left\| v_T\right\| \le \left\| v\right\| +d(U,U_a)=\ell _2+\ell _3. \end{aligned}$$

Hence, by Theorem 4, there exists a geodesic triangle in \(\mathbb {R}^2\) with sides \({\ell }_\mathbf{1}, {\ell }_\mathbf{2}\) and \({\ell }_\mathbf{3}\) of lengths \(\ell _1, \ell _2\) resp. \(\ell _3\), such that \(\alpha _2 \ge \tilde{\alpha }_2\) and \(\alpha _3 \ge \tilde{\alpha }_3\). We want to estimate the distance \(\ell _3\). First of all notice that \({\ell }_\mathbf{3}={\ell }_\mathbf{2}- {\ell }_\mathbf{1}\); hence, the law of cosinus implies that

$$\begin{aligned} \left\| {\ell }_\mathbf{3}\right\| ^2=\left\| {\ell }_\mathbf{2}- {\ell }_\mathbf{1}\right\| ^2 = \left\| {\ell }_\mathbf{1}\right\| ^2+\left\| {\ell }_\mathbf{2}\right\| ^2-2\left\| {\ell }_\mathbf{1}\right\| \left\| {\ell }_\mathbf{2}\right\| \cos {\angle ({\ell }_\mathbf{1},{\ell }_\mathbf{2})} \end{aligned}$$

and

$$\begin{aligned} \ell _3^2=\ell _1^2+\ell _2^2-2\ell _1\ell _2\cos {\tilde{\alpha }_3}. \end{aligned}$$
(6.1)

Since \(\cos {{\alpha }_3}=\frac{\langle v,v_T\rangle }{\left\| v\right\| \left\| v_T\right\| }=\frac{\left\| v_T\right\| }{\left\| v\right\| }\ge 0\), it follows that \(\tilde{\alpha }_3 \le \alpha _3\le \frac{\pi }{2}\) which implies \(-\cos {\tilde{\alpha }_3} \le -\cos {\alpha _3}\) and, so,

$$\begin{aligned} \begin{aligned} \ell _3^2\le&\ell _1^2+\ell _2^2-2\ell _1\ell _2\cos {\alpha _3}=\left\| v_T\right\| ^2+\left\| v\right\| ^2-2\left\| v_T\right\| \left\| v\right\| \cos {\alpha _3}\\ =&\left\| v\right\| ^2+\left\| v_T\right\| ^2-2\langle v, v_T \rangle =\left\| v-v_T\right\| ^2=\left\| v_N\right\| ^2. \end{aligned} \end{aligned}$$
(6.2)

Figure 2 shows a graphical representation of our scenario.

Fig. 2
figure 2

Submanifold \(\mathrm {im}(\varphi )\) in U(M), matrix U and approximation \(U_a\)

Now we want to see the bi-invariant metric in \(\mathfrak {u}(M)\) defined in Sect. 5 as a metric in U(M). First observe that the Riemannian manifold \((U(M),\langle , \rangle )\) is a Riemannian subvariety of the manifold \(\mathcal {M}_n(\mathbb {C})\) of the complex \(n\times n\)-matrices endowed with the Euclidean inner product. If we write \(d_{U(M)}\) for the distance we use on U(M), and \(d_{\mathcal {M}_n}\) for the one on \(\mathcal {M}_n(\mathbb {C})\), then

$$\begin{aligned} \ell _3\ge d_{U(M)} (U_a,U) \ge d_{\mathcal {M}_n} (U_a,U). \end{aligned}$$

The inequality holds since “distance” between two points is the infimum of the length of any two curves joining the points. Now it is easy to find a minimal geodesic for \(\mathcal {M}_n(\mathbb {C})\) joining \(U_a\) and U, namely the segment \(\gamma _4(t)=U_a+(U-U_a)t\) for \(t\in [0,1]\). Since \(\dot{\gamma _4}(t)=U-U_a\), we find that \( d_{\mathcal {M}_n} (U_a,U)=\left\| \dot{\gamma _4}(t)\right\| \cdot 1=\left\| U-U_a\right\| \) and, therefore,

$$\begin{aligned} \ell _3 \ge \left\| U-U_a\right\| . \end{aligned}$$

This inequality together with (6.2) yields

$$\begin{aligned} \left\| U-U_a\right\| ^2 \le \ell _3^2 \le \left\| v_N\right\| ^2, \end{aligned}$$

and we obtain, finally,

$$\begin{aligned} \left\| U-U_a\right\| \le \left\| v_N\right\| . \end{aligned}$$

7 Random unitary matrices and a basis for the image

Before looking into the iterative process that gives the locally optimal approximation to any desired \(U\not \in \mathrm{im}\varphi \), we need to consider a few useful tricks.

While in the previous section we have considered the identity matrix as our starting point in the image group, the results hold for any \(U_0 \in \mathrm{im}\varphi \). The identity has some advantages: it is always in the image, for any values of n and m, and, in terms of computation, it is trivial to generate the \(M\times M\) identity matrix.

However, in general, the landscape of the image group is unknown and numerical experiments show there are multiple local minima. In a 2D space, we can picture the image as an irregular profile with mountains and valleys. For any given starting point, the iterative process ends in a local minimum, but, as we do not know how many minima exist, we need a series of random starting points to sample multiple approximations, each the closest to the intended U in its local neighbourhood.

In order to generate a random unitary \(U_r\in \mathrm{im}\varphi \), we choose a random \(S_r \in U(m)\) uniformly from all the possible matrices and compute \(U_r=\varphi (S_r)\). The random unitaries in U(m) can be generated from samples of a normal distribution [29, 30], and the photonic homomorphism is known [11, 12].

Finally, we need a way to project the logarithm of any \(U\in U(M)\) to the image algebra \(\mathrm{im}d\varphi \). The method is similar to the generation of random unitaries. We start by working on the algebra corresponding to the scattering matrices, \(\mathfrak {u}(m)\), where we can write down a known basis and, using the differential map \(d\varphi \), explicitly given in [18], we can obtain a basis for \(\mathrm{im}d\varphi \), which is a subalgebra of \(\mathfrak {u}(M)\). This basis is not necessarily orthonormal, but it can be orthogonalized and normalized. The details of the whole procedure can be found in [19]. This basis, together with the inner product of Eq. (5.1), is enough to obtain the desired projection on the image algebra.

8 An iterative process for the approximation

In Sect. 6 we have constructed an approximation matrix \(U_1:=U_a \in \mathrm {im}(\varphi )\), starting from \(U=\exp {(v)}\) as

$$\begin{aligned} U_1=\exp {((\log {U})_T)}, \ \ \ \ \text{ with } \ \ \ \ \left\| U-U_1\right\| \le \left\| (\log {U})_N\right\| . \end{aligned}$$

Now, we can repeat this by taking a new approximation \(U_2\in \mathrm {im}(\varphi )\) by considering a geodesic triangle of vertices Id, \(U_1^{-1}U\) and \(U_2\), where \(U_1^{-1}U=\exp {(-v_T)}\exp {v}\) and

$$\begin{aligned} U_2=U_1\exp {((\log {U_1^{-1}U})_T)} \ \ \text{ with } \ \ \left\| U-U_2\right\| \le \left\| (\log {U^{-1}_1U})_N\right\| . \end{aligned}$$

Iteratively,

$$\begin{aligned} \left\{ \begin{aligned} U_0=&\mathrm{Id}\\ U_n=&U_{n-1}\exp ((\log (U_{n-1}^{-1}U)_T)) \end{aligned}\right. \end{aligned}$$

with

$$\begin{aligned} \left\| U-U_n\right\| \le \left\| (\log (U_{n-1}^{-1}U)_N\right\| . \end{aligned}$$

8.1 Convergence

The method described above converges. Consider the sequence \(\{U_i\}\) with \(i\ge 0\) and \(U_0=Id\). For the first step we know that \(d(U_1,U)=\Vert v^1\Vert \), \(d(U_2,U)\le \Vert v^1_N\Vert \), and \(d(U_1,U_2)\le \Vert v^1_T\Vert \). In general, we have:

  1. 1.

    \(d(U_i,U)=\Vert v^i\Vert \),

  2. 2.

    \(d(U_{i+1},U)\le \Vert v^i_N\Vert \),

  3. 3.

    \(d(U_i,U_{i+1})\le \Vert v^i_T\Vert \).

Proposition 1

We have that

$$\begin{aligned} d(U_{i+1},U) \le d(U_i, U). \end{aligned}$$
(8.1)

Furthermore, the equality holds if and only if \(U_i=U_{i+1}\).

Proof

By a successive application of inequalities (2) and (1) above we obtain

$$\begin{aligned} d(U_{i+1},U) \overset{(2)}{\le } \Vert v^i_N\Vert \le \Vert v^i \Vert \overset{(1)}{=} d(U_i, U). \end{aligned}$$

Now, if equality (8.1) holds, then \(\Vert v^i_N\Vert = \Vert v^i \Vert \) and so \(\Vert v^i_T\Vert =0 \); inequality (3) above allows us to conclude that \(U_i=U_{i+1}\). The converse is trivial. \(\square \)

Let us define

$$\begin{aligned} d:\mathbb {N} \rightarrow \mathbb {R}, \ \ i \mapsto d_i:=d(U_i,U). \end{aligned}$$

Proposition 2

The sequence \(\{d_i\}\) is convergent.

Proof

Assume that \(U_i \ne U_{i+1}\) for every i (otherwise, there would exist \(n \in \mathbb {N}\) such that \(U_n=U_{n+1}\), and therefore \(d_n=d_{n+1}=\cdots \) and the sequence converges to \(d_n\)). By Proposition 1 the sequence \(\{d_i\}\) is decreasing; in particular \(d_i <d_1\) for all i. This together with the fact that \(d_i \ge 0\) for all i (i.e. the sequence \(\{d_i\}\) is bounded) implies the convergence. \(\square \)

In fact, the approximation given by the described method is the best possible one in the following sense. Since \(\{d_i\}\) is a convergent sequence, then it is a Cauchy sequence. This means that for every \(\epsilon >0\) there exists \(n_{\epsilon } \in \mathbb {N}\) with

$$\begin{aligned} 0<d_i-d_{i+1} < \epsilon , \ \ \text{ for } \text{ all } \ i > n_{\epsilon }. \end{aligned}$$

Since, by (1) and (2) we have that \(d_i=\Vert v^i\Vert \) and \(d_{i+1} \le \Vert v^i_N\Vert \), it is easily deduced that

$$\begin{aligned} 0\le \Vert v^i\Vert - \Vert v^i_T\Vert \le d_i - d_{i+1} < \epsilon . \end{aligned}$$
(8.2)

We first observe the following inequality:

Lemma 2

$$\begin{aligned} \Vert v^i -v^i_N\Vert ^2 \le (\Vert v^i\Vert -\Vert v^i_N\Vert )\cdot 2 \cdot d_1. \end{aligned}$$

Proof

$$\begin{aligned} \Vert v^i -v^i_N\Vert ^2 =&\Vert v^i\Vert ^2 + \Vert v^i_N\Vert ^2-2\langle v^i,v^i_N \rangle \\ =&\Vert v^i\Vert ^2 + \Vert v^i_N\Vert ^2-2\langle v^i_T+v^i_N,v^i_N \rangle \\ =&\Vert v^i\Vert ^2 + \Vert v^i_N\Vert ^2-2\Vert v^i_N \Vert ^2 = \Vert v^i \Vert ^2 - \Vert v^i_N \Vert ^2 \\ =&(\Vert v^i \Vert - \Vert v^i_N \Vert )(\Vert v^i \Vert + \Vert v^i_N \Vert ). \end{aligned}$$

Since \(\Vert v^i_N \Vert \le \Vert v^i \Vert \), it holds that

$$\begin{aligned} (\Vert v^i \Vert - \Vert v^i_N \Vert )(\Vert v^i \Vert + \Vert v^i_N \Vert ) \le&(\Vert v^i \Vert - \Vert v^i_N \Vert )2 \Vert v^i \Vert \\ \overset{(1)}{=}&(\Vert v^i \Vert - \Vert v^i_N \Vert ) 2d_i. \end{aligned}$$

Proposition 1 implies that \(d_i \le d_1\); hence, we get

$$\begin{aligned} \Vert v^i -v^i_N\Vert ^2 \le (\Vert v^i \Vert - \Vert v^i_N \Vert ) 2d_1. \end{aligned}$$

\(\square \)

Now, inequality (8.2) together with Lemma 2 implies that

$$\begin{aligned} 0 \le \Vert v^i - v^i_N\Vert ^2 \le 2\epsilon d_1. \end{aligned}$$

On the other hand, \(\Vert v^i-v^i_N \Vert ^2 = \Vert v_T \Vert ^2\) and, together with inequality (3), this yields

$$\begin{aligned} d(U_i,U_{i+1}) \le \Vert v^i - v^i_N \Vert ^2 \le 2 \epsilon d_1. \end{aligned}$$

Therefore the sequence \(\{U_i\}\) is a Cauchy sequence itself. This allows us to apply Theorem 2 (Hopf-Rinow) in the complete metric space (Mg), and so the sequence \(\{U_i\}\) converges to a matrix \(\tilde{U} \in M\), and \(\Vert v^T_{\infty } \Vert =0\). Hence the geodesic joining \(\tilde{U}\) with U is normal.

9 Application example

We can see an example of the procedure with the quantum Fourier transform

$$\begin{aligned} Q\!F\!T\left| x\right\rangle =\frac{1}{\sqrt{M}}\sum _{y=0}^{M-1}e^{\frac{i2\pi xy}{M}}\left| y\right\rangle . \end{aligned}$$
(9.1)

We choose the QFT not only for being a useful transformation in quantum information and quantum optics, but also because it can be considered one of the most difficult transformations for linear optics.

We start from the QFT matrix for \(M=3\) (\(n=m=2\))

$$\begin{aligned} \begin{aligned} U=&\frac{1}{\sqrt{3}}\left( \begin{array}{ccc} 1 &{} 1 &{} 1 \\ 1 &{} e^{-i\frac{2\pi }{3}} &{} e^{-i\frac{4\pi }{3}}\\ 1 &{} e^{-i\frac{4\pi }{3}} &{} e^{-i\frac{8\pi }{3}} \\ \end{array} \right) \\ \approx&\left( \begin{matrix}0.57735 &{} 0.57735 &{} 0.57735\\ 0.57735 &{} -0.28868 - 0.5 i &{} -0.28868 + 0.5 i\\ 0.57735 &{} -0.28868 + 0.5 i &{} -0.28868 - 0.5 i\end{matrix}\right) . \end{aligned} \end{aligned}$$
(9.2)

This evolution is impossible to achieve with lossless linear optics alone, which can be checked with the method in [19].

For the implementation of U, we consider the states in the ordered basis \(\{\left| 20\right\rangle ,\left| 02\right\rangle ,\left| 11\right\rangle \}\). All the results are given with 5 significant digits for matrix entries and 10 significant digits for distances, rounding the imaginary or real parts to zero when they are much smaller than the surrounding terms.

The starting point is the identity matrix, for which \(\left\| U-\mathrm{Id}\right\| =2.449489743\). In the first step of our procedure, we take the projection of the principal logarithm of U into the image algebra:

$$\begin{aligned} \log (U)=\left( \begin{matrix}- 0.6639 i &{} 0.9069 i &{} 0.9069 i\\ 0.9069 i &{} - 2.0242 i &{} - 0.45345 i\\ 0.9069 i &{} - 0.45345 i &{} - 2.0242 i\end{matrix}\right) \end{aligned}$$

and

$$\begin{aligned} \log (U)_T= \left( \begin{matrix}- 0.89062 i &{} 0 &{} 0.22672 i\\ 0 &{} - 2.251 i &{} 0.22672 i\\ 0.22672 i &{} 0.22672 i &{} - 1.5708 i\end{matrix}\right) \end{aligned}$$

so that

$$\begin{aligned} U_1=\left( \begin{matrix}0.61786 - 0.75486 i &{} 0.024514 i &{} 0.20595 + 0.073541 i\\ 0.024514 i &{} -0.61786 - 0.75486 i &{} 0.20595 - 0.073541 i\\ 0.20595 + 0.073541 i &{} 0.20595 - 0.073541 i &{} - 0.95097 i\end{matrix}\right) \end{aligned}$$

with \(\left\| U-U_1\right\| =1.770101749\).

After 10 steps, we have:

$$\begin{aligned} U_{10}=\left( \begin{matrix}0.86432 - 0.50294 i &{} 2.889 \cdot 10^{-6} i &{} 0.0020837 + 0.0011983 i\\ 2.889 \cdot 10^{-6} i &{} -0.86432 - 0.50294 i &{} 0.0020837 - 0.0011983 i\\ 0.002 0837 + 0.0011983 i &{} 0.0020837 - 0.0011983 i &{} - 0.99999 i\end{matrix}\right) \end{aligned}$$

with \(\left\| U-U_{10}\right\| =1.732054756\). Further iterations only produce marginal improvements in the distance to the target matrix beyond the fifth decimal place.

We stop after 10 additional steps and use as an approximation

$$\begin{aligned} U_{20}=\left( \begin{matrix}0.86601 - 0.50003 i &{} 0 &{} 0\\ 0 &{} -0.86601 - 0.50003 i &{} 0\\ 0 i &{} 0 &{} - 1.0 i\end{matrix}\right) \end{aligned}$$

with \(\left\| U-U_{20}\right\| =1.732050808\).

9.1 Random initial matrices

In fact, the results hold for any arbitrary \(U_0\in \mathrm{im}\varphi \). Now the procedure becomes computationally more involved, as we need an explicit evaluation of \(U_r=\varphi (S_r)\) for random \(S_r\) matrices, with a complexity which grows combinatorially in n and m. However, as we see in this second example, we can explore different local optima.

For the QFT matrix in Eq. (9.2), if we start at random points, the approximation gravitates towards three solutions. Two of them are at the same distance from the QFT matrix than the approximation we found starting with the identity matrix, 1.7320, with

$$\begin{aligned} U_a^1=\left( \begin{matrix}0.86602 - 0.5 i &{} 0 &{} 0\\ 0 &{} -0.86602 - 0.5 i &{} 0\\ 0 &{} 0 &{} - 1.0 i\end{matrix}\right) \end{aligned}$$

and

$$\begin{aligned} U_a^2=\left( \begin{matrix}0 &{} 0.86602 + 0.5 i &{} 0\\ 0.86602 + 0.5 i &{} 0 &{} 0\\ 0 &{} 0 &{} -0.86602 - 0.5 i\end{matrix}\right) . \end{aligned}$$

There is a third solution with a distance 0.85675 to the QFT for the matrix:

$$\begin{aligned} U_a^3=\left( \begin{matrix}0.43301 + 0.25 i &{} 0.43301 - 0.25 i &{} 0.70711\\ 0.43301 - 0.25 i &{} - 0.5 i &{} -0.35355 + 0.61237 i\\ 0.70711 &{} -0.35355 + 0.61237 i &{} 0\end{matrix}\right) . \end{aligned}$$

These solutions seem to be found with equal probability. For a run of 1000 experiments we found \(U_a^1\) 311 times, \(U_a^2\) 374 times and \(U_a^3\) 315 times. Further experiments showed a similar behaviour.

The best approximation to the \(3\times 3\) QFT matrix \(U_a^3\) comes from a scattering matrix:

$$\begin{aligned} S_a^3=\left( \begin{matrix}0.68301 + 0.18301 i &{} 0.68301 - 0.18301 i\\ 0.68301 - 0.18301 i &{} -0.5 + 0.5 i\end{matrix}\right) . \end{aligned}$$

In general, depending on the starting local point, there are different local optima. Sometimes, two different approximation matrices in the image group give the same distance to the target unitary, as it happens in this example for \(U_a^1\) and \(U_a^2\).

9.2 Translation to an optical setup

The best approximation to the QFT comes from a scattering matrix \(S_a^3\) which corresponds, up to a \(\frac{5\pi }{12}\) global phase, to a balanced two input beam splitter with a scattering matrix

$$\begin{aligned} S_{BS}=\frac{1}{\sqrt{2}}\left( \begin{matrix} 1 &{}{} i \\ i &{}{} 1\end{matrix}\right) \end{aligned}$$

preceded by a \(\frac{-\pi }{3}\) phase shifter in the first port and followed by a \(\frac{\pi }{3}\) in the second port. The phase shifters have scattering matrices

$$\begin{aligned} S_{-\frac{\pi }{3}}=\left( \begin{matrix} e^{-i\frac{\pi }{3}} &{} 0 \\ 0 &{} 1\end{matrix}\right) \text { and } S_{\frac{\pi }{3}}=\left( \begin{matrix} 1 &{} 0 \\ 0 &{} e^{i\frac{\pi }{3}}\end{matrix}\right) , \end{aligned}$$

respectively.

The experimental configuration corresponding to this approximation to the QFT is shown in Fig. 3.

Fig. 3
figure 3

Linear optics interferometer approximating the QFT for a system with two inputs and two photons. The system uses two phase shifters (with shifts \(-\frac{\pi }{3}\) and \(\frac{\pi }{3}\) in the upper and lower mode, respectively) and one balanced beam splitter (centre). This configuration gives one possible approximation, \(U_a^3\), which has matrix distance 0.85675 to the QFT

10 Summary, recommendations and future improvements

We have given an iterative method that finds a linear optical setup that approximates any arbitrary quantum evolution for n photons in m modes. The approximation is optimal in the local neighbourhood of the initial guess. Once we have the closest unitary that can be implemented, \(\widetilde{U}\), we can use a previous method [19] to obtain a scattering matrix \(\widetilde{S}\) that gives the approximated evolution and there are multiple algorithms that give the physical setup corresponding to \(\widetilde{S}\) using only beam splitters and phase shifters [4, 7].

This gives a design method which starts from the desired evolution instead of the usual analysis methods which, given a classical description of the linear optical system, find out the evolution it induces on quantum states.

The proposed algorithm is based on results from differential geometry, in particular, Toponogov’s theorem. They show the method will converge and find local optima.

There are a few practical details worth mentioning. First, numerically, we find that if we try the method on an evolution which is possible to obtain from a linear optics system, \(U\in \mathrm{im}\varphi \), sometimes it will converge to a matrix close the actual solution, but, depending on the local landscape, it might fall into a variety of different matrices all at a similar large distance. In practical applications, the recommendation would be, first, check whether an exact implementation exists (with our previous algorithm [19]) and, if there is none, look for an approximation.

There are also some open problems. The presented results are only valid for lossless linear systems. Linear optical systems including losses and squeezing can be described using quasiunitary matrices [13, 22], which can also be studied using group theory. However, there appear some technical complications and the theory of these linear system is not as developed as in the lossless case. We leave for future research the adaptation of the presented design methods to these scenarios.

Additionally, from the structure of the involved groups and algebras, it is not clear how many local optima exist for any given transformation \(\varphi \). We have proposed a randomized way of exploring the state space of the potential approximation matrices, and, for the limited dimensions that can be numerically explored, it seems to work well, but there is no guarantee the method finds a global minimum. Any further knowledge of the structure of the image group would help in the search for a global minimum or, at least, in finding a probabilistic bound on the optimal approximation. Additionally, a lower bound on \(\left\| U-U_a\right\| \), as opposed to our upper norm, would help to determine the global optimum.

Even in its present form, the proposed algorithm, when combined with previous results, can assist in the design of quantum optical operations and has applications to quantum information and quantum optics experiment design.