Fast reconstruction of programmable integrated interferometers

Programmable linear optical interferometers are important for classical and quantum information technologies, as well as for building hardware-accelerated artificial neural networks. Recent results showed the possibility of constructing optical interferometers that could implement arbitrary transformations of input fields even in the case of high manufacturing errors. The building of detailed models of such devices drastically increases the efficiency of their practical use. The integral design of interferometers complicates its reconstruction since the internal elements are hard to address. This problem can be approached by using optimization algorithms [Opt. Express 29, 38429 (2021)]. In this paper, we present a novel efficient algorithm based on linear algebra only, which does not use computationally expensive optimization procedures. We show that this approach makes it possible to perform fast and accurate characterization of high-dimensional programmable integrated interferometers. Moreover, the method provides access to the physical characteristics of individual interferometer layers.


Introduction
The use of classical and quantum properties of light opens up new possibilities for information processing methods [1]. Programmable optical circuits play an important role in classical and quantum optical communications [2][3][4][5], in quantum computing [6][7][8], and in machine learning problems [9][10][11]. The key element of such devices is a programmable linear optical interferometer. When control parameters are fixed, it performs a linear transformation of input fields: where is × complex transfer matrix of the interferometer. By changing the values of the control parameters one can change . This dependence is determined by the interferometer internal structure, i.e. by the position of beam splitters (both two-and multichannel) and phase shifters [12][13][14][15]. In the absence of losses, such structures are capable of implementing an arbitrary unitary transformation of the fields. Moreover, a proper choice of the architecture can achieve this even in the presence of high manufacturing errors [14][15][16].
One can tune the interferometer to some target unitary transformation in two ways. The first way is empirical. For each , one optimizes the control parameters in order to bring the experimental results of optical fields measurements closer to the target ones [17]. This does not require building an exact model of interferometer internal structure, but is impractical, since it is necessary to measure all matrix elements at each step. For each new target matrix , the procedure must be repeated. Another way is to first build a complete mathematical model of the interferometer based on the results of some set of measurements [18]. Then, the search for the desired set of control parameters is performed without any additional measurements. This procedure faces some difficulties. Since large-scale linear optical devices can be implemented in the integral design only, it is almost impossible to split the large-scale reconstruction problem into several low-scale ones. It needs additional optical outputs or modular optical circuits [19] that causes additional losses and instability. Trying to characterize an integrated device as a whole, one has to face the fact that the results of each measurement are determined by a large number of internal device parameters. As a result, the efficiency of the optimization procedure rapidly decreases with increasing dimension .
In this paper, we present a method for constructing a model of a programmable integrated interferometer based on algebraic operations only. This approach needs limited number of operations and is much more computationally efficient than the one using numerical optimization. The method is based on the layered interferometer architecture [15], which is described in Section 2. Further in Section 3, we describe the procedure for reconstructing interferometer mixing layers. As in [18], it requires the measurement of all matrix elements of the complex transfer matrix (up to a global phase) for a given configuration of phase layers. Common methods for measuring the transfer matrices of optical interferometers using laser [20], thermal [21], and two-photon [22,23] fields make it possible to measure matrix elements up to input and output phases only. A non-ambiguous measurement of the matrix can be performed using homodyne detection of all the input and output coherent fields in (1) and requires ( 2 ) measurements [24]. In Section 4, we show that the proposed algebraic approach allows one to predict the transformation matrix for an arbitrary configuration of phase layers with an accuracy not lower than that for the method based on numerical optimization. The mixing layers are determined ambiguously, however, in the case of homogeneous losses, our method makes it possible to uniquely determine the normalized transmission coefficients of each individual layer. In addition, we show that the method is highly robust to phase layers control errors.

Interferometer model
Below we consider the universal layered interferometer architecture consisting of alternating mixing and phase layers [15,18] (Fig. 1). The corresponding transfer matrix has the form Here and Φ are transfer matrices of the mixing and phase layers, respectively. Each mixing layer is an -channel transformation that sufficiently distributes the signal of each input channel to all the output channels. All matrices can be different. Ideally, they are unitary, but this property is lost in the presence of linear losses. For the phase layer, each channel is subjected to a controlled phase shift through a thermo-or electro-optical effect. The corresponding transfer matrix has a diagonal form: Expression (2) can also be used to describe any other interferometer architecture (for example, the Reck [12] or Clements [13] architecture). In this case, can be the transfer matrices of beam splitters acting on two channels only, and the phase layers do not control all channels, but only a few ones.
With the number of mixing layers = , the layered architecture makes it possible to obtain an arbitrary transformation due to the proper choice of phases ( ) . We mostly consider this particular case.
The problem of constructing an interferometer model is reduced to determining all transfer matrices and Φ . The dependence of ( ) on the control parameters can be pre-calibrated even for unknown . For this, a simple sin dependence of the interferometer output intensity on the value of any single controlled phase is used (all other phases are disabled) [2]. Below we assume that all phase layers are pre-calibrated (later we will show that our reconstruction method is robust to phase control errors). Reconstruction of the matrices is complicated by the fact that there is no way to "turn off" all mixing layers, except for one, and perform its measurement. However, in the next section, we show that it is possible to obtain a matrix equation that depends on a single mixing layer only by an appropriate choice of phase layers configurations. This equation is reduced to an eigendecomposition problem, which is solved algebraically.

Mixing layers reconstruction
In this section, we assume that we have full control over the phase layers. Later in Section 4, we show that the proposed method is robust to control errors. We also assume that none of the mixing layers has absolute losses in any channel, so all the matrices 1 , . . . are invertible. We also do not restrict them to be unitary.
We describe the reconstructing algorithm in several steps. First, we consider the ideal case when, for any configuration of the interferometer, we know exactly its full transfer matrix . Next, we consequentially take into account the impossibility of experimental determination of the global phase of , the presence of errors in measurements and phase layers control, and the special case of homogeneous losses inside the interferometer. Finally, we present the pseudo-code for the complete algorithm. Its software implementation is available online [25].

Ideal case
Consider two configurations of the interferometer phase layers. The first configuration has all the phase layers disabled. The corresponding transfer matrix is 0 = . . . 2 1 . Next, the phases { , = 1, . . . , } are set in the second phase layer: 1 = . .
Since the matrix Φ is known and diagonal, this equation is solved with respect to 1 by the diagonalization of 1 . By induction, one can iteratively obtain all mixing matrices up to −1 . In particular, we enable only the ( + 1)-th phase layer to obtain the -th mixing layer: Using and the results for all previous − 1 matrices we compute the following matrix: The following equation determines its eigendecomposition: Since eigenvectors diagonalize , we can use it to construct −1 . To do this, one must sort the solutions of (6) in such a way that matches eigenvalues with the diagonal of Φ. Let us define the sorting indices = { : = exp( )}. Then the inverse transfer matrix has the form This procedure goes successively for = 1, . . . , − 1. Finally, the transfer matrix of the last mixing layer is = 0 The procedure does not involve the first and last phase layers.
Note that for the correct work of the described method it is necessary that all from (6) (so the phases ) must be different. Eigenvectors with equal eigenvalues cannot be uniquely defined, this leads to errors in constructing the columns of −1 .

Ambiguity
The described algorithm determines all mixing layers of the interferometer up to following transformations: where Λ are arbitrary invertible diagonal matrices. Such ambiguity is caused by the fact that the equation (6) is invariant under the multiplication of the eigenvector by an arbitrary number. This, however, is not the limitation of the proposed method, but the consequence of the impossibility to address each individual mixing layer separately in the experiment. Indeed, let us write the transfer matrix of the interferometer with arbitrary control phases taking (8) into account: which is equivalent to the initial transfer matrix. We have used the commutativity of diagonal matrices Λ and Φ in the second line of the equation (9). Thus, transformations (8) cannot be observed in the experiment, and they do not affect the ability of the model to predict the transfer matrix for any configuration of phase layers. Physically this means that we cannot distinguish the losses and constant phase shifts at the output of one mixing layer from the losses and phase shifts at the input of the next mixing layer (Fig. 2). Fig. 2. Illustration of the impossibility to distinguish the losses and phase shifts at the output of one mixing layer from the losses and phase shifts at the input of the next mixing layer. Boxes with symbols and denote phase shifts and linear losses, respectively.

Effect of the global phase
The approach described above gives an exact solution, when the matrices are exactly known. In practice, the elements of the transfer matrices are measured experimentally and, therefore, differ from the original ones. These differences are primarily related to the undetectable global phase. In this regard, in equation (6), we obtain eigenvalues = exp , where the set of phases { } differs from the original set { } by some constant value . It makes the matching between { } and { } ambiguous and could violate the sorting of eigenvectors. For example, a set of phases from Fig. 3a is invariant under 2 / rotation and can't be used to determine in a unique way. This problem is solved by an appropriate choice of controllable phases such that { + (mod 2 )} ≠ { } for any non-trivial . Consider the example presented in Fig. 3b: One can observe that this set is invariant under the full 2 rotations only. Further, it is convenient to move from absolute phases to relative phases between adjacent channels of the phase layer (channel is considered adjacent to channel 1): It is easy to see that the relative phase for the first channel is twice the others for the choice (11): We can then calculate the sorting indices of eigenvalues from (6) by relative phases that are insensitive to the global phase : Systematic and statistical errors also distort the spectrum of the matrix , leading to more general eigenvalues: = exp , where ≈ 1, ≈ + . If the errors are not too large then we can again use the above approach, but instead of an exact equality we look for an approximate one: Note that in practice this minimization is not necessary. The set (11) is sorted in ascending order, so one could also sort all and then find the maximum value of Δ . In view of (13), this value corresponds to the index 1 . The remaining elements will follow it on the unit circle in the anti-clockwise direction (Fig. 3c).

Noise handling
The described algorithm works well on small dimensions. However, as increases, the density of { } values on the unit circle increases. As a result, the eigenvalue sorting rule becomes more noise sensitive (Fig. 3d). One can reduce this effect by activating only a few phases at a time and calculating only a few columns of the matrix −1 . Denote by = { } the set of indices of these columns and select the following set of phases ( Fig. 4a): where < is the size of . The set (16) contains + 1 unique values. After measuring the interferometer transfer matrix and obtaining the matrix from (5), its eigenvalues are sorted according to rule (15). It turns out that the values for 1 ≤ ≤ − are close to each other, and Δ 1 is maximal (Fig. 4b). The remaining values correspond to the desired eigenvectors (Fig. 4c). The smaller the number of simultaneously estimated columns of −1 , the less densely the corresponding phase values can be located. As a result it is possible to resolve the individual eigenvalues of the matrix from (6) quite well even for large values of dimension . In the limit = 1, we reconstruct the columns of the matrix −1 one by one. This, however, complicates the experiment: it is required to measure ∝ / transfer matrices for each mixing layer. The total number of transfer matrices used is = ( − 1) / + 1.
Note that in the above algorithm, the phases of eigenvalues were used for the correct eigenvectors sorting only. At the same time, the absolute values of contain information about the true values of the controlled phases at the phase layer. These values can serve for more accurate calibration of the phase layer, taking its cross-talk into account.

The case of homogeneous losses
The described procedure does not impose any constraints on resulting matrices . In certain cases, it may be known a priory that the interferometer losses are homogeneous. Then each belongs to the set of unitary matrices up to a constant factor 0 describing the homogeneous losses. Thus, one can enhance the accuracy by projecting each to this set. For some non-unitary matrix , the projection is given by Here matrices , and form the singular value decomposition of . The matrices and are unitary, while is non-negative diagonal matrix and equals to the identity matrix for unitary . Within the unitary model, the transformations Λ from (8) are limited by arbitrary phase shifts only (there are no linear loss blocks in Fig. 2). This ambiguity does not affect the normalized transmission coefficients of mixing layers Thus, this coefficients are defined unambiguously. Below we consider the case 0 = 1 since homogeneous losses could be measured separately and compensated by collecting more data.

Algorithm pseudo-code
Algorithms 1 summarizes the described procedure of mixing layers reconstructions. It takes the following parameters as the input: -the number of input and output channels of the interferometer; -the number of mixing layers; -the maximal number of −1 columns, estimated at once. The output of the algorithm are transfer matrices ( = 1, . . . , ) of all mixing layers.
The algorithm software implementation is available online [25].

Numerical simulation
In this section, we analyze the efficiency of the proposed algorithm in the presence of various kinds of errors. In addition, we compare it with the optimization algorithm from [18].

Error models
We define random mixing layers in two steps. In the first step, we generate a random unitary matrix where proj[·] is the unitary projection operation (17), and characterizes the degree of distinction of from the -dimensional Hadamard transform, proportional to the discrete Fourier transform matrix (the reconstruction method is not restricted to this particular type of mixing layers and could by applied to any invertable matrix): In other words, we insert random linear losses of no more than into the "middle" part of each random unitary transformation. Random matrices of the form (21) are generated independently for each mixing layer to create a true model of the programmable interferometer. We denote these matrices as true .
We also simulate the errors of phase layers control. Let { } be the desired set of phases in the -th phase layer. The relative control error of each phase is given by : We simulate the cross-talk between the phases of a the single phase layer as following: where characterizes the cross-talk strength. The errors of the type (22) are simulated independently for each phase layer each time the control phases of the interferometer are changed.
An important condition is the absence of cross-talk between different phase layers. This condition is usually satisfied in the experiment, since the corresponding control elements (electrodes or heaters) are usually quite far apart on the optical chip.
We simulate the statistical noise of the transfer matrix measurements by a random Gaussian error for each matrix element: where is the noise level, and is the measurement protocol size (the number of transfer matrices to measure). The value includes all the statistical errors in the transfer matrix measurement procedure (e. g. described in [24]) and depends on the experiment details. The correction √ is based on the standard assumption that statistical errors are inversely proportional to the square root of the acquisition time. It allows to compare different methods under the conditions of a fixed data acquisition time. For example, let Method A requires to measure 50 transfer matrices and Method B requires 100. Then, if the experiment duration is limited, one needs to reduce the measurement time twice for each matrix for Method B (we neglect the phase layers configuration switching time). This relatively increases the variance of matrix elements estimates by a factor of 2 for Method A. The linear error increases √ 2 times. The error (24) is generated independently each time, when the algorithms need to obtain a new transfer matrix.

Benchmarks
After the numerical experimental data generation we reconstruct the interferometer model. Then we estimate the ability of the model to predict the full transfer matrix given arbitrary phase layer configuration (each phase of each phase layer is chosen randomly from 0 to 2 ) by calculating the fidelity between the predicted pred and true true transfer matrices of the entire interferometer (the normalized squared dot product of the matrices): The quantity Δ = 1 − , in turn, plays the role of the distance between the matrices, which is insensitive to the value of the undetectable global phase (similar to the distance between two quantum processes [26]). We consider the 95-th percentile Δ 95 in a test set of 1000 random phase layer configurations [27]. Note that the value of Δ 95 is affected not only by the quality of the mixing layers reconstruction, but also by the presence of phase layer control errors in true , which are assumed to be zero in pred . In Section 3.5, we pointed out that the method makes it possible to uniquely determine the normalized transmission coefficients (18) of mixing layers. Therefore, we also consider the measure It characterizes the maximal error of transmission coefficients estimation over the all mixing layers.
Here true and rec are respectively the true and reconstructed transmission matrices of the -th mixing layer. Both Δ 95 and Δ max are insensitive to homogeneous losses. Considering Δ max in addition to Δ 95 is useful for two reasons. First, Δ 95 is an integral estimate that includes both errors in determining mixing layers and errors in controlling phase layers. Since the latter remains unknown, Δ 95 may be quite high even if each mixing layer is determined accurately. Secondly, the determination of the transmission coefficient is of independent interest for debugging the manufacturing technology of integrated interferometers. The use of Δ max , however, makes sense for the unitary model only (see Section 3.5). In the presence of heterogeneous losses, the matrices rec are determined ambiguously and it makes no sense to compare them directly with true .

Comparison with optimization method
Let us consider the case when the number of interferometer channels and the number of mixing layers are 5 ( = = 5), and there are no linear losses ( = 0) and phase control errors ( = = 0). We generate a random programmable interferometer according to Section 4.1 and perform the reconstruction of all its mixing layers according to Section 3. As in [18], we use the model of unitary mixing layers. The initial data used in the algorithm is also fed to the input of the algorithm from [18] with an open source code. We use the version of this algorithm that uses the approximate prior knowledge of mixing layers transfer matrices. Following the example from the library files, the following algorithm parameters are chosen: mini-batch size is five, the prior knowledge coefficient = 0.24. The number of training epochs varies from 20 to 1500 depending on the dimension and the measurement protocol (the value is high enough to reliably rich the accuracy saturation). In total, we considered 3 different measurement protocols: 1. First, all phases are set to zero. Next, 20 configurations are considered. In each one a single non-zero phase is set equal to 2 /3. Each of these 20 configurations allows one to define = 1 column of the inverse transfer matrix of one mixing layer (see Section 3.4). The total number of configurations is = 21.
2. First, all phases are set to zero. Next, 4 configurations are considered. In each one the phases (11) are set on one of the phase layers. Each of these 4 configurations allows one to determine all = 5 columns of the inverse transfer matrix of one mixing layer (see Section 3.4). The total number of configurations is = 5.

A set of
= 300 random interferometer configurations is considered (only for the optimization algorithm from [18]). Fig. 5 shows the simulation results. In the absence of phase layers control errors, the dependencies of Δ max (Fig. 5a) and Δ 95 (Fig. 5b) on the measurement error turned out to be close, regardless of the measurement protocol and the reconstruction method used. The closeness is so high that plotting them in the same axes on a log scale make the distinction barely seen. Approximately, we have observed Δ max ∝ and Δ 95 ∝ 2 . From Fig. 5c it can be seen that the presence of phase layer control errors does not affect our method's ability to estimate the mixing layer transmission coefficients [ ] : it is limited by the interferometer measurements accuracy only. This is because the algorithm uses the phases values only to do the correct sorting. However, starting from a certain critical value of the error , the accuracy of determining [ ] sharply decreases, since the sorting is violated. The value of can be increased by decreasing the number of unique phases at each iteration (it is defined by the parameter of the algorithm). The optimization algorithm from [18] relies on the particular phase values, so the control errors impair the estimation of [ ] . Despite this, the algorithm manages to train the entire interferometer as a whole with an error Δ 95 not higher than that provided by our algorithm (Fig. 5d). Approximately, we have observed Δ 95 ∝ 2 . Despite the closeness of the results for optimization algorithm and our algorithm, the latter performs much faster. To show this, consider two extreme versions of our algorithm ( = and = 1). For the optimization algorithm from [18], we use a set of 300 training phase layer configurations. Starting from a certain value, changing this number does not affect the computation speed significantly. The number of epochs is chosen to be minimal in order to guarantee a correct result. On the range of considered dimensions , the presented reconstruction algorithm worked more than two orders of magnitude faster (Fig. 6). We also considered the optimization algorithm in the "black box" model (the absence of any prior knowledge about the mixing layers). In this case, as expected, there was a rapid growth in computational complexity with an increase in from two to six (as in [18], we also failed to obtain the convergence of the optimization algorithm in the "black box" model for > 6). Our algorithm, also based on the "black box" model, showed a much slower increase in computation time. The independent trials of numerical experiments were performed in parallel on 52 CPU cluster; Intel(R) Xeon(R) CPU E5450 @ 3.00GHz.

Algorithm scalability
Above we have mentioned that the total number of required interferometer configurations for our method is = ( − 1) / + 1, where is the number of input and output channels, is the number of mixing layers, and ∈ [1; ] is the number of simultaneously estimated columns of the inverse matrix of a single mixing layer. The time required to measure the transfer matrix for each configuration is estimated as ( 2 ) [24]. Therefore, for = the total measurement time scales as ( 4 / ).
The reconstruction algorithm requires ( ) algebraic operations including matrix multiplication, taking the inverse matrix, and computing matrix eigendecomposition. The complexity of these procedures does not exceed ( 3 ) [28]. For = the total complexity of the entire procedure can be estimated as ( 5 / ).  From these estimates, one can see the number to be an important parameter that affects the algorithm speed. The value = is optimal from the point of view of measurement and computation speed. However, in this case, with increasing , the sensitivity to control errors of phase layers increases: if the error is higher than the critical value , the mixing layers estimates accuracy sharply decreases. At the same time, for a fixed , the value of changes slightly with increasing (Fig. 7a). Note also that for < transmission coefficients matrices are determined with approximately the same accuracy for all mixing layers (Fig. 7b). That is, despite the iterative nature of the algorithm (see eq. (5)), there is no effect of error accumulation.
The prediction error Δ 95 grows polynomially with increasing (Fig. 7c). This is primarily because at a fixed data acquisition time, the size of the measurement protocol increases. As a result, the measurement accuracy of each transfer matrix decreases (see eq. (24)), impairing the mixing layers estimation accuracy. The figure shows that one can slightly improve the accuracy by performing the projection of the mixing layers to the set of unitary matrices (Section 3.5). This, however, does not affect the rate of growth (about Δ 95 ∝ 4 ). When the heterogeneous linear losses are present, the rate of growth becomes higher (about Δ 95 ∝ 5.8 for = 0.1). This can be explained by the fact that the condition number of the transfer matrices of the mixing layers increases when heterogeneous losses occur. As a result, the matrix inversion procedure becomes more sensitive to statistical fluctuations.

Conclusion
We have shown that the estimation of mixing layers in the layered architecture of the programmable integrated optical interferometer is possible without any use of numerical optimization. The proposed algorithm has a single parameter -the number of simultaneously estimated columns of a mixing layer inverse transfer matrix. We have performed the numerical simulation and compared the results with the results obtained using the optimization algorithm. It has been found that under the equal conditions both methods provide the same accuracy, regardless of the measurement protocol. At the same time, the exclusively algebraic approach in our algorithm has shown a significantly lower computation time.
Moreover, our algorithm uses the control phases values only to correctly sort the columns of the adjacent mixing layer. This makes the method almost insensitive to phase control errors up to some critical error value. This critical value can be increased by choosing a smaller value of the parameter . Small , however, increases the number of operations in the reconstruction algorithm, so in practice it is worth choosing an optimal based on the experimental accuracy of phases control. We believe that extending the algorithm to consider the absolute values of the phases can help in determining the phase control errors. This is the subject of further research.
Finally, note that any interferometer architecture (for example, the Reck [12] or Clements [13] architecture) can be brought to the layered architecture considered here by adding controllable phase shifters in some channels. However, the possibility to directly apply our method to an arbitrary architecture remains unclear and is also the subject of further research.