Adaptive stochastic parallel gradient descent approach for efficient fiber coupling

In high-speed free-space optical communication systems, the received laser beam must be coupled into a single-mode fiber at the input of the receiver module. However, propagation through atmospheric turbulence degrades the spatial coherence of a laser beam and poses challenges for fiber coupling. In this paper, we propose a novel method, called as adaptive stochastic parallel gradient descent (ASPGD), to achieve efficient fiber coupling. To be specific, we formulate the fiber coupling problem as a model-free optimization problem and solve it using ASPGD in parallel. To avoid converging to the extremum points and accelerate its convergence speed, we integrate the momentum and the adaptive gain coefficient estimation to the original stochastic parallel gradient descent (SPGD) method. Simulation and experimental results demonstrate that the proposed method reduces 50% of iterations, while keeping the stability by comparing it with the original SPGD method. © 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
Free space optical communication (FSOC), which is a high-speed alternative communication technology between satellites, has attracted increasing attention of researchers [1][2][3][4][5]. In FSOC system, the remote distance between satellites and the tiny shake occurring on the transmitter result in severe jitter of the beam and degrade the spatial coherence of a laser beam, thus making the quality of the link decrease dramatically [6]. Ideally, the received laser beam must be coupled into single-mode fiber (SMF) at the input of the receiver module. If the beam fluctuates owing to outer turbulence, the wavefront is introduced with tip/tilt aberration and mismatch with the field of SMF. In consequence, the power of beam coupled into SMF, i.e., the coupling efficiency (CE), is decreased [7][8][9]. Generally, adaptive optics is an effective method to compensate for the wavefront aberration. The fast steering mirror (FSM) is the primary control unit for steering the beam from the laser to improve CE in the fiber coupling system [10][11][12][13].
Due to the complexity of the system, it is challenging to formulate the fiber coupling system explicitly. As a result, researchers usually treat it as a red-box system and formulate the fiber coupling as a model-free optimization problem. Various approaches have been proposed to perform fiber coupling. For instance, stochastic gradient descent(SGD) [14], hill climbing [15], and random search methods [15]. However, these methods all optimize the controlling variables sequentially, which dramatically limits its efficiency on fiber coupling.
In order to accelerate the optimization process, the stochastic parallel gradient descent (SPGD) method is adopted to achieve fiber coupling in parallel. SPGD is firstly adopted by Vorontsov et al. for adaptive optical problems in 1997 [14]. Since then, many applications of the SPGD method have been presented [16][17][18][19][20][21][22][23]. However, the SPGD method may converge to local extremum points and its converge speed can be extremely slow [22], which limits its application in real-world applications, especially in complex systems. In recent years, a few attempts have been conducted to speed-up the convergence and/or avoid converging to the local extremum points. For example, in 2012, Chen et al. improved the SPGD method for satellite-to-ground laser communication links [18]. In 2013, Geng et al. proposed the divergence cost function method, where divergence cost function was proposed as a merit function for SPGD method [19]. In 2015, Wu et al. proposed the multi-perturbation SPGD method with faster convergence than the original SPGD method [20]. In 2017, Yang et al. improved the SPGD method to avoid local extremum points for incoherent beam combination [21]. In 2018, Huang et al. deployed the precisely-delayed SPGD method for adaptive SMF coupling in the free space optical communication [22]. Although these methods have achieved promising results, most of them were proposed for specific optical problems and cannot be adopted to achieve efficient fiber coupling directly.
In this paper, we propose a novel method, called adaptive stochastic parallel gradient descent (ASPGD), to achieve efficient fiber coupling. Specifically, inspired by the Adam optimizer [24,25], which is widely used to optimize the connection weights of deep neural networks, we integrate the momentum and the adaptive gain coefficient estimation to the original SPGD method. The novelty and the main contribution of this work are two-fold: 1) An improved SPGD method is proposed to solve the model-free optimization problem in parallel. It is capable of escaping local extremum points and accelerating convergence. At the same time, it sets the corresponding gain coefficients for different controlling variables adaptively, which makes ASPGD more robust to the learning rate; and 2) we apply the proposed ASPGD method to achieve efficient fiber coupling in a real-world system, which can further advance the FSOC research. Extensive simulation and experiments have been conducted. The simulation and experimental results demonstrate that the proposed method reduces not only 50% of iterations but also keeps the stability by comparing it with the original SPGD method, which verifies the effectiveness and efficiency of our proposed method.

Problem formulation
In FSOC between satellites, the vibration of satellite platform where FSO terminals are mounted induces wavefront tip-tilt aberration into the beam, degrading the beam coupling efficiency(CE) into the single-mode fiber. Fortunately, the optical fiber coupling has proven to be a significant technique for adaptive optical tasks [14], which can effectively improve the fiber CE of the system. As shown in Fig. 1, after the reflection of the mirror and the disturbance of disturbing fast steering mirror(FSM), the laser enters the energy meter after the correction of coupling FSM. The disturbing FSM is used to simulate atmospheric turbulence and satellite, the vibration of the satellite platform. The coupling FSM is the primary control unit for steering the beam from the laser into SMF to improve CE in the fiber coupling system, power meter as the sensor measures the coupling energy of optical fiber. The goal of fiber coupling is to control the FSM to reach the maximal coupling energy by adjusting the controlling variables. To formulate this fiber coupling system, we take the power meter measurement as the objective function J, which is associated with the FSM voltage parameters u 1 and u 2 as J = g(u 1 , u 2 ). Even the function of g is not explicitly defined, its result can be obtained by reading the power meter, and it is assumed to be differentiable w.r.t the FSM voltage parameters u 1 and u 2 [23]. Thus, the fiber coupling can be achieved by searching the optimal FSM voltage parameters to maximize J.

ASPGD
The original SPGD method is widely used in AO for correcting the spot jitter error caused by atmospheric turbulence and mechanical jitter at the receiving equipment to maximize the power meter reading result. It is also can be used to solve our formulated fiber coupling problem. In the original SPGD method, the gradient estimation of the objective function is realized by Following [26], we define the change in objective function as By using the Taylor expansion, we can rewrite Eq. (1) as follows: which yields the following approximation if we ignore the higher order terms: Reference [26] points out that the last term of Eq. (3) has expectation value of zero for random and independently distributed since the term (∆u k ) 2 is equally ∆u 2 . Thus, we can approximate the gradient by disturbing all variables simultaneously as We note that the learning scheme of the original SPGD method can be very slow when there is a long and narrow valley in the objective function surface. In such a situation, the direction of the gradient is almost perpendicular to the long axis of the valley. Thus, the optimizer would oscillate forth and back in the direction of the short axis and moves very slowly along the long axis of the valley. Inspired by [27] and [24], we first introduce the momentum term into the SPGD method to accelerate its convergence. Mathematically, we compute the first momentum of the current time step as: where m t−1 k stands for the momentum of the last time step, and β 1 is a scalar hyper-parameter controlling the decay rates of the past momentum. The momentum depends on both the current gradient and the previous gradients. This manner helps average out the oscillation along the short axis while adds up contributions along the long axis [27]. Furthermore, the original SPGD method adopts a united gain rate for all the optimizing parameters. It would be difficult to search for a suitable gain rate value in the real-world fiber coupling systems. By following [25], we adjust the gain rate for different parameters in SPGD by involving a second momentum term as follows: where v t k stands for the second momentum of the past time step and β 2 is a scalar hyper-parameter controlling the decay rates of the second momentum in the last step. This term sums up the weighted square results of the past gradients, which indicates the uncentered variance of the gradients. In the learning process, we adjust the learning step by dividing the second momentum term. In consequence, we update the parameters as follows: where ε is a small number to avoid numerical problems, and we typically set it as 10 −8 . It can be seen that the updating rule in Eq. (7) makes the momentum biased towards the initial value of the momentum at t = 0, especially when β 1 and β 2 close to 1. To address this issue, [25]  has proposed a correction strategy to estimate the bias-corrected estimates of the momentum values as: The derivation of Eq. (8) can be find in [25]. Let us initialize the momentum value as zero. Then, the first momentum at time step t can be written as: Taking expectations of the both sides of Eq. (9), we have which gives where η 1 = 0 if the true first momentum is stationary; otherwise η 1 can be kept small [25]. Similar result can be obtained for η 2 . To correct the discrepancy between E(m t k ), E(v t k ) and The derivation of Eq. (8) can be find in [25]. Let us initialize the momentum value as zero. Then, the first momentum at time step t can be written as: Taking expectations of the both sides of Eq. (9), we have which gives where η 1 = 0 if the true first momentum is stationary; otherwise η 1 can be kept small [25]. Similar result can be obtained for η 2 . To correct the discrepancy between E(m t k ), E(v t k ) and 15:

end parfor 16: end for
The details of the learning procedure of ASPGD are summarized in Algorithm 1. It is notable that the code block in Lines 9 -14 is executed in parallel for different values of k. The maximal number of learning iterations is taken as the termination condition in this work and is typically set as 100.
The details of the learning procedure of ASPGD are summarized in Algorithm 1. It is notable that the code block in Lines 9 -14 is executed in parallel for different values of k. The maximal number of learning iterations is taken as the termination condition in this work and is typically set as 100.
To intuitively illustrate the effectiveness of the ASPGD method, we apply it to minimize the objective function: As shown in Fig. 2, it can be seen that it has many local minimum points, and its global minimum value is 0. For comparison, we use SPGD to optimize the objective function J as well, and three sets of parameters are evaluated for each method. The values of the objective function obtained by the two methods during the optimization process are shown in Fig. 3, where the left column shows the results of SPGD and the right column shows the results f ASPGD, and different rows display the results under different parameter settings. From the simulation results, we can see that the SPGD method can converge quickly when the parameters are appropriately provided, but it falls into the local minimum ( Fig. 3(a)). When the parameters are changed from (∆u = 0.01 to ∆u = 0.003, ∆u = 0.001), its convergence speed is reduced (Fig. 3(c) and Fig. 3(e)). In contrast, the ASPGD method can converge quickly within 100 iterations and reach the global minimum under all the three parameter settings ( Fig. 3(b), Fig. 3(d) and Fig. 3(f)). The comparison demonstrates that ASPGD can accelerate the convergence speed and improve the capability to reach the global minimum. Also, it shows that the proposed method is robust to the hyper-parameter ∆u.

SMF coupling efficiency
The scheme of SMF coupling is shown in Fig. 4. A beam propagates through an aperture with a diameter of d located at plane A, and is focused via an optical lens with a focal length of f . The tip of the stationary SMF is mounted at the focal plane signed as plane B. The SMF mode field at plane B can be approximated as a Gaussian beam with 1% error. The symbol of λ is the wavelength of the laser beam and ω 0 is the the radius of SMF field. For convenience we consider the calculation of coupling efficiency η in plane A, which is defined as follows [28]: where a r = ∬ )sin[φ(r, θ)]drdθ and ω α = λf πω 0 . In adaptive optical systems, Zernike polynomial is generally adopted to decompose the wavefront phase with distortion to the sum of weighted orthogonal polynomials, which represent various types of aberrations. The wavefront phase φ(r, θ) can be expended as [28]: where Z i (r, θ) denotes the i th Zernike polynomial and a i is the corresponding coefficient of polynomials. In the Zernike polynomials, the 0 th term with coefficient a 0 represents piston that is insignificant to SMF coupling, while Z 1 (r, θ) and Z 2 (r, θ) represent the tilt aberrations along x and y directions, respectively. Tip/tilt error accounts for 87% of the total wavefront aberrations caused by the atmosphere turbulence [28]. In addition, the tracking system is based on the optical communication link in space with a thin atmosphere. Thus, in this work, we ignore the high-order aberrations and compensate only tip/tilt error caused by vibration and atmospheric turbulence.

Simulation analysis
In order to imitate slight atmosphere turbulence and inherent aberrations of the lens, Zernike polynomials with 10 terms is fabricated as the distorted wavefront. The initial coefficients for a 1 to a 10 are given as 2, 2, 0.34, 0.2, 0.15, 0.12, 0.13, 0.16, 0.08 and 0.09, respectively. In the simulation, λ is set to 1550nm, f is 0.71m, ω 0 is 5.2µm and d is set to 0.15m. Since the control voltages of FSM have an approximately linear relationship with the coefficients a 1 and a 2 , we regulate a 1 and a 2 to equivalently simulate tip/tilt control of FSM. The normalized CE is used as the index rather than the absolute value of CE to observe the feature of the method more intuitively, and x-label is set as the motion times of FSM because of the fixed control frequency. The wavefront before the compensation is shown in Fig. 5(a) and the wavefront after the compensation is shown in Fig. 5(b). The normalized CE value obtained by using the compensation is 67.8%, which is much larger than the value of 3 × 10 −4 before the compensation. PV means the peak value, and RMS is the root mean square. Clearly, most of the distortion has been well compensated. Note that to facilitate our observation, the simulation results treat the optimization objective as normalized coupling efficiency. In the simulation, we use Eq. (13) as the optimization goal of SPGD and ASPGD, and control FSM by optimizing a 1 and a 2 of the Zernike coefficients. By considering the randomness of the method, we execute each method 200 times. First of all, we do experiments on two parameters β 1 and β 2 introduced by SPGD to find the optimal parameters. As shown in Fig. 6(a) and Fig. 6(b), we can see that the minimum convergent numbers under β 1 = 0.2 and β 2 = 0.999. Then, we use the setting of β 1 = 0.2 and β 2 = 0.999 for ASPGD and compare it with the SPGD in the simulation. Figure 7(a) and Fig. 7(b) show the optimization curves of SPGD and ASPGD under their optimal parameters, respectively.   Fig. 7, the SPGD method converges after at least 20 iterations, and in the worst case, it converges after up to 65 iterations, averaging at the number around 52 iterations. ASPGD converges to a fixed point after at least 11 iterations, and maximally 27 iterations. The average number of iterations for the convergence of ASPGD is about 22, which less than half of the SPGD method. In addition, the results of SPGD merely depend on the random disturbance at each iteration and the current gradient, which fluctuates greatly. While the ASPGD method not only considers the current gradient information in the iteration process, but also the historical gradient, thus effectively reducing the impact of randomness. Overall, ASPGD converges faster than SPGD, and it is more robust to the randomness of the disturbance.
To further compare the robustness of the two methods to the hyper-parameter ∆u, we evaluate the two methods under the same setting as previous simulation except change the value of ∆u from 0.01 to 0.015 and 0.005. The results are shown in Fig. 8, from which we can see that the SPGD method is extremely sensitive to ∆u. When ∆u becomes 0.015, SPGD almost diverges ( Fig. 8(a)), while when ∆u is reduced to 0.005, the convergence speed of SPGD is reduced twice (Fig. 8(c)). Differently, the ASPGD method still works well under ∆u ∈ 0.015, 0.005. To explore the limitation of the ASPGD method, we adjust ∆u form 0.00005 to 0.5. The results of convergent iteration and the normalized CE obtained by ASPGD are shown in Fig. 9, from which we can see that our ASPGD is able to converge to the optimal CE under ∆u ∈ {10 −5 , 10 −4 , 10 −3 , 10 −2 , 0.1} within 120 iterations, and it obtains the normalized CE of 80% under ∆u = 1. The results show that ASPGD works well under a large range of ∆u values, which makes it easily be applied to real-world applications for the users.

Experiment
To further investigate the performance of ASPGD for fiber coupling, and verify the performance in real-world application systems, we compare the SPGD method with our ASPGD on a fiber coupling platform. It consists of a laser, an SMF, an FSM, and an optical power meter. The scheme and the experimental setup are shown in Fig. 1 and Fig. 10, respectively.
As shown in Fig. 1, the power meter is designed for receiving a light beam from the laser. The wavelength of the laser beam is 1550nm, the conversion coefficient of the optical power meter's output (voltage) and input (optical power) is measured to be 39.475V/mW, the diameter of fiber core is 9m and the sampling frequency of the controller is 500Hz. The beam is reflected by FSM and enters the optical power meter. According to the variation of optical power, FSM is controlled to move tinily so as to calculate the gradient [4]. For a fair comparison, we evaluate the performance of SPGD and ASPGD under the same initial conditions. We set the same initial point for both tested methods, and report the results with their corresponding optimal parameter values. The optimal setting for SPGD is ∆u = 1, α = 7000, and the optimal setting for ASPGD is ∆u = 1, α = 50, β 1 = 0.2, β 2 = 0.999, ε = 10 −8 . The experimental results are shown in Fig. 11, from which we find that the curve of the SPGD method rises slowly at the beginning and reaches the maximum upon about 130 iterations. However, the ASPGD method can dynamically adjust the gain according to the gradient value at the beginning to achieve rapid convergence. Finally, it reaches the maximum after about 50 iterations, which is much faster than SPGD. Fig. 10. A real-world fiber coupling system. It is constructed based on the fiber coupling scheme shown in Fig. 1.  Fig. 11. Comparison of SPGD and ASPGD on the real-world fiber coupling system. The red curves indicate the results of SPGD, and the red curves indicate the results of ASPGD.

Conclusion
In this paper, an improved SPGD method (ASPGD) is proposed to achieve efficient fiber coupling. By integrating the momentum and adaptive gain coefficient estimation into the original SPGD, our proposed method is able to avoid converging to the local extremum points and accelerate the convergence speed. The simulation results show that the ASPGD method can improve the stability of the method and accelerate the convergence speed. Specifically, compared with SPGD, the iteration number of ASPGD is reduced by 50%. At the same time, the method is robust to parameter uncertainties and can converge for a wide range of parameters (∆u = 0.00005 − 0.5 ). At last, the effectiveness of the method is also evaluated on a real-world fiber coupling system. The experimental results show that our ASPGD converges much faster than the original SPGD method as well.
In the future, as a general optimization method, we will investigate how to apply the ASPGD method to more complex optical problems.

Funding
National Natural Science Foundation of China (NO.61905253).