Pseudo-spectral optimal control of stochastic processes using Fokker Planck equation

Abstract Motivated by the successful implementation of Pseudo-spectral (PS) methods in optimal control problems (OCP), a new technique is introduced to control the probability density function (PDF) of the state of the 1-D system described by a stochastic differential equation (SDE). In this paper, the Fokker Planck equation (FPE) is used to model the time evolution of the PDF of the stochastic process. Using FPE instead of SDE, changes the problem of stochastic optimal control to a deterministic one. FPE is a parabolic PDE. Solving an OCP with PDE constraint is computationally a difficult task. We use two strategies to efficiently solve this OCP problem: firstly, we use PS methods in order to transform the OCP to a non-linear programming (NLP) with fewer discretization points but higher order of accuracy, and secondly, we utilize Genetic algorithm (GA) to solve this large-scale NLP in a more efficient approach than gradient-based optimization methods. The simulation results based on Monte-Carlo simulations prove the performance of the proposed method.


Introduction
Stochastic optimal control is one of the main subfields of control theory. It is introduced due to considering more realistic models where many systems in different branches of science are subjected to randomness. It is the subject of study in traffic control (Zhong, Sumalee, Pan, & Lam, 2014), airspace industry (Okamoto & Tsuchiya, 2015), cancer chemotherapy (Coldman & Ali Namadchian ABOUT THE AUTHOR Ali Namadchian received the MSc degree in control theory and control engineering from the Islamic Azad University of Mashhad, Mashhad, Iran, in 2013. He is currently pursuing the PhD degree with the Department of Electrical and Control Engineering, Tafresh University, Tafresh, Iran. His current research interests include stochastic control, nonlinear control systems and optimal control.

PUBLIC INTEREST STATEMENT
Fokker Planck equation describes the time evolution of the trajectories of the stochastic differential equation. The FPE optimal control is an alternative approach to stochastic optimal control of the SDEs. Unlike, optimal control of the SDEs, FPE optimal control is a deterministic approach to solve a stochastic process. In this paper, the FPE optimal control is considered and instead of controlling the trajectories of the stochastic system, the probability density function of this trajectories have been controlled. The paper proposed a pseudo-spectral discretization strategy in combination with genetic algorithm to reduce the computational complexity of the FPE optimal control. Murray, 2000), cyber security systems in computer science (Shang, 2013(Shang, , 2012, etc. When the system randomness is bounded and the bounds are known, the problem of finding a suitable control action can be dealt with robust control approaches (Doyle, Francis, & Tannenbaum, 2013;Wu & Lee, 2003;Zhou & Doyle, 1998). While the bounds of uncertainty are unknown and the probability distribution of the randomness is available, the stochastic framework should be carried out in optimal control (Åström, 2012;Herzallah, 2018;Sell, Weinberger, & Fleming, 1988;Touzi, 2012). There is a vast amount of literature on stochastic optimal control of linear systems with additive noise where the certainty equivalence property leads to the separation principle in stochastic control (Bar-Shalom & Tse, 1974;Mohammadkhani, Bayat, & Jalali, 2017). Under certain conditions, separation principle asserts that finding a control action for a stochastic system can be restated as designing a stable observer and a stable controller. In the other words, a stochastic problem can be recast as two deterministic problems. Linear quadratic Gaussian control (LQG) is one of the most prominent stochastic control methods. It is simply the combination of Kalman filter as an observer and a linear quadratic controller.
In spite of much research on the subject of separation principle for nonlinear systems (Atassi & Khalil, 1999;Khalil, 2008), no general separation principle exists for nonlinear systems. Rich theoretical foundations in the literature regarding the deterministic optimal control encourage control engineers to convert stochastic optimal control into a corresponding deterministic one. One approach to remodel a stochastic process into a deterministic counterpart is using Fokker Planck equation (FPE). FPE is a parabolic PDE that models the time evolution of probability density function (PDF) of a stochastic process. PDF of the states of a stochastic process fully characterize the statistical behaviour of the states. Under this scheme, the control function is sought such that the PDF of the states reaches to a desired PDF in time.
Several studies have been published on stochastic optimal control using FPE in the literature. In Annunziato and Borzì (2010), the FPE framework for optimal control of the 1-D stochastic process was presented. Finite difference method is utilized for discretization of the OCP. In Annunziato and Borzì (2013), the authors argued the problem of stochastic optimal control of multidimensional systems and an efficient computational method based on Chang-Cooper scheme was proposed. In Fleig, Grüne, Pesch, and Altmüller (2014), some theoretical results regarding the existence and uniqueness of the solution of FPE were proved. Also, the existence and uniqueness of the solution of stochastic optimal control using FPE have been investigated; the numerical solution was based on finite difference method. In Borzì (2010, 2013) and Fleig et al. (2014), the control function is only the function of time, but in Fleig and Guglielmi, it is assumed to be state dependent as well as time dependent. In Buehler, Paulson, and Mesbah (2016), a stability constraint was integrated into the FP optimal control in order to guarantee the system stabilization in probability (Battilotti & De Santis, 2003). All the aforementioned papers implement receding horizon control strategy known as model predictive control (MPC) (Camacho & Bordons, 2012;Wang, 2009). To the best of our knowledge, the focus of the literature was not on the numerical discretization of Fokker Planck optimal control.
The main contribution of this work is on the discretization of the FPE optimal control problem (OCP) using Pseudo-spectral (PS method). PS optimal control is a fast computational method for solving general OCPs. It uses the foundations of PS theory to solve an OCP. The main reason that PS method is suitably fit to the OCPs is: (1) Efficient approximation method for integration in the cost function, solving ODE that governs the dynamic of the system and discretization of the state-space constraint Gong et al., 2007;Gong, Kang, & Ross, 2006). (2) There is sufficient theoretical support for convergence and consistency of PS method for optimal control based on covector mapping principle (Gong, Ross, Kang, & Fahroo, 2008;Ross, 2005), also PS method is known to be one of the top three numerical solutions of PDEs (Trefethen, 2000). In this paper, instead of approximating ODE that describes the time evolution of the state of the system in a formal PS optimal control framework, we approximate the Fokker Planck PDE which is the dynamical constraint of the stochastic system. It would be worth mentioning that not only many theoretical OCP have been solved by PS optimal control, it is also widely applied in practice. It has been successfully implemented in different NASA flight systems (Ross & Karpenko, 2012) and industrial and military applications (Gong et al., 2007). In this research, we utilize the Legendre PS method (Elnagar, Kazemi, & Razzaghi, 1995) based on Legendre-Gauss-Lobatto (LGL) points which is proven to be the most appropriate PS method for solving the finite time OCP with boundary conditions . Discretization of FP optimal control yields a large-scale non-linear programming (NLP) which is a hard computational problem. A new intuitive efficient technique based on Genetic algorithm (GA) is adapted to reduce the computational complexity of the NLP.
The paper is structured as follows: Section 2 gives a brief review of FPE optimal control with necessary assumptions and the main OCP is determined. PS method fundamentals are described in details in Section 3. In Section 4, the Methodology of FPE discretization and the proposed method is presented base on GA. Section 5 gives the numerical results of implementing the proposed method on three benchmark stochastic processes, and finally in Section 6, conclusions are provided and future works are described.

Fokker planck optimal control
A typical stochastic differential equation (SDE) is as follows: Where, X t 2 Ω & R n solves the stochastic differential Equation (1) provided: X 0 is the initial condition, W t 2 R r is the Weiner process, r is the dimension of Weiner process, bðX t ; tÞ : R n Â ½t 0 ; T ! R n is a vector valued drift function and σðX t ; tÞ : R n Â ½t 0 ; T ! R nÂr is a vector valued diffusion function. u t 2 R m is the control function, which is considered to be the function of time. The first Integral in (3) is Lebesgue integral and the second one is Ito Integral. Here we consider 1-D SDE where n ¼ m ¼ r ¼ 1.
Remark 1 Equation (4) is a parabolic type PDE with initial and boundary condition. According to the probability axioms, the solution of PDE (4) should satisfy: The existence and uniqueness of the Fokker Planck initial boundary value problem are addressed in (Fleig et al., 2014).
Definition 2.2 (The FPE optimal control problem): The Fokker Planck equation optimal control problem is a dynamic optimization problem with PDE constraint, it could be formulated as: Subject to: where ΔðtÞ is a measure of similarity between pðx; tÞ, PDF of state, and P ref ðx; tÞ, the desired PDF. α; β 2 R>0 are the performance trade-off constants. u a ; u b 2 R are upper and lower bounds on the admissible set of control U ad . (5a) shows that the Problem. 1 is a dynamic optimization problem that should satisfy the FPE (4) as its constraint.
Remark 2: The FPE equation is a second-order PDE, and for solutions we need boundary condition. Condition (5c) is called absorbing boundary condition (Mohammadi, 2015). For absorbing boundary condition, the boundaries are chosen such that the probability on the boundaries is zero. The condition (5d) is control constraint condition. Due to physical restrictions in practical systems, the control signals cannot be arbitrary large, so imposing control constraint is necessary for most of OCP.
The existence of the solution of such OCP with PDE constraints is discussed in (Hinze, Pinnau, Ulbrich, & Ulbrich, 2008).

PS fundamentals
PS approximation was initially developed to solve PDEs. In this method, continuous functions are approximated in a finite set of nodes called collocation nodes. These nodes are selected in a way that guarantees the high accuracy of approximation. The main exclusivity of PS method is in the smart selection of collocation nodes. Assume that a real value function f ðtÞ is going to be approximated in ½À1; 1 with some interpolants. The simplest but worst choices of nodes are equal distance nodes. But if a special set of nodes such as roots of derivative of Legendre polynomial known as LGL is chosen, the higher accuracy with fewer nodes can be achieved. The following inequality shows the impressive convergence speed of PS method.
Where I N f ðtÞ is the polynomial interpolation of f ðtÞ. C is a constant. N is the number of collocation points and m is the order of differentiability of f ðtÞ. If f ðtÞ 2 C 1 ðm ¼ 1Þ the polynomial converges at a spectral rate.
There are two types of numerical methods to solve an OCP, direct and indirect methods. Indirect methods use the calculus of variation such as Pontryagin minimum principle. These methods are usually hard to solve numerically. Direct methods are known as discretize-thenoptimize methods. They are not based on the calculus of variation. In direct methods, the state and control function are approximated by function approximation such as polynomial approximation.
When an OCP is solved by direct methods, it is highly desirable to use few number of discretization point. A typical OCP consists of three mathematical equations: (1) Integration in the cost function; (2) Differential equation governing the system state evolution; and (3) Algebraic equations appear as the constraints. PS method is suitable for approximation of all three objects. The OCP studied in this paper is different from a typical PS OCP because instead of an ODE constraint, the PDE constraint is utilized which is the FPE.
There are a lot of theoretical proofs regarding the convergence, consistency and optimality of the solution of PS optimal control. It should be noted that these results are true for systems that are described by the ODE. In this paper, no theoretical proof has been proposed for FP PS optimal control but numerical results prove the validity of the method.
In this paper, we chose the Gauss PS method with Lagrange interpolant. Since the OCP is finite time, the LGL nodes are used for discretization. Here the necessary foundations of PS method for optimal control are discussed . We show that in PS framework, the differentiations in dynamic constraint of OCP, as well as integration in the cost function can be converted to algebraic equations.

Domain transformation
For Gauss-PS method when solving a finite-horizon OCP, the computational framework is ½À1; 1, so variables such as time and states, should be scaled as follows:

Interpolation
Given τ ¼ ½τ 0 ; τ 1 ; :::; τ Nt and X ¼ ½X 0 ; X 1 ; :::; X Nx , set of distinct points in ½À1; 1, a standard Lagrange interpolant for control function uðtÞ and xðtÞ can be written as: Where u Nt ðtÞ and x N X ðtÞ are a unique polynomial of degree N t and N X , respectively. L N i ðτÞ is the N-th order Lagrange polynomial that satisfies the Kronecker property L N i ðτ j Þ ¼ δ ij . This implies that uðτ j Þ ¼ u Nt ðτ j Þ and xðX j Þ ¼ x N X ðX j Þ. Also, the interpolation for PDF of state pðx; tÞ can be written as second Lagrange form interpolant as follows: 3.3. Differentiation
The first derivative of f ðxÞ using interpolation can be written as where _ L N i ðτÞ can be expressed as By these two equations, the derivative of f ðxÞ at the LGL nodes τ k 2 ½À1; 1 can be approximated as where in the matrix form it can be rewritten at all LGL node as where D Nþ1 is a ðN þ 1Þ Â ðN þ 1Þ matrix defined by: It is obvious that the approximation of _ f ðxÞ only depends on f ðxÞ . For unscaled nodes i.e. τ k 2 ½a; b the derivative of f ðxÞ at the LGL nodes is as follows: where n is the order of derivation.
For the approximation of partial differential of gðx; yÞwith respect to x we have: At the exact LGL nodes, the derivative can be written as where I Mþ1 is an ðM þ 1Þ Â ðM þ 1Þ identity matrix. The higher order derivative at the LGL nodes can be written as d n dx n gðx; yÞ ¼ ð The approximation of partial differential of gðx; yÞ with respect to y at the LGL nodes can be deduced as And for higher derivative order we have d n dy n gðx; yÞ ¼ Similar operations on tensor have been also used in other works, e.g. (Shang, 2014).

Integration
In PS method, the integration is approximated by Gaussian quadrature rule.
where in Gauss PS method x i ; i ¼ 1; :::; N are LGL nodes and w i ; i ¼ 1; :::; N are predetermined quadrature weights correspond to LGL nodes. For arbitrary domain of integration over ½a; b, the Gauss quadrature rule can be rewritten as

PS discretization of FP optimal control
Using fundamentals that have been discussed in Section 3, all mathematical objects that form an OCP, i.e. Integration, PDE and constraints can be discretized using PS method. By defining the meshes in each direction, π Nt ¼ fτ 0 ; τ 1 ; :::; τ Nt g and π N X ¼ fX 0 ; X 1 ; :::; X N X g, we have a grid of meshes, π NtÂN X , which the values of pðx; tÞ and uðtÞ should be defined in. Problem. 1 convert to Problem. 2 that can be solved by NLP tool.
The summation in Problem. 2 is the discretized counterpart of integral in Problem. 1. In this paper, for similarity of pðx; tÞ to P ref ðx; tÞ the correlation coefficient criteria is adopted as follows: (27a) is the discretized model of FP equation which is a nonlinear algebraic equation.
Problem. 2 is a large-scale nonlinear programming. The number of the unknowns is N ¼ ðN t þ 1ÞðN X þ 1Þ þ ðN t þ 1Þ. In a typical problem, the number of meshes should be chosen as N t > 10 and N X > 90 which makes N t > 10 3 . This large-scale optimization is computationally cumbersome to solve. In order to solve this problem, we intuitively reformulate Problem. 2 into a new algorithm and reduce the number of unknowns to just N ¼ N t þ 1 and uses genetic algorithm to find the optimal solution.

Genetic algorithm for PS-FP optimal control
In this section, we will not discuss the complete procedure of GA algorithm. It is assumed that the reader has basic knowledge about GA. For further information on GA see Whitley (1994).
The key steps of the GA-PS-FP optimal control algorithm are as follows: (1) Determine the Maximum iteration, population, percentage of mutation, crossover and other initial parameters of GA.
Problem. 2 can be reformulated into Problem. 3 as follows: The main advantage of this reformulation of optimization is reducing the number of unknowns to N ¼ N t þ 1. Also no NLP should be implemented to find the solution, instead for fitness evaluation a matrix inversion of size ðN t þ 1ÞðN X þ 1Þ must be solved.

Simulation results
In this section, two stochastic processes are simulated. The proposed algorithm is implemented to find the best control action. Finding the matrix Ψ, which might be the most confusing part of the algorithm, is discussed in detail in each example. The interpolated control function is implemented to the SDEs and the Monte-Carlo runs extract the statistical characteristics of states by directly simulation of SDEs using Euler-Maruyama method (Higham, 2001). The Monte-Carlo simulation of SDEs, as well as distance between desired PDF and the PDF of the states, prove the validity of the proposed method.

Ornstein-Uhlenbleck (OU) process
OU Process models the velocity of a massive Brownian particle under influence of friction. It is defined by where X is the velocity of the particle and U is the momentum due to an external force which is the control action. The simulation parameters are determined in Table 1.

Parameters
OU process The typical nonlinear system control function is calculated by (10). Due to the linearity of the OU process, when the initial distribution is Gaussian, the PDF of the states are always Gaussian. So in order to test the optimal control function, it is implemented on OU and by Monte-Carlo simulations of the SDE in each time step, mean and variance at each time step is calculated. Euler-Maruyama Method is used to solve SDE in each Monte-Carlo run. Figure 2 compares the mean and variance of the states, obtained by MC simulations and FP equation.
LGL points in Figure 2 represent the discretization points in time. Lagrange interpolation of this points is also demonstrated. It can be seen that the mean and variance obtained by FPE is consistent with MC simulation results. Figure 3 demonstrates the Barycentric Lagrange interpolation (Berrut & Trefethen, 2004) of pðx; tÞ based on LGL nodes, it has been calculated at 120 Â 120 data points.

Typical nonlinear system with two stable equilibrium point
System (32) was a linear system, here we apply our method to a nonlinear system described as follows: This system has 3 equilibrium points, one saddle point on X t ¼ 0 and two stable points on As it is shown in Figure 7, the PDF of the states of autonomous system (35) with U ¼ 0, evolves in a bimodal Gaussian shape due to the existence of two stable Equilibrium points. The simulation parameter is summarized in Table 1.  The corresponding FPE of (35) is @ t pðx; tÞ þ ðr À 3X t 2 Þpðx; tÞ þ ðrX t À X t 3 þ UðtÞÞ@ x pðx; tÞ À 1 2 σ 2 ½2pðx; tÞ þ 4X t @ x pðx; tÞ þ X 2 t @ xx pðx; tÞ ¼ 0 Appendix B shows how to transform (36) to tensor product form. Figures 4 and 5 show that for nonlinear system, the proposed method succeeded to find optimal control functions and the FPE solution is consistent with MC simulations. Figure 6 demonstrates the transient PDF of the typical nonlinear system.

Conclusion
In this paper, a PS optimal control framework for the stochastic processes is presented. The stochastic optimal control is transformed to OCP by implementing FPE, then discretized in few collocation points by Gauss Legendre spectral discretization Method. In spite of utilizing spectral discretization, the deterministic OCP becomes a large-scaled optimization due to the discretization  of FPE which is a PDE. A new technique is represented to solved this large-scaled problem in an efficient way using GA. Two benchmark stochastic processes were simulated and the results prove the validity of the proposed method. The advantages of the proposed method are high accuracy and few computational complexity in comparison with other OCP numerical method for stochastic processes. It is also implementable to nonlinear stochastic processes. The next steps in future researches are as follows: • Proposing a theoretical framework to prove the optimality and uniqueness of the solution of this method.
• Utilizing other PDE numerical techniques such as separation of dimensional domain (Lynch, Rice, & Thomas, 1964) to solve more than 1-D stochastic process in a reasonable speed.  . Typical nonlinear system with two stable equilibrium points with initial condition pðx; tÞ ¼ Nð0; 1Þ and r ¼ 1; .