DiffTune: Auto-Tuning through Auto-Differentiation

The performance of robots in high-level tasks depends on the quality of their lower-level controller, which requires fine-tuning. However, the intrinsically nonlinear dynamics and controllers make tuning a challenging task when it is done by hand. In this paper, we present DiffTune, a novel, gradient-based automatic tuning framework. We formulate the controller tuning as a parameter optimization problem. Our method unrolls the dynamical system and controller as a computational graph and updates the controller parameters through gradient-based optimization. The gradient is obtained using sensitivity propagation, which is the only method for gradient computation when tuning for a physical system instead of its simulated counterpart. Furthermore, we use $\mathcal{L}_1$ adaptive control to compensate for the uncertainties (that unavoidably exist in a physical system) such that the gradient is not biased by the unmodelled uncertainties. We validate the DiffTune on a Dubin's car and a quadrotor in challenging simulation environments. In comparison with state-of-the-art auto-tuning methods, DiffTune achieves the best performance in a more efficient manner owing to its effective usage of the first-order information of the system. Experiments on tuning a nonlinear controller for quadrotor show promising results, where DiffTune achieves 3.5x tracking error reduction on an aggressive trajectory in only 10 trials over a 12-dimensional controller parameter space.


I. INTRODUCTION
Robots' execution of complicated tasks builds upon lowlevel controllers that deliver precise and responsive motions. Controller design requires qualitative analysis to ensure stability and then parameter tuning to deliver the designed performance on real systems. Controller tuning is normally done by hand, by either trial-and-error or proven methods for specific controllers (e.g., Ziegler-Nichols method for proportional-integral-derivative (PID) controller tuning [1]). However, hand-tuning often requires experienced personnel and can be inefficient, especially for systems with long loop times or a huge parameter space.
To improve the efficiency and performance, automatic tuning (or auto-tune) methods have been investigated. Such methods integrate system knowledge, expert experience, and software tools to determine the best set of controller parameters, especially for the widely used PID controllers [2]- [4]. Commercial auto-tune products have been available since 1980s [3], [5]. A desirable auto-tune scheme should have the following three qualities: i) stability of the target system; ii) compatibility with real systems' data; and iii) efficiency for online deployment, possibly in real-time. However, how to design the auto-tune scheme, simultaneously having the above three qualities, for general controllers is still a challenge.
Existing auto-tune methods can be categorized into modelbased [6], [7] and model-free [8]- [13]. Both approaches iteratively select the next set of parameters for evaluation that is likely to improve the performance than the previous Fig. 1: Illustration of an unrolled dynamical system as a computational graph. The summed loss is the root node, whereas the parameter θ is the leaf node. trials. Model-based auto-tune methods leverage knowledge of the system model to improve performance, often using the gradient of the performance criterion (e.g., tracking error) and applying gradient descent so that the performance can improve based on the local gradient information [6], [7]. Stability can be ensured through explicitly leveraging knowledge about the system dynamics. However, model-based auto-tune might not work in real environment, where the knowledge about the dynamical model might be imperfect. This issue is especially severe when controller parameters are tuned in simulation and then deployed to a real system.
Model-free auto-tune methods approximate gradient or a surrogate model to improve the performance. Representative approaches include Markov chain Monte Carlo [12], Gaussian process (GP) [8]- [11], deep neural network (DNN) [13], etc. Such approaches often make no assumptions on the model and have the advantage in real-data compatibility owing to their data-driven nature. However, some model-free approaches, such as GPs [14], are inefficient when tuning in high-dimensional parameter spaces. Besides, it is hard to establish stability guarantees with data-driven methods, where empirical methods are often applied.
To overcome the challenges in the auto-tune scheme, we present DiffTune: an auto-tune method based on autodifferentiation. Our method is inspired by the "end-to-end" idea from the learning community. Specifically, in the proposed scheme, the gradient of the loss function (evaluating the performance of the controller) with respect to the controller parameters can be directly obtained and then applied to gradient-descent to improve the performance. DiffTune is generally applicable to tune all the controller parameters as long as the system dynamics and controller are differentiable (we will define "differentiable" in Section III), which is the case with most of the systems. For example, algebraically computed controllers, e.g., with the structure of gain-timeserror (PID [7]), are differentiable. Moreover, following the seminal work [15] that differentiates the argmin operator using the Implicit Function Theorem, one can see that controllers relying on solutions of an optimization problem to generate control actions (e.g., model predictive control (MPC) [16], [17], optimal control [18], [19], control barrier function [20]- [23], linear-quadratic regulator (LQR) [24]) are also differentiable.
We build DiffTune by treating the unrolled dynamical system as a computational graph and then apply autodifferentiation to compute the gradient. Unlike the commonly used back-propagation scheme, we present an equivalent way of computation, called sensitivity propagation, which propagates the gradient in the forward direction in parallel to the dynamics' propagation. The sensitivity propagation is based on the sensitivity equation [25] for nonlinear dynamical system. The unique aspect of the sensitivity propagation lies in its capability to incorporate real systems' data, in which case the computational graph is broken because the new system states are obtained through sensor measurements or state estimation instead of evaluating those from the dynamics. On the contrary, the broken computational graph forbids the usage of reverse-mode auto-differentiation, making sensitivity propagation the preferred approach.
DiffTune enjoys the earlier mentioned three qualities simultaneously: stability is inherited from the controllers with stability guarantees by design; real-data compatibility is enabled by the sensitivity propagation; and efficiency is provided since the sensitivity propagation runs forward in time and in parallel to the system's evolution.
Our contributions are summarized as follows: i) We propose an auto-tuning method for controller parameters over general dynamical systems and controllers that are differentiable by unrolling into a computational graph and applying reverse-mode auto-differentiation; ii) We develop the sensitivity propagation (analytically equivalent to reversemode auto-differentiation) that can incorporate data from real systems for auto-tuning, enabling the method to be implemented online.
The remainder of the paper is organized as follows: Sections II and III review related work and background, respectively, of this paper. Section IV describes our autotuning method with a comparison between reverse-mode auto-differentiation and sensitivity propagation. We also discuss uncertainty handling when using real system data. Section V shows the simulation results on a Dubins' car and on a quadrotor. Finally, Section VI concludes the paper.

II. RELATED WORK
Our approach closely relates to classical work on automatic parameter tuning and recent learning-based controllers. In this section, we briefly review previous work in these two major directions. Model-based auto-tune leverages model knowledge to infer the parameter choice for performance improvement. In [6], an auto-tune method is proposed for LQR. Gradient of a loss function with respect to the parameterized quadratic matrix coefficients are approximated using Simultaneous Perturbation Stochastic Approximation [26]. In [7], the gradient of a quadratic loss over input control actions and system outputs with respect to PID gains is computed using autodifferentiation tools. Model-free auto-tune relies on zeroth-order approximate gradient or surrogate performance model to decide the new candidate parameters. In [27], the authors use extremum seeking to sinusoidally perturb the PID gains and then estimate the gradient. Gradient-free methods, e.g., Metropolis-Hastings sampling [12], have also been used for tuning. In terms of surrogate model, machine learning tools have been frequently used for their advantages in incorporating data. In [28], an end-to-end, data-driven hyperparameter tuning is applied to an MPC using a surrogate dynamical model. Besides, GP is often used as a non-parametric model that approximates an unknown function from input-output pairs with probabilistic confidence measures. This property makes GP a suitable surrogate model that approximates the performance function with respect to the tuned parameters. In [8], GP is applied to approximate the unknown cost function using noisy evaluations and then induce the probability distribution of the parameters that minimize the loss. In [9], the authors use GP to approximate the cost map over controller parameters while constructing safe sets of parameters to ensure safe exploration. Similar ideas have been applied to gait optimization for bipedal walking, where GP is used to approximate the cost map of parameterized gaits [10], [11]. Besides GP, DNNs [13] have also been used for model-free tuning. Learning for control is a recently trending research direction that strives to combine the advantages of modeldriven control and data-driven learning for safe operation of a robotic system. Exemplary approaches include, but not limited to, the following: reinforcement learning [29], [30], whose goal is to find an optimal policy while gathering data and knowledge of the system dynamics from interactions with the system; imitation learning [31], which aims to mimic the actions taken by a superior controller while making decisions using less information of the system than the superior controller; and iterative learning control [32], [33], which constructs the present control action by exploiting every possibility to incorporate past control and system information, typically for systems working in a repetitive mode. A recent survey [34] provides a thorough review on the safety aspect of learning for control in robotics.

III. BACKGROUND
Consider a discrete-time dynamical system where x k ∈ R n and u k ∈ R m are the state and control, respectively, and the initial state x 0 is known. The control is generated by a feedback controller that tracks a desired statê x k ∈ R n such that where θ ∈ R p denotes the parameters of the controller, e.g., θ ∈ R 2 may represent the P-and D-gain in a PD controller. We assume that the state x k can be measured directly or, if not, an appropriate state estimator is used. Furthermore, we assume the dynamics (1) and controller (2) are differentiable, i.e., the Jacobians ∇ x f , ∇ u f , ∇ x h, and ∇ θ h exist, which widely applies to general systems. The tuning task adjusts θ to minimize an evaluation criterion, denoted by L(·), which is a function of the desired states, actual states, and control actions over a time interval of length N . An illustrative example is the quadratic loss of the deviation of the state to the desired state and control-effort penalty, where L(x 1:N , k=0 λ u k 2 with λ > 0 being the penalty coefficient. We will use the short-hand notation L(θ) for conciseness in the rest of the paper.

IV. METHOD
DiffTune gradually improves the system performance by tuning the controller parameters using gradient descent. We unroll the dynamical system (1) and controller (2) into a computational graph. Figure 1 illustrates the unrolled system, which stacks the iterative procedure of state update via the "dynamics" and control-action generation via the "controller." We then use gradient-based methods to update the parameters θ. Specifically, since the controller is stable for θ within the feasible set Θ, we use the projected gradient descent [35] to update θ (and ensure stability): where P Θ is the projection operator that projects its operand into the set Θ and α is the step size. What remains to be done is to compute the gradient ∇ θ L, for which we provide two methods: backward propagation and sensitivity propagation. Our method is inspired by the backward propagation used in training a neural network (NN): once the structure of the NN and loss function are defined, then the parameters of the NN are updated via gradient descent. Denote the NN parameters by φ and the loss function by l(φ). Backward propagation, also known as the reverse-mode autodifferentiation on a computational graph, is nowadays used to obtain the gradient ∇ φ l. Likewise, the computational graph in Fig. 1 has the controller parameters θ as the leaf node and the loss L(θ) to be the root node, whereas all the intermediate states and control actions are non-leaf nodes. The computational graph has to be propagated forward first, with the value of each non-leaf node stored in the memory. Then the backward propagation uses chain rule and the stored non-leaf nodes in the memory to trace the graph from root to leaves and compute the desired gradient ∇ θ L. Backward propagation can be conveniently implemented using off-theshelf tools like PyTorch [36] or TensorFlow [37]: one will program the forward pass on the computational graph using the dynamics and controller and set the parameters with respect to which the loss function will be differentiated.
However, backward propagation cannot incorporate data from real systems, because all the computation relies on a computation graph. Specifically, the dynamics (1) have to be evaluated each time to obtain a new state, which is not the case in real systems: the states are obtained through sensor measurements or state estimation, instead of evaluating the dynamics. Thus, backward propagation can only be applied to controller tuning in simulations, forbidding the usage of real systems' data. We introduce the sensitivity propagation next to address the issue of real-data compatibility.

A. Sensitivity propagation
Sensitivity propagation is an alternative method to compute ∇ θ L, whose output is analytically equivalent to that of back propagation on a computational graph. To see how sensitivity propagation works, we first break down the derivative ∇ θ L by Since ∂L/∂x k and ∂L/∂u k can be evaluated once L is chosen and x k and u k are known, what remains to be done is to formulate ∂x k /∂θ and ∂u k /∂θ. Given that the system states x k are iteratively defined using the dynamics (1), we can derive an iterative formula for ∂x k /∂θ and ∂u k /∂θ based on the sensitivity equation of the system [25, Chapter 3.3]: The decomposition (4) and sensitivity propagation (5) provide a new perspective to compute the desired gradient ∇ θ L.
On the one hand, the desired gradient ∇ θ L is essentially the weighted sum of the sensitivity ∂x k /∂θ and ∂u k /∂θ, with the weights given by ∂L/∂x k and ∂L/∂u k , respectively, in a window of length N . Consider an earlier example where In this case, the weights are contained in the error vector between the actual and desired state ∂L/∂x k = 2(x k −x k ) and the control action itself ∂L/∂u k = 2u k at each time step. On the other hand, equations (4)-(5) permit computing the desired gradient ∇ θ L using only one forward propagation: no back propagation is required. The sensitivity is propagated along with the dynamics (1) in the forward direction until reaching the end of the window with the intermediate states x k , controls u k , and sensitivities ∂x k /∂θ and ∂u k /∂θ stored in the memory. The desired gradient ∇ θ L is evaluated using decomposition (4) with the stored intermediate values. One will program the dynamics and controller together with the sensitivity equations, with possibly the sensitivity being an augmented state variable itself. In other words, the sensitivity propagation requires all the computations from state update and control action generation to gradient computation to be custom-coded.
Remark 1: The sensitivity propagation and backward propagation are equivalent to the forward-and reversemode auto-differentiation, respectively, on a computational graph. Thus, both can provide analytical gradients (instead of numerical approximations).
A unique aspect of the sensitivity propagation is its realdata compatibility. Using real data for tuning is vital because
the ultimate goal is to improve the performance of the real system instead of the simulated system. Despite the preciseness of the model in simulation, the real system will have discrepancies to the model, leading to sub-optimal performance on a real system if the parameters come from simulation-based tuning. This phenomenon is part of the sim-to-real gap which leads to degraded performance on real systems compared to their simulated counterparts. The sensitivity propagation, unlike the back propagation, can still be applied to compute the desired gradient while using data collected from real systems, which will be explained in detail in the next subsection.

B. Tuning with data from real systems
The core of DiffTune is to obtain ∇ θ L (from real systems' data, using the decomposition (4) and sensitivity propagation (5)) and then apply projected gradient descent. We summarize the DiffTune algorithm in Alg. 1.
However, model uncertainties and noise have to be carefully handled when using data from real systems. Controller design usually uses nominal model of the system, which is uncertainty-and noise-free. However, both uncertainties and noise exist in a real system. If not dealt with, then the uncertainties and noise will contaminate the sensitivity propagation, leading to biased sensitivities and, thus incorrect gradient ∇ θ L, which results in inefficient parameter update. Since noise can be efficiently addressed by filtering or state estimation, our focus will be on handling uncertainties.
Existing methods can be applied to mitigate this issue. For example, the L 1 adaptive control (L 1 AC) is a robust adaptive control architecture that has the advantage of decoupling estimation from control, thereby allowing for arbitrarily fast adaptation subject only to hardware limitations [38]. It can be applied to make a real system with uncertainties behave like a nominal system by compensating for the uncertainties. To see how L 1 AC works, consider a control-affine systeṁ where x, u, u ad , and σ stand for state, baseline control (of the to-be-tuned controller), L 1 AC, and uncertainty, respectively. Here, σ is matched uncertainty and we defer the discussion of unmatched uncertainty to future work due to space limitation. The L 1 AC u ad aims to cancel out the uncertainty σ. Intuitively, L 1 AC compensates for the uncertainties that are lumped from the dynamics as additive terms such that the dynamics of the real system equal to the nominal dynamics plus the additive terms (see [38]- [41] for details of how L 1 AC is implemented). It can be shown that u ad + σ is bounded [42], [43], which renders the uncertain system (6) behaving similar to the nominal systemẋ = f (x, t) + B(x, t)u. Therefore, the sensitivity propagation remains unchanged while L 1 AC handles the uncertainties. We will illustrate how the L 1 AC facilitates the tuning in Section V.

V. SIMULATION RESULTS
In this section, we implement DiffTune for a Dubin's car and a quadrotor in simulations, where the controller in each case is differentiable. For all simulations, we use the sensitivity propagation to compute ∇ θ L, use the forward-Euler method to discretize the dynamics for sensitivity propagation, and use ode45 to obtain the system states by integrating the continuous-time dynamics (mimicking the continuous-time physical process on a real system).

A. Dubin's car
Formulation: Consider the following nonlinear model: where the state contains five scalar variables, (x, y, ψ, v, w), which stand for horizontal position, vertical position, yaw angle, linear speed in the forward direction, and angular speed. The control actions in this model include the force F ∈ R on the forward direction of the vehicle and the moment M ∈ R. The vehicle's mass and moment of inertia are known and denoted by m and J, respectively. The feedback tracking controller with learnable parameter θ = (k p , k v , k ψ , k ω ) is given by where· indicates the desired value, the error terms are defined by e p =p − p, e v =v − v, e ψ =ψ − ψ, and e ω =ω − ω for p and v being the 2-dimensional vector of position and velocity, respectively, q = [cos(ψ) sin(ψ)] being the heading of the vehicle,v = [v cos(ψ)v sin(φ)] andv being the desired linear velocity and acceleration, respectively. The control law (8) is a PD controller with proportional gains (k p , k ψ ) and derivative gains (k v , k ω ). If θ > 0, then this controller is exponentially stable for the tracking errors ( e p , e v , e ψ , e ω ). Simulation setup: The loss function is the squared norm of the position tracking error, summed over a horizon of 10 s. (c) Spiral We choose 0.1 as the step size in the gradient descent algorithm and choose the termination condition as the relative reduction in the total loss between two consecutive steps being smaller than 1e-4 of the current loss value. Generalizability: We first illustrate the generalizability of DiffTune. We select nine trajectories as the training set, where these trajectories have the maximum linear speed and angular speed of 1 m/s and 1 rad/s, respectively, to represent trajectories in one operating region. The four control parameters are all initialized at 2. The tuning proceeds by batch gradient descent on the training set. The controller parameters converge to (k p , k v , k ψ , k ω ) = (18.83, 6.69, 14.97, 2.66). We then test the tuned parameters on four testing trajectories (unseen in the training set) with lemon-, twist-, peanut-, and spiralshape, as shown in Fig. 2. The tuned parameters lead to better tracking performance than the untuned ones. The loss on the testing set are compared to the untuned parameters in Table I. It can be observed that the tuned parameters generalize well and are robust to the previously unseen trajectories. Handling uncertainties: In the second simulation, we implement the L 1 AC to facilitate the system compensating for the uncertainties during tuning. For the L 1 AC, we use the piecewise-constant adaptation law and a 1st-order low-pass filter with 20 rad/s bandwidth. In this simulation, we inject additive force 0.1a 1 sin(t) and moment 0.1a 2 cos(t) to the control channels in the dynamics (7) as uncertainties from the environment. To understand how the tuned parameters alter based on different amplitude of the uncertainties, we set (a 1 , a 2 ) to a 10 × 10 grid such that a 1 and a 2 take integer values from 1 to 10. The four control parameters are all initialized at 10. We tune the controller parameters with both L 1 on and L 1 off, where the sensitivity propagation in both cases is based on the nominal model in (7). Different from the training-test scenario in the previous simulation, we only tune the parameters on one trajectory (the focus is how to reduce the impact of the uncertainties that are not considered in the nominal dynamics). The step size and termination criterion remain the same as before. To clearly understand the individual role by DiffTune and the L 1 AC, we conduct an ablation study. The losses are shown in Fig. 3. It can be observed that both DiffTune and L 1 AC improve the performance: the L 1 AC does so by compensating for the uncertainties, whereas DiffTune does so by driving the parameters to achieve smaller tracking error. Although the two heatmaps with L 1 on show indistinguishable color within each itself, the actual loss values have minor fluctuations. Figure 4 shows the tuned parameters when L 1 is off and on. The influence of uncertainties to tuning is clear in that when L 1 is off, the controller itself needs to counteract the uncertainties by significantly raising the gains, resulting in uncertainty-dependent values. For each case of the uncertainty, the converged gains can achieve case-specific satisfactory performance, leading to undesirable performance if the uncertainties alter away from the one during tuning. Furthermore, the large gains will sacrifice the system's robustness by amplifying the noise into the control channel. Contrarily, when L 1 compensates for the uncertainties, the tuning can be done to the "nominal dynamics," yielding consistent parameter values that are less affected by uncertainties.

B. Quadrotor
Formulation: Consider the following model on SE (3): where p ∈ R 3 and v ∈ R 3 are the position and velocity of the quadrotor, respectively, R ∈ SO(3) is the rotation matrix describing the quadrotor's attitude, Ω ∈ R 3 is the angular velocity, g is the gravitational acceleration, m is the vehicle mass, J ∈ R 3×3 is the moment of inertia (MoI) matrix, f is the collective thrust, and M ∈ R 3 is the moment applied to the vehicle. The wedge operator · × : R 3 → so(3) denotes the mapping to the space of skew-symmetric matrices. The control actions f and M are computed using the geometric controller [44], with the simplification that the desired angular rateΩ = 0. The geometric controller has a 12-dimensional parameter space, which splits into four groups of parameters: k p , k v , k R , k Ω (applying to the tracking errors in position, linear velocity, attitude, and angular velocity, respectively). Each group is a 3-dimensional vector (associated with the x-, y-, and z-component in each's corresponding tracking error). The initial parameters for tuning are set as k p = 16I, k v = 5.6I, k R = 8.81I, and k Ω = 2.54I, for I = [1, 1, 1] . We add zero-mean Gaussian noise to the position, linear velocity, and angular velocity (with standard deviation 0.1 m, 0.1 m/s, 1e-3 rad/s, respectively). Simulation setup: We use the desired trajectory for tuning: p(t) = [2(1 − cos(t)), 2(cos(t) − 1), 0] with a constant desired yaw angle at 0 (the quadrotor's differentially flat dynamics enable the description of the desired state by position and yaw and their derivatives [45]). The loss function is the squared norm of the position tracking error, summed over a horizon of 10 s. We choose 1e-3 as the step size in the gradient descent algorithm. The termination condition is either the relative loss reduction being less than 1e-3 of the current loss value or the current loss increasing by more than 10% of the previous loss. We set the 10% threshold to tolerate loss oscillation by the noisy data and avoid an early termination.
Handling uncertainties: In this simulation, we consider the uncertainty caused by the imprecisely known value J of the MoI. We set the vehicle's true MoI as βJ for β from 0.5 to 4 and use J in the controller design as our best knowledge of the system. The scaled MoI can be treated as an unknown control input gain (see (9b)), leading to decreased (β > 1) or increased (β < 1) moment in reality compared to the commanded moment by the geometric controller. However, the uncertainty caused by the perturbed MoI can be well handled by L 1 AC, which is adopted in the simulation (formulation detailed in [39]). We conduct an ablation study to understand the roles by DiffTune and L 1 AC by comparing the root-mean-square error (RMSE) of position tracking, as shown in Table II. It can be seen that tuning and L 1 can individually reduce the tracking RMSE. The best performance is achieved when tuning and L 1 AC are applied jointly, despite almost indistinguishable inferior performance caused by the noise when β = 0.5 and 1. We show the tuned controller parameters in Fig. 5. The parameters with L 1 in the loop show better consistency than those without L 1 across the perturbation on the MoI. This observation is consistent with that in the Dubin's car simulation (Fig. 4).

VI. CONCLUSION
In this paper, we propose DiffTune: an auto-tune method for controller tuning in systems with differentiable dynamics and controller. Given a performance metric, DiffTune gradually improves the performance using gradient-descent, where the gradient is computed using the sensitivity propagation that is compatible with real-data. We also discuss how to use L 1 AC to mitigate the discrepancy between the nominal model and the real system when the latter suffers from uncertainties. Simulation results on the Dubin's car and the quadrotor show that DiffTune improves the system's performance while L 1 AC compensates for the uncertainties. Generalizability of the tuned parameters is also illustrated using training and test sets in the Dubin's car simulation.