Data-driven Reference Trajectory Optimization for Precision Motion Systems

We propose a data-driven optimization-based pre-compensation method to improve the contour tracking performance of precision motion stages by modifying the reference trajectory and without modifying any built-in low-level controllers. The position of the precision motion stage is predicted with data-driven models, a linear low-fidelity model is used to optimize traversal time, by changing the path velocity and acceleration profiles then a non-linear high-fidelity model is used to refine the previously found time-optimal solution. We experimentally demonstrate that the proposed method is capable of simultaneously improving the productivity and accuracy of a high precision motion stage. Given the data-based nature of the models, the proposed method can easily be adapted to a wide family of precision motion systems.


I. INTRODUCTION
A precision motion stage (PMS) is a robotic setup designed to precisely position an end-effector, typically in the microto nanometer range. PMS are used in lithography [1], microassembly [2], precision metrology [3], and precision engineering [4], [5], and many other applications [6], [7]. They are often used as components in machine tools for the production of parts, such as metal laser cutting or grinding processes where the goal is to trace a desired target geometry as quickly and accurately as possible.
The accuracy of PMS systems is critical as the parts they produce must satisfy tight engineering tolerances. At the same time, economic considerations dictate rapid production of parts to boost productivity. Unfortunately, the inertia of PMS systems dictates a trade-off between productivity and accuracy and, in practice, the control strategy is often the determining factor in this trade-off. Typical industrial precision systems achieve high-precision using well-tuned "classical" feedback controllers, such as Proportional-Integral-Derivative (PID), to track a given target trajectory. These low-level controllers are designed by the manufacturer and users are typically not able to modify/tune the control algorithm.
In this paper, we propose an optimization-based reference (input trajectory) design methodology for simultaneously improving the precision and productivity of PMS by leveraging the widespread availability of system data. Our methodology is guided by the following design principles: The authors are with the ETH Zürich Automatic Control Laboratory, Physikstrasse 3, 8092 Zürich, Switzerland. E-mail: {sbalula, dliaomc, ralisa, jlygeros}@ethz.ch. S. Balula and A. Rupenyan are also with Inspire AG. This project has been funded by the Swiss Innovation Agency (Innosuisse, Grant Number 46716) and the Swiss National Science Foundation through NCCR Automation, a National Centre of Competence in Research (Grant Number 180545). D1 does not require modifications to the existing hardware and low-level controllers, making it suitable for application in production machines; D2 does not require a physics-based model of the machine, relying exclusively on experimental data; D3 can be applied to individual machines with diverse technical characteristics; D4 can produce good results for previously unseen parts at the first attempt.
These design principles are motivated by the requirements of "Industry 4.0" where machines are no longer standalone setups and are instead connected in a network, making data more easily available [8]. This data can be exploited to improve performance; for example, it can be leveraged to create data-driven models, avoiding the labour intensive process of developing and maintaining first principles based ones [9]. Future manufacturing is also expected to be more distributed and personalized (e.g., "Manufacturing as a service") [10], [11], leading to smaller volumes for any given part, possibly produced across a pool of non-homogeneous machines. This makes it more important to exploit data to ensure that machines can execute designs with minimal calibration.
3) Pre-compensation: Pre-compensation refers to a class of methods that aim to modify the input trajectory offline to reduce the contouring error for arbitrary parts [23]. A common approach is to drive a model of the contouring error to zero by manipulating the input trajectory using a classical controller (e.g., a PID) [24]. These methods cannot include look-ahead or constraints, motivating the use of offline MPCC algorithms based on physics-based nonlinear [25], identified linear models [26] or combined with a reference governor [27]. However, these generate suboptimal input trajectories since the optimization uses a receding horizon, instead of optimizing the trajectory as a whole.
In [28] the contour error is predicted with an Artificial Neural Network (ANN), and the pre-compensation computed with reinforcement learning. This approach does not allow constraints to be included and uses only zeroth-order information about the ANN model, despite the ready availability of ANN derivatives.

B. Contributions
In this paper, we propose a novel pre-compensation method based on data-driven modelling and trajectory optimization to improve simultaneously the productivity and performance of PMS. Our contributions are three-fold: (i) A data-driven framework for PMS modelling and identification for control, including design of experiments for collecting training data, and machine learning architecture in a feed-forward form. (ii) A novel contour control pre-compensation method featuring time-optimal trajectory design and motion constraints. (iii) Experimentally validated improvement over the existing controller across the precision/productivity trade-off curve. Our approach does not require any modification of the design or control software of the system (D1) and is fully datadriven (D2). Moreover, it enables adaptation to changing conditions or system degradation through retraining the underlying data-driven model (D3). Finally, the approach is independent of the target geometry and significantly shifts the precision/productivity trade-off curve 1 (D4). We build upon our previous work [29] based on linear system identification.
Here we improve and extend this approach by incorporating a nonlinear ANN-based model of system performance, which offers two orders of magnitude higher precision, a refined optimization approach, and experimental validation of the simulation results.

C. Organization
In section II we precisely define the problem tackled, while in section III we describe the modelling structure, the model fitting strategy, and the accuracy achieved. In section IV we detail the input trajectory design strategy proposed, specifying the optimization problems. In section V we show experimental validation for the method with a set of test cases. x axis y axis Fig. 1: An example target geometry, featuring a segment of an Archimedean spiral, described in polar coordinates by r = b · θ with b = 50 2π mm rad −1 and θ ∈ [0, 4π]. The two plots show alternative representations of the same shape. In the left panel it is shown in a (x(s), y(s)) plot, parametric in the path progress s, while in the right panel each axis is a function of s.

A. Target geometry tracking problem
Consider a tooltip, e.g., the laser in a laser cutting machine, which traverses a two-dimensional (2D) workspace W ⊂ R 2 . The position of the tooltip as a function of time is denoted by The ultimate goal in precision motion planning is to have the tooltip trace a spatial target geometry where S is the path length. The target geometry Ξ is assumed to be a continuous function parameterized by the path progress variable s. An example is shown in Figure 1. Typical design specifications require the target geometry to be traced with a precision of tens of micrometers; for example, in industrial laser cutting systems a typical precision specification is 20 µm Most industrial precision motion systems come with a builtin low-level controller designed by the manufacturer to track an input trajectory ρ : [0, T ] → W, where T > 0 is the time necessary for the tooltip to complete tracing the target geometry. When ρ is given as an input to the lowlevel controller, its actions and the dynamics of the machine give rise to an output trajectory γ : [0, T ] → W; note that for simplicity we assume that both the input and the output trajectory lie in the workspace W. We denote the mapping between the commanded input trajectory ρ and the resulting output trajectory γ by where the function F : W [0,T ] → W [0,T ] encapsulates the closed-loop dynamics of the machine, and v is a noise term, with zero-mean and a typical standard deviation of 2 µm, which quantifies the repeatibility of the system. Ideally, the low-level controller is well designed so that γ ≈ ρ and F is approximately the identity function. Fig. 2: A low-level controller tracks the input trajectory ρ, producing the output trajectory γ. The deviation d(s, γ) measures the distance from γ to the target geometry Ξ as a function of the path progress s. We desire that the deviation is smaller than the tolerance µ(s) at all points.
We are interested in the following inverse problem: Given the target geometry Ξ, how do we generate an input trajectory ρ for the machine, so that the resulting output γ = F (ρ) matches Ξ as closely as possible. In addition to the usual difficulties associated with inverse problems, for precision motion systems the function F is often poorly known, partly due to lack of information about the low-level controller.
In standard practice, input trajectories are generated by traversing the path at a constant speed v > 0, i.e. setting ρ(t) = Ξ(vt). However, due to sensor noise, actuator limits, the inertia of the machine, and other factors we observe experimentally a significant error between γ and ρ, especially when v is high. This induces a natural trade-off between speed and precision that is characteristic of precision motion systems. Our aim is to improve this trade-off by learning the function F through preliminary experiments, then inverting it using optimization.
Our goal is then to solve the optimization problem where T (γ) is the duration of the output trajectory, λ > 0 governs the trade-off between speed and accuracy, and µ(s) : [0, S] → R is a tolerance bound around the target geometry; for the typical industrial laser cutting systems mentioned above, one would use µ(s) = 20 µm, ∀s ∈ [0, S]. The different quantities are illustrated in Figure 2. The optimization problem (5) is stated in a trajectory space and is not computationally tractable as written. In the following sections we present and experimentally validate a practical methodology for approximately solving (5) using only data gathered from the process. Because the methodology is data-driven and noninvasive (in the sense that it does not require any knowledge of the underlying controllers but only the ability to run experiments on the machine) it is applicable to a wide range of practical industrial applications.

B. Experimental Setup
While our proposed methodology is applicable for general PMS, we use a specific experimental setup as a running example throughout the paper and to experimentally validate our approach. The experimental setup is shown in Figure 3a, and consists of an ETEL DXL-LM325 two stage motion machine. The machine is instrumented with quadrature encoders that measure the position of both axes and is actuated with ETEL LMS15-100 linear motors.
The system is equipped with a feedback controller that adjusts the motor voltages to track a supplied input trajectory ρ. The controller is implemented on a Speedgoat rapid prototyping unit using MATLAB/Simulink software. For the purposes of this paper we consider this loop as a black-box and consider only the desired input trajectory ρ as an input. The machine enforces limits on the acceleration, velocity, and position of the input trajectory to prevent accidental damage. The main characteristics of the machine are collected in Table I.

C. Test cases
We selected two test cases, a series of letters and an airfoil, as target geometries to validate our proposed methodology. These geometries contain a rich set of features that are representative of industrial applications. To avoid "cherry picking", and to demonstrate generalization to new geometries, neither test case was used to generate training data or during the development of the method.
• The letters test case, as can be seen in Figure 3b, consists of four letters ("u","r","c", and "h") from the ETH Zürich logo. These letters were selected due to the rich set of relevant and challenging features they contain, including sharp corners, straight segments, and curves with various curvatures. The limiting contour of each letter is used as the target geometry. • The airfoil test case, shown in Figure 3c, is based on a high-lift Eppler e344 airfoil [30]. Geometries of this kind are typically laser cut out of e.g., balsa wood, and are used in the primary structural sub-assembly of an aircraft. The main challenging features of the airfoil are a cut-out at the leading edge (L) used to attach a structural spar, and the trailing edge (T) whose geometry has a strong effect on the aerodynamics of the aircraft.

D. Architecture overview
We approach the trajectory design problem in several stages. First, we design experiments to generate informative inputoutput data for the unknown mapping F and use the resulting data to identify two models for the system, a low-fidelity linear one and a high-fidelity one based on an ANN. Throughout, we incorporate machine learning best practices, such as independent training and validation datasets, normalization, and diverse training data to avoid over fitting and to ensure that the resulting models are able to generalize to trajectories outside the training dataset, enabling optimization of new geometries.
Using these models, we propose a two-stage optimizationbased approach for offline optimization of references as schematized in Figure 4. In the first stage, we compute an input trajectory by solving a contouring control problem using the linear model of the system. This first stage solution yields a fast trajectory that respects the problem constraints, by reducing the speed in intricate features of the path, such as sharp corners, and accelerating through long smooth segments. The output of the first-stage is used as the initial condition for the second stage, which employs the high-fidelity neural network model to correct errors from the first stage. The resulting input trajectory is then given to the low-level controller as an input trajectory.

III. DATA DRIVEN MODELLING
As a first approximation, we model the system using a continuous-time linear model which captures the dominant dynamics. The structure of this model allows for efficient computations of a time-optimal solution in the first stage, but lacks the high-precision required by the application. This shortcoming is alleviated by a second high-fidelity neural network model, used to refine the initial solution provided by the lower-fidelity linear model.

A. Low-fidelity linear model
The closed-loop system maps trajectories to trajectories and is of the form γ = F (ρ). We approximate this mapping using   (14) a discrete time ANN-based nonlinear model at a sampling frequency of 400Hz is used to find the best input trajectory r to trackΞ, using the the first stage solution as initial guess. The a max parameter, found in the constraints (12j) and (14f) is singled out since the maximum acceleration of the output trajectory is found to be decisive for how much time it takes to track the target geometry. The optimized input trajectory parameterized in discrete time r is given to the experimental apparatus, and the output trajectory φ recorded by the axis quadrature encoders at a sample rate of 10 kHz. (6b) The model (6) uses a non-physical hidden state z ∈ R 4 ; the matrices A, B, C, and D are identified from experimental data using the MATLAB function n4sid from the system identification toolbox. The dimension of the state was chosen empirically, noting that increasing the model order further has diminishing returns. In particular dimensions higher than 4 have practically the same prediction error, but with added complexity. The linear model captures the dominant system dynamics (model validation results are presented in Section III-D), allowing us to formulate and solve a tractable optimization problem that effectively trades off the total trajectory time and its accuracy, as described in Section IV-A. However, it fails to capture some of the nonlinearities inherent in the system and is not sufficiently accurate to achieve the precision required by the application. On the other hand, the computational burden of using only the more complex nonlinear model (described in Section III-B) without the linear model would limit the method to unrealistic small problem instances.

B. High-fidelity Nonlinear Model
To obtain a higher precision model, we use an ANN to capture the nonlinearities of the system dynamics. ANN are much more expressive than linear models, but require more informative training data to avoid over fitting; the latter is not a drawback in our application, where a large amount of data can be collected at low cost.
We also investigated other alternatives, such as Gaussian Processes, but found that the ANN architecture below offered a more favourable trade-off between accuracy and computation effort. They are well suited for our application, where a large amount of data can be collected at low cost.
The 2D-positioning stage is modelled using two independent causal ANN, one for each axis. Unlike (III-A) the nonlinear model operates in discrete time, at a sample frequency of 400 Hz and it has the following form Where φ is the output, f nlm is the nonlinear function defined by the ANN, r is the input trajectory given to the closed loop system, and h is the length of the input history considered for the prediction. Note that we have introduced the variables r in place of ρ, and φ in place of γ to make explicit that ρ and γ are parameterized in continuous time, while r and φ are defined in discrete time. The two ANN are structurally identical, but are trained separately, leading to different weights. Each takes as input the most recent 500ms of the input trajectory for the corresponding axis, subsampled at 400Hz, leading to 200 inputs. Each network has a single output, namely the predicted position at the subsequent time step. The networks have 8 fully connected hidden layers of LeakyReLU [31] activation functions with a slope of 0.01, and a triangular structure as illustrated in Figure 5.  ANN for each axis we are implicitly ignoring coupling effects between the two axes; this choice is supported by experimental data that suggests coupling effects are negligible, as will be demonstrated in Section III-D. Finally, note that the nonlinear and linear models are independent, in particular the predictions of the linear model are not used in the nonlinear model evaluation.

C. Training data generation
Choosing input trajectories that cover a wide range of operational regimes is essential for generating informative data to train models that can generalize to previously unseen trajectories. Classical system identification methods make use of a toolbox of excitation signals such as Pseudo Random Binary Signal, sinusoidal, or chirp signals. These inputs have the property of being persistently exciting for linear systems. Unfortunately none of these proved adequate for the identification of the nonlinear system in this paper.
1) Linear Model: For the identification of the linear model we opted for randomly generated trajectories, yielding 6 × 10 6 points in total. By randomly changing the velocity, acceleration, and jerk of the input trajectory we made sure that the trajectories represent features seen in real parts. This included, for example, curves of different radii, traversed at different speeds and appearing in different locations within the workspace, as well as sharp transitions in acceleration and jerk. The trajectories were not designed for a specific part, but for a particular set of constraints on velocity and acceleration.
2) Nonlinear Model: Having sufficient high-quality data is necessary to ensure that the ANN is accurate and generalizes well. We started by training a first generation ANN with the random trajectories used to fit the linear model, and a set of varied regular geometries including circles, polygons, and spirals. Each shape was traversed at varied speeds, constant for each trial. To enrich the data set further, the trained first generation ANN was used in the optimization methodology of Section IV applied to the same regular geometries. This resulted in optimized input trajectories and, after applying these to the system, additional experimental data. This data was used to train the ANN further obtaining a second generation ANN. This training-optimisation-retraining cycle can be executed recursively until the model prediction accuracy is acceptable. These data sets lead to a targeted reduction in over fitting in areas that are favoured by the optimization. Overall, our strategy is to include sufficient random (exploratory) and structured (exploitative) data to ensure the ANN is both accurate and generalizes well in the areas of interest. Our experiments suggest that there is no added benefit in refitting the linear model, since its small number of parameters is already identified well with the initial dataset.
A representative training data sample is shown in Figure 6, and the flow of data is summarized in Figure 7. In section III-D we present the prediction quality of a third generation ANN to generate the results of Section V. The data was divided 80%/20% for training and testing purposes. This results in a ratio of ≈ 34 data points per parameter of the ANN. About 60% of the data comes from the randomly generated trajectories, and the remaining from regular shapes both before and after optimization. The experimental data is divided in segments of 500 ms, subsampled to achieve an effective sample rate of 400Hz, and scaled. For the loss function we use the mean square of the prediction error. We use batches of 16k segments which take maximum advantage of the GPU computation power. Adam [32] was used for the optimizer, with decaying learning rate. Modelling and training were done in PyTorch [33] with NVIDIA CUDA support.

D. Model Validation
The quality of the model predictions is evaluated over different datasets and acceleration limits. The results are Fig. 7: Flow of data used to train the linear and nonlinear models. First experimental data is collected for randomly generated trajectories and regular shapes. This dataset is used to fit and train the linear and a first generation nonlinear models. The models are used to compute optimized input trajectories for the set of regular shapes using the method of Section IV the additional data is used to extend the training the nonlinear model. This is done because the optimizer has a tendency to exploit regions of the nonlinear model that are not well fitted to reality.  summarized in Table II and Figure 8. The linear model training data set includes trajectories with a range of accelerations up to 5 m s −2 . However, due to its simple structure, the linear model cannot overfit the data, hence cannot accurately predict all points in its training data set. We observed that it performs better when the acceleration is restricted to a lower value, even for trajectories not seen during the fit, such as the letters test case.
Regarding the nonlinear model, the prediction accuracy is best for data with the same characteristics as the one used for the training, as can be seen in Figure 8a where the error is evaluated for the validation dataset. The empirical error distribution has zero mean. When presented with data coming from shapes which have not been used for the training of the model (Figures 8b and 8c), the prediction accuracy degrades. However, the accuracy is still adequate for the application, as the standard deviations remain below 20 µm for each axis for accelerations below 3 m s −2 as shown in Figure 8c. Figure 9 shows how the standard deviation of the prediction evolves with the maximum acceleration a max that was set as a constraint in the optimisation of Section IV. For acceleration values where there is sufficient training data, the model prediction error increases gradually with the acceleration. If we try to use the model outside of the acceleration values it was trained for, the predictions rapidly deteriorate.

IV. TRAJECTORY OPTIMIZATION
With the the data-driven models in hand, we now discuss the two-stage optimisation architecture used to compute input trajectories. The decision to divide the optimization problem in two stages is motivated by the need to include the duration T of the trajectory in a computationally tractable manner. This is accomplished by using a variable time-discretization in the first stage, enabling inclusion of T as a decision variable. The second stage then fixes the duration T and refines the accuracy of the first-stage solution using the high-fidelity neural network model. Our experiments suggest that treating the time discretization directly with the nonlinear model is intractable. A key difficulty is that real machines typically generate noisy output measurements in a sampled, time series format. These cannot be readily used to train the high-fidelity continuous time model needed for a computationally tractable variable-time discretization, due the challenges associated with numerical differentiation of noisy data. Trying to solve the inverse problem in one shot using the resulting model leads to low quality solutions; often the solver fails to even return a feasible solution in a reasonable amount of time. On the other hand, skipping the second stage and inverting using only the linear model is computationally tractable, but typically leads to low quality solutions; this observation is supported by the model prediction errors shown in Table II. Training a fixed sampling time nonlinear model and seeding with the solution of the first stage optimization substantially reduces the overall computation time and improves the quality of the solutions.
A key advantage of our methodology is that, unlike iterative learning control, once the models are trained and we are given a target geometry, the input trajectory is computed offline, without the need for additional experiments. This is especially important for small batch manufacturing where it is crucial to minimize failed parts. However we can still use the information collected from experimental runs to improve the model, as shown schematically in Figure 7.

A. First-stage Optimization Formulation
The purpose of the first stage is to fix the duration of the trajectory, T , by trading off speed and precision. We employ a contouring control approach with a fixed spatial discretization. This involves sampling the target geometry at N equally spaced points in space {Ξ k . = Ξ(s k ) : k ∈ {0, 1, . . . , N − 1}}, where We use the identified linear model (6) to approximate the process dynamics. As our approach involves a fixed spatial discretization ∆s, the time-discretization must be variable. Starting with t 0 = 0, we denote by t k as the time at which we would like the trajectory to reach the point Ξ k and consider the non-uniform time steps ∆τ k = t k+1 − t k , k = {0, . . . , N − 1} as decision variables; note that the time T = t N = N −1 k=0 ∆τ k at which the trajectory is completed is implicitly also a decision variable. We then obtain a discrete time model by applying the 4th order Runge-Kutta (RK4) [34] time-marching method to (6) z Where f lm is the RK4 function, and ∆τ will be treated as decision variables.
To evaluate the contouring error d k we start with the distance ν k = γ k − Ξ k and introduce a local coordinate frame as shown in Figure 10. The linear transformation where α k is the rotation angle of the local frame, encodes a translation and rotation between the local coordinate frame k and the global coordinate frame. With the output can be transformed from local coordinates ν k to global coordinates γ k . Since time discretization is variable, we can always ensure that the x l component of ν k is zero by adding a constraint to the optimization problem. The contouring error is then simply the y l component of ν k . The tolerance bound condition in (5) can then also be computed for the corresponding discretization point µ k = µ(s k ).
The aim of the optimization problem is to minimize the total time needed to complete the target geometry subject to the tolerance bounds, velocity and acceleration limits. To this basic formulation, we introduce an additional term for the input to promote smoothness; this is used to suppress high frequency content on the input trajectory, which is filtered out Fig. 10: For every point Ξ k of the target geometry Ξ(s) a local coordinate frame is placed. x l is tangent to the target geometry at Ξ k and pointing in the direction of increasing s whereas y l is orthogonal to x l pointing to the port side. α k is the angle between the global x axis and x l . The maximum allowed deviation µ k is computed from µ(s). ∆s is the path distance between points Ξ k and Ξ k+1 . by the linear model. This results in the following optimization problem min ∆τ, ρ, γ, z, ν where N is the number of discretization points, selected based on the total path length and tolerance bound, are discrete derivative operators, ν = {ν k } N k=1 ⊆ R 2 is the output trajecotry in local coordinates, γ = {γ k } N k=1 ⊆ R 2 is the output trajectory in global coordinates, ρ = {ρ k } N −1 k=1 ⊆ R 2 is the input trajectory, f lm is the RK4 approximation of the linear model (9), with identified matrices C and D, T k is the transformation matrix between global and local coordinates in (10), µ k is the discretized maximum allowed deviation from the target geometry, W are the limits of the working space of the device, v max is the maximum allowable speed, and a max is the acceleration limit.
The cost (12a) contains two terms. The first penalizes the total time to perform the trajectory while the second penalizes the acceleration of the input, promoting fast trajectories with a smooth input, (12b) and (12c) imposes the RK4 discretization of the linear dynamics (9), (12d) requires the variable time step to be non-negative, (12e) imposes that the output stays within the working space of the device, (12f) the transformation between local and global coordinates in (11), (12g) requires that the x l k component of ν k is zero, in which case the y l k component is the deviation, (12h) bounds the y l k component of ν k to be less than the maximum allowed deviation, (12i) and (12j) impose component-wise limits on velocity and acceleration of the output.

B. Second-stage Optimization Formulation
Experiments applying the trajectory ρ generated by (12) directly to the machine suggest that this generally results in large deviations from the target geometry, which we attribute to the low fidelity of the linear model; the data is omitted in the interest of space. To improve performance, we refine the trajectory generated by the first stage using the high-fidelity ANN model (7). In principle, it is possible to skip the first stage and directly embed the ANN model in an optimization problem similar to (12). However numerical experiments suggest that this often leads to poorer performance even compared to the linear model, as the solver tends to converge to poor local minima, or fails to return a feasible solution. Here we take advantage of the output of (12) to fix the time discretization and further improve precision through a second optimization problem based on the nonlinear model. In practice this leads to a more reliable and scalable overall method.
Given the structure of the nonlinear model, which is trained using time series data, we sample the target geometry at M equally spaced points in timē where ∆t is the fixed sample rate of the time series data used to train the nonlinear model. Given this fixed discretization, the target geometry of the second-stage optimization problem is simply to minimize the deviation from the target trajectory while satisfying the tolerance, speed, and acceleration bounds. This leads to the optimization problem where M is the number of discretization points, φ = is the sampled target geometry, f nlm is the nonlinear ANN model (7), µ i is the tolerance bound evaluated for each discretization point, W are the limits of the working space of the device, v max is the maximum allowable speed, and a max is the acceleration limit.
The cost (14a) penalizes deviations of the output from the target geometry, in this case, and contrary to the first stage, we do not include a smoothing term. Trajectories that correct the error, even with aggressive input trajectories (as long as they comply with the constraints) are acceptable. The constraint (14b) imposes the nonlinear system dynamics modelled by the ANN, (14c) constrains the output to be within the tolerance bound for each discretization point, While (14d), (14e) and (14f) ensure that the output, its velocity and acceleration stay within working space and operational limits of the device.
In practice, to ensure feasibility, the tolerance constraint is replaced with an exact L 1 penalty which is implemented using slack variables [35]. Note that the a max limits the maximum acceleration of the output trajectory and it is a decisive parameter determining the total trajectory time. It is the main tuning parameter of the method along the accuracy/trajectory time trade-off curve.

C. Implementation Details
The optimization problems defined in (12) and (14) are both nonlinear problems, which were solved using IPOPT [36], with the JuMP [37], and Casadi [38] interfaces in Julia and Python respectively. The first stage optimization problem (12) is initialized by taking as ρ the target geometry traversed at a constant speed. The solution of (12) is then used to initialize the second-stage (14).
The computer used throughout this paper runs Arch Linux, with Linux kernel version 5.15, and it is equipped with a GeForce RTX 2080 Ti GPU with 12 GB of dedicated memory, an Intel(R) Core(TM) i9-9900K CPU @ 3.60 GHz and 64 GB of RAM.
For the problem sizes solved in this paper, the firststage optimization takes approximately 1 minute to complete (N ≈ 10 3 ), and the second-stage takes approximately between 1 to 4 hours depending on the trajectory time (M ∈ {200, . . . , 1000}). Lower a max results in slower trajectories which take more time to compute due to the increased number of function evaluations required. Training of the ANN model takes approximately 24 h per axis.

V. EXPERIMENTAL RESULTS
In this section, we apply our proposed data-driven optimization methodology from Section IV to the experimental apparatus in II-B, using the test cases in II-C. Our experiments demonstrate that the methodology is capable of improving system performance relative to the baseline -the trajectory obtained with the low-level controller following a not optimized input trajectory -by tracing desired geometries faster and more precisely.

A. Individual trajectories
We first focus on the letter "r" as a case study and consider a scenario with an acceleration limit of 1 m s −2 ; the individual  Fig. 11: Letter "r" of the ETH Zürich Logo. Optimized with a max = 1 m s −2 , including a detail view of station B. Note how the optimized input trajectory significantly deviates from the target geometry near the corner in B to compensate for the dynamics of the machine. This leads to significantly smaller error between the output and the target geometry compared to using the state-of-practice (non-optimized) input trajectory. results for the other letters in the test case are comparable and will be discussed collectively in Section V-B. Applying our method results in an optimized input trajectory with a total time of T = 0.807 s. The experimental result of this new input trajectory is compared with a baseline performed in the same total time. For this the original, non-optimized shape (in this case the letter "r") is sampled at a constant progress speed where S is the total path length, which for a closed shape corresponds to the perimeter. Figure 11 compares the output and input trajectories for the optimized and baseline cases in the x − y plane while Figure 12 displays the same data as a function of progress. Analyzing Figures 11 and 12, we see that the optimized output effectively stays within the selected µ = 20 µm while the baseline output deviates by more than 100µm. The optimized output trajectory speeds up in areas with low curvature and slows down near corners or other intricate features. While the baseline input trajectory path speed is constant, the speed of the projection of the output onto the target geometry shows peaks around the corners. This is an artifact of the projection,  Figure 11. Top panel shows the deviation from the target geometry for the optimized and non-optimized output, as well as the optimized input trajectory. The non-optimized input trajectory by construction does not deviate from the target geometry. Bottom panel shows the velocity of the optimized and non-optimized outputs projected onto the target geometry.
given the way the non-optimized output cuts through corners, as shown in the detail of Figure 11. Figure 12 also illustrates that the optimized input trajectory deviates aggressively from the target geometry near areas where the baseline output performs poorly. This can be interpreted as an attempt to "cancel out" the error; The detail in Figure 11 illustrates this behaviour in the x − y plane. Figures 13-15 display the same information for the airfoil test geometry with a maximum acceleration of 2 m s −2 . The baseline and optimized trajectories complete the part in 1.081 s. Similarly to the letter "r" example, the optimized trajectory slows down near intricate features, in this case the leading and trailing edges of the airfoil, and accelerates between them. Figures 14 and 15 show that the optimized input again seeks to "compensate" for deviations in the baseline trajectory.
To summarize, we observe that the optimizer exploits two main mechanisms to improve performance of the machine relative to the baseline system with the same trajectory time: 1) It reduces the speed of the input trajectory near intricate geometric features and increases it in areas with low curvature. 2) It "compensates" for errors in the baseline trajectory by moving the optimized input trajectory in an opposing direction.  The optimized trajectory is able to track the desired geometry more precisely than the baseline, though still not within the 20 µm tolerance band. The optimized input trajectory is not constrained by the tolerance band.
In effect, the optimizer exploits information encapsulated in the data-driven models to push the limits of system performance and adapt the input trajectory to the machine capabilities, for the selected maximum acceleration value.

B. Precision-accuracy Trade-off
In precision motion systems there is a fundamental trade-off between speed and accuracy caused primarily by the inertia of the machine. In this section, we demonstrate that our proposed methodology is able to improve system performance by shifting this trade-off.
In our formulation (12)- (14), the trade-off between speed and accuracy is controlled by the parameter a max which limits the maximum acceleration of the input trajectory. We  The letters L and T refer to the leading and trailing edges labelled in Figure 13. The interpretation of the panels in analogous to Figure 12.
input trajectories are subsequently run with the same total time as the one found for the optimized input trajectories.
We use the normalized L 1 and L 2 (rms) norms as well as the L ∞ norm to quantify the deviation between the output φ and target geometry Ξ, both in simulation and experimentally The results are shown in Figures 16 and 17. In all cases the optimized experimental output results in significantly more precise trajectories for a given part completion time. Moreover, the simulated deviation is lower than the experimental one due to model mismatch. The non-optimized experimental output shows lower deviation than the simulation predicts in most cases. Model predictions are distributed approximately symmetrically around the experimental output as seen in Figure  8. However the error is measured between a) the prediction and the target geometry for the simulated case and b) the experimental output and target geometry for the experimental case. For a) the prediction error can be seen as the sum of the expected error, in this case measured experimentally and the model error, while in b) is only the experimental error. For acceleration values higher than 5 m s −2 , as seen in the left side  Fig. 16: Precision-speed trade-off curve for the concatenation of the letters "u", "r", "c", and "h" from the ETH Zürich logo test case. The top panel shows the normalised L 2 deviation of the optimized and non-optimized outputs from the target geometry. The bottom panel shows the total time to perform the trajectory and the a max used in optimization. To determine the performance at a given acceleration value, one can use the lower panel to determine the time needed to complete the letters for a given acceleration, then retrieve the deviation for the corresponding time from the top plot.
of Figure 17, the model prediction exhibits high deviations. Since the model used for simulation is the same as the one used for optimization, in this region the designed trajectory has over exploited the model, resulting in an overly optimistic simulated deviation.
As seen in Figure 17, precision begins to degrade rapidly for high accelerations starting at around 5 m s −2 . This occurs because the training data for the high-fidelity model does not contain sufficient data at those acceleration levels, leading to poor model accuracy, as shown in Figure 9. In contrast, the predicted (simulated) precision grows more slowly as it does not account for model mismatch. Figures 16 and 17 also show that for the same deviation values the optimized trajectories take less time to complete. For example, in the letters test case the non-optimized version takes 8.1 s and achieves a deviation of 26 µm, while the optimized version takes only 2.2 s for an identical deviation. In this case the optimized trajectory reduces the time needed to trace the shape in 73%. For the airfoil test case a similar analysis yields a reduction of 57% for a deviation of 47 µm.
In Table III we show the results for 5 different test cases, each at two different a max scenarios. For all cases our method improves system performance in L 1 , L 2 and L ∞ norms.

VI. CONCLUSION
In this paper we proposed a method that improves the precision vs. productivity trade-off of a PMS. We use models built exclusively from experimental data, and only modify the input trajectory provided to the closed loop control system. Experimental data obtained for shapes outside of the training dataset corroborates simulation results and shows that the method can significantly improve system performance, reliably shifting the precision vs. productivity trade-off curve across a wide range of operating conditions. This is accomplished by exploiting variable speeds (possible since the full trajectory is optimized in one go), and error compensation while respecting system operational constraints.
Future work will focus on reducing the computational load of the offline input trajectory optimization and further increasing the ANN complexity to improve the prediction accuracy.
In another direction, the ANN can be combined with an ILC loop. In an ILC setting the number of trials can be reduced by using the ANN gradient information, either independently or with an initial solution provided by the method we proposed here, taking advantage of the strengths of both methods. Samuel Balula received his B.Sc. and M.Sc. in Engineering Physics from Instituto Superior Técnico, University of Lisbon, Portugal in 2014 and 2016 respectively. He is currently a PhD student in the Automatic Control Laboratory at ETH Zürich, Switzerland. His research interest include optimal non-linear control, trajectory planning, data-driven control and optimal experimental design with applications in robotics and manufacturing.