Learning and Extrapolation of Robotic Skills using Task-Parameterized Equation Learner Networks

Imitation learning approaches achieve good generalization within the range of the training data, but tend to generate unpredictable motions when querying outside this range. We present a novel approach to imitation learning with enhanced extrapolation capabilities that exploits the so-called Equation Learner Network (EQLN). Unlike conventional approaches, EQLNs use supervised learning to fit a set of analytical expressions that allows them to extrapolate beyond the range of the training data. We augment the task demonstrations with a set of task-dependent parameters representing spatial properties of each motion and use them to train the EQLN. At run time, the features are used to query the Task-Parameterized Equation Learner Network (TP-EQLN) and generate the corresponding robot trajectory. The set of features encodes kinematic constraints of the task such as desired height or a final point to reach. We validate the results of our approach on manipulation tasks where it is important to preserve the shape of the motion in the extrapolation domain. Our approach is also compared with existing state-of-the-art approaches, in simulation and in real setups. The experimental results show that TP-EQLN can respect the constraints of the trajectory encoded in the feature parameters, even in the extrapolation domain, while preserving the overall shape of the trajectory provided in the demonstrations.


Introduction
Imitation Learning (IL), also known as robot programming by demonstration, is a powerful approach that allows the robot to learn new skills from human guidance [1]. The main advantage of this approach is that the user can instruct a robot with no knowledge about programming. New tasks are learned using only a set of demonstrations that the user gives as input to the learning algorithm. Task demonstrations can be provided, for instance, using kinesthetic teach-ing [2,3] where the user physically guides the robot towards the task execution.
Task demonstrations represent successful executions of a given task in a certain range of operative conditions. However, everyday tasks often demand good generalization capabilities in order to successfully execute the same task in different conditions. As an example, consider the pouring task in Fig. 1. In this case, the user can reasonably demonstrate how to pour into a few cups of different size and, from this data, the robot should infer how to pour into unobserved cups of different sizes. Task generalization can be achieved by enriching the demonstrated motion trajectories with further task-dependent features that characterize the task execution [4,5,6]. Taskdependent features, also called task parameters, are given as further input to regress a feasible motion for the given executive context (e.g., a cup of different size). Taskparameterized approaches for IL, as other data-driven approaches, produce effective motions inside the range of the demonstrations. However, robotic tasks often demand good generalization capabilities outside the range of training data.
In this paper, we present a novel approach to generating task-parameterized motions that uses a special type of neural networks called Equation Learner Network (EQLN) [7,8]. EQLNs combine a set of different basic functions to fit the training data with analytical expressions. This is a key difference with standard learning approaches that use one type of activation function like Gaussian, sigmoid, or ReLu. As a result, the expressions fit by the EQLN have better generalization capabilities and extrapolate well beyond the range of the training data. Following the idea of task parameters, we enrich the demonstrations Figure 1: Pouring task. In this example, each parameter value γ corresponds to a different size of the cup, and ξ(t) corresponds to the parameterized trajectory at time t that the robot has to perform in order to pour the water.
with extra features describing spacial properties of the motion like the size of a box or a certain height to reach. This leads to the TP-EQLN approach proposed in this work.
TP-EQLNs cope with two main problems related to task-parameterized approaches: 1) They achieve good extrapolation performance over the space of task parameters while 2) preserving the overall shape of the demonstrations that defines the task by itself. In several robotic tasks, it is crucial to consider these mentioned problems in order to ensure a successful execution. We compare the performance of TP-EQLNs with existing approaches and further demonstrate the effectiveness of our approach with physics-based simulations and in a real setup on a set of manipulation tasks.
Section 2 reviews related work. Section. 3 presents the proposed approach for motion extrapolation. In Sec. 4, we present an experimental evaluation and a comparison with existing approaches. Section 5 states 2 the conclusions and proposes further extensions.

Related Work
The IL formalism is built around the concept of Movement Primitive (MP) intended as a compact and flexible representation of the observed motion which allows generalization to different scenarios and reduces memory requirements [9]. A variety of approaches exist to represent MPs and each of them has distinctive features and limitations. We provide a review of existing approaches that specifically focus on task generalization.
Stable Dynamical Systems (DSs) are capable of planning converging motions from any point of the state space and have been effectively used as MP [10,11,12,13]. Among the DS-based representations, Dynamic Movement Primitives (DMPs) [14,15] are certainly the most popular and several authors have investigated their generalization capability. A DMP encodes a demonstrated trajectory into a springdamper dynamics with a nonlinear forcing term. The forcing term acts as a disturbance on the linear dynamics and allows the DMP to generate arbitrarily complex trajectories. A set of learnable shape parameters are used to parameterize the forcing term. The shape parameters determine the overall shape of the DMP trajectories and play an important role in the generalization to new situations. Ude et al. [4] use a Gaussian Process (GP) [16] to learn and retrieve the mapping between a query point (typically a different goal) and the DMP shape parameters. Instead of learning a mapping that outputs a new set of DMP parameters, the approach in [17] embeds the task parameters directly into the forcing term. Pervez et al. [6] follow a similar idea, but use a Gaussian Mixture Model (GMM) [18] to represent the forcing term. Their experiments shows better generalization performance compared to the approaches in [4,17]. Alternative approaches provide a probabilistic characterization of the learned MP [5,19,20,21,22]. Probabilistic MPs [19] perform generalization by statistical conditioning on the query task parameters, typically a new goal or a set of via-points. Via-point MP [22] and Kernelized MP [20,21] have better extrapolation capabilities than Probabilistic MP, but also in these approaches task parameters are new goals to reach or a set of viapoints to traverse. The Task-Parameterized GMM (TP-GMM) [5] considers as task parameters the homogeneous transformations between arbitrary reference frames. By observing the human demonstrations from each of these frames the robot is able to learn the spatial relationship between start, goal, and via-points in the trajectory.
Going beyond the kinematic level, Abu-Dakka et al. [23] use DMPs to regulate the contact force during an interaction task, while Yang et al. [24] exploit DMPs to generalize a set of learned, variable-impedance robotic skills. Teaching variable impedance and force skills is also a challenge, due to the couplings between robot, environment, and human teacher [25]. Rozo et al. [26] exploit teleoperation and a simplified model of the environment to extract force profiles and build training data. Kronander and Billard [27] use an artificial skin to wiggle the robot and reduce its stiffness. Electromyography (EMG) is another widelyused human-robot teaching interface, with applications in prosthetic hand interaction control [28,29], co-manipulation [30], and impedance learning [31]. 3 Conditional Neural Processes [32] are a class of deep Neural Networks (NNs) that combines the function approximation power of NN with the data efficiency of Bayesian approaches like GP. Inspired by conditional neural processes, Seker et al. [33] developed an imitation learning framework called Conditional Neural Movement Primitive (CNMP). CNMP generates motion trajectories by sampling observations from the training data and predicting a conditional distribution over target points. Training data may include robot position, forces, and any task parameters. CNMP is promising but have limited extrapolation capabilities. A possibility to improve the extrapolation performance is to combine imitation and reinforcement learning [34]. However, we seek a solution within the imitation learning framework.
Standard regression techniques fit the training data by combining a single type of activation functions through learnable weights. For example GMM uses Gaussians while NN uses sigmoid or ReLu functions. We claim that the poor extrapolation performance of standard regression approaches arises form this design choice. Equation Learner Networks (EQLNs) [7,8] are designed to combine a set of activation functions in order to fit the training data with a richer analytical representation that possibly matches the underlying function we are training to approximate from observations. In other words, an EQLN attempts to find equations from observations. Symbolic regression approaches attempt to find such equations by searching in a certain function space by means of evolutionary algorithms [35]. This is a key difference between symbolic regression and EQLN, which uses standard backpropagation techniques. In this work, we ex-tend the EQLN and present a novel IL approach called Task-Parameterized Equation Learner Network (TP-EQLN).

Problem definition
In Task-parameterized MPs, the task to be learned can be defined as a set of tu- . . , M and t = 0, . . . , T i , where M is the number of demonstrated trajectories ξ i (t) ∈ R D and T i is the time duration of each trajectory. For simplicity, we assume that the demonstrations have the same length, i.e., that each T i = T . For each demonstrated trajectory ξ i (t), there exists a set of n feature parameters γ i ∈ R n associated with each ξ i (t). Task parameters encode information of the spatial behavior and the task's shape, e.g., a certain height to reach. As an example, in the pouring task shown in Fig. 1 the cup's size (task parameter) determines the arc length, its initial and final point as well as the orientation trajectory that the end effector of the robot has to perform in order to pour into the cup successfully. Therefore, task-parameterized MPs try to learn a mapping between the parameter space and the trajectory space.
Let us to define ζ as the whole parameter space for a given task Φ. In imitation learning, the user can only demonstrate a subset of the whole parameter space Fig. 1. When a query parameter γ is sampled from ζ, we interpolate to regress the corresponding trajectory ξ(t). Traditional regression approaches perform well in interpolation. However, in real situations, we are interested in querying outside the demonstration area. In this case, we need to extrapolate to regress the corresponding trajectory. This problem is hard to solve due to the lack of 4 training data in the extrapolation domain. The goal of this paper is to learn a mapping between task parameters and motion trajectories that achieves good inter-and extrapolation performance. As mentioned in Sec. 1, we build our TP-EQLN on the EQLN approach originally proposed by [7].

TP-EQLN architecture
An EQLN is a multi-layered feed-forward network with L layers; l = {1, ..., L − 1} are hidden layers, and the last layer L is an output layer. The input z l ∈ R u to the l-th layer is defined as where y (l−1) ∈ R v is the output of the preceding layer l − 1, W (l) ∈ R u×v is a matrix of weights, and b (l) ∈ R u is a bias vector. y (0) and y (L) are defined as the network input and output respectively. W and b are the open parameters to learn. A traditional NN uses a single kind of activation function f a , while the EQLN uses the following activation functions: For a multidimensional input the above activation functions are applied element-wise. Note that the original formulation [7] uses only a subset of Eq. (2), namely f 0 through f 4 . The original layers are redefined by including new activation functions that contribute to better performance, as experimentally shown in the ablation study in Section 3.6. For the output y (L) , the same configuration is kept as defined in [7], a fullyconnected layer with linear activation functions. In contrast, the first layer y (0) is mod-5 ified to include time and feature parameter(s), i.e. the time vector t = 0, . . . , 1 is concatenated to the task parameter γ to form the input I = [γ, t] of the TP-EQLN, where I ∈ R n+1 . Therefore, the output of the network y (L) represents the estimated trajectoryξ(t) at each time step t for a queried feature parameter I. The TP-EQLN architecture used in this work is shown in Fig. 2.

Cost Function
For training the TP-EQLN, we use the cost function where the second term is the conventional regularizer that minimizes the weights in order to achieve good generalization. λ W,b ≥ 0 defines a regularization constant. We additionally introduce a Penalty Epoch Term to penalize every k epochs the predictionŝ ξ i (t) that are beyond a predefined threshold δ ξ . λ P > 0 defines a regularization constant for the penalty term. The introduction of a priory information in the penalty term helps to restrict the convergence space of the model by penalizing predictionsξ(t) that are beyond the boundaries defined by [δ ξ min , δ ξmax ]. This leads to better prediction in the extrapolation domain. For robotic manipulation tasks, the restricted space [δ ξ min , δ ξmax ] is given by the workspace of the robot. To calculate the penalty epoch we randomly sample N inputs I without labels ξ(t). The values for t are sampled from the predefined range [0, T ] whereas the values for γ are sampled considering only the extrapolation domain of the parameter space ζ.
Regarding the rest of the parameters in the cost function, the boundaries of the parameter space ζ depends on each task.
We use λ W,b = 4e −3 , λ P = 1e −6 , k = 50, and [δ ξ min , δ ξmax ] are platform/hardware dependent and are defined as The parameters θ of the network are updated by stochastic gradient descent with mini-batches and the Adam optimizer. We use a learning rate of α = 4e −3 for all the experiments.

Training Phases
We define a total of E epochs for training the model and split them into 3 phases: Phase 1: No regularization. This phase takes place within the epochs 0 ≤ E i ≤ 1 4 E: and the cost function to be minimized is L without regularization, i.e., λ W,b = 0. The reason of considering λ W,b = 0 only for a few epochs at the beginning of the training is to allow the weights to freely converge towards a first approximation.
Phase 2: Lasso-Like. This phase takes place within the epochs 1 4 E < E i ≤ 3 4 E, and the cost function includes the regularization term, with λ W,b = 4e −3 . The goal of this phase is to enhance the generalization and prevent overfitting. 6 Phase 3 Weight pruning Weights. This phase performs magnitude-based weight pruning and gradually zeroes out model weights by following the update rule This phase takes place within the range 3 4 E < E i ≤ T and leads to shorter analytic expressions that still properly describe the training data. We use δ W = 0.01 The epochs range E used in each training phase was chosen based on the values proposed in [7]. We slightly modified the last range to 3 4 E with the aim of increasing sparsity.
A summary of the overall training process is presented in Algorithm 1. example. The aim is to find the set of equations via TP-EQLN that better explains the training data set, where the samples are generated as Here f : R 2 → R, t is the time variable and γ the feature parameter. The samples are generated within the range 0 ≤ γ ≤ 3 with 20 equally-spaced subintervals. For each value of γ, we generate 200 samples from f (t, γ) with t ∼ U (0, 10). This gives a total of 4000 samples. To test the extrapolation performance, we train the model only considering 10 samples of γ generated within the range 0.789 ≤ γ ≤ 2.210. The remaining samples are used to validate the prediction of TP-EQLN in the extrapolation domain. The training and validation data set is presented in Fig. 3 (left). 7 We use one hidden layer with a fullyconnected layer in the output, with a batch size of 150. The equation obtained from TP-EQLN after the training process is Comparing the estimated and the groundtruth equation (6), we observe that both are composed of the same trigonometric functions, and f 5 (z) does not play any role in either equation. The main differences of the constants are in the sin and cos functions, which could be simplified using the trigonometric identities sin(θ) = cos(θ − π 2 ) and cos(θ) = sin(θ + π 2 ), in order to obtain an expression more similar to reference equation. Small inaccuracies in the estimated constant factors give rise to mild deviations in the values of the function values that increase as the values of the arguments t and γ move away from the trained ranges. However, for the purpose addressed in this work, the error that may be produced is acceptable since the exploration area of interest is limited to the robot's workspace, which remains constant. The evaluation of the equation estimated by TP-EQLN with training and validation data sets is presented in Fig. 3 (right) indicated by the red and blue data points respectively. The light-green surface represents the real equation from 6. Here, we observe that the evaluation of both training data sets remains close to the surface of the ground-truth equation.

Ablation study
To justify the use of the set of activa- (2) proposed for TP-EQLN, we carried out an ablation study where the performance is analyzed when some key activation functions are removed such as [f 2 , f 3 , f 4 , f 5 ]. The study is carried out using the data set from task Φ 1 . The details of this task are presented in Sec. 4.1. Figure 4 presents the results for each removed function. Each of the markers represents the MSE between the demonstrated trajectories and the predictions of each method. The circular markers refer to the training data set, whereas triangular markers refer to validation data set. The green markers represent the MSE of the demonstrations from the training data set and the blue markers represent the MSE for the extrapolation data set.
The lowest MSE obtained from this study for both the training and the extrapolation data set is highlighted in the light-red boxes, which corresponds to the use of the entire set of activation functions. Using the whole set of activation functions proposed in Section 3.2 gives the best results.
We also compared TP-EQLN against the standard EQLN to show the advantage of adding a feature parameter to the input. As for the ablation study, we used the obstacle avoidance task presented in Sec. 4.1. We used the same architecture for both TP-EQLN and EQLN, without considering the feature parameter in EQLN. The results are shown in Fig. 5. Figure 5 (left) shows the trajectories used for training (green) and validation (blue). The orange trajectory is the output of the EQLN. Since it takes only the time as input, the EQLN is not able to generalize over different obstacle heights and all trajectories collapse into one. The MSE values for the training and extrapolation data set of each approach are shown in Fig. 5 (right), which validate the improvement of the TP-EQLN over the EQLN. In conclusion, it is essential to include a feature parameter in the EQLN to encode and generalize over any physical and or spatial properties of the task.

Experimental Results
We perform six sets of experiments in different scenarios. The first four were carried out in simulation and the last two on a real Franka Emika Panda robot. In each experiment, we evaluate the interand extrapolation performance of the proposed TP-EQLN. Our approach is compared against two prominent approaches, namely CNMP and TP-GMM. The estimated trajectory in all the experiments is defined in the Cartesian space. First, a brief introduction to the six experiments is given. Afterwards, each experiment is discussed in detail in their respective subsection.
The first three experiments resemble a pick and place task where the robot picks an object from point A and places it at point B. For each experiment different task parameters are used. For the first experiment Φ 1 , we varied only the obstacle height. In the second experiment Φ 2 , we varied the height of the obstacle and the endpoint B. In the third experiment Φ 3 , we vary the height and position of the obstacle and the endpoint B. Demonstrations for Φ 1 to Φ 3 are synthetic data sampled from the parametric equation: where the initial position is [x 0 , y 0 , z 0 ] = [0.57, −0.41, 0.0] m, θ = 100 deg is the rotation between the trajectory and the global frame, and t and z d (t) are vectors that parameterize the motion of each task; t varies linearly (200 samples) in the range reported in Table 1 . The parameter γ 1 varies the height of the Gaussian. The ranges of γ 1 , µ and σ are listed in Table 1.
The fourth experiment Φ 4 consists of opening boxes of different sizes by holding their cover and executing an arch trajectory until the box is entirely open. The fifth experiment Φ 5 is carried out in a real setup 9 with a Panda robot and consists of picking and placing an object into different containers while avoiding some fixed obstacles in the environment. The last experiment Φ 6 is also carried out in a real setup and consists of pouring water into cups of varying size.
TP-EQLN hyperparameters used in each experiments are shown in Table 2. The values were selected empirically using the following rationale. Selecting a small number of hidden layers leads to better generalization in extrapolation but loses details in fitting the trajectory. On the other hand, selecting a large number of hidden layers fits the training data better but leads to highly nonlinear curves in the extrapolation domain, which impedes generalization in the extrapolation in some cases. As a general rule, one can select a high number of hidden layers if the trajectories in the extrapolation domain are expected to be highly nonlinear. Otherwise, a low number of hidden layers is preferable. We found that 1 to 3 hidden layers are enough to capture the trajectories of the performed experiments. The batch sizes are normally low for simple trajectories that do not demand large amounts of training data. The number of epochs is directly related to the size of the data set and the number of feature parameters. We found that after 20.000 epochs the prediction accuracy does not change significantly. A small number of epochs could be enough for simple trajectories that present low nonlinear variations between the feature parameters as in the case of Φ 4 .
For TP-GMM we used 30 Gaussian components. This number was chosen empirically by varying the number of components between 3 and 40 and checking the reproduction accuracy. After 30 components the estimated trajectory showed some overfitting. Moreover, we use 3 reference frames (beginning, middle, and end of the trajectory) for Φ 1 to Φ 3 , and 2 reference frames (beginning and end of the trajectory) for Φ 5 and Φ 6 . For CNMP we used 4 layers with 128 units per layer for both the encoder and decoder parts of the network, 3 × 10 6 training steps, and an Adam optimizer with l r = 10 −4 . For TP-EQLN and CNMP the same input data format I was used for the training in all the experiments.
For tasks Φ 1 to Φ 5 , the orientation of the end-effector was not considered since it was not relevant for the tasks. Therefore, the estimated trajectory for these tasks iŝ ξ(t) ∈ R 3 (end-effector position). The orientation (4D unit quaternion) is considered in task Φ 6 , resulting in an estimated trajectoryξ(t) ∈ R 7 . The feature parameters and the training and extrapolation data set ranges of the experiments are summarized in Table 1.

Φ 1 -Variable height
This task consists of avoiding an obstacle of varying height h, with only one feature parameter γ 1 = h, Fig. 6 (top-left). The range of γ 1 and the training and extrapolation data sets are reported in Table 1. The position of the obstacle is fixed at [0.5, 0.07, 0.0] m for all the demonstrations.
We show three different trajectories for each approach in Fig. 6 (top-right), two for γ 1 = [0.08, 0.4] (extrapolation set) and one for γ 1 = [0.23] (training set). The results show that TP-EQLN is able to correctly predict the trajectory for unobserved obstacle heights while keeping the desired shape and reaching the desired highest point of the curve. On the other hand, CNMP accurately reproduces the trajectories from the training data set but generates larger distortions in the extrapolation domain, especially for the highest point of the tra-10 Extrapolation  It is worth mentioning that the two frames used by the TP-GMM at the beginning/end of the motion are constant for each value of the feature parameter, while the frame in the middle varies depending on the height of the obstacle. Otherwise the estimated trajectory from TP-GMM will be constant along the z-axis. In other words, for TP-GMM we assume that the highest point to reach is always known, even for the extrapolation domain. In practice, this point is unknown and has to be estimated since it is directly correlated with the height of the obstacle. For the TP-EQLN and CNMP methods, this extra information is not provided. This assumption reveals a first clear advantage of our method over TP-GMM, i.e., it can be more practical to specify a set of high-level feature parameters rather than a reference frame.

Φ 2 -Variable height and goal
Similarly to the previous task, we vary the height of the obstacle encoded in γ 1 . Additionally, we introduce a new task parameter γ 2 to encode the final position (goal). The feature parameter vector is formed as γ = [γ 1 , γ 2 ]. The demonstrations were sampled from (8), with the parameters defined in Table 1. An overview of this task is depicted in Fig. 7 (topleft), while 3 trajectories retrieved with each approach are shown in Fig. 7 (top-right). For these cases, the results show that our approach presents less distortion in the shape of the trajectory than the other methods (Fig. 7, bottom).

Φ 3 -Variable height, obstacle position, and goal
In this task we change the obstacle height, its position, and the final point of the trajectory, which are embedded in three feature parameters γ = [γ 1 , γ 2 , γ 3 ]. Demonstrations are generated from (8), with the parameters defined in Table 1. The position of the obstacle is also varied linearly by shifting the Gaussian function with µ, where µ depends on γ 3 . The results of this procedure are shown in Fig. 8 (top-left).  For TP-GMM, we have provided again 3 reference frames. The frame at the beginning remains constant, while the others vary depending on the desired obstacle's position and the goal of the trajectory. As done in Φ 1 and Φ 2 , the position of the frames is provided also for parameter values in the extrapolation domain. Figure 8 (bottom) shows the MSE for γ. The MSE for γ 1 represents error of the highest point, the MSE for γ 2 represents error of the obstacle's position, which is measured with respect to the position of the highest point of the trajectory in the plane xz, and the MSE for γ 3 represents error of the final goal position. These results show that our approach reaches closer to the highest point of the trajectory than the other methods. Regarding the MSE for γ 2 , our method is slightly outperformed by TP-GMM. The reason can be the extra information provided by the middle frame. Regarding the MSE for γ 3 the TP-EQLN outperforms other methods. This means that our method is able to predict better the desired highest point of the trajectory at the right place to prevent collisions.
Finally, the average MSE over all the trajectories (for both training and extrapolation set) show that TP-EQLN is able to preserve the shape of the trajectory even for feature parameter values in the extrapolation domain.
The presented results show that our method predicts the spatial features linked to the features parameters accurately, while maintaining the shape of the trajectory even for feature parameters values from the extrapolation domain.

Φ 4 -Opening a box
This task consists in opening boxes of different sizes by holding their cover and executing an arch trajectory until the box is entirely open (Fig. 9, top). The box's size is given as a feature parameter γ, which affects both the trajectory's arc that must be followed as well as its endpoint. The trajectories are expressed with respect to the center of the handle and are generated as: where the value of the parameters is defined in Table 1. We provide two frames for TP-GMM, one at the beginning and 13  one at the end of the trajectory. The frame at the beginning remains constant, whereas the frame at the end varies depending on the feature parameter. The frame value is considered as extra information since in practice the end point of the trajectory is unknown in the extrapolation domain. Fig. 9 (bottom-left) shows the predicted trajectory for γ = 0.06, 0.6 (extrapolation set) and for γ = 0.36 (training set). In the presented cases, CNMP is the one that shows larger deviations from the desired trajectories in extrapolation whereas TP-EQLN and TP-GMM predict accurately the shape of the trajectory. This can be seen in Fig. 9 (bottom-right), where the MSE for both training and extrapolation set are shown.
4.5. Φ 5 -Pick and Place with Obstacle avoidance This experiment evaluates the prediction capabilities of our approach in a real pick and place task. The experiment consists of releasing an object in a container while avoiding some fixed obstacles in the environment (see Fig. 10, top-left). We used 8 containers for this experiment denoted as C i with i ∈ {1, . . . , 8}. However, we demonstrate the pick and place task only for the containers i ∈ {1, 2, 3, 4}. The initial position of the object to be transported is fixed whereas the release point is defined right above of each container's position. The demonstrations were collected using a Panda robot, where the user demonstrated kinesthetically the trajectories, as shown in Fig. 10, top-right). Each trajectory was stored with a resolution of 700 data-points and smoothed using Cubic Splines from the SciPy library. The preservation of the trajectories' shape in this task is relevant to guarantee the avoidance of the obstacles in the setup.
The feature parameter is defined with the initial and the release point, i.e. γ i ∈ R 6 . Although the initial point remains constant, we found out that TP-EQLN performs better by including this initial point in the feature parameter. The feature parameter space and the training and extrapolation data sets considered for this task can be found in Table 1.
For TP-GMM, we have defined two frames, at the beginning and at the end of each trajectory. In Fig. 10  Trajectories obtained from each method in the training (containers C 1 ,C 2 ) and extrapolation data sets (containers C 5 , C 6 ). (Bottom-center) Trajectories obtained from each method in the training (containers C 3 ,C 4 ) and extrapolation data sets (containers C 7 , C 8 ).
(Bottom-right) MSE estimation for each approach. Shaded (non-shaded) areas indicate the extrapolation (training) domain.
TP-EQLN. The 3 methods produce correct trajectories for avoiding the obstacles successfully in both data-sets. For the release point, the three methods present a gap with respect to the goal. This gap is mainly present in the z axis. Figure 10 (bottom-right) shows the MSE of the release point (goal) for each data set. Results show that TP-EQLN produces the lowest gap in both training and extrapolation sets. This means that TP-EQLN releases the object in the container successfully, whereas the TP-GMM and CNMP release the object outside of the container.
The shape of the trajectory is also important for executing the task successfully. In Fig. 10 (bottom-right), we show the average MSE for each data set which tell us how well the shape of the trajectory is preserved. The results show that TP-EQLN significantly reduces the total error in reproducing the demonstrations. Figure 10 (middleright) shows some snapshots of the executed trajectory for the Container C 8 using TP-EQLN.

Φ 6 -Pouring into different cups
This experiment demonstrates the generalization capabilities of TP-EQLN in a pouring task (see Fig. 11, top-left), where both position and orientation of the endeffector have to be controlled. In particular, the position trajectory is mainly affected by the size of the cup and therefore varies in the xy plane. The orientation plays an important role as it gives the correct inclination of the end-effector during the pouring, but it does not present a significant variation with respect to the size of the cup. Also the correct coupling between position and orientation trajectories is important for the successful execution of the task. The estimated trajectory is ξ ∈ R 7 , where [x, y, z] defines the position of the end-effector in Cartesian space and [q x , q y , q z , q w ] the orientation as a unit quaternion. The initial position of the cup is constant for all the demonstrations. The variation of the feature parameter γ ∈ R is linked to the cup's size and the ranges are listed in Table 1.
The demonstrations were taken with a Panda robot where the user demonstrated kinesthetically the trajectories as shown in the snapshots of Fig. 10 (top-right). Each trajectory was stored with a resolution of 700 data points. We estimate position and orientation using separate TP-EQLNs or CNMP that share the feature parameter as input. Both estimated trajectories are aligned in time to form the estimated trajectoryξ ∈ R 7 .
As done in the other task for comparison purposes, we provide to TP-GMM information about the unknown (final) point by specifying the value of the end frame also in extrapolation.
In Fig. 11 (bottom-left), we show the predicted trajectories for position and orientation for 3 different cup sizes, i.e. γ = 0.16 (training set) and γ = [0.075, 0.245] (extrapolation set). For this task, there are three relevant points of the trajectory (see the markers in Fig. 11, bottom-left). These points are used as a reference to compare the TP-EQLN against TP-GMM and CNMP. The first point is the initial pose of the robot. All the approaches manage to keep the initial pose for different feature parameters. The second point represents the approach position between the tip of the watering can and the cup. The last point defines the pose where the robot ends the pouring process. The last one is the most important considering that a deviation in this point may cause the liquid to spill outside the cup.
In Fig. 11 (bottom-right), we show the MSE value for each key point in both training and extrapolation domains. Regarding the initial point, our method is the one that presents the lowest MSE. A large MSE in the initial point of the trajectory would generate large control actions and a non-smooth start of the robot motion.
Regarding the approach and final point, our method is slightly outperformed by TP-GMM in the training domain. However, in the extrapolation domain, our method produces better results. In Fig. 11 (bottomright), we show the average MSE per data set. In the training domain, TP-EQLN and CNMP perform similarly and are slightly outperformed by TP-GMM. TP-EQLN outperforms other approaches in extrapolation.

Conclusions and Future Work
We presented the Task-Parameterized Equation Learner Network (TP-EQLN), a novel approach to imitation learning with enhanced extrapolation capabilities for tasks where maintaining the shape of the trajectory is important. The key idea of TP-EQLN is to combine different types of activation functions in order to find an analytical expression that fits the training data. This leads to a better approximation of the desired motion in both inter-and extrapolation domains. We enrich our network with task-dependent parameters and use them alongside a time vector to query the desired path. We showed it is possible to encode physical or spatial properties of the task in this feature parameter and to generalize the task over the feature parameter space.
Our TP-EQLN was empirically evaluated and compared against two prominent existing approaches in a set of tasks of increasing complexity. The obtained results show that, in most of the cases, TP-EQLN outperforms state-of-the-art approaches, especially in extrapolation, and most importantly, preserves the shape of the trajectory. Whereas CNMP also use a feature parameter to encode some properties of the task, it is not possible to generalize beyond the 17 data distribution of the training data since this method is based on probability distributions. On the other hand, TP-GMM require the explicit definition of frames provided by the user to be able to adapt the trajectory for any change in the environment. TP-EQLN encode those changes in the environment using high-level feature parameters. With this is possible to generalize the task beyond the data distribution of the feature parameter used for training. The definition of high-level feature parameters allows us to define and generalize the tasks more intuitively.
Our future research will follow different directions. We will explore the possibility to combine TP-EQLN with control techniques to ensure convergence to a given target while keeping the overall shape of the motion. We also plan to endow TP-EQLN with stochastic information to characterize the variability and the uncertainty in the demonstrations. Finally, we will investigate the possibility to use more abstract concepts as task parameters, like the property of containing liquids, in combination with a task ontology to generalize learned skills across different objects with similar properties.