Active Sensing for Data Quality Improvement in Model Learning

In machine learning for robotics, training data quality assumes a crucial role. Many methods use exploration algorithms to select the most informative data points for the model, often ignoring the impact of measurement noise on data. This letter introduces a method to enhance dataset quality for model learning, optimizing a combination of exploration and active sensing metrics. We introduce a novel Exploration Gramian metric based on a Gaussian Process predicted covariance matrix, optimized to explore the state space regions where the knowledge about the unknown model is maximum. These are integrated with an active sensing metric (Constructibility Gramian) to mitigate measurement noise effects. The effectiveness of this approach is demonstrated through simulations on a unicycle and a quadruped robot, confirming that combining active sensing and exploration significantly enhances performance in model learning.

Abstract-In machine learning for robotics, training data quality assumes a crucial role.Many methods use exploration algorithms to select the most informative data points for the model, often ignoring the impact of measurement noise on data.This letter introduces a method to enhance dataset quality for model learning, optimizing a combination of exploration and active sensing metrics.We introduce a novel Exploration Gramian metric based on a Gaussian Process predicted covariance matrix, optimized to explore the state space regions where the knowledge about the unknown model is maximum.These are integrated with an active sensing metric (Constructibility Gramian) to mitigate measurement noise effects.The effectiveness of this approach is demonstrated through simulations on a unicycle and a quadruped robot, confirming that combining active sensing and exploration significantly enhances performance in model learning.
Index Terms-Optimization, robotics, information theory and control.

I. INTRODUCTION
I N THE field of machine learning (ML), many researchers focus on developing powerful models for processing complex datasets to enhance predictions.However, the importance of the quality and informativeness of these datasets, particularly in real-world applications, often receives less attention, despite its significant impact on ML performance [1], [2].The quality of the datasets in terms of informativeness plays a crucial role in ML applications in robotics and automation, especially when used for unknown model learning.Because of the time constraint of a real-time implementation, only a few data points can be processed to provide a timely prediction.Therefore, to best predict the model, during the learning process, it is of paramount importance to identify the few data points that contain the largest amount of information [3].To this purpose, strategies for constructing training sets that minimize the epistemic error (systematic errors due to limitations on model knowledge) are becoming popular.These include the use of information-theoretic optimal experimental design for selecting optimal training dataset [4], and exploration algorithms that maximize information gain, quantified by the Fisher Information Matrix (FIM) [5], [6].
In learning unknown dynamics, a significant challenge is to extend the concept of information gain to a trajectory optimization problem.To tackle this issue, in [7] a static active learning strategy is adopted to dynamic settings, aligning the sampling strategy with trajectory-dependent states.
Another factor affecting data quality is measurement error, primarily stemming from noisy sensor readings.While filters can passively increase the robustness of model learning by limiting the effects of noise, actively optimizing measurement acquisition through active sensing/perception approaches further mitigates these effects.Although widely adopted in robotics to enhance estimation [8], [9], active sensing has been minimally explored for improving filter estimates in training set construction.In particular, in this letter, we use the Constructibility Gramian (CG) as active sensing metric [10] Our proposed method integrates active sensing with exploration to identify informative trajectories for sample collection.We introduce the Exploration Gramian (EG), a new metric that uses the state transition matrix to capture how the system evolves along a trajectory, thus improving sample collection in trajectory optimization problems.The EG is obtained from the predicted covariance matrix of a Gaussian Process, which we utilize as a learning method in our study.To the best of our knowledge, this represents the first implementation of EG in an exploration algorithm.Here, exploration and active sensing metrics are combined, providing new cost functions maximized in a Model Predictive Control (MPC) framework.
The effectiveness of our methodology is demonstrated through comparative analysis with a standard exploration algorithm based on the GP covariance matrix.Differently from the EG, this metric does not contain the state dynamics and hence could not properly steer the system to unexplored states.Finally, we test our EG-based methodology on simulated unicycle and quadrupedal robots, proving significant improvements in model learning.

II. PRELIMINARIES
The primary objective of this letter is to maximize dataderived information to reduce both epistemic and measurement errors.Consequently, we employ two metrics: the active sensing measure (CG) enhances observer estimate performance and minimizes measurement error, while the active exploration measure (EG) boosts the efficiency of a Gaussian Process used in learning unknown dynamics, thus diminishing epistemic error.
Let us consider a generic nonlinear system q(t) = f n (q(t), u(t)) + f u (q(t), u(t)), q(t 0 ) = q 0 (1) where q(t) ∈ R n is the state of the system, u(t) ∈ R m is the control inputs, z(t) ∈ R p is the sensor outputs (i.e., the measurements available through onboard sensors at time t), f n (•) is the known nominal system dynamics while f u (•) is the unknown one.Finally, ν(t) ∼ N (0, R(t)) is a white, normallydistributed Gaussian noise with zero mean and covariance matrix R(t).We assume that the nonlinear system is affected by negligible process noise.Let us consider the linear time-varying (LTV) system obtained by linearizing (1)-(2) around a given trajectory, with q(t 0 ) = q 0 : q(t) = A(t)q(t) + B(t)u(t) + F u (q(t), u(t)), (3) z(t) = C(t)q(t) + ν(t), (4) where , and C(t) = ∂h(q(t)) ∂q(t) .Moreover, for the LTV system, the unknown dynamics is F u (q(t), u(t)) = f u (q(t), u(t)) + O(q 2 ) and consists of the linearized unknown dynamics f u (q(t), u(t)) and nonlinear higher-order terms O(q 2 ) resulting from the Taylor series, which are neglected during the linearization process.The chosen formulation is general enough to cover a broad class of practical cases, ranging from simplified linear models of complex systems with highly nonlinear dynamics to nonlinear dynamics with unmodeled dynamics, disturbances, and neglected higher-order terms.

A. Active Sensing Measures
In real scenarios, the real state q(t) of a system is typically unknown, with only an available estimate q(t) provided by an observer.The observer's performance, in terms of uncertainty and estimation error, heavily relies on the information coming from noisy sensor readings.Addressing this, active sensing/perception control aims to maximize sensory information for optimal state estimation.A crucial key step is the selection of an appropriate measure of information to optimize [11], [12], [13].The measure chosen in this letter is the CG (see Section I) which is defined as where t f > t 0 is the final integration time, and W c (τ ) ∈ R p×p is a symmetric positive definite weight matrix.(t, t 0 ) is the state transition matrix of the LTV system in (3)-( 4) (with F u (q(t), u(t)) = 0) and solution of the following matrix differential equation In [10], the authors showed that the CG is also strictly related to the inverse of the estimation error covariance matrix P provided by an Extended Kalman Filter (EKF) built on ( 3)-( 4) (with F u (q(t), u(t)) = 0 and negligible process noise).Indeed, the following equivalence holds: , where G c (−∞, t) represents the amount of the collected information in the time window [−∞, t].Thus, maximizing the CG is equivalent to minimizing the state estimation uncertainty.

B. Exploration Measures
A Gaussian Processes regressor is exploited to reconstruct the unknown dynamics F u .GPs can be seen as an infinite collection of random Gaussian variables, any finite number of which has a joint Gaussian distribution [14].Therefore, F u is learned from N samples x i := (q i , u i ), i = 1, . . ., N, which consists of estimated state and inputs and represents the input data of the GP.The corresponding outputs represent the deviation of the estimated nominal system dynamics from qi , which is an estimate of qi .Moreover, w ∼ N (0, ) is a white, normally-distributed Gaussian noise with zero mean and covariance matrix that affects each element of y ∈ IR n .The computation of q(t) in (6), requires an estimator.An observer built on (1)-(2) allows to only retrieve q(t) but not its time derivative.Hence we introduce an augmented state q aug (t) = (q(t), q(t), u(t)) with u aug (t) = u(t) the new input.Starting from (1) with f u (q(t), u(t)) = 0, the dynamics of q aug (t) is The training dataset of the GP is hence given by D : The accuracy of q and q directly influences the training set quality and thus the effectiveness of the GP in learning the unknown model.Consequently, the introduction of an active sensing control strategy is crucial for enhancing training set quality, underlining the importance of the active perception-exploration approaches tackled in this letter.
Since F u is approximated by a GP, providing Fu , in each output dimension a = 1, . . ., n it is fully characterized by its mean μ a (•) and variance Q a (•), which at at unobserved point x * are Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where K a Xx * , K a XX , K a x * x * and K a x * X depends on the kernel k a (•, •) (see [14]).The prior mean is assumed zero.Each dimension of the output is learned independently, resulting in a multivariate GP approximation of the unknown dynamics F u , given by stacking the individual output dimensions ) are the predicted mean vector and covariance matrix provided by the GP about the unknown dynamics at a generic input x and conditioned by the training set D.
To enhance the reconstruction of unknown dynamics Fu (x), the GP training set will be extended with novel samples x that are expected to increase the knowledge about F u (x).In the context of online learning, where the training set is continuously updated, the informativeness of new sampling points is assessed using mutual information between these points and their respective observations [15].Therefore, let us consider new samples X new = [x 1 , . . ., x κ ] and the a-th component of the outputs y a new = [y a 1 , . . ., y a κ ], with a = 1, . . ., n.Since the observations are affected by an additive Gaussian white noise N (0, ) with = λ 2 I, the Information Gain (IG) for each component of the output can be expressed as (see [16,Lemma 5.3] for details) where Fa u new = [ Fa u (x)] x∈X new , and α = 1, . . ., κ is an index used to iterate over the new samples of X new .One can conclude that for the future points to sample, IG depends on the predictive variances Q(x) of the GP [16].Moreover, in parameter identification problems, Q can be an alternative to the FIM, as the two are inversely proportional [17].
Remark 1: The eigenvalues of Q measure the information gain associated with the new samples taken along the directions denoted by their eigenvectors.Samples that maximize Q, also maximize IG, steering the system toward unexplored state-input space.
Unfortunately, Q(x) does not consider the information about the state evolution along the planned trajectory that, instead, cannot be neglected since we are learning the unknown part of a dynamic system.Hence, let us start from the explicit solution of (3), i.e., Following [18], the first and the second moments of ( 8) are Unlike Q(t), Q q (t) describes the evolution of the uncertainty on F u (q(t), u(t)) in the time window [t 0 , t] taking into account the information about the f n (q(t), u(t)) around the nominal trajectory.The maximization (of some norm) of Q q (t) is expected to steer the system toward unexplored state space regions, where the expected IG about F u (q(t), u(t)) is maximized.For this reason, hereafter, we will rename Q q (t) as G exp (t 0 , t) and we will refer to it as Exploration Gramian.
We conclude by showing an important link between the EG and the solution X(t) of the Continuous Riccati Differential Equation (CRDE) associated with (3) in the absence of output equations (i.e., sensor readings).In this case, the CRDE becomes a Differential Lyapunov Equation (DLE), i.e., with initial condition X(t 0 ) = X 0 .The solution of ( 9) is (see [19]) Analogously to what has been done in [10] for the CG, also in this case equation ( 10) can be expressed as a function of the sole EG.Let G exp (−∞, t) represent the EG computed over the interval (−∞, t], By comparing (10) with (11), it follows that X(t) = G exp (−∞, t).Differently from Q(t), the EG takes into account the integral informativeness of both the state-input sample collected along the path and the state evolution of the system encoded in the state transition matrix (t 0 , t).However, the state transition matrix is often not available in explicit form, especially for nonlinear systems or LTV systems, and needs to be computed by solving (5).Therefore, a simplified version of the EG, which is easier to compute, is In the following, we will refer to G DF exp as Dynamic-Free Exploration Gramian (DF-EG).Of course, the DF-EG shares somehow the same disadvantage as the Empirical Observability Gramian (EOG) [20] related to neglecting the system state transition matrix.The EOG cannot approximate the local observability Gramian for the states that do not appear in the sensor model and hence its optimization cannot steer the system along those directions.For the same reason, the DF-EG cannot model the IG for the state that does not appear in F u (q(t), u(t)), and hence its optimization cannot steer the system along those directions.It is important to note that, DF-EG is not a new metric for active exploration, as maximizing Q has previously been employed in other works.For instance, in [21], where the GP covariance is maximized within an active tactile object exploration algorithm.

III. ACTIVE PERCEPTION-EXPLORATION COUPLING
This section first introduces the objective function chosen as a baseline in our simulations.Subsequently, effective new cost functions that integrate active sensing and exploration measures are defined and an optimal control problem that maximizes these functions is stated.

A. Perception-Exploration Performance Metrics
Let t ∈ [ − ∞, t f ], the first metric that we introduce in this letter is the trace of DF-EG, i.e., chosen as the state-of-the-art baseline to prove the effectiveness of our methodology.Furthermore, the DF-EG represents the continuous-time version of the cost function proposed in [7], wherein the IG is replaced by its equivalent definition, Q, as described in [16].The maximization of both IG and Q yields the same optimal solution, as the logarithmic operation does not alter the problem optimality.By maximizing J 1 , the system will visit unexplored state-input spaces collecting training set samples that maximize the information needed for good model learning.
The second metric is the weighted combination of J 1 (q(t), u(t)) and the trace of the CG, i.e., where σ EG , σ CG > 0 are used to assign different levels of importance to each component.The maximization of J 2 will result in finding the trajectory that simultaneously leverages the sensor information for improving state estimation and explores the regions where the expected knowledge about the unknown model is maximized.However, the DF-EG does not consider the system state evolution along it.As a consequence, a better combination is the following quantify the sensory information and the evolution of the unknown model uncertainty along the planned trajectory taking into account the state evolution of the system encoded in the transition matrix.
Remark 2: The trace operator (aka A-optimality criterion) measures both the average IG about the unknown model and the amount of measurement information.It also satisfies the Bellman principle of optimality, i.e., tr(M+N) = tr(M)+tr(N) implying that subpaths of an optimal path are optimal as well.Other optimality criteria, not satisfying the Bellman principle, could be used as, e.g., the D-optimality (matrix determinant) or E-optimality (matrix eigenvalue) [22].Control system block diagram of the active perceptionexploration coupling methodology proposed in this letter.
Remark 3: It is crucial to find the correct trade-off between exploration and active sensing by adjusting the weights in the proposed objective functions when precise estimation and thorough exploration are required.Indeed, the combination of active sensing and active exploration may exhibit contrasting behaviors.For instance, let us consider a vehicle equipped with a sensor providing intermittent measurements w.r.t.fixed markers.The active sensing control strategy steers the robot around the markers, maximizing the amount of sensory information.In contrast, active exploration guides the vehicle to unexplored state space regions for retrieving the missing dynamics where, however, the availability of the measurement is not guaranteed, possibly reducing (because of Q, which is related to the unknown dynamics) the amount of information and hence the quality of estimation.As a consequence, an effective control strategy needs to balance the benefits of sensory information maximization with the advantages of state space exploration for improving the system's state estimation and unknown dynamics reconstruction.

B. Online Optimal Perception-Exploration Problem
Let us consider a generic observer built on the augmented system q aug (t) = (q(t), q(t), u(t)) for estimating qaug (t), needed for constructing the training set.The goal is to develop an online exploration-perception control strategy (see Fig. 1) by solving, at each time t, the following Problem 1: [Online Optimal Perception-Exploration Control] Given the prediction horizon consisting of L samples, the control input u(t) ∈ S( ), where S( ) is the family of piece-wise constant functions with sampling period T, the predicted trajectory of the nominal system qaug obtained by applying u(t) starting from the initial estimated state qaug where J l (q aug (t − ), u(t − )) with l = 1, . . ., 3 represents one of the cost functions introduced in Section III-A, integrated along the time interval [t k , t k+L ]. eq (13) is the nominal system dynamics used to predict the state evolution starting from the initial state (14) provided at runtime by the observer.eq (15) are the state constraints, ( 16) are the control constraints, while ( 17) are other possible constraints such as the total Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.energy consumption for the execution of the task [10] or a Lyapunov constraint to better ensure stability [23].
Problem 1 is solved using the CasADi tool [24] by rewriting it as a Nonlinear Programming (NLP) problem.u * ref (t), is exploited to compute the current system state estimate, qaug (t).The state estimation-input pair is then added to the GP training set.The GP is updated at intervals of T update , where T update > T. By setting the update time distinct from the sampling time, the samples are spatially more separated, improving the GP's ability to capture the characteristics of the unknown model more effectively.

IV. RESULTS
To validate our methodology, we conduct tests on simulated unicycle and quadrupedal robots.An Extended Kalman Filter is employed as an observer, and training set samples are collected online by optimizing each objective function detailed in Section III-A.In addition, we also compare our method with a random approach generated by applying random input.Each learned GP, yielding estimates of the unknown dynamics f u , is validated on a testing trajectory of 200 samples for both case studies.The testing trajectory consists of (q test , qtest ) , along with the corresponding control inputs u test .For each sample of the testing trajectory, we compute the mismatch error e = qtest − f n q test , u test + f u q test , u test (18) where the second term in brackets is the reconstructed dynamics.To show the effectiveness of our approach, we compare the Root Mean Square (RMS) of the mismatch errors.Moreover, we consider σ e = σ a = 1, for both case studies.

A. Unicycle Vehicle
Let us consider a unicycle vehicle affected by an unknown external disturbance f u (q, u), whose dynamics is described by (1) (7), the augmented unicycle vehicle has q aug = [p x , p y , θ, ṗx , ṗy , ω, v] as state and u aug = [v, ω] as input.The vehicle starts from q 0 = 0 6 with zero estimation error and initial uncertainty P 0 = 0.4 I 6×6 .The onboard sensors provide noisy distances w.r.t.four markers located at (0, −5) m, (0, 5) m, (10,6) m, (−10, −10) m.The measurement noise covariance matrix is R = 0.25I 4×4 .Moreover, −3 m/s ≤ v ≤ 3 m/s and −3 rad/s ≤ ω ≤ 3 rad/s, while, T = 0.1 s, and L = 10 samples.Finally, the unknown disturbance is defined as f u (q, u) = −0.3sin θ, 0.3 cos θ, 0.3 cos θ sin θ .The training set is built online by solving Problem 1 and updating the GP every T update = 0.5 s, continuing until the training set reaches 200 samples.Each sample of the training set consists of the input x j = [p j x , pj y , θ j , v j , ω j ] and the output y j = [ pj x , pj y , θ j ] − f n (q j , u j ).Once a testing trajectory (q test , qtest ) and corresponding input u test have been chosen, we compute the mismatch error defined in (18) along such a trajectory, with q test = [p test x , p test y , θ test ] , qtest , and u test = [v test , ω test ].For this case study, we have qtest = f n (q test , u test ) + f u (q test , u test ), and hence (18) reduces to e = f u (q test , u test ) − f u (q test , u test ).Finally, the RMS of the mismatch errors are compared in Table I.Overall, all three performance indices improve the quality of the training set over the random approach.Comparing J 2 and J 3 to J 1 indicates that the active sensing part further enhances the quality of the training dataset.In general, the RMSE values for J 2 and J 3 are smaller than those for J 1 .Moreover, J 3 allows to achieve the best results in terms of mismatch errors.Finally, we have evaluated the time needed for the GP update and for solving Problem 1, both w.r.t. the training set size.The update time increases with the growth of the training set dimensions, confirming the computational expense of updating the GP scales as O(nN 2 ) [25].Furthermore, J 3 consistently exhibits long times for finding the optimal solution of Problem 1, mainly due to a not-optimized code w.r.t. the one of the quadrupedal robot.

B. Quadrupedal Robot
We now test the proposed metrics on a more complex dynamic system such as a quadrupedal robot.According to [27], its approximated linear Single Rigid Body Dynamics (SRBD) in the world frame is . Moreover, p ∈ R 3 is the robot's position, m is the robot's mass, g ∈ R 3 is the gravity vector, and I ∈ R 3 is the robot's inertia tensor, = [φ, θ, ψ] is the robot's orientation with φ, θ , and ψ are the roll, pitch, and yaw angles, respectively.Notice that, we assumed that the roll and pitch angles do not significantly vary during the robot's motion.Additionally, r i ∈ R 3 is the vector connecting the center of mass (CoM) to the point where the force f i ∈ R 3 is applied with i = 1 . . . 4. The system dynamics expressed by (19) represents a simplified version of the more complex nonlinear one.Therefore, f u (q(t), u(t)) (O(q 2 ) is zero in this case) represents the legs dynamics, the robot-ground interaction, and the small angles approximation on pitch and roll in the SRBD.According to (7), the new augmented state is q aug = [p, , ṗ, ˙ , p, ¨ , f 1 , . . ., f 4 ] with input u aug = ḟ i .The robot is simulated in PyBullet, as shown in Fig. 2. We consider q 0 = 0 18 with zero estimation error and initial uncertainty P 0 = 0.3 2 I 18×18 .An onboard IMU provides the linear acceleration and the angular velocity of the robot CoM.Additionally, a laser scan provides noisy distances of the floating base CoM w.r.t.fixed landmarks.Moreover,  I shows the comparison between the RMS of mismatch errors.The RMSE values for J 2 and J 3 are smaller than those for the random approach and J 1 , confirming that the active sensing component contributes to enhancing the quality of the training dataset.The case study involving the quadrupedal robot demonstrates a rising trend in the GP update time as the size of the training set increases, following an O(nN 2 ) as discussed for the unicycle vehicle.As stated in the previous section, the time required to find the optimal solution of Problem 1 should exhibit a growing trend with the size of the training set.This trend is evident for the unicycle vehicle but not for the quadrupedal robot.This is due to a more optimized code version used in the latter case (based on the open-source framework Horizon [26]).

V. CONCLUSION AND FUTURE WORKS
In this letter, we have proposed an online optimal perception-exploration control approach.Results have clearly shown the data quality improvement obtained by including active sensing.We have also observed an increasing trend in GP update times as a function of the training set size for both case studies.Future works will aim at optimizing the GP model size by adding a sub-sampling level, e.g., based on Nystrom-type methods [28] or a subset data technique [25], to reduce the growth rate of the training set size, which is of paramount importance for real-time experiments.
and N the number of training set samples.

Fig. 1 .
Fig. 1.Control system block diagram of the active perceptionexploration coupling methodology proposed in this letter.

Fig. 2 .
Fig. 2. Real system block in Fig. 1 for the quadrupedal robot.u * ref are the robot CoM reference.Horizon [26] converts it to the optimal input u * used to update the state estimate and hence the training set.R = 0.1 2 I 9×9 , T = 0.04s and L = 20 samples.The training set is built with GP sample collected every T update = 0.12s, until the training set reaches 200 samples.Each sample of the training set consists of the input xj = [p j , ˆ j , pj , ˆ j , f j i , r j i ]and the output y j = [ pj , ˆ j ] − f n (q j , u j ) with j = 1, . . ., 200.The mismatch error (18) is computed on a testing trajectory withq test = [ṗ test , ˙ test ] , qtest , u test = [f test i , r test i ], and i = 1, . . ., 4, provided by the PyBullet simulator.Since we noticed small variations in the height, roll, and pitch angles during motions, we have excluded these variables from the discussion of the results.TableIshows the comparison between the RMS of mismatch errors.The RMSE values for J 2 and J 3 are smaller than those for the random approach and J 1 , confirming that the active sensing component contributes to enhancing the quality of the training dataset.The case study involving the quadrupedal robot demonstrates a rising trend in the GP update time as the size of the training set increases, following an O(nN 2 ) as discussed for the unicycle vehicle.As stated in the previous section, the time required to find the optimal solution of Problem 1 should exhibit a growing trend with the size of the training set.This trend is evident for the unicycle vehicle but not for the quadrupedal robot.This is due to a more optimized code version used in the latter case (based on the open-source framework Horizon[26]).

TABLE I RMS
OF THE MISMATCH ERROR OF RANDOM, J 1 , J 2 , AND J 3 FOR THE UNICYCLE VEHICLE AND QUADRUPEDAL ROBOT CASE STUDIES