Tunable Impact and Vibration Absorbing Neck for Robust Visual-Inertial State Estimation for Dynamic Legged Robots

We propose a new neck design for legged robots to achieve robust visual-inertial state estimation in dynamic locomotion. While visual-inertial state estimation is widely used in robotics, it has a problem of being disturbed by the impacts and vibration generated when legged robots move dynamically. The use of rubber dampers may be a solution, but even if the dampers are proper for some gaits, they may be excessively deformed or resonated at certain frequencies during other gait locomotion since they are not tunable. To address this problem, we develop a tunable neck system that absorbs the impacts and vibration during diverse gait locomotions. This neck system consists of two components: 1) a suspension mechanism that compensates for the weight of the head equipped with a camera and IMU (inertial measurement unit), absorbs the impacts and the head motion of high frequencies including vibration as a fixed low-pass filter; and 2) a dynamic vibration absorber (DVA) that can be reactively-adjusted to diverse gait frequencies to alleviate excessive head movements. We present a dynamics analysis of the neck system and show how to adjust the target frequency of the system. Simulation and experimental validation are performed to verify the effect of the proposed neck design, manifesting superior estimation performance and robustness across diverse gaits.


I. INTRODUCTION
L EGGED robots have been widely researched due to their capability to operate in irregular terrain. One of the central issues in legged robots is precise state estimation for their safe autonomy In general, the state estimation using cameras and IMUs (inertial measurement units) is widely exploited due to their low cost and complementary characteristics (e.g., indoor UAV [1], [2], hand motion tracking [3]) Likewise, for the state estimation of legged robots, several studies have developed methods using visual-inertial information with leg kinematic information [4], [5], [6], which circumvent the drift issue of methods using only proprioceptive sensors (i.e IMU and encoder) [7], [8], [9]. These state estimation methods using visual-inertial measurements, however, are fragile during aggressive dynamic locomotions such as pronking gait. Specifically, if the impact is large and/or generates high-frequency vibrations, the accelerometer signal is in general compromised due to saturation and the limited sampling rate. For drones, which are also subject to the vibrations caused by propellers, rubber dampers are typically used to absorb the vibration as a mechanical low-pass filter. However, in the case of legged robots, even if certain rubber dampers are appropriate for some gaits, they may be resonated at gait locomotion of other frequencies since they are not tunable. In addition, while low-stiffness dampers are more advantageous in absorbing vibrations, some large motions would deform excessively these dampers. As another way to improve the robustness of state estimation, some legged robots adopt multiple cameras and/or LiDAR (light detection and ranging) sensor (e.g., a camera with Velodyne LiDAR sensor for Anymal [10], five cameras for Spot [11]), which result in high hardware and system integration costs.
To reduce impacts or vibrations induced by aggressive motions of legged robots, some studies have focused on inserting compliant components in the legs (e.g., flexible feet [14], elastic actuators [15], [16]) However, these components are not tunable once installed, so the stiffness of the components may be effective in improving state estimation for some locomotions but not for others. Another recent work [17] presents a shock propagation reduction method by optimizing the mass distribution in a robot's leg, which, however, cannot be applied to various gaits since it restricts the ankle angle on landing. On the other hand, an algorithmic method is proposed in [18], which enhances the state estimation during aggressive locomotion by performing multiple visual SLAM sessions that track feature points individually according to the phases of the robot's gait cycle. This, however, would be less accurate and robust than using the continuous image frames as enabled by our proposed neck mechanism since only IMU measurements are used to constrain separate visual SLAM sessions. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Fig. 1. A sprinting cheetah. Four snapshots show that the head moves stably even with the dynamic movement of the body. Image courtesy of National Geographic [12]. In this letter, we pay attention to the neck as a solution to the impact and vibration problem for the state estimation of legged robots. In fact, this approach is inspired by nature, where some animals decouple the movement of the head with a cognitive sensor from that of the body. For example, as shown in Fig 1, a cheetah can keep its head stably when sprinting toward prey. In the same way, for the stable perception of legged robots, we consider the use of the neck desirable, which can decouple the head motion from the body motion.
More specifically, as shown in Fig. 2, we first design a suspension mechanism that compensates for the weight of the head equipped with a camera and IMU. With a spring and a damper, this suspension mechanism absorbs the impacts from the legs and provides a response of a low-pass filter to the head motion. Although this mechanism can absorb the impact and vibration from the legs in some gaits like rubber dampers, it may still be resonated or excessively deformed for other gaits. If the head with the sensors excessively moves, it may collide with the range of motion limit, disturbing the accelerometer signal of the IMU. Also, the camera may suffer from motion blur rendering feature tracking more difficult or impossible. Therefore, we adopt a tunable dynamic vibration absorber (DVA) to stabilize the head motions, which is widely utilized to mechanically stabilize oscillations such as skyscrapers [19] and rotating machinery [20]. To handle diverse locomotion, the DVA can adjust its notching frequency according to the gait frequency, thereby alleviating the excessive head motion. This DVA mechanism turns out to be necessary during some aggressive locomotion in which the head motion can become very excessive if only the suspension is used.
As an application of morphological computation [21], our proposed design replaces the complex computation of active control with passive components for shock and vibration absorption. In addition, these (analog) passive elements provide a much faster response as compared to (digital) active control with a finitely-fast sampling rate. In the same context, in contrast to using some deep learning-based methods for image stabilization (e.g., DeepDeblur [22]), our approach can mechanically replace their computational time/resources to obtain stable images. Of course, our mechanism is much simpler as compared to the neck of real animals with real-time variable stiffness. Even so, we believe this letter manifests the feasibility of (simpler) DVA-based neck design for better visual-inertial state estimation of legged robots. This perception-stabilizing efficacy of our DVA-based neck design is verified with simulations and experiments using MIT mini cheetah [13] with a prototype of our neck design shown in Fig 2. To the best of our knowledge, this is the first result of exploring the possibility of utilizing a neck mechanism to improve the perception stability and performance for dynamic legged robots.
The rest of the letter is organized as follows. In Section II, the neck design is proposed in detail. Section III and Section IV provide an analysis and hardware implementation of our neck mechanism, respectively. The simulation and experimental results are presented in Section V and Sec VI, followed by some concluding remarks and comments on future works in Section VII.

II. NECK MECHANISM DESIGN
As seen in Fig. 3, the proposed neck design is mainly composed of two parts: 1) the suspension mechanism that compensates for the weight of the head, maintains the head orientation to be the same as that of the body, and absorbs impacts and motions of high frequency including vibrations; and 2) the DVA which can be adjusted to the gait frequencies to suppress excessive motion and vibration of the head with the camera and IMU attached on it.

A. Suspension Mechanism Design
The main part of the suspension consists of a linkage mechanism connecting the body and head, which imitates the neck joint and skeleton of animals. We use a compression spring and a hydraulic damper to perform the roles of muscles in this mechanism. When absorbing an impact and vibration, this linkage should operate in a vertical direction, which is the main direction of the impact and vibration, not in any other unnecessary direction. For this vertical movement of the head, we generalize the leg design of Ascento [23], which was developed to create a vertical trajectory of the foot passing the robot's center of mass, in a way that places the trajectory forward as much as we want so as not to block the view of the camera, as seen in Fig. 4. This linkage mechanism cannot make the vertical movement of the head perfectly because the bars of the linkage are connected via revolute joints. Thus, we optimize the design parameters of the linkage for the head trajectory to track the desired vertical trajectory as much as possible.
For this, with the geometric design parameters l i and the joint angles ψ i in Fig. 4, note that the position of the head (h x , h y ) is represented as and the constraint of the kinematic loop can be expressed as To express the desired vertical trajectory, we use the parameters of the forward distance from the body to the head x d and the minimum/maximum height of the head y min , y max , as seen in    Table I considering that the neck can cover large motions such as pronk, the head should not collide with other parts of the robot, and the camera's field of view should not be blocked.
With the head trajectory derived in (1)-(3), we can formulate the following design optimization to minimize the deviation from the desired vertical motion of the head: where: (5)- (7) is to constrain the head position to be in the trajectory generated by l i ; and (8) is to ensure the positive length of the bars and to prevent this optimization from diverging to infinite length. Through the design optimization (4)-(8), the geometric parameters are determined in Table II for the desired vertical trajectory given in Table I With these design parameters, the deviation of the head is horizontally bounded in 1 mm when it moves 12 cm along the vertical direction, as shown in Fig. 4. Moreover, we supplement this impact-absorbing linkage with the two connected parallel four-bar linkage, see Fig. 5. Due to this additional parallel linkage, the rotation of the neck joints does not change the direction of the head. In other words, the head and the body always have the same orientation.  Here, note that (3) defines a constraint for ψ 1 and ψ 2 . This implies that only one of either ψ 1 or ψ 2 is independent. Further, we only need just one rotary encoder to calculate the relative position of the head from the body (i.e., (1)-(2)), and we already know that the head and the body have the same orientation. Using this kinematic information between the head and the body, the estimated state from the sensors on the head can be exploited for the control and planning of the robots.

B. Tunable Dynamic Vibration Absorber
The proposed DVA consists of an actuator, a linear stepping motor, a torsion spring, and a slider. A torsion spring connects the actuator to the slider, similar to series elastic actuators [24], as shown in Fig. 6. By changing the position of the linear stepping motor along with the slider, the DVA can adjust the notch frequency of the system. Then, the actuator rotates the slider to maintain it horizontally in a steady state. The analysis of this tunable DVA is presented in Section III in detail.

III. NECK DYNAMICS ANALYSIS
This section provides the dynamics model to verify that the proposed neck system renders the mechanical notch filter. Furthermore, the method of matching target frequencies is also presented.
We can represent the whole neck system as a mass-springdamper shown in Fig. 7, viewing the height of the body u as the input and the height of the head y as the output of the system.
We define the generalized coordinates as q = y θ T , where θ denotes the rotated angle of the slider. The angle of the actuator is expressed in θ m (see Fig. 6), and r implies the displacement of the linear stepping motor of DVA. The mass of the head and the linear stepping motor of DVA are indicated by m 1 , m 2 modeled as entirely located at the center of mass, and the other masses are neglected. The spring and damping coefficients of the shock absorber are represented in k 1 and b 1 and the spring coefficient of the DVA is implied by k 2 . The relative position of the head from the body in a vertical direction can be expressed as where h 0 implies the value of h(y, u) in steady state (y, u = 0). With the assumption that the horizontal motion of the head is negligible as shown in Fig. 4, the spring/damper length of suspension can be represented by h as where s 0 denotes the value of s(h) in steady state (h = h 0 ). The kinetic energy and potential energy of the system can be derived from the states and parameters described above, which lead to the dynamics equation of the system by using the Euler-Lagrange equation wheres is the free length of the spring, and τ y is the generalized force of the damper associated with the virtual displacement δy.
From the virtual work of the damper δW , we can find τ y as follows: On the other hand, using the vertical deviation of the head from the steady state (Δh := h − h 0 = y − u), (10) can be expressed as with Δs := s − s 0 . In the neighborhood of the steady state (|Δs| s 0 , |Δh| h 0 ), we can approximate (13) showing the linear relation between Δs, Δh as

Δs Δh
≈ h 0 s 0 · l 5 l 5 + l 6 =:α (14) Using this approximate relation, the first line of (11) can be approximated as (m 1 + m 2 )ÿ + m 2 r cos θθ − m 2 r sin θθ 2 With the steady-state condition at θ = 0, s = s 0 , we can derive equations representing relationships as: In (17), we can find the motor angle θ m to make a steady state at θ = 0. In addition to (16), (17), by assuming the small rotated angle/velocity (θ,θ), the dynamics equation (15) can be linearized as Through this linearized dynamics equation (18), we can derive the transfer function of the system as follows: The transfer function H(s) has the zero of w = k 2 m 2 r 2 , which implies the notching frequency. That is, for the target frequency of the body vertical motion w t , we can set the r = k 2 m 2 w 2 t so as to absorb the oscillation of the head. Fig. 8 shows bode plots according to the step motor position adjusted to the target frequencies.

IV. HARDWARE IMPLEMENTATION
For the suspension mechanism, we use the small shock absorber made for the same purpose in an RC vehicle. It can compress the spring by tightening the nut, allowing it to adjust the height of the head in a steady state. As the suspension system drives the movement of the head like a second-order low pass filter for body movement, we choose a spring coefficient as low as possible under the condition of lifting the head to lower the cutoff frequency. For the actuator of the DVA, we used a Robotis'  Dynamixel MX-64 that could withstand the torques caused by the rotating part.
The proposed neck platform should be as light as possible to minimize the effect on the control of the robot. It is crucial for the assumption that the legged robot is not affected by the neck dynamics. At the same time, the neck requires appropriate stiffness of the structure to resist the deformation caused by impact and movement. Therefore, the main simple parts consist of lightweight and high-stiffness carbon fiber tubes. In addition, 3D printing with ABS filament is primarily employed for the geometrically complex connection parts. Under these design considerations, we developed the prototype of the proposed neck design for MIT mini Cheetah [13], as seen in Fig. 2, which results in the overall weight of 0.92 kg including DVA (9% of the robot).

V. SIMULATION
To verify our proposed method, we conduct a simulation showing the behavior of the head, depending on whether or not using the DVA adapted to the target frequency. By numerical integration of the original dynamics equation, we simulate the head motion according to the given vertical movement of the body with MATLAB. In this simulation, we use the same parameters in Table II and two sinusoid input models similar to the data of real robot motion, seen in Table III. Fig. 9 shows the simulation results about the head motion for the body motion. For the trot input model, the head motions are more smooth than the body in both cases. This is because the trot input has a high frequency and small amplitude, so the suspension mechanism can alleviate the periodic motion of the head, which can be predicted from the bode plot in Fig. 8. On the other side, in the case of the pronking gait, the head motions become unstable without the DVA, which might be caused by the resonance of the system or the nonlinear terms neglected when simplifying the dynamics equation. In contrast, we confirm that the DVA can render stable head movement because it absorbs the oscillation even with large body motions of low frequency.

VI. EXPERIMENTS
In this section, we present experimental results with the proposed prototype of the neck design. Each experiment is conducted with an MIT mini Cheetah [13] which has strength in dynamic locomotion. We manually 1 control our robot based on [25], which effectively combines model predictive control (MPC) and whole-body impulse control (WBIC). This controller generates a swing foot trajectory between the footstep locations based on Bezier curves. We use two types of gait in the experiments, specified by two parameters described in [26]; phase offset and stance period for each foot. r Pronk: All legs work in the same phase of the cycle.
And we set the stance period of both gaits to half of the cycle. The frequency of these two gaits is set to 3 Hz, but the trotting gait has a vertical motion of 6 Hz because it contains two times of ground pushing in a cycle. All sensor data are captured using an MPU6050 IMU and an oCam-1MGN-U global shutter camera.
To compare the effect of the proposed neck mechanism on the state estimation of dynamic legged robots, we use three necks as seen in Fig. 10.
1) DVA: the prototype of the proposed neck design including the DVA, which is adjusted to the target frequency. 2) SUS: the prototype of the proposed neck design, but using only the suspension mechanism with the DVA locked. 3) FIX: the neck is replaced by a rigid fixture with a similar dimension.

A. Stabilization of Sensor Data
One of the goals to use the neck mechanism is to stabilize the periodic vertical movement of legged robots. To validate  the effect of the suspension mechanism and DVA on vertical movements, the OptiTrack MOCAP (motion capture) system measures the motions of the head and body at 100 Hz. Furthermore, the linear acceleration data on the head is measured to see how much shock and vibration it absorbs, which is critical for robust visual-inertial state estimation. We also collect camera image data to see how much excessive motions of the camera affect the performance of feature tracking by comparing how long a thousand feature points of the first image are tracked. The results are shown in Fig. 11 (vertical movement and IMU signal), Fig. 12 (snapshots with DVA and SUS), and Fig. 13 (feature tracking performance with DVA, SUS, and FIX).  From Fig. 11, we can see the smooth head motions as compared to the body for the trotting gait with DVA and SUS. Also, we can see linear acceleration data showing a small impact and less noise attributed to the suspension mechanism compared to FIX for the trotting gait.
However, similar to the simulation results in Section V, for the case of pronking gait with SUS, the head exhibits unstable motion that rapidly moves with collision to its range limit as seen in Fig. 12. This resonance-like behavior is because SUS is not tunable, and results in the very noisy acceleration in Fig. 11 and poor feature tracking in Fig. 13, both significantly detrimental to the visual-inertial state estimation. On the other hand, for the case of FIX, the feature tracking in Fig. 13 is not perturbed much even during the pronking gait, yet, the impact and vibration directly transmitted to the IMU generate the noisy acceleration signal in Fig 11. In contrast, the head with DVA maintains relatively small vertical motion, less noise in the acceleration signal, and good feature tracking robustly across different gaits. A video of these experiments can be found at https://youtu.be/xRF3SLvohl0.

B. State Estimation Results
In Section VI-A, we verified that reliable sensor data can be measured with our proposed method. This section presents the state estimation results using these sensor data. We exploit VINS-MONO [27] for visual-inertial state estimation with a monocular camera and do not use loop closure to verify clearly how much drift is reduced.
1) Indoor Experiment: To verify the performance of the state estimation results, we use a MOCAP system as the ground truth and trajectory evaluation toolbox provided in [28]. In this experiment, we use the two necks (DVA and FIX) with two gaits (trot and pronk). We exclude SUS for this indoor and also the outdoor experiments (Section VI-B2), since its motion is so violent even to the verge of breaking the neck mechanism (see Figs. 11 and 12). Fig. 14 shows the dramatically different estimated trajectories on the top view. In the case of using FIX, the result of trotting gait shows a fast drift, and the result of pronking gait even completely diverges, even though the visual perception pipeline appears adequate (e.g., good feature tracking in Fig. 13). This is in fact due to the effect of impact and vibration, directly landing on, and corrupting the acceleration signal of, the IMU, which is in particular crucial for the monocular visual-inertial state estimation, where the IMU is the sole information to correct scale ambiguity of monocular vision. Recall that mitigating this impact/vibration across different gaits cannot be attained with the non-tunable SUS and necessitates the tunable DVA. In contrast to the result of FIX, the case of utilizing the stable sensor data with DVA shows a slow drift. For the trotting gait, the root means square error for the position has 4.05 cm and for the pronking gait, 7.77 cm along the trajectories in Fig. 14. This clearly shows the efficacy of our proposed neck for dynamic legged locomotion particularly only with a monocular camera and an IMU. A video of these experiments can be found at https://youtu.be/O0T58hQ5gJE.
2) Outdoor Experiment: In addition to indoor experiments, we compare DVA and FIX in an outdoor setting to verify our proposed method along a longer trajectory. In these experiments, the performance is confirmed by measuring the offset when returning to the starting position after taking a turn around the perimeter with a trotting gait.
The estimated trajectories are shown in Fig. 15, aligned with aerial photography to visually verify their accuracy. Similar to the result of indoor experiments, the case of using FIX shows a fast drift induced by the noisy IMU data. On the other hand, in the case of using DVA, its final drift without loop closure is [−0.60, 0.05, −0.07]m in x, y, and z-axis, which occupies 1.50% of the total trajectory length. A video of these experiments can be found at https://youtu.be/Dn7FNoyO624.

VII. CONCLUSION
In this paper, we propose a tunable neck design for the robust state estimation of dynamic legged robots. Adjusting the neck to obtain stable sensor data with diverse locomotion enables a robust visual-inertial state estimation system. Mechanical design, modeling, and analysis of the proposed system are provided and the validity is verified through simulation and experiments.
This research has shown that mechanical neck design is indeed a promising solution to the problem of state estimation for dynamic legged robots. Our approach would also reduce the drift problem caused by slippages in the state estimation method using leg encoder information by providing more stable visual-inertial sensing. We will further refine the neck design with variable stiffness, lightweight, and small form factor. For example, with the help of the spring's fast response, we may also actively control the DVA to quickly stabilize the head even during transitions or non-periodic motions such as stepping down. We will also explore alternative directions to achieve variable stiffness of our neck instead of the current method of changing the inertial parameters of the dynamic equation.