Neuro-inspired continual anthropomorphic grasping

Summary Humans can learn continuously grasping various objects dexterously. This ability is enabled partly by underlying neural mechanisms. Most current works of anthropomorphic robotic grasping learning lack the capability of continual learning (CL). They utilize large datasets to train grasp models and the trained models are difficult to improve incrementally. By incorporating several discovered neural mechanisms supporting CL, we propose a neuro-inspired continual anthropomorphic grasping (NICAG) approach. It consists of a CL framework of anthropomorphic grasping and a neuro-inspired CL algorithm. Compared with other methods, our NICAG approach achieves better CL capability with lower loss and forgetting, and gets higher grasping success rate. It indicates that our approach performs better on alleviating forgetting and preserving grasp knowledge. The proposed system offers an approach for endowing anthropomorphic robotic hands with the ability to learn grasping objects continually and has great potential to make a profound impact on robots in households and factories.

Enabling anthropomorphic hand to learn grasping objects continually is critical A neuro-inspired continual anthropomorphic grasping (NICAG) approach is proposed NICAG performs better on alleviating forgetting and preserving grasp knowledge NICAG has a profound impact on robots in households and factories

INTRODUCTION
Humans are able to learn continuously grasping various objects dexterously throughout their lifetime under an endless variety of ever-changing scenarios. This ability is from two aspects. One is the exquisite flexibility and precision of the human hand. Another is the continual learning (CL) capability of the human brain, based on which, the dexterous grasping skill is acquired during childhood and further refined throughout life.
Biologists have tried to identify a number of underlying mechanisms that support CL. Some typical biological mechanisms include: complementary learning system (CLS), 1 episodic replay, 2 and meta-plasticity. 3 CLS theory 1 holds that two learning systems are possessed by mammalians, i.e., the hippocampus system and the neocortex system. The first allows for the rapid learning of the specifics of individual experiences which will, in turn, be played back over time to the second for acquiring structured knowledge gradually. Replay is the reactivation of neuronal activity patterns, in which neural patterns that had previously occurred during waking re-occur during later rest or sleep. 2 Replay appears in the hippocampus and neocortical areas, is selective and partial, and benefits subsequent memory. Schapiro et al. 4 suggest that human hippocampal replay during rest prioritizes weakly learned information. Metaplasticity is the ability of a synapse to be modified depending on its internal biochemical states, 3 which then depends on the history of synaptic modifications and recent neural activity. An instantiation of metaplasticity is regularization or normalization, with which consolidated knowledge can be protected from forgetting through synapses with a cascade of states yielding different levels of plasticity. 5 Especially, in biological networks, normalization and synaptic changes co-occur with replay. 6 Anthropomorphic grasping is a critical skill for robotics because robots generally need to grasp an object in the majority of manipulation tasks. 7,8 For a robot in an open and dynamical environment, it is necessary to learn new knowledge continually over time, as it is impossible to pre-program everything in advance. The capability to learn skills and knowledge over time without forgetting the previously learned is referred to as CL. 5 Endowing an anthropomorphic hand with the ability to learn grasping continually could have an enormous societal impact. Examples include providing assistance in the household of disabled or elder people and resorting and packaging varied goods in factories.
Existing learning-based anthropomorphic robotic grasping approaches utilize supervised learning or reinforcement learning paradigms, and train the grasping policy with large amounts of annotated data. Grasp annotations of the training data are collected by humans, 9 with simulation 10 or physical robot tests. 11 Given enough data, learning-based approaches achieved astonishing grasp ability. Nevertheless, they use large Here, we proposed a neuro-inspired continual anthropomorphic grasping (NICAG) approach that integrates and adopts several discovered biological neural mechanisms supporting continual lifelong learning, i.e., CLS, episodic replay, and meta-plasticity. The proposed NICAG approach consists of a CL framework for anthropomorphic grasping and a neuro-inspired CL algorithm. Three layers, i.e., data layer, algorithm layer, and application layer, are included in the CL framework, thus making CL of anthropomorphic grasping possible. The neuro-inspired CL algorithm prevents forgetting and preserves grasp knowledge by replaying weakly learned information and knowledge distillation on strongly learned information, consequently, the anthropomorphic robotic hands can learn to grasp different objects continually and incrementally over long sequential grasp stream. We validate the proposed approach through dataset experiments and simulated experiments. Compared with other methods, our NICAG approach not only achieves better CL capability with lower average loss and forgetting but also gets a higher success rate (SR) for grasping. It indicates that our approach performs better on alleviating forgetting and preserving grasp knowledge. The proposed system offers an approach for endowing anthropomorphic robotic hands with the ability to learn to grasp different objects continually and incrementally over time and has great potential to make a profound impact on robots in households and factories. The contributions of this paper are as follows.
1. A NICAG approach incorporates several discovered biological mechanisms of lifelong learning.

2.
A CL framework of anthropomorphic grasping that includes data layer, algorithm layer, and application layer. It makes CL of anthropomorphic grasping possible.
3. A neuro-inspired CL algorithm that prevents forgetting and preserves grasp knowledge by replaying weakly learned information and knowledge distillation on strongly learned information, thus the anthropomorphic robotic hands can learn to grasp different objects continually and incrementally over long sequential grasp stream. 4. A validation of the proposed NICAG approach through dataset experiments and simulated experiments, demonstrating our method achieves better performance than state-of-the-art CL methods. It not only achieves better CL capability with lower average loss and forgetting but also gets higher SR for grasping.

RESULTS
The three focused biological neural mechanisms supporting continual lifelong learning, i.e., CLS, episodic replay, and meta-plasticity, have been well described in ref. 1,2,3,4,6 We propose our NICAG approach based on these mechanisms. The design process of our NICAG approach is shown in Figure 1. We will describe the CL framework of anthropomorphic grasping, the neuro-inspired CL algorithm and experimental results in the following. To enable anthropomorphic robotic hands to learn to grasp objects continually over time, we design a neuro-inspired CL framework for anthropomorphic grasping (NICAG-framework), which is shown in Figure 2. There are three layers in the NICAG-framework: data layer, algorithm layer, and application layer. Data layer is responsible for generating the stream of anthropomorphic grasping experiences. Algorithm layer trains the grasp model based on information from data layer. Application layer applies trained grasp model to objects in the field, those objects with bad grasps are sent to the data layer for better learning.
Formally, in the NICAG-framework, a CL algorithm A CL is expected to update its internal state, e.g., its internal grasp model M g and knowledge base representing as specific data structures, based on a non-stationary sequentially accessible stream of anthropomorphic grasping experiences E = ðe 1 ; e 2 .; e i ; .; e n Þ. The objective of A CL is to improve its performance on a set of grasp metrics ðp 1 ; .; p m Þ as evaluated on a test stream of experiences ðe t 1 ; .; e t n Þ.
With respect to the stream of anthropomorphic grasping experiences E = ðe 1 ;e 2 .;e i ;.;e n Þ, the i-th experience consists of e i = fCP k ; g k D ni k = 1 g, where each pair constitutes a grasp example consists of a point cloud P k of the observed object, and a grasp g k . The grasp is defined as g = fp; qg. The hand wrist pose p is given in special Euclidean group SEð3Þ, consisting of the translation t = ½t x ; t y ; t z and orientation quaternion q = ½q w ; q x ; q y ; q z . The hand joint configuration q is denoted by the actual degree of freedom of the anthropomorphic robot hand. In this work we use anthropomorphic robot hand DLR/HIT Hand II, of which q˛R 20 .

The neuro-inspired continual learning algorithm
We first describe our neuro-inspired CL algorithm A CL at a high level here. A CL prevents forgetting and preserves grasp knowledge in both sample-space and function-space. Replaying weakly learned information is for preserving grasp knowledge in sample space, while knowledge distillation on strongly learned information is for keeping knowledge in function-space. The schematic view of the proposed algorithm is given in Figure 3. It consists of three major steps: memory retrieval based on learnability criterion, model update by replay weakly learned information and knowledge distillation on strongly learned information, and memory update with weakly learned sample selection and diversity-based sampling. In the following subsections, we first introduce the learnability criterion, and then provide details of three major steps.

Learnability criterion
To indicate whether a training sample is strongly learned or weakly learned, we adopt the learnability criterion from Sun et al., 13 which measures how much the grasp model M g can explain the training sample CP + ; g + D once it has absorbed its information in the memory. Adapted from Sun et al., 13 we define the learnability as follows: where, g 0 + and g + are used to represent two realizations of the same random variable describing a grasp.
Due to the used grasp model is a variational autoencoder (VAE) based generative model, after the grasp model M g have visited a grasp experience, the learnability of sample CP + ; g + D with respect to M g is calculated specifically as a quantity related with its loss: where LðCP + ; g + D; M g Þ is the loss of sample CP + ; g + D after passing through the grasp model M g . The information of memory buffer M in Equation 1 has been incorporated into the grasp model M g during the training process on the latest visited grasp experience.

Memory retrieval based on learnability criterion
Based on learnability score, there are two parts in memory buffer M: weakly learned samples S wl and diverse strongly learned samples S sl . They are with same size jMj 2 , where jMj is the size of memory buffer M. The memory buffer is updated once an experience was learned, details are in Section "memory update with weakly learned sample selection and diversity-based sampling". For each incoming mini-batch B j iScience Article drawn from current grasp experience e i , the memory retrieval step randomly selects jBjj 2 weakly learned samples (denoted as B wl j ) and jBjj 2 diverse strongly learned samples (denoted as B sl j ) from the memory buffer M, B j is the batch size. To enhance the diversity of retrieved samples, 3D data augmentation on B wl j and B sl j are applied. Operations for 3D data augmentation include jitter, dropout and rotation. Jitter operation adds a clipped Gaussian noise with zero mean and standard deviation s to the position of each point. Dropout augmentation throw away points randomly with max ratio r max . And rotation augmentation randomly rotates the object P k and grasp g k along three axes. The retrieved and augmented samples are used for updating grasp model M g , as described in Section "model update by replay weakly learned information and knowledge distillation on strongly learned information".

Model update by replay weakly learned information and knowledge distillation on strongly learned information
To preserve grasp knowledge in sample space, we replay the retrieved weakly learned samples B wl j , the loss of weakly learned information replay is defined as: ) To keep grasp knowledge in function-space, we apply knowledge distillation on strongly learned samples B sl j . With respect to the strongly learned samples, it is expected that the current grasp model M i g and the previous grasp model M i À 1 g encodes latent code and generate final grasp in the same way. We utilize a KL-divergence loss to enforce the latent code distributions of M i g and M i À 1 g to be close, and use a reconstruction loss to encourage the output of M i g and M i À 1 g to be same. The KL-divergence loss and the reconstruction loss are formulated in Equations 4 and 5. The loss of knowledge distillation consist of two terms, i.e., the KL-divergence loss and the reconstruction loss, and is defined in Equation 6.
where E i and E i À 1 are the encoders of current grasp model M i g and the previous grasp model M i À 1 g , respectively, KLðQ k PÞ is the Kullback-Leibler (KL) divergence to measure how different these two distributions Q and P are.
The reconstruction loss is based on the reconstructed hand mesh. It consists of two terms: hand mesh vertices displacement and joint angles error. In Equation 5, M i g ðB sl j Þ V is the vertices set of the reconstructed hand mesh, and M i g ðB sl j Þ q is the joint angle of generated grasps.
is the previous version of grasp model, parameters of which just were updated based on the last experience e i À 1 , M i g is the current version of grasp model which is visiting the current experience e i . l kl of Equation 6, l v and l q of Equation 5 are constants to balance the losses.
Combining with the loss of mini-batch B j drawn from current grasp experience e i , loss of weakly learned replay, and loss of knowledge distillation on strongly learned information, we perform model update by optimizing the following loss with respect to the parameters of the grasp model M g : Memory update with weakly learned sample selection and diversity-based sampling     15 SR is commonly used in grasping tasks to measure the stability and quality of the generated grasps. For penetration depth and penetration volume, the implementation of Jiang et al. 16 is used. When the hand collides with the target object, the penetration depth is computed as the maximum of the distances from vertices of hand mesh to the object surface.
We describe the implementation details of all compared CL algorithms here. All CL algorithms are implemented using Avalanche, 17 which is an end-to-end CL library based on PyTorch. For IId-Offline, i.e., the variational grasp generator in DVGG 10 (deep variational grasp generation), we use the implementation of Wei et al. 10 For training of the compared methods, 150 epochs is used, and learning rate is set to 0.002 at start and divided by 10 when the validation error plateaus. Batch size is 512. We train all models on an RTX-3090 GPU. We present the detailed hyperparameters in Table 1.

Results on dataset
To evaluate the CL capability of the proposed neuro-inspired algorithm, we compare it with other six methods on dataset. Details of the dataset are described in Section "dataset" of STAR Methods. The compared methods include four typical CL methods, namely, elastic weight consolidation (EWC), 18 synaptic intelligence (SI), 19 experience replay (ER), 20 rainbow memory (ER-RM), 21 and two baselines, i.e., Fine-tune and IId-Offline. Our proposed neuro-inspired CL algorithm includes four variants: NI-WL is the weakly learned replay, NI-WL-RD is the integration of weakly learned replay and randomly selective ER, NI-WL-RM is the integration of weakly learned replay and rainbow memory replay, NI-WL-RM-KD is the integration of weakly learned replay and knowledge distillation on strongly learned information. The description of compared methods is in Section "description of compared methods" of STAR Methods. We will report and analyze the evolution of test loss and forgetting along with training, the mAL and average forgetting, and the loss on the combined test set when the grasp model is finally trained on all grasp experiences, respectively. test set so far evolve along with seeing more tasks, i.e., seeing more objects. Figure 4 is for the version of which test set is without random rotation, while Figure 5 is for that with random rotation. The evolution processes of test forgetting along with training are shown in Figure 6 (without random rotation) and Figure 7 (with random rotation). The lower and smoother the loss is, the better the corresponding CL method is. So is the forgetting. From left to right in Figures 4, 5, 6, and 7, memory size changes with 1K-5K. As shown in Figures 4, 5, 6, and 7, the navie fine-tune has high loss (also high forgetting) and oscillates up and down with a large attitude, indicating catastrophic forgetting occurs. EWC is even worse than Fineturn under 5K buffer size, due to the saturation-prone property of regularization methods in the long steam. SI is better than Fineturn, but is still with high loss, high forgetting, and large oscillation. ER has high loss, high forgetting, and is with large oscillation when the buffer size is small, such as 1K. With the increasing of buffer size, ER performs well gradually. ER is with low loss, low forgetting, and small oscillation when big buffer size is used, such as with 5K memory buffer. Thanks to the diversity of the replayed samples, ER-RM performs well under different buffer size. By contrast, the variants of our proposed method perform better with lower loss, lower forgetting, and smaller oscillation. NI-WL-RM-KD achieves best results, which is very close to the IId-Offline, even with only 1K memory buffer. The visualized tendencies of Figures 4,5,6, and 7for alternatives with and without random rotation are similar, indicating that the proposed method is robust to random rotation.
Mean average loss and average forgetting. The quantitative results of Figures 4, 5, 6, and 7 are summarized in Table 2, which is provided as the mAL and average forgetting F. Firstly, on test set without A B Figure 6. The forgetting metric on test set without random rotation so far measured by the end of each task (object) (A) 1K buffer size is used for replay related methods.
(B) 5K buffer size is used for replay related methods.
A B Figure 7. The forgetting metric on test set with random rotation so far measured by the end of each task (object) (A) 1K buffer size is used for replay related methods. (B) 5K buffer size is used for replay related methods. Loss on combined test set of finally trained model. In Table 3, we provide the average loss of finally trained models for all compared methods. The losses are calculated on combined test set or combined training set of all experiences. There are four losses for each method, namely, loss on test set without rotation (Loss-test-w/o-rot), loss on test set with rotation (Loss-test-w/-rot), loss on training set without rotation (Loss-train-w/o-rot), and loss on training set without rotation (Loss-train-w/-rot). As demonstrated in Table 3, four losses of Fineturn are quite high and are all above 5.25. As expected lower bound, losses of IID-Offline are low and below 3.6. Consistent with evolution of test loss along with training in Figures 4 and 5, EWC has high losses around 6.9 which are all larger than those of Fineturn, Bold underline, italic underline, and underline font highlights the first place, second place, and third place with same BufferSize, respectively. a TeWoR is short for test set without rotation, TeWR is short for test set with rotation. b ''-'' means not applicable.

Results in simulation
To illustrate the effectiveness of the proposed approach on continually generating anthropomorphic grasps with high quality, we conduct simulated experiments in the physics-based simulator MuJoCo. 22 58 objects from YCB (yale-cmu-berkeley) dataset 23 (seen) and 48 objects from EGAD! 24 (unseen) are used. For each object, the completed 3D point cloud is taken as the input of the trained grasp model M g , and M g generates 20 grasps randomly. In the simulator, we perform grasp with generated grasp configuration for all used objects and calculate the metrics, i.e., SR, Penetration Depth, and Penetration Volume between the hand mesh and the target object. The steps in the physical simulation process are described in Section "steps for the simulated experiments" of STAR Methods. iScience Article In Table 4 and Table 5, the compared results on grasping seen objects from YCB and unseen objects from EGAD! in the simulation are provided respectively. As shown in Table 4, most top places with respect to SR on grasping objects from YCB are achieved by our proposed approach. At the same time, the variants of our proposed method, NI-WL, NI-WL-RD, NI-WL-RM, and NI-WL-RM-KD, are consistently with lower Penetration. Table 5 shows a similar tendency also. Overall, the proposed approach outperforms other alternatives on grasping object in simulation with higher SR and lower penetration including depth and volume. Moreover, it is observed that ER-WL for different buffer size, ER-WL-RD, ER-WL-RM, and ER-WL-RM-KD under 5K and 10K buffer size, outperform IID-Offline for unseen EGAD! object dataset, perhaps due to the bias from the dominant objects in IID-Offline. Qualitative results shown in Figures 8 and 9 demonstrate that our proposed method is able to generate diverse reasonable grasps.

DISCUSSION
In this work, the problem of continual anthropomorphic grasping is considered. In particular, a NICAG approach is developed, which incorporates several discovered biological neural mechanisms supporting continual lifelong learning and consists of a CL framework of anthropomorphic grasping and a neuro-inspired CL algorithm. The experiments carried out on dataset and in simulation provide encouraging results, showing that this approach achieves better CL capability Bold underline, italic underline, and underline font highlights the first place, second place, and third place with same BufferSize, respectively. a P-Depth is short for Penetration Depth, P-Volume is short for Penetration Volume. iScience Article with lower average loss and forgetting, but also gets higher SR for grasping, with reference to some CL metrics and grasp quality metrics. The proposed system offers an approach for endowing anthropomorphic robotic hands with the ability to learn to grasp different objects continually and incrementally over time, and has great potential to make a profound impact on robots in households and factories.
Starting from this work, some future directions are worthwhile considering. Firstly, dealing with more task settings of continual anthropomorphic grasping, one example is CL of anthropomorphic grasping for different purposes (e.g., tool use, hand over, pick, and place), or with different hands. Secondly, due to reinforcement learning could be utilized to reduce repeated failures, integrating supervised learning and reinforcement learning into the continual anthropomorphic grasping framework is also a good direction. Thirdly, developing composite continual anthropomorphic grasping systems that incorporate more biological mechanisms of lifelong learning and human grasping is of great significance. Some biological mechanisms to be incorporated include neuromodulation, 25 contextdependent perception and gating, 26 and cognition outside the brain. 27 Finally, the development of realistic test environments that specifically address CL capabilities of anthropomorphic grasping is another crucial factor for the advancement of continual grasping technology and need to be explored further. iScience Article

Limitations of the study
Our approach only utilizes simulated data to improve the grasp performance and it is validated on dataset and in a simulator. Our demonstration here can be the first step toward a combination of neuro-inspired CL and anthropomorphic grasping. In the future, it not only needs to validate our approach on real robotics but also needs to find more sophisticated and effective methodologies that enable performance improvement using the data collected from both real task executions and simulators.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:   iScience Article forgetting. Finetune incrementally finetunes the model without employing any continual learning strategy. Finetune can not overcome catastrophic forgetting and is considered as the naive baseline. IId-Offline uses all the samples in the dataset in an offline manner to train the model, and is regarded as the oracle baseline.

Dataset
To evaluate our proposed framework and methods for continual learning of anthropomorphic grasping, we construct a sequential anthropomorphic grasping dataset based on Wei et al. . The used anthropomorphic robotic hand is DLR/HIT Hand II. There are more than one million grasp samples on 300 objects. We firstly remove those objects with few effective grasps, as a result, 278 objects are preserved. And then, we build a continual grasp learning setting, the data in the setting is modeled as an ordered sequence composed of 278 non-iid learning experiences, a learning experience is a set of grasp samples from an individual object, as shown in Figure 10. The complete 3D point cloud of each object is taken as observation P. For each experience, we split the grasp samples into training set, validation set and test set at the ratio of 6:2:2.
Training set and validation set in the sequence are used to train the grasping models continually, while the test set is used to test trained models.

Steps for the simulated experiments
The simulated experiments are conducted in the physics-based simulator MuJoCo. 22 There are four steps in the physical simulation process: 1) Fix the object stationary and initialize the robotic hand with a pregrasp state, then the hand approaches the object and executes grasping with the generated grasp parameters including hand wrist pose and angles of hand joints until a stable state of the simulator reaches. 2) Then the gravity is present, fingers keep the grasping force till a stable simulator state reaches or the object falls from the hand. 3) By shaking the hand, the unstable grasps are filtered, and grasps that keep the object in hand are preserved as successful ones.