Elsevier

Neural Networks

Volume 121, January 2020, Pages 148-160
Neural Networks

Tree-CNN: A hierarchical Deep Convolutional Neural Network for incremental learning

https://doi.org/10.1016/j.neunet.2019.09.010Get rights and content

Abstract

Over the past decade, Deep Convolutional Neural Networks (DCNNs) have shown remarkable performance in most computer vision tasks. These tasks traditionally use a fixed dataset, and the model, once trained, is deployed as is. Adding new information to such a model presents a challenge due to complex training issues, such as “catastrophic forgetting”, and sensitivity to hyper-parameter tuning. However, in this modern world, data is constantly evolving, and our deep learning models are required to adapt to these changes. In this paper, we propose an adaptive hierarchical network structure composed of DCNNs that can grow and learn as new data becomes available. The network grows in a tree-like fashion to accommodate new classes of data, while preserving the ability to distinguish the previously trained classes. The network organizes the incrementally available data into feature-driven super-classes and improves upon existing hierarchical CNN models by adding the capability of self-growth. The proposed hierarchical model, when compared against fine-tuning a deep network, achieves significant reduction of training effort, while maintaining competitive accuracy on CIFAR-10 and CIFAR-100.

Introduction

In recent years Deep Convolutional Neural Networks (DCNNs) have emerged as the leading architecture for large scale image classification (Rawat & Wang, 2017). In 2012, AlexNet (Krizhevsky, Sutskever, & Hinton, 2012), an 8 layer Deep CNN, won the ImageNet Large Scale Visual Recognition Challenge (ISLVRC) and catapulted DCNNs into the spotlight. Since then, they have dominated ISLVRC and have performed extremely well on popular image datasets such as MNIST (LeCun et al., 1998, Wan et al., 2013), CIFAR-10/100 (Krizhevsky & Hinton, 2009), and ImageNet (Russakovsky et al., 2015).

Today, with increased access to large amounts of labeled data (e.g. ImageNet (Russakovsky et al., 2015) contains 1.2 million images with 1000 categories), supervised learning has become the leading paradigm in training DCNNs for image recognition. Traditionally, a DCNN is trained on a dataset containing a large number of labeled images. The network learns to extract relevant features and classify these images. This trained model is then used on real world unlabeled images to classify them. In such training, all the training data is presented to the network during the same training process. However, in real world, we hardly have all the information at once, and data is, instead, gathered incrementally over time. This creates the need for models that can learn new information as it becomes available. In this work, we try to address the challenge of learning on such incrementally available data in the domain of image recognition using deep networks.

A DCNN embeds feature extraction and classification in one coherent architecture within the same model. Modifying one part of the parameter space immediately affects the model globally (Xiao, Zhang, Yang, Peng, & Zhang, 2014). Another problem of incrementally training a DCNN is the issue of “catastrophic forgetting” (Goodfellow, Mirza, Xiao, Courville, & Bengio, 2013). When a trained DCNN is retrained exclusively over new data, it results in the destruction of existing features learned from earlier data. This mandates using previous data when retraining on new data.

To avoid catastrophic forgetting, and to leverage the features learned in previous task, this work proposes a network made of CNNs that grows hierarchically as new classes are introduced. The network adds the new classes like new leaves to the hierarchical structure. The branching is based on the similarity of features between new and old classes. The initial nodes of the Tree-CNN assign the input into coarse super-classes, and as we approach the leaves of the network, finer classification is done. Such a model allows us to leverage the convolution layers learned previously to be used in the new bigger network.

The remainder of the paper is organized as follows. The related work on incremental learning in deep neural networks is discussed in Section 2. In Section 3 we present our proposed network architecture and incremental learning method. In Section 4, the two experiments using CIFAR-10 and CIFAR-100 datasets are described. It is followed by a detailed analysis of the performance of the network and its comparison with transfer learning and fine tuning in Section 5. Finally, Section 6 discusses the merits and limitations of our network, our findings, and possible opportunities for future work.

Section snippets

Related work

The modern world of digitized data produces new information every second (John Walker, 2014), thus fueling the need for systems that can learn as new data arrives. Traditional deep neural networks are static in that respect, and several new approaches to incremental learning are currently being explored. “One-shot learning” (Fei-Fei, Fergus, & Perona, 2006) is a Bayesian transfer learning technique, that uses very few training samples to learn new classes. Fast R-CNN (Girshick, 2015), a popular

Network architecture

Inspired from hierarchical classifiers, our proposed model, Tree-CNN is composed of multiple nodes connected in a tree-like manner. Each node (except leaf nodes) has a DCNN which is trained to classify the input to the node into one of its children. The root node is the highest node of the tree, where the first classification happens. The image is then passed on to its child node, as per the classification label. This node further classifies the image, until we reach a leaf node, the last step

Adding multiple new classes (CIFAR-10)

Adding multiple new classes (CIFAR-10)

We initialized a Tree-CNN that can classify six classes (Fig. 3a). It had a root node and two branch nodes. The sample images from the 4 new classes generated the softmax likelihood output at root node as shown in Fig. 4(a). Accordingly, the new classes are added to the two nodes, and the new Tree-CNN is shown in Fig. 3b. In Table 6, we report the test accuracy and the training effort for the 5 cases of fine-tuning network B against our Tree-CNN for CIFAR-10. We observe that retraining only the

Discussion

The motivation of this work stems from the idea that subsequent addition of new image classes to a network should be easier than retraining the whole network again with all the classes. We observed that each incremental learning stage required more effort than the previous, because images belonging to old classes needed to be shown to the CNNs. This is due to the inherent problem of “catastrophic forgetting” in deep neural networks. Our proposed method offers the best trade-off between accuracy

Acknowledgments

This work was supported in part by the Center for Brain Inspired Computing (C-BRIC), USA, one of the six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA, USA, the National Science Foundation, USA, Intel Corporation, USA, the DoD, USA Vannevar Bush Fellowship, and by the U.S. Army Research Laboratory and the U.K. Ministry of Defense under Agreement Number W911NF-16-3-0001.

References (32)

  • AljundiR. et al.

    Expert gate: Lifelong learning with a network of experts

    CoRR

    (2016)
  • Fei-FeiL. et al.

    One-shot learning of object categories

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2006)
  • GirshickR.

    Fast R-CNN

  • Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., & Bengio, Y. (2013). An empirical investigation of catastrophic...
  • Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. In International...
  • HeK. et al.

    Deep residual learning for image recognition

  • HertelL. et al.

    Deep convolutional neural networks as generic feature extractors

  • IoffeS. et al.

    Batch normalization: Accelerating deep network training by reducing internal covariate shift

  • John WalkerS.

    Big data: A revolution that will transform how we live, work, and think

    (2014)
  • KontschiederP. et al.

    Deep neural decision forests

  • KrizhevskyA. et al.

    Learning multiple layers of features from tiny imagesTechnical report

    (2009)
  • KrizhevskyA. et al.

    Imagenet classification with deep convolutional neural networks

  • LeCunY. et al.

    Gradient-based learning applied to document recognition

    Proceedings of the IEEE

    (1998)
  • LiZ. et al.

    Learning without forgetting

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2017)
  • MATLAB, . (2017). version 9.2.0 (R2017a). Natick, Massachusetts: The MathWorks...
  • PandaP. et al.

    FALCON: Feature driven selective classification for energy-efficient image recognition

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

    (2017)
  • Cited by (179)

    • Autoencoding tree for city generation and applications

      2024, ISPRS Journal of Photogrammetry and Remote Sensing
    View all citing articles on Scopus
    View full text