Tree-CNN: A hierarchical Deep Convolutional Neural Network for incremental learning

doi:10.1016/j.neunet.2019.09.010

Neural Networks

Volume 121, January 2020, Pages 148-160

https://doi.org/10.1016/j.neunet.2019.09.010 Get rights and content

Abstract

Over the past decade, Deep Convolutional Neural Networks (DCNNs) have shown remarkable performance in most computer vision tasks. These tasks traditionally use a fixed dataset, and the model, once trained, is deployed as is. Adding new information to such a model presents a challenge due to complex training issues, such as “catastrophic forgetting”, and sensitivity to hyper-parameter tuning. However, in this modern world, data is constantly evolving, and our deep learning models are required to adapt to these changes. In this paper, we propose an adaptive hierarchical network structure composed of DCNNs that can grow and learn as new data becomes available. The network grows in a tree-like fashion to accommodate new classes of data, while preserving the ability to distinguish the previously trained classes. The network organizes the incrementally available data into feature-driven super-classes and improves upon existing hierarchical CNN models by adding the capability of self-growth. The proposed hierarchical model, when compared against fine-tuning a deep network, achieves significant reduction of training effort, while maintaining competitive accuracy on CIFAR-10 and CIFAR-100.

Introduction

In recent years Deep Convolutional Neural Networks (DCNNs) have emerged as the leading architecture for large scale image classification (Rawat & Wang, 2017). In 2012, AlexNet (Krizhevsky, Sutskever, & Hinton, 2012), an 8 layer Deep CNN, won the ImageNet Large Scale Visual Recognition Challenge (ISLVRC) and catapulted DCNNs into the spotlight. Since then, they have dominated ISLVRC and have performed extremely well on popular image datasets such as MNIST (LeCun et al., 1998, Wan et al., 2013), CIFAR-10/100 (Krizhevsky & Hinton, 2009), and ImageNet (Russakovsky et al., 2015).

Today, with increased access to large amounts of labeled data (e.g. ImageNet (Russakovsky et al., 2015) contains 1.2 million images with 1000 categories), supervised learning has become the leading paradigm in training DCNNs for image recognition. Traditionally, a DCNN is trained on a dataset containing a large number of labeled images. The network learns to extract relevant features and classify these images. This trained model is then used on real world unlabeled images to classify them. In such training, all the training data is presented to the network during the same training process. However, in real world, we hardly have all the information at once, and data is, instead, gathered incrementally over time. This creates the need for models that can learn new information as it becomes available. In this work, we try to address the challenge of learning on such incrementally available data in the domain of image recognition using deep networks.

A DCNN embeds feature extraction and classification in one coherent architecture within the same model. Modifying one part of the parameter space immediately affects the model globally (Xiao, Zhang, Yang, Peng, & Zhang, 2014). Another problem of incrementally training a DCNN is the issue of “catastrophic forgetting” (Goodfellow, Mirza, Xiao, Courville, & Bengio, 2013). When a trained DCNN is retrained exclusively over new data, it results in the destruction of existing features learned from earlier data. This mandates using previous data when retraining on new data.

To avoid catastrophic forgetting, and to leverage the features learned in previous task, this work proposes a network made of CNNs that grows hierarchically as new classes are introduced. The network adds the new classes like new leaves to the hierarchical structure. The branching is based on the similarity of features between new and old classes. The initial nodes of the Tree-CNN assign the input into coarse super-classes, and as we approach the leaves of the network, finer classification is done. Such a model allows us to leverage the convolution layers learned previously to be used in the new bigger network.

The remainder of the paper is organized as follows. The related work on incremental learning in deep neural networks is discussed in Section 2. In Section 3 we present our proposed network architecture and incremental learning method. In Section 4, the two experiments using CIFAR-10 and CIFAR-100 datasets are described. It is followed by a detailed analysis of the performance of the network and its comparison with transfer learning and fine tuning in Section 5. Finally, Section 6 discusses the merits and limitations of our network, our findings, and possible opportunities for future work.

Section snippets

Related work

The modern world of digitized data produces new information every second (John Walker, 2014), thus fueling the need for systems that can learn as new data arrives. Traditional deep neural networks are static in that respect, and several new approaches to incremental learning are currently being explored. “One-shot learning” (Fei-Fei, Fergus, & Perona, 2006) is a Bayesian transfer learning technique, that uses very few training samples to learn new classes. Fast R-CNN (Girshick, 2015), a popular

Network architecture

Inspired from hierarchical classifiers, our proposed model, Tree-CNN is composed of multiple nodes connected in a tree-like manner. Each node (except leaf nodes) has a DCNN which is trained to classify the input to the node into one of its children. The root node is the highest node of the tree, where the first classification happens. The image is then passed on to its child node, as per the classification label. This node further classifies the image, until we reach a leaf node, the last step

Adding multiple new classes (CIFAR-10)

We initialized a Tree-CNN that can classify six classes (Fig. 3a). It had a root node and two branch nodes. The sample images from the 4 new classes generated the softmax likelihood output at root node as shown in Fig. 4(a). Accordingly, the new classes are added to the two nodes, and the new Tree-CNN is shown in Fig. 3b. In Table 6, we report the test accuracy and the training effort for the 5 cases of fine-tuning network B against our Tree-CNN for CIFAR-10. We observe that retraining only the

Discussion

The motivation of this work stems from the idea that subsequent addition of new image classes to a network should be easier than retraining the whole network again with all the classes. We observed that each incremental learning stage required more effort than the previous, because images belonging to old classes needed to be shown to the CNNs. This is due to the inherent problem of “catastrophic forgetting” in deep neural networks. Our proposed method offers the best trade-off between accuracy

Acknowledgments

This work was supported in part by the Center for Brain Inspired Computing (C-BRIC), USA, one of the six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA, USA, the National Science Foundation, USA, Intel Corporation, USA, the DoD, USA Vannevar Bush Fellowship, and by the U.S. Army Research Laboratory and the U.K. Ministry of Defense under Agreement Number W911NF-16-3-0001.

References (32)

AljundiR. et al.
Expert gate: Lifelong learning with a network of experts
CoRR
(2016)
Fei-FeiL. et al.
One-shot learning of object categories
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2006)
GirshickR.
Fast R-CNN
Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., & Bengio, Y. (2013). An empirical investigation of catastrophic...
Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. In International...
HeK. et al.
Deep residual learning for image recognition
HertelL. et al.
Deep convolutional neural networks as generic feature extractors
IoffeS. et al.
Batch normalization: Accelerating deep network training by reducing internal covariate shift
John WalkerS.
Big data: A revolution that will transform how we live, work, and think
(2014)
KontschiederP. et al.
Deep neural decision forests

KrizhevskyA. et al.

Learning multiple layers of features from tiny imagesTechnical report

(2009)

KrizhevskyA. et al.

Imagenet classification with deep convolutional neural networks

LeCunY. et al.

Gradient-based learning applied to document recognition

Proceedings of the IEEE

(1998)

LiZ. et al.

Learning without forgetting

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2017)

MATLAB, . (2017). version 9.2.0 (R2017a). Natick, Massachusetts: The MathWorks...

PandaP. et al.

FALCON: Feature driven selective classification for energy-efficient image recognition

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

(2017)

Cited by (179)

NRG Oncology Assessment of Artificial Intelligence Deep Learning–Based Auto-segmentation for Radiation Therapy: Current Developments, Clinical Considerations, and Future Directions
2024, International Journal of Radiation Oncology Biology Physics
Deep learning neural networks (DLNN) in Artificial intelligence (AI) have been extensively explored for automatic segmentation in radiotherapy (RT). In contrast to traditional model-based methods, data-driven AI-based models for auto-segmentation have shown high accuracy in early studies in research settings and controlled environment (single institution). Vendor-provided commercial AI models are made available as part of the integrated treatment planning system (TPS) or as a stand-alone tool that provides streamlined workflow interacting with the main TPS. These commercial tools have drawn clinics' attention thanks to their significant benefit in reducing the workload from manual contouring and shortening the duration of treatment planning. However, challenges occur when applying these commercial AI-based segmentation models to diverse clinical scenarios, particularly in uncontrolled environments. Contouring nomenclature and guideline standardization has been the main task undertaken by the NRG Oncology. AI auto-segmentation holds the potential clinical trial participants to reduce interobserver variations, nomenclature non-compliance, and contouring guideline deviations. Meanwhile, trial reviewers could use AI tools to verify contour accuracy and compliance of those submitted datasets. In recognizing the growing clinical utilization and potential of these commercial AI auto-segmentation tools, NRG Oncology has formed a working group to evaluate the clinical utilization and potential of commercial AI auto-segmentation tools. The group will assess in-house and commercially available AI models, evaluation metrics, clinical challenges, and limitations, as well as future developments in addressing these challenges. General recommendations are made in terms of the implementation of these commercial AI models, as well as precautions in recognizing the challenges and limitations.
A crowdsourcing-based incremental learning framework for automated essays scoring
2024, Expert Systems with Applications
Automated Essay Scoring (AES) is a challenging topic in Natural Language Processing. Recently, deep learning models have achieved remarkable performance for the AES task. However, applying deep learning models to the AES system in practice is expensive when both data collection and model training are taken into consideration. This paper aims to tackle this problem by proposing the Crowdsourcing-based Automated Essay Scoring (CAES) framework. The proposed framework gradually collects data through crowdsourcing and incrementally trains the AES models. In particular, we propose the Incremental Learning with Dynamic Exemplar Herding (ILDEH) approach to simultaneously tackle catastrophic forgetting and concept drift. The proposed approach dynamically updates the exemplar set by the Dynamic Exemplar Herding algorithm to obtain the best approximation of the overall data distribution and selectively apply knowledge distillation on the model outputs by Linear Outlier Suppression loss to retain the learned knowledge. Moreover, we use a lightweight AES model for effective and efficient essay scoring. The experimental results show that our proposed ILDEH approach outperforms other strong baseline approaches for the AES task. Moreover, the CAES framework is able to steadily improve the AES performance in the crowdsourcing environment with only 10.6% training time of the conventional approach. Further analysis shows that one single CPU server can support daily updates of more than 300 AES models, which is sufficient for most practical AES systems.
Energy management of grid connected PV with efficient inverter based wireless electric vehicle battery charger: A hybrid CSA-QNN technique
2024, Journal of Energy Storage
A Hybrid CSA-QNN approach is proposed in this manuscript for grid-connected PV with an efficient inverter-based wireless electric vehicle (EV) battery charger. The proposed hybrid method combines the performance of both the circle search algorithm (CSA) and quantum neural networks (QNN), commonly named the CSA-QNN technique. The Circle Search Algorithm helps find the best charging spot by creating a virtual circle, while the Quantum Neural Network optimizes the overall power flow and charging efficiency. Together, these technologies contribute to making wireless charging for EVs more efficient and convenient. The major goal of the manuscript is the design of a wireless EV battery charger with PV integration. Wireless EV charging systems (WEVCS) may be a feasible alternative technology for charging EVs without a plug-in problem. The CSA-QNN method is performed in the MATLAB platform and it is compared to different existing approaches. The CSA-QNN method shows better results than the existing approaches like the Salp Swarm Algorithm (SSA), Wild horse optimizer (WHO), and Particle Swarm optimization (PSO).
Autoencoding tree for city generation and applications
2024, ISPRS Journal of Photogrammetry and Remote Sensing
City modeling and generation have attracted an increased interest in various applications, including gaming, urban planning, and autonomous driving. Unlike previous works focused on the generation of single objects or indoor scenes, the huge volumes of spatial data in cities pose a challenge to the generative models. Furthermore, few publicly available 3D real-world city datasets also hinder the development of methods for city generation. In this paper, we first collect over 3,000,000 geo-referenced objects for the cities of New York, Zurich, Tokyo, Berlin, Boston, and several other large cities. Based on this dataset, we propose AETree, a tree-structured auto-encoder neural network, for city generation. Specifically, we first propose a novel Spatial-Geometric Distance (SGD) metric to measure the similarity between building layouts and then construct a binary tree over the raw geometric data of the building based on the SGD metric. Next, we present a tree-structured network whose encoder learns to extract and merge spatial information from the bottom-up iteratively. The resulting global representation is reversely decoded for reconstruction or generation. To address the issue of long-dependency as the level of the tree increases, a Long Short-Term Memory (LSTM) Cell is employed as a basic network element of the proposed AETree. Moreover, we introduce a novel metric, Overlapping Area Ratio (OAR), to quantitatively evaluate the generation results. Experiments on the collected dataset demonstrate that the proposed model outperforms baseline models, such as LayoutTransformer and LayoutVAE, in terms of key metrics. Specifically, the proposed model achieves a Jensen–Shannon Divergence (JSD) of 0.0033, compared to 0.0041 and 0.0061 for LayoutTransformer and LayoutVAE, respectively. Similarly, for the Overall Accuracy Rate (OAR), the proposed model scores 1.66, significantly better than 28.24 and 19.01 for the baseline models Furthermore, the latent features learned by AETree can serve downstream urban planning applications. Project webpage is available at https://ai4ce.github.io/RealCity3D.
Triplet attention-enhanced residual tree-inspired decision network: A hierarchical fault diagnosis model for unbalanced bearing datasets
2024, Advanced Engineering Informatics
In fault classification tasks, deep neural networks (DNNs) have remarkable recognition performance. Nevertheless, the classification decision processes of DNNs lack hierarchical logical reasoning abilities and their diagnostic performance significantly deteriorates when dealing with imbalanced bearing fault datasets. To further address this issue, a novel model, termed the triplet attention-enhanced residual tree-inspired decision network (TARTDN) which is not a simple combination of the DNN and decision tree model, is developed to diagnose unbalanced bearing faults and provides a rational decision-making and reasoning process in this study. First, a triplet attention-enhanced residual network (TARN) is designed as the backbone network to capture key information more accurately. Second, a novel tree-inspired decision layer (TDL) is construed to infer and decide bearing data categories. Subsequently, the probability distribution values obtained by pre-training TARN are flowed into the TDL as thresholds for seed and leaf nodes. The parameters of the TARN are continuously updated with the node thresholds of the TDL, resulting in an integrated TARTDN model that combines high-quality feature extraction and inferable decision-making. In the end, the trained TARTDN progressively determines the fault types and severity levels in unbalanced bearing fault datasets. The developed model tested on two bearing fault datasets with three unbalanced ratios, has consistently achieved recognition rates exceeding 97.5%. The proposed approach has been validated through ablation experiments and comparisons with other advanced methods to exhibit higher recognition rates, superior hierarchical classification reasoning, and more stable generalization capabilities on unbalanced bearing datasets.
Self-driven continual learning for class-added motor fault diagnosis based on unseen fault detector and propensity distillation
2024, Engineering Applications of Artificial Intelligence
Continual learning is a high-potential technique that enables intelligent motor fault diagnosis models to extend new diagnosable fault classes without costly training from scratch. However, existing continual learning methods have the following limitations. (1) They manually detect new faults, which is labor-intensive, untimely, and more importantly, may lead to mistaken diagnosis results. (2) They adopt the traditional knowledge distillation to align the absolute responses of old and new models, which alleviates catastrophic forgetting but restricts flexible learning from incremental datasets. To overcome the above limitations, this paper proposes a novel self-driven continual learning framework for class-added motor fault diagnosis, which can spontaneously detect unseen faults and perform more flexible continual learning from incremental datasets. For the automatic detection of unseen faults, after collecting online samples, adversarial training with exemplars of each seen class is conducted to measure the class separability. The truth fault classes that are unseen for diagnosis models can be clearly distinguished from all seen classes, and correspondingly missed diagnosis or misdiagnosing can be avoided effectively and incremental samples with new fault types can be collected quickly. For the flexible continual learning strategy, a more flexible knowledge distillation is proposed to preserve the prediction propensity rather than the absolute response. This strategy not only keeps the recognition performance of old classes but also loosens unnecessary constraints and increases the diagnosis model plasticity to learn new knowledge from incremental datasets, thus improving the accuracy of motor fault diagnosis during continual learning. The effectiveness of the proposed method is verified by conducting fault simulation experiments of three-phase motors and its superiority is also demonstrated by comparing it with some state-of-the-art diagnosis methods.

View all citing articles on Scopus

View full text

Tree-CNN: A hierarchical Deep Convolutional Neural Network for incremental learning

Abstract

Introduction

Section snippets

Related work

Network architecture

Adding multiple new classes (CIFAR-10)

Adding multiple new classes (CIFAR-10)

Discussion

Acknowledgments

Expert gate: Lifelong learning with a network of experts

CoRR

One-shot learning of object categories

IEEE Transactions on Pattern Analysis and Machine Intelligence

Fast R-CNN

Deep residual learning for image recognition

Deep convolutional neural networks as generic feature extractors

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Big data: A revolution that will transform how we live, work, and think

Deep neural decision forests

Learning multiple layers of features from tiny imagesTechnical report

Imagenet classification with deep convolutional neural networks

Gradient-based learning applied to document recognition

Proceedings of the IEEE

Learning without forgetting

IEEE Transactions on Pattern Analysis and Machine Intelligence

FALCON: Feature driven selective classification for energy-efficient image recognition

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems