Label-Similarity Curriculum Learning

Dogan, Ürün; Deshmukh, Aniket Anand; Machura, Marcin Bronislaw; Igel, Christian

doi:10.1007/978-3-030-58526-6_11

Ürün Dogan¹²,
Aniket Anand Deshmukh¹²,
Marcin Bronislaw Machura¹² &
…
Christian Igel¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12374))

Included in the following conference series:

European Conference on Computer Vision

3773 Accesses
3 Citations

Abstract

Curriculum learning can improve neural network training by guiding the optimization to desirable optima. We propose a novel curriculum learning approach for image classification that adapts the loss function by changing the label representation.

The idea is to use a probability distribution over classes as target label, where the class probabilities reflect the similarity to the true class. Gradually, this label representation is shifted towards the standard one-hot-encoding. That is, in the beginning minor mistakes are corrected less than large mistakes, resembling a teaching process in which broad concepts are explained first before subtle differences are taught.

The class similarity can be based on prior knowledge. For the special case of the labels being natural words, we propose a generic way to automatically compute the similarities. The natural words are embedded into Euclidean space using a standard word embedding. The probability of each class is then a function of the cosine similarity between the vector representations of the class and the true label.

The proposed label-similarity curriculum learning (LCL) approach was empirically evaluated using several popular deep learning architectures for image classification tasks applied to five datasets including ImageNet, CIFAR100, and AWA2. In all scenarios, LCL was able to improve the classification accuracy on the test data compared to standard training. Code to reproduce results is available at https://github.com/speedystream/LCL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The code to reproduce our results is available in the supplementary material.
2.
Horovod is a method which uses large batches over multiple GPU nodes and some accuracy loss is expected for the baseline method and this is well established. For more details please see Table 1 and Table 2.c in [24].
3.
We have tried \(\epsilon \in \{0.8, 0.9, 0.91, \ldots ,\) \(0.98, 0.99, 0.992, \ldots ,\) \(0.998, 0.999\}\) for ResNet-50 and ResNet-101 . The results showed that the search space for \(\epsilon \) can be less granular and we have limited the search space accordingly.

References

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning (ICML), pp. 41–48. ACM (2009)
Google Scholar
Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 535–541. ACM (2006)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255. IEEE (2009)
Google Scholar
Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. In: Conference on Robot Learning (CoRL), pp. 482–495 (2017)
Google Scholar
Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2121–2129 (2013)
Google Scholar
Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks (2018). arXiv:1805.04770 [stat.ML]
García, S., Herrera, F.: An extension on statistical “comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
MATH Google Scholar
Goyal, P., et al.: Accurate, large minibatch SGD: Training ImageNet in 1 hour (2017). arXiv:1706.02677v2 [cs.CV]
Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. In: International Conference on Machine Learning (ICML), pp. 1311–1320. PMLR (2017)
Google Scholar
Guo, S., et al.: CurriculumNet: weakly supervised learning from large-scale web images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 139–154. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_9
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE (2016)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: Deep Learning and Representation Learning Workshop: NIPS 2014 (2014)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141. IEEE (2018)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708. IEEE (2017)
Google Scholar
Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Google Scholar
Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 1189–1197 (2010)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning (ICML), pp. 1188–1196 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv:1301.3781v3 [cs.CL]
Müller, R., Kornblith, S., Hinton, G.: When does label smoothing help? (2019). arXiv:1906.02629 [cs.LG]
Orbes-Arteainst, M., et al.: Knowledge distillation for semi-supervised domain adaptation. In: Zhou, L., et al. (eds.) OR 2.0/MLCN -2019. LNCS, vol. 11796, pp. 68–76. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32695-1_8
Chapter Google Scholar
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE (2016)
Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NeurIPS 2017 Workshop Autodiff (2017)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets (2014). arXiv:1412.6550 [cs.LG]
Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in TensorFlow (2018). arXiv:1802.05799v3 [cs.LG]
Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1), 31–72 (2011). https://doi.org/10.1007/s10618-010-0175-9
Article MathSciNet MATH Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (AAAI), pp. 4278–4284 (2017)
Google Scholar
Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 dataset. Technical report. CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Weinshall, D., Cohen, G., Amir, D.: Curriculum learning by transfer learning: theory and experiments with deep networks. In: International Conference on Machine Learning (ICML), pp. 5235–5243. PMLR (2018)
Google Scholar
Wu, C., Tygert, M., LeCun, Y.: Hierarchical loss for classification (2017). arXiv:1709.01062v1 [cs.LG]
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2251–2265 (2018)
Article Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1492–1500. IEEE (2017)
Google Scholar
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4133–4141 (2017)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer (2016). arXiv:1612.03928 [cs.CV]
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: British Machine Vision Conference (BMVC). BMVA Press (2016)
Google Scholar
Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4320–4328 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Ads, Microsoft, Sunnyvale, CA, USA
Ürün Dogan, Aniket Anand Deshmukh & Marcin Bronislaw Machura
Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
Christian Igel

Authors

Ürün Dogan
View author publications
You can also search for this author in PubMed Google Scholar
Aniket Anand Deshmukh
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Bronislaw Machura
View author publications
You can also search for this author in PubMed Google Scholar
Christian Igel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ürün Dogan .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1672 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dogan, Ü., Deshmukh, A.A., Machura, M.B., Igel, C. (2020). Label-Similarity Curriculum Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12374. Springer, Cham. https://doi.org/10.1007/978-3-030-58526-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-58526-6_11
Published: 07 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58525-9
Online ISBN: 978-3-030-58526-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics