research-article

Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks

Authors:
Surgan Jandial

Indian Institute of Technology Hyderabad, Hyderabad, India

Indian Institute of Technology Hyderabad, Hyderabad, India
View Profile

,
Ayush Chopra

Adobe, Delhi, India

Adobe, Delhi, India
View Profile

,
Mausoom Sarkar

Adobe, Delhi, India

Adobe, Delhi, India
View Profile

,
Piyush Gupta

Adobe, Delhi, India

Adobe, Delhi, India
View Profile

,
Balaji Krishnamurthy

Adobe, Noida, India

Adobe, Noida, India
View Profile

,
Vineeth Balasubramanian

Indian Institute of Technology Hyderabad, Hyderabad, India

Indian Institute of Technology Hyderabad, Hyderabad, India
View Profile

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningAugust 2020Pages 1123–1131https://doi.org/10.1145/3394486.3403165

Published:20 August 2020Publication History

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1123–1131

ABSTRACT

Deep neural networks (DNNs) are powerful learning machines that have enabled breakthroughs in several domains. In this work, we introduce a new retrospective loss to improve the training of deep neural network models by utilizing the prior experience available in past model states during training. Minimizing the retrospective loss, along with the task-specific loss, pushes the parameter state at the current training step towards the optimal parameter state while pulling it away from the parameter state at a previous training step. Although a simple idea, we analyze the method as well as to conduct comprehensive sets of experiments across domains - images, speech, text, and graphs - to show that the proposed loss results in improved performance across input domains, tasks, and architectures.

References

[n.d.]. ACGAN-Pytorch. https://github.com/eriklindernoren/PyTorch-GANGoogle Scholar
[n.d.]. DCGAN-Pytorch. https://github.com/pytorch/examples/tree/master/dcganGoogle Scholar
2018. Inception Score Code. https://github.com/sbarratt/inception-score-pytorchGoogle Scholar
Filippo Maria Bianchi, Daniele Grattarola, Cesare Alippi, and Lorenzo Livi. 2019.Graph Neural Networks with convolutional ARMA filters. arXiv:arXiv:1901.01343Google Scholar
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42, 4 (2008), 335.Google Scholar
Gal Chechik, Varun Sharma, Uri Shalit, and Samy Bengio. 2010. Large scaleonline learning of image similarity through ranking.Journal of Machine Learning Research 11, Mar (2010), 1109--1135.Google Scholar
Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-BinHuang. 2019. A Closer Look at Few-shot Classification.CoRRabs/1904.04232(2019). arXiv:1904.04232 http://arxiv.org/abs/1904.04232Google Scholar
Wei-Yu Chen. 2019. https://github.com/wyharveychen/CloserLookFewShot.URL(2019).Google Scholar
Sentic-Emotion Co. 2019. https://github.com/SenticNet/conv-emotion.URL(2019).Google Scholar
Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.Google Scholar
Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. IEEE, 1735--1742.Google ScholarDigital Library
Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, and William J.Dally. 2016. DSD: Dense-Sparse-Dense Training for Deep Neural Networks. In ICLR.Google Scholar
Hado V. Hasselt. 2010. Double Q-learning. In Advances in Neural Information Processing Systems 23, J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel,and A. Culotta (Eds.). Curran Associates, Inc., 2613--2621.Google ScholarDigital Library
Haowei He, Gao Huang, and Yang Yuan. 2019. Asymmetric Valleys: Beyond Sharpand Flat Local Minima. In Advances in Neural Information Processing Systems 32,H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett(Eds.). Curran Associates, Inc., 2553--2564. http://papers.nips.cc/paper/8524-asymmetric-valleys-beyond-sharp-and-flat-local-minima.pdfGoogle Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.Google Scholar
Elad Hoffer and Nir Ailon. 2015. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition. Springer, 84--92.Google ScholarCross Ref
Yann N. Dauphin David Lopez-Paz Hongyi Zhang, Moustapha Cisse. 2018. mixup:Beyond Empirical Risk Minimization. International Conference on Learning Representations(2018). https://openreview.net/forum?id=r1Ddp1-RbGoogle Scholar
Chi Jin, Praneeth Netrapalli, and Michael I. Jordan. 2018. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent. In Proceedings of the31st Conference On Learning Theory (Proceedings of Machine Learning Research), Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet (Eds.), Vol. 75. PMLR, 1042--1085. http://proceedings.mlr.press/v75/jin18a.htmlGoogle Scholar
Rie Johnson and Tong Zhang. 2013. Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction. In Neural Information Processing Systems(Lake Tahoe, Nevada).Google Scholar
Hansohl Kim. 2016. Residual Networks for Tiny ImageNet. Stanford CS231Nreports 2016(2016).Google Scholar
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv: arXiv:1412.6980Google Scholar
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR).Google Scholar
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Image Net Classification with Deep Convolutional Neural Networks. In Neural Information Processing Systems(Lake Tahoe, Nevada). 1097--1105.Google ScholarDigital Library
Yann Lecun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 2001. Gradient-based learning applied to document recognition. IEEE Press, 306--351.Google Scholar
Navonil Majumder, Soujanya Poria, Devamanyu Hazarika, Rada Mihalcea,Alexander Gelbukh, and Erik Cambria. 2019. Dialogue Rnn: An attentive rnn for emotion detection in conversations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6818--6825.Google ScholarDigital Library
Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2016. Least Squares Generative Adversarial Networks.arXiv:arXiv:1611.04076Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015.Human-level control through deep reinforcement learning.Nature518, 7540(Feb. 2015), 529--533. http://dx.doi.org/10.1038/nature14236Google Scholar
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdfGoogle Scholar
Lam M. Nguyen, Jie Liu, Katya Scheinberg, and Martin Takác. 2017. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6--11 August 2017. 2613--2621.Google Scholar
Hyeonwoo Noh, Tackgeun You, Jonghwan Mun, and Bohyung Han. 2017. Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization. In NIPS'17(Long Beach, California, USA). Curran Associates Inc., USA, 5115--5124. http://dl.acm.org/citation.cfm?id=3295222.3295264Google Scholar
Augustus Odena, Christopher Olah, and Jonathon Shlens. 2016. Conditional Image Synthesis With Auxiliary Classifier GANs. arXiv:arXiv:1610.09585Google Scholar
Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan, and Quoc Tran-Dinh. 2019. ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization. arXiv: arXiv:1902.05679Google Scholar
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:arXiv:1511.06434Google Scholar
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. Image Net Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV)115, 3 (2015), 211--252. https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarDigital Library
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved Techniques for Training GANs. arXiv:arXiv:1606.03498Google Scholar
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.Google ScholarCross Ref
Björn Schuller, Michel Valster, Florian Eyben, Roddy Cowie, and Maja Pantic. 2012. AVEC 2012: the continuous audio/visual emotion challenge. In Proceedings of the 14th ACM international conference on Multimodal interaction. ACM, 449--456.Google Scholar
John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, and Pieter Abbeel. 2015. Trust Region Policy Optimization. In ICML(Lille, France). JMLR.org, 1889--1897. http://dl.acm.org/citation.cfm?id=3045118.3045319Google Scholar
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov.2017. Proximal Policy Optimization Algorithms. ArXivabs/1707.06347 (2017).Google Scholar
Prithviraj Sen, Galileo Mark Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. 2008. Collective Classification in Network Data. AI Magazine 29, 3 (2008), 93--106. http://www.cs.iit.edu/~ml/pdfs/sen-aimag08.pdfGoogle ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.Google Scholar
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In Neural Information Processing Systems. 4077--4087.Google Scholar
Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the Importance of Initialization and Momentum in Deep Learning. In ICML(Atlanta, GA, USA). JMLR.org, III--1139--III--1147.Google Scholar
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,.Google Scholar
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report.Google Scholar
Pete Warden. 2017. https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html. Google AI Blog 1 (2017), URL.Google Scholar
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.arXiv:cs.LG/cs.LG/1708.07747Google Scholar
Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random Erasing Data Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).Google ScholarCross Ref

Index Terms

Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Domain-adversarial training of neural networks

We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for ...
Read More
Deep learning in neural networks

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. ...
Read More
Revisiting multiple instance neural networks

We revisit the problem of solving MIL using neural networks (MINNs), which are ignored in current MIL research community. Our experiments show that MINNs are very effective and efficient.We proposed a novel MI-Net which is centered on learning bag ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
loss functions
representation learning
supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 257
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Domain-adversarial training of neural networks

Deep learning in neural networks

Revisiting multiple instance neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Domain-adversarial training of neural networks

Deep learning in neural networks

Revisiting multiple instance neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media