ABSTRACT
Deep neural networks (DNNs) are powerful learning machines that have enabled breakthroughs in several domains. In this work, we introduce a new retrospective loss to improve the training of deep neural network models by utilizing the prior experience available in past model states during training. Minimizing the retrospective loss, along with the task-specific loss, pushes the parameter state at the current training step towards the optimal parameter state while pulling it away from the parameter state at a previous training step. Although a simple idea, we analyze the method as well as to conduct comprehensive sets of experiments across domains - images, speech, text, and graphs - to show that the proposed loss results in improved performance across input domains, tasks, and architectures.
- [n.d.]. ACGAN-Pytorch. https://github.com/eriklindernoren/PyTorch-GANGoogle Scholar
- [n.d.]. DCGAN-Pytorch. https://github.com/pytorch/examples/tree/master/dcganGoogle Scholar
- 2018. Inception Score Code. https://github.com/sbarratt/inception-score-pytorchGoogle Scholar
- Filippo Maria Bianchi, Daniele Grattarola, Cesare Alippi, and Lorenzo Livi. 2019.Graph Neural Networks with convolutional ARMA filters. arXiv:arXiv:1901.01343Google Scholar
- Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42, 4 (2008), 335.Google Scholar
- Gal Chechik, Varun Sharma, Uri Shalit, and Samy Bengio. 2010. Large scaleonline learning of image similarity through ranking.Journal of Machine Learning Research 11, Mar (2010), 1109--1135.Google Scholar
- Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-BinHuang. 2019. A Closer Look at Few-shot Classification.CoRRabs/1904.04232(2019). arXiv:1904.04232 http://arxiv.org/abs/1904.04232Google Scholar
- Wei-Yu Chen. 2019. https://github.com/wyharveychen/CloserLookFewShot.URL(2019).Google Scholar
- Sentic-Emotion Co. 2019. https://github.com/SenticNet/conv-emotion.URL(2019).Google Scholar
- Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.Google Scholar
- Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. IEEE, 1735--1742.Google ScholarDigital Library
- Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, and William J.Dally. 2016. DSD: Dense-Sparse-Dense Training for Deep Neural Networks. In ICLR.Google Scholar
- Hado V. Hasselt. 2010. Double Q-learning. In Advances in Neural Information Processing Systems 23, J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel,and A. Culotta (Eds.). Curran Associates, Inc., 2613--2621.Google ScholarDigital Library
- Haowei He, Gao Huang, and Yang Yuan. 2019. Asymmetric Valleys: Beyond Sharpand Flat Local Minima. In Advances in Neural Information Processing Systems 32,H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett(Eds.). Curran Associates, Inc., 2553--2564. http://papers.nips.cc/paper/8524-asymmetric-valleys-beyond-sharp-and-flat-local-minima.pdfGoogle Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.Google Scholar
- Elad Hoffer and Nir Ailon. 2015. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition. Springer, 84--92.Google ScholarCross Ref
- Yann N. Dauphin David Lopez-Paz Hongyi Zhang, Moustapha Cisse. 2018. mixup:Beyond Empirical Risk Minimization. International Conference on Learning Representations(2018). https://openreview.net/forum?id=r1Ddp1-RbGoogle Scholar
- Chi Jin, Praneeth Netrapalli, and Michael I. Jordan. 2018. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent. In Proceedings of the31st Conference On Learning Theory (Proceedings of Machine Learning Research), Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet (Eds.), Vol. 75. PMLR, 1042--1085. http://proceedings.mlr.press/v75/jin18a.htmlGoogle Scholar
- Rie Johnson and Tong Zhang. 2013. Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction. In Neural Information Processing Systems(Lake Tahoe, Nevada).Google Scholar
- Hansohl Kim. 2016. Residual Networks for Tiny ImageNet. Stanford CS231Nreports 2016(2016).Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv: arXiv:1412.6980Google Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR).Google Scholar
- Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Image Net Classification with Deep Convolutional Neural Networks. In Neural Information Processing Systems(Lake Tahoe, Nevada). 1097--1105.Google ScholarDigital Library
- Yann Lecun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 2001. Gradient-based learning applied to document recognition. IEEE Press, 306--351.Google Scholar
- Navonil Majumder, Soujanya Poria, Devamanyu Hazarika, Rada Mihalcea,Alexander Gelbukh, and Erik Cambria. 2019. Dialogue Rnn: An attentive rnn for emotion detection in conversations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6818--6825.Google ScholarDigital Library
- Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2016. Least Squares Generative Adversarial Networks.arXiv:arXiv:1611.04076Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015.Human-level control through deep reinforcement learning.Nature518, 7540(Feb. 2015), 529--533. http://dx.doi.org/10.1038/nature14236Google Scholar
- Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdfGoogle Scholar
- Lam M. Nguyen, Jie Liu, Katya Scheinberg, and Martin Takác. 2017. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6--11 August 2017. 2613--2621.Google Scholar
- Hyeonwoo Noh, Tackgeun You, Jonghwan Mun, and Bohyung Han. 2017. Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization. In NIPS'17(Long Beach, California, USA). Curran Associates Inc., USA, 5115--5124. http://dl.acm.org/citation.cfm?id=3295222.3295264Google Scholar
- Augustus Odena, Christopher Olah, and Jonathon Shlens. 2016. Conditional Image Synthesis With Auxiliary Classifier GANs. arXiv:arXiv:1610.09585Google Scholar
- Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan, and Quoc Tran-Dinh. 2019. ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization. arXiv: arXiv:1902.05679Google Scholar
- Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:arXiv:1511.06434Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. Image Net Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV)115, 3 (2015), 211--252. https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarDigital Library
- Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved Techniques for Training GANs. arXiv:arXiv:1606.03498Google Scholar
- Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.Google ScholarCross Ref
- Björn Schuller, Michel Valster, Florian Eyben, Roddy Cowie, and Maja Pantic. 2012. AVEC 2012: the continuous audio/visual emotion challenge. In Proceedings of the 14th ACM international conference on Multimodal interaction. ACM, 449--456.Google Scholar
- John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, and Pieter Abbeel. 2015. Trust Region Policy Optimization. In ICML(Lille, France). JMLR.org, 1889--1897. http://dl.acm.org/citation.cfm?id=3045118.3045319Google Scholar
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov.2017. Proximal Policy Optimization Algorithms. ArXivabs/1707.06347 (2017).Google Scholar
- Prithviraj Sen, Galileo Mark Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. 2008. Collective Classification in Network Data. AI Magazine 29, 3 (2008), 93--106. http://www.cs.iit.edu/~ml/pdfs/sen-aimag08.pdfGoogle ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.Google Scholar
- Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In Neural Information Processing Systems. 4077--4087.Google Scholar
- Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the Importance of Initialization and Momentum in Deep Learning. In ICML(Atlanta, GA, USA). JMLR.org, III--1139--III--1147.Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,.Google Scholar
- C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report.Google Scholar
- Pete Warden. 2017. https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html. Google AI Blog 1 (2017), URL.Google Scholar
- Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.arXiv:cs.LG/cs.LG/1708.07747Google Scholar
- Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random Erasing Data Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).Google ScholarCross Ref
Index Terms
- Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks
Recommendations
Domain-adversarial training of neural networks
We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for ...
Deep learning in neural networks
In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. ...
Revisiting multiple instance neural networks
We revisit the problem of solving MIL using neural networks (MINNs), which are ignored in current MIL research community. Our experiments show that MINNs are very effective and efficient.We proposed a novel MI-Net which is centered on learning bag ...
Comments