Reweighted stochastic learning
Introduction
In many domains dealing with online and stochastic learning, the input instances are of very high dimension, yet within any particular instance several features are non-zero. Therefore specific stochastic and online approaches crafted with sparsity inducing regularization are of particular interest for many machine learning researchers and practitioners. This paper investigates an interplay between Regularized Dual Averaging (RDA) approaches [1] (along with other techniques for solving linear SVMs in the context of stochastic learning [2]) and parsimony concepts arising from the application of sparsity inducing norms, like the l0-type of a penalty.
One can see an increasing importance of correctly identified sparsity patterns and proliferation of proximal and soft-thresholding subgradient-based methods [1], [3], [4]. There are many important contributions of the parsimony concept to the machine learning field. One may allude to the understanding of the obtained solution and simplified or easy to extract decision rules [5], [6], [7]. On the other hand the informativeness of the obtained features might be useful for a better generalization on unseen data [5]. Approaches based on l1-regularized loss minimization were studied in the context of stochastic and online learning by several research groups [1], [3], [8], [9] but we are not aware of any l0-norm inducing methods which were applied in the context of Regularized Dual Averaging and stochastic optimization.
In this paper we are trying to provide a supplementary analysis and sufficient regret bounds for learning sparser linear Regularized Dual Averaging (RDA) [1] models from random observations. We extend and modify our previous research [10], [11] and present complementary proofs with fewer assumptions and discussion for the reported theoretical findings. We use sequences of (strongly) convex reweighted optimization objectives to accomplish this goal.
This paper is structured as follows. Section 2 describes previous work on l0-norm induced learning and some existing solutions to stochastic optimization with regularized loss. Section 3.1 presents a problem statement for the reweighted algorithms. 3.2 Reweighted, 3.5 Reweighted introduce our reweighted l1-RDA and l2-RDA methods respectively while Section 3.8 presents completely novel approach based on probabilistic reweighted Pegasos-like linear SVM solver. 3.4 Analysis for the reweighted, 3.7 Analysis for the reweighted provide a theoretic background for our reweighted RDA approaches. Section 4 presents our numerical results and Section 5 concludes the paper.
Section snippets
Related work
Learning with pseudonorm regularization is a NP-hard problem [12] and can be approached via the reweighting schemes [13], [14], [15], [16] while lacking a proper theoretical analysis of convergence in the online and stochastic learning cases. Some methods, like [17], consider an embedded approach where one has to solve a sequence of QP-problems, which might be very computationally- and memory-wise expensive while still missing some proper convergence criteria.
In many existing iterative
Problem statement
In the stochastic Regularized Dual Averaging approach developed by Xiao [1] one approximates the loss function f(w) by using a finite set of independent observations . Under this setting one minimizes the following optimization objective:where represents a regularization term. Every observation is given as a pair of input–output variables . In the above setting one deals with a simple classification model and calculates the
Experimental setup
For all methods in our experiments with UCI datasets [24] for tuning (e.g. to estimate the ubiquitous λ hyperparameter or tuples of hyperparameters employed in Algorithm 1, Algorithm 2) we use Coupled Simulated Annealing [27] initialized with 5 random sets of parameters. These random sets are made out of tuples of hyperparameters linked to one particular setup of an algorithm. At every iteration step for CSA we proceed with a 10-fold cross-validation. Within the cross-validation we are
Conclusion
In this paper we studied reweighted stochastic learning in the context of dual averaging schemes and solvers for linear SVMs. We have presented two different directions for applying reweighting at each round t. The first approach helps to approximate very efficient l0-type of a penalty using a reliable and proven dual averaging scheme [22]. We applied the reweighting procedure to different norms and elaborated two versions of the Regularized Dual Averaging method [1], namely Reweighted l1- and l
Acknowledgments
EU: The research leading to these results has received funding from the European Research Council under the European Union׳s Seventh Framework Programme (FP7/2007-2013)/ERC AdG A-DATADRIVE-B (290923). This paper reflects only the authors׳ views, the Union is not liable for any use that may be made of the contained information. Research Council KUL: GOA/10/09 MaNet, CoE PFV/10/002 (OPTEC), BIL12/11T; PhD/Postdoc grants Flemish Government: FWO: projects: G.0377.12 (Structured systems), G.088114N
Vilen Jumutc received his B.Sc. and M.Sc. degrees in Computer Science from the Riga Technical University in 2007 and 2009 respectively. He is currently a Ph.D. researcher in the Department of Electrical Engineering (ESAT) of the Katholieke Universiteit Leuven. His interests include large-scale stochastic and online learning problems, kernel methods, semi-supervised learning and convex optimization.
References (28)
- et al.
Rule extraction from support vector machinesa review
Neurocomputing
(2010) - et al.
The null space property for sparse recovery from multiple measurement vectors
Appl. Comput. Harmon. Anal.
(2011) Dual averaging methods for regularized stochastic learning and online optimization
J. Mach. Learn. Res.
(2010)- S. Shalev-Shwartz, Y. Singer, N. Srebro, Pegasos: primal estimated sub-gradient solver for svm, in: Proceedings of the...
- S. Shalev-Shwartz, A. Tewari, Stochastic methods for l1 regularized loss minimization, in: Proceedings of the 26th...
- et al.
Efficient online and batch learning using forward backward splitting
J. Mach. Learn. Res.
(2009) - H. Núnez, C. Angulo, A. Català, Rule extraction from support vector machines, in: Proceedings of European Symposium on...
- C.J.C. Burges, Simplified support vector decision rules, in: L. Saitta (Ed.), Proceedings of the 13th International...
- X. Chen, Q. Lin, J. Peña, Optimal regularized dual averaging methods for stochastic optimization, in: P.L. Bartlett,...
- et al.
Adaptive subgradient methods for online learning and stochastic optimization
J. Mach. Learn. Res.
(2011)
Cited by (1)
Mini-batch algorithms with Barzilai–Borwein update step
2018, Neurocomputing
Vilen Jumutc received his B.Sc. and M.Sc. degrees in Computer Science from the Riga Technical University in 2007 and 2009 respectively. He is currently a Ph.D. researcher in the Department of Electrical Engineering (ESAT) of the Katholieke Universiteit Leuven. His interests include large-scale stochastic and online learning problems, kernel methods, semi-supervised learning and convex optimization.
Johan A.K. Suykens was born in Willebroek Belgium, May 18, 1966. He received the master degree in Electro-Mechanical Engineering and the Ph.D. degree in Applied Sciences from the Katholieke Universiteit Leuven, in 1989 and 1995 respectively. In 1996 he has been a Visiting Postdoctoral Researcher at the University of California, Berkeley. He has been a Postdoctoral Researcher with the Fund for Scientific Research FWO Flanders and is currently a Professor (Hoogleraar) with KU Leuven. He is Author of the books “Artificial Neural Networks for Modelling and Control of Non-linear Systems” (Kluwer Academic Publishers) and “Least Squares Support Vector Machines” (World Scientific), co-author of the book “Cellular Neural Networks, Multi-Scroll Chaos and Synchronization” (World Scientific) and Editor of the books “Nonlinear Modeling: Advanced Black-Box Techniques” (Kluwer Academic Publishers), “Advances in Learning Theory: Methods, Models and Applications” (IOS Press) and “Regularization, Optimization, Kernels, and Support Vector Machines” (Chapman & Hall/CRC). In 1998 he organized an International Workshop on Nonlinear Modelling with Time-Series Prediction Competition. He has served as an Associate Editor for the IEEE Transactions on Circuits and Systems (1997–1999 and 2004–2007) and for the IEEE Transactions on Neural Networks (1998–2009). He received an IEEE Signal Processing Society 1999 Best Paper Award and several Best Paper Awards at International Conferences. He is a recipient of the International Neural Networks Society INNS 2000 Young Investigator Award for significant contributions in the field of neural networks. He has served as a Director and Organizer of the NATO Advanced Study Institute on Learning Theory and Practice (Leuven 2002), as a Program Co-Chair for the International Joint Conference on Neural Networks 2004 and the International Symposium on Nonlinear Theory and its Applications 2005, as an Organizer of the International Symposium on Synchronization in Complex Networks 2007, a Co-Organizer of the NIPS 2010 Workshop on Tensors, Kernels and Machine Learning, and Chair of ROKS 2013. He has been awarded an ERC Advanced Grant 2011 and has been elevated IEEE Fellow 2015 for developing least squares support vector machines.