Abstract
We describe a novel framework for the design and analysis of online learning algorithms based on the notion of duality in constrained optimization. We cast a sub-family of universal online bounds as an optimization problem. Using the weak duality theorem we reduce the process of online learning to the task of incrementally increasing the dual objective function. The amount by which the dual increases serves as a new and natural notion of progress for analyzing online learning algorithms. We are thus able to tie the primal objective value and the number of prediction mistakes using the increase in the dual.
Article PDF
Similar content being viewed by others
References
Azoury, K., & Warmuth, M. (2001). Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3), 211–246.
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
Bregman, L. M. (1967). The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7, 200–217.
Censor, Y., & Zenios, S. A. (1997). Parallel optimization: theory, algorithms, and applications. New York: Oxford University Press.
Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2002). On the generalization ability of on-line learning algorithms. In Advances in neural information processing systems (Vol. 14, pp. 359–366).
Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2005). A second-order perceptron algorithm. SIAM Journal on Computing, 34(3), 640–668.
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2005). Online passive aggressive algorithms. Technical report, The Hebrew University.
Dekel, O., Shalev-Shwartz, S., & Singer, Y. (2005). The forgetron: a kernel-based perceptron on a fixed budget. In Advances in neural information processing systems (Vol. 18).
Gentile, C. (2001). A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2, 213–242.
Gentile, C. (2002). The robustness of the p-norm algorithms. Machine Learning, 53(3).
Grove, A. J., Littlestone, N., & Schuurmans, D. (2001). General convergence results for linear discriminant updates. Machine Learning, 43(3), 173–210.
Hannan, J. (1957). Approximation to Bayes risk in repeated play. In M. Dresher, A. W. Tucker, & P. Wolfe (Eds.), Contributions to the theory of games (Vol. III, pp. 97–139). Princeton: Princeton University Press.
Helmbold, D. P., Kivinen, J., & Warmuth, M. (1999). Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6), 1291–1304.
Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1), 1–64.
Kivinen, J., & Warmuth, M. (2001). Relative loss bounds for multidimensional regression problems. Journal of Machine Learning, 45(3), 301–329.
Kivinen, J., Smola, A. J., & Williamson, R. C. (2002). Online learning with kernels. IEEE Transactions on Signal Processing, 52(8), 2165–2176.
Li, Y., & Long, P. M. (2002). The relaxed online maximum margin algorithm. Machine Learning, 46(1–3), 361–387.
Littlestone, N. (1988). Learning when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2, 285–318.
Littlestone, N. (1989). Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, U.C. Santa Cruz, March 1989.
Novikoff, A. B. J. (1962). On convergence proofs on perceptrons. In Proceedings of the symposium on the mathematical theory of automata (Vol. XII, pp. 615–622).
Rockafellar, R. T. (1970). Convex analysis. Princeton: Princeton University Press.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–407. (Reprinted in Neurocomputing, MIT Press, 1988.)
Vovk, V. (2001). Competitive on-line statistics. International Statistical Review, 69, 213–248.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Hans Ulrich Simon, Gabor Lugosi, Avrim Blum.
A preliminary version of this paper appeared at the 19th Annual Conference on Learning Theory under the title “Online learning meets optimization in the dual”.
Rights and permissions
About this article
Cite this article
Shalev-Shwartz, S., Singer, Y. A primal-dual perspective of online learning algorithms. Mach Learn 69, 115–142 (2007). https://doi.org/10.1007/s10994-007-5014-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-007-5014-x