Article

Exponentiated gradient algorithms for log-linear structured prediction

Authors:
Amir Globerson

Computer Science and Artificial Intelligence Laboratory, Cambridge, MA

Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
View Profile

,
Terry Y. Koo

Computer Science and Artificial Intelligence Laboratory, Cambridge, MA

Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
View Profile

,
Xavier Carreras

Computer Science and Artificial Intelligence Laboratory, Cambridge, MA

Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
View Profile

,
Michael Collins

Computer Science and Artificial Intelligence Laboratory, Cambridge, MA

Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
View Profile

ICML '07: Proceedings of the 24th international conference on Machine learningJune 2007Pages 305–312https://doi.org/10.1145/1273496.1273535

Published:20 June 2007Publication History

ICML '07: Proceedings of the 24th international conference on Machine learning

Pages 305–312

ABSTRACT

Conditional log-linear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maximum likelihood objective; this results in both sequential and parallel update algorithms, where in the sequential algorithm parameters are updated in an online fashion. We provide a convergence proof for both algorithms. Our analysis also simplifies previous results on EG for max-margin models, and leads to a tighter bound on convergence rates. Experiments on a large-scale parsing task show that the proposed algorithm converges much faster than conjugate-gradient and L-BFGS approaches both in terms of optimization objective and test error.

References

Baker, J. (1979). Trainable grammars for speech recognition. 97th meeting of the Acoustical Society of America.Google ScholarCross Ref
Bartlett, P., Collins, M., Taskar, B., & McAllester, D. (2004). Exponentiated gradient algorithms for large--margin structured classification. NIPS.Google Scholar
Beck, A., & Teboulle, M. (2003). Mirror descent and non-linear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167--175. Google ScholarDigital Library
Buchholz, S., & Marsi, E. (2006). CoNLL-X shared task on multilingual dependency parsing. Proc. of CoNLL-X. Google ScholarDigital Library
Collins, M., Schapire, R., & Singer, Y. (2002). Logistic regression, adaboost and bregman distances. Machine Learning, 48, 253--285. Google ScholarDigital Library
Cover, T., & Thomas, J. (1991). Elements of information theory. Wiley. Google ScholarDigital Library
Jaakkola, T., & Haussler, D. (1999). Probabilistic kernel regression models. Proc. of AISTATS.Google Scholar
Keerthi, S., Duan, K., Shevade, S., & Poo, A. N. (2005). A fast dual algorithm for kernel logistic regression. Machine Learning, 61, 151--165. Google ScholarDigital Library
Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--63. Google ScholarDigital Library
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditonal random fields: Probabilistic models for segmenting and labeling sequence data. Proc. of ICML. Google ScholarDigital Library
Lebanon, G., & Lafferty, J. (2001). Boosting and maximum likelihood for exponential models. NIPS.Google Scholar
McDonald, R., Crammer, K., & Pereira, F. (2005). Online large-margin training of dependency parsers. Proc. of the 43rd Annual Meeting of the ACL. Google ScholarDigital Library
Memisevic, R. (2006). Dual optimization of conditional probability models (Technical Report). Univ. of Toronto.Google Scholar
Minka, T. (2003). A comparison of numerical optimizers for logistic regression (Technical Report). CMU.Google Scholar
Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization. In Advances in kernel methods - support vector learning. MIT Press. Google ScholarDigital Library
Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. Proc. of HLT--NAACL. Google ScholarDigital Library
Shalev-Shwartz, S., & Singer, Y. (2006). Convex repeated games and fenchel duality. NIPS.Google Scholar
Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin markov networks. NIPS.Google Scholar
Vishwanathan, S. N., Schraudolph, N. N., Schmidt, M. W., & Murphy, K. P. (2006). Accelerated training of conditional random fields with stochastic gradient methods. Proc. of ICML. Google ScholarDigital Library
Zhu, J., & Hastie, T. (2001). Kernel logistic regression and the import vector machine. NIPS.Google Scholar

Exponentiated gradient algorithms for log-linear structured prediction
1. Computing methodologies
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Convergence of exponentiated gradient algorithms

This paper studies three related algorithms: the (traditional) gradient descent (GD) algorithm, the exponentiated gradient algorithm with positive and negative weights (EG± algorithm), and the exponentiated gradient algorithm with unnormalized ...
Read More
EXPONENTIATED GRADIENT VERSUS GRADIENT DESCENT FOR LINEAR PREDICTORS
Read More
On three-term conjugate gradient algorithms for unconstrained optimization

This paper presents a project for three-term conjugate gradient algorithms development. The search direction of the algorithms from this class has three terms and is computed as modifications of the classical conjugate gradient algorithms to satisfy ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '07: Proceedings of the 24th international conference on Machine learning
June 2007
1233 pages
ISBN:9781595937933
DOI:10.1145/1273496
Editor:
Zoubin Ghahramani
University of Cambridge, United Kingdom
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 306
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exponentiated gradient algorithms for log-linear structured prediction

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Convergence of exponentiated gradient algorithms

EXPONENTIATED GRADIENT VERSUS GRADIENT DESCENT FOR LINEAR PREDICTORS

On three-term conjugate gradient algorithms for unconstrained optimization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exponentiated gradient algorithms for log-linear structured prediction

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Convergence of exponentiated gradient algorithms

EXPONENTIATED GRADIENT VERSUS GRADIENT DESCENT FOR LINEAR PREDICTORS

On three-term conjugate gradient algorithms for unconstrained optimization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media