skip to main content
10.1145/1273496.1273535acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Exponentiated gradient algorithms for log-linear structured prediction

Authors Info & Claims
Published:20 June 2007Publication History

ABSTRACT

Conditional log-linear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maximum likelihood objective; this results in both sequential and parallel update algorithms, where in the sequential algorithm parameters are updated in an online fashion. We provide a convergence proof for both algorithms. Our analysis also simplifies previous results on EG for max-margin models, and leads to a tighter bound on convergence rates. Experiments on a large-scale parsing task show that the proposed algorithm converges much faster than conjugate-gradient and L-BFGS approaches both in terms of optimization objective and test error.

References

  1. Baker, J. (1979). Trainable grammars for speech recognition. 97th meeting of the Acoustical Society of America.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bartlett, P., Collins, M., Taskar, B., & McAllester, D. (2004). Exponentiated gradient algorithms for large--margin structured classification. NIPS.Google ScholarGoogle Scholar
  3. Beck, A., & Teboulle, M. (2003). Mirror descent and non-linear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Buchholz, S., & Marsi, E. (2006). CoNLL-X shared task on multilingual dependency parsing. Proc. of CoNLL-X. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Collins, M., Schapire, R., & Singer, Y. (2002). Logistic regression, adaboost and bregman distances. Machine Learning, 48, 253--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cover, T., & Thomas, J. (1991). Elements of information theory. Wiley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jaakkola, T., & Haussler, D. (1999). Probabilistic kernel regression models. Proc. of AISTATS.Google ScholarGoogle Scholar
  8. Keerthi, S., Duan, K., Shevade, S., & Poo, A. N. (2005). A fast dual algorithm for kernel logistic regression. Machine Learning, 61, 151--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditonal random fields: Probabilistic models for segmenting and labeling sequence data. Proc. of ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lebanon, G., & Lafferty, J. (2001). Boosting and maximum likelihood for exponential models. NIPS.Google ScholarGoogle Scholar
  12. McDonald, R., Crammer, K., & Pereira, F. (2005). Online large-margin training of dependency parsers. Proc. of the 43rd Annual Meeting of the ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Memisevic, R. (2006). Dual optimization of conditional probability models (Technical Report). Univ. of Toronto.Google ScholarGoogle Scholar
  14. Minka, T. (2003). A comparison of numerical optimizers for logistic regression (Technical Report). CMU.Google ScholarGoogle Scholar
  15. Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization. In Advances in kernel methods - support vector learning. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. Proc. of HLT--NAACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shalev-Shwartz, S., & Singer, Y. (2006). Convex repeated games and fenchel duality. NIPS.Google ScholarGoogle Scholar
  18. Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin markov networks. NIPS.Google ScholarGoogle Scholar
  19. Vishwanathan, S. N., Schraudolph, N. N., Schmidt, M. W., & Murphy, K. P. (2006). Accelerated training of conditional random fields with stochastic gradient methods. Proc. of ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zhu, J., & Hastie, T. (2001). Kernel logistic regression and the import vector machine. NIPS.Google ScholarGoogle Scholar
  1. Exponentiated gradient algorithms for log-linear structured prediction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICML '07: Proceedings of the 24th international conference on Machine learning
        June 2007
        1233 pages
        ISBN:9781595937933
        DOI:10.1145/1273496

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate140of548submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader