On the computational complexity of approximating distributions by probabilistic automata

Abe, Naoki; Warmuth, Manfred K.

doi:10.1007/BF00992677

On the computational complexity of approximating distributions by probabilistic automata

Published: July 1992

Volume 9, pages 205–260, (1992)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

On the computational complexity of approximating distributions by probabilistic automata

Download PDF

Naoki Abe¹ &
Manfred K. Warmuth²

779 Accesses
60 Citations
Explore all metrics

Abstract

We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of approximating an arbitrary, unknown source distribution by distributions generated by a PA. We investigate the following question about this important, well-studied problem: Does there exist anefficient training algorithm such that the trained PAsprovably converge to a model close to an optimum one with high confidence, after only a feasibly small set of training data? We model this problem in the framework of computational learning theory and analyze the sample as well as computational complexity. We show that the number of examples required for training PAs is moderate—except for some log factors the number of examples is linear in the number of transition probabilities to be trained and a low-degree polynomial in the example length and parameters quantifying the accuracy and confidence. Computationally, however, training PAs is quite demanding: Fixed state size PAs are trainable in time polynomial in the accuracy and confidence parameters and example length, butnot in the alphabet size unlessRP=NP. The latter result is shown via a strong non-approximability result for the single string maximum likelihood model probem for 2-state PAs, which is of independent interest.

References

Abe, N., Takeuchi, J., & Warmuth, M. K., (1991). Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence. InProceedings of the 1991 Workshop on Computational Learning Theory. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Angluin, D., (1978). On the complexity of minimal inference of regular sets.Information and Control, 39 337–350.
Google Scholar
Angluin, D., (1988).Identifying languages from stochastic examples (Technical Report YALEU/DCS/RR-614). Yale University.
Angluin, D., (1989).Minimum consistent 2-state DFA problem is NP-complete. Unpublished manuscript.
Barron, A.R. & Cover, T.M., (1989). Minimum complexity density estimation.IEEE Transactions on Information Theory.
Baum, L.E., (1972). An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process.Inequalities, 3 1–8.
Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, (1989). Learnability and the Vapnik-Chervonenkis dimension.Journal of the ACM, 36 929–965.
Google Scholar
Gill, J., (1977) Probabilistic Turing machines.SLAM J. Comput., 6 675–695.
Google Scholar
Gold, E.M., (1978). Complexity of automaton identification from given data.Information and Control, 37 302–320.
Google Scholar
Hamming, R.W., (1986). Coding and Information Theory, Second Edition. Prentice-Hall.
Haussler, D., (1991). Decision theoretic generalizing of the pac model for neural net and other learning applications.Information and Computation. To appear. (An extended abstract appeared in the Proceedings of FOCS '89.)
Kearns, M., & Schapire, R., (1990). Efficient distribution-free learning of probabilitic concepts. InProceedings of IEEE Symposium on Foundations of Computer Science.
Kullback, S., (1967). A lower bound for discrimination in terms of variation.IEEE Transactions on Information Theory, 126–127.
Laird, P.D. (1988). Efficient unsupervised learning. InProceedings of the 1988 Workshop on Computational Learning Theory. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Levinson, S.E., Rabiner, L.R., & Sondhi, M.M., (1983). An introduction to the application of the theory of the probabilistic functions of a Markov process to automatic speech recognition.The Bell System Technical Journal, 62.
Pitt, L., & Warmuth, M.K., (1989). The minimum consistent DFA problem cannot be approximated within any polynomial. InProc. 19th ACM Symp. on Theory of Computation. To appear in JACM.
Pollard, D., (1984).Convergence of Stochastic Processes. Springer-Verlag.
Tzeng, W., (1989). The equivalence and learning of probabilistic automata. InProceedings of the 30th IEEE Annual Symposium on the Foundations of Computer Science.
Valiant, L.G., (1984). A theory of the learnable.Communications of A.C.M., 27 1134–1142.
Google Scholar
Yamanishi, K., (1991). A learning criterion for stochastic rules.Machine Learning, 9,.

Download references

Author information

Authors and Affiliations

Information Basic Research Laboratory, C&C Information Technology Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyarnae-ku, 216, Kawasaki, Japan
Naoki Abe
Computer Engineering and Information Sciences, University of California, 95064, Santa Cruz, CA
Manfred K. Warmuth

Authors

Naoki Abe
View author publications
You can also search for this author in PubMed Google Scholar
Manfred K. Warmuth
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abe, N., Warmuth, M.K. On the computational complexity of approximating distributions by probabilistic automata. Mach Learn 9, 205–260 (1992). https://doi.org/10.1007/BF00992677

Download citation

Issue Date: July 1992
DOI: https://doi.org/10.1007/BF00992677

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the computational complexity of approximating distributions by probabilistic automata

Abstract

Article PDF

Similar content being viewed by others

Learning Probability Distributions Generated by Finite-State Machines

A Generic Algorithm for Learning Symbolic Automata from Membership Queries

Algorithmic Information Theory and Computational Complexity

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the computational complexity of approximating distributions by probabilistic automata

Abstract

Article PDF

Similar content being viewed by others

Learning Probability Distributions Generated by Finite-State Machines

A Generic Algorithm for Learning Symbolic Automata from Membership Queries

Algorithmic Information Theory and Computational Complexity

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation