Abstract
In the k-Restricted-Focus-of-Attention (k-RFA) model, only k of the n attributes of each example are revealed to the learner, although the set of visible attributes in each example is determined by the learner. While thek -RFA model is a natural extension of the PAC model, there are also significant differences. For example, it was previously known that learnability in this model is not characterized by the VC-dimension and that many PAC learning algorithms are not applicable in the k-RFA setting.
In this paper we further explore the relationship between the PAC and k -RFA models, with several interesting results. First, we develop an information-theoretic characterization of k-RFA learnability upon which we build a general tool for proving hardness results. We then apply this and other new techniques for studying RFA learning to two particularly expressive function classes,k -decision-lists (k-DL) and k-TOP, the class of thresholds of parity functions in which each parity function takes at most k inputs. Among other results, we prove a hardness result for k-RFA learnability of k-DL,k ≤ n-2 . In sharp contrast, an (n-1)-RFA algorithm for learning (n-1)-DL is presented. Similarly, we prove that 1-DL is learnable if and only if at least half of the inputs are visible in each instance. In addition, we show that there is a uniform-distribution k-RFA learning algorithm for the class of k -DL. For k-TOP we show weak learnability by ak -RFA algorithm (with efficient time and sample complexity for constant k) and strong uniform-distribution k-RFA learnability of k-TOP with efficient sample complexity for constant k. Finally, by combining some of our k-DL and k-TOP results, we show that, unlike the PAC model, weak learning does not imply strong learning in the k -RFA model.
Article PDF
Similar content being viewed by others
References
Aho, A. V., Hopcroft, J. E., and Ullman, J. D. (1974). The Design and Analysis of Computer Algorithms. Addison-Wesley.
Angluin, D. and Valiant, L. G. (1979). Fast probabilistic algorithms for hamiltonian circuits and matchings. Journal of Computer Systems and Sciences, 18:155–193.
Ben-David, S. and Dichterman, E. (1993). Learning with restricted focus of attention. In Proceedings of the 6th Annual Conference on Computational Learning Theory, pages 287–296. ACM Press, New York, NY.
Ben-David, S. and Dichterman, E. (1994). Learnability with restricted focus of attention guarantees noisetolerance. In 5th International Workshop on Algorithmic Learning Theory, ALT'94, pages 248–259.
Ben-David, S. and Dichterman, E. (1997). Learning with restricted focus of attention. Technical Report LSE-CDAM-97-01, London School of Economics.
Blum, A. and Chalasani, P. (1992). Learning switching concepts. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, pages 231–242.
Blum, A., Furst, M., Jackson, J. C., Kearns, M., Mansour, Y., and Rudich, S. (1994). Weakly learning DNF and characterizing statistical query learning using fourier analysis. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing, pages 253–262.
Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension. JACM, 36(4):929–965.
Bruck, J. (1990). Harmonic analysis of polynomial threshold functions. SIAM Journal of Discrete Mathematics, 3(2):168–177.
Chow, C. (1961). On the characterization of threshold functions. In Proc. Symp. on Switching Circuit Theory and Logical Design, pages 34–38.
Decatur, S. E. and Gennaro, R. (1995). On learning from noisy and incomplete examples. In Proceedings of the 8th Annual Conference on Computational Learning Theory, pages 353–360.
Domingo, C. Private communication.
Ehrenfeucht, A., Haussler, D., Kearns, M., and Valiant, L. G. (1989). A general lower bound on the number of examples needed for learning. Information and Computation, 82:247–261. First appeared in Proceedings of the 1st Annual Workshop on Computational Learning Theory.
Freund, Y. (1990). Boosting a weak learning algorithm by majority. In Proceedings of the 3rd Annual Workshop on Computational Learning Theory, pages 202–216. Morgan Kaufmann, San Mateo, CA.
Freund, Y. (1993). Data Filtering and Distribution Modeling Algorithms for Machine Learning. PhD thesis, University of California, Santa Cruz.
Goldman, S. A. and Sloan, R. H. (1995). Can PAC learning algorithms tolerate random attribute noise? Algorithmica, 14:70–84.
Jackson, J. C. (1994). An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. In Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science, pages 42–53.
Jackson, J. C. (1995). The Harmonic Sieve: A Novel Application of Fourier Analysis to Machine Learning Theory and Practice. PhD thesis, Carnegie Mellon University. Available as technical report CMU-CS-95-183.
Kearns, M. J. (1993). Efficient noise-tolerant learning from statistical queries. In Proceedings of the 25th Annual ACM Symposium on Theory of Computing, pages 392–401.
Kearns, M. J. and Li, M. (1993). Learning in the presence of malicious errors. SIAM J. Comput., 22:807–837.
Kearns, M. J. and Schapire, R. E. (1990). Efficient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Annual IEEE Symposium on Foundations of Computer Science, pages 382–391.
Kearns, M. J., Schapire, R. E., and Sellie, L. M. (1992). Towards efficient agnostic learning. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, pages 341–352.
Linial, N., Mansour, Y., and Nisan, N. (1993). Constant depth circuits, Fourier transform, and learnability. Journalof the ACM, 40(3):607–620. Earlier version appeared in Proceedings of the 30th Annual Symposium on Foundations of Computer Science, pages 574–579, 1989.
Rivest, R. L. (1987). Learning decision lists. Machine Learning, 2:229–246.
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5:197–227.
Simon, H. U. (1995). Learning decision lists and trees with equivalence-queries. In Proceedings of the 2nd European Conference on Computational Learning Theory, pages 322–336.
Valiant, L. G. (1984). A theory of the learnable. CACM, 27(11):1134–1142.
Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its applications, XVI(2):264–280.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Birkendorf, A., Dichterman, E., Jackson, J. et al. On Restricted-Focus-of-Attention Learnability of Boolean Functions. Machine Learning 30, 89–123 (1998). https://doi.org/10.1023/A:1007458528570
Issue Date:
DOI: https://doi.org/10.1023/A:1007458528570