Years and Authors of Summarized Original Work
1987; Littlestone
Problem Definition
Given here is a basic formulation using the online mistake-bound model, which was used by Littlestone [9] in his seminal work.
Fix a class C of Boolean functions over n variables. To start a learning scenario, a target function f∗ ∈ C is chosen but not revealed to the learning algorithm. Learning then proceeds in a sequence of trials. At trial t, an input \(\boldsymbol{x}_{t} \in \{ 0,1\}^{n}\) is first given to the learning algorithm. The learning algorithm then produces its prediction\(\hat{y}_{t}\), which is its guess as to the unknown value f∗(x t ). The correct value y t  = f∗(x t ) is then revealed to the learner. If \(y_{t}\neq \hat{y}_{t}\), the learning algorithm made a mistake. The learning algorithm learns C with mistake-bound m, if the number of mistakes never exceeds m, no matter how many trials are made and how f∗ and \(\boldsymbol{x}_{1},\boldsymbol{x}_{2},\ldots\) are chosen.
Variable (or...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Auer P, Warmuth MK (1998) Tracking the best disjunction. Mach Learn 32(2):127–150
Blum A, Hellerstein L, Littlestone N (1995) Learning in the presence of finitely or infinitely many irrelevant attributes. J Comput Syst Sci 50(1):32–40
Bshouty N, Hellerstein L (1998) Attribute-efficient learning in query and mistake-bound models. J Comput Syst Sci 56(3):310–319
Dhagat A, Hellerstein L (1994) PAC learning with irrelevant attributes. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe. IEEE Computer Society, Los Alamitos, pp 64–74
Gentile C, Warmuth MK (1999) Linear hinge loss and average margin. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in Neural Information Processing Systems, vol 11. MIT, Cambridge, pp 225–231
Khardon R, Roth D, Servedio RA (2005) Efficiency versus convergence of boolean kernels for on-line learning algorithms. J Artif Intell Res 24:341–356
Kivinen J, Warmuth MK (1997) Exponentiated gradient versus gradient descent for linear predictors. Inf Comput 132(1):1–64
Klivans AR, Servedio RA (2006) Toward attribute efficient learning of decision lists and parities. J Mach Learn Res 7:587–602
Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear threshold algorithm. Mach Learn 2(4):285–318
Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261
Martin RK, Sethares WA, Williamson RC, Johnson CR Jr (2002) Exploiting sparsity in adaptive filters. IEEE Trans Signal Process 50(8):1883–1894
Mossel E, O’Donnell R, Servedio RA (2004) Learning functions of k relevant variables. J Comput Syst Sci 69(3):421–434
Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Greiner R, Schuurmans D (eds) Proceedings of the 21st International Conference on Machine Learning, Banff. The International Machine Learning Society, Princeton, pp 615–622
Vovk V (1990) Aggregating strategies. In: Fulk M, Case J (eds) Proceedings of the 3rd Annual Workshop on Computational Learning Theory, Rochester. Morgan Kaufmann, San Mateo, pp 371–383
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this entry
Cite this entry
Kivinen, J. (2016). Attribute-Efficient Learning. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_43
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2864-4_43
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2863-7
Online ISBN: 978-1-4939-2864-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering