Abstract
Machine learning has not yet succeeded in the design of robust learning algorithms that generalize well from very small datasets. In contrast, humans often generalize correctly from only a single training example, even if the number of potentially relevant features is large. To do so, they successfully exploit knowledge acquired in previous learning tasks, to bias subsequent learning.
This paper investigates learning in a lifelong context. In contrast to most machine learning approaches, which aim at learning a single function in isolation, lifelong learning addresses situations where a learner faces a stream of learning tasks. Such scenarios provide the opportunity for synergetic effects that arise if knowledge is transferred across multiple learning tasks. To study the utility of transfer, several approaches to lifelong learning are proposed and evaluated in an object recognition domain. It is shown that all these algorithms generalize consistently more accurately from scarce training data than comparable “single-task” approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
W.-K. Ahn and W. F. Brewer. Psychological studies of explanation-based learning. In G. DeJong, editor, Investigating Explanation-Based Learning. Kluwer Academic Publishers, Boston/Dordrecht/London, 1993.
W.-K. Ahn, R. Mooney, W. F. Brewer, and G. F. DeJong. Schema acquisition from one example: Psychological evidence for explanation-based learning. In Proceedings of the Ninth Annual Conference of the Cognitive Science Society, Seattle, WA, July 1987.
C. A. Atkeson. Using locally weighted regression for robot learning. In Proceedings of the 1991 IEEE International Conference on Robotics and Automation, pages 958–962, Sacramento, CA, April 1991.
A. G. Barto, S. J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81–138, 1995.
J. Baxter. Learning internal representations. In Proceedings of the Conference on Computation Learning Theory, 1995.
F. Bergadano and A. Giordana. Guiding Induction with Domain Theories, pages 474–492. Morgan Kaufmann, San Mateo, CA, 1990.
D. Beymer, A. Shashua, and T. Poggio. Example based image analysis and synthesis. A.I. Memo No. 1431, November 1993.
R. Caruana. Multitask learning: A knowledge-based of source of inductive bias. In P. E. Utgoff, editor, Proceedings of the Tenth International Conference on Machine Learning, pages 41–48, San Mateo, CA, 1993. Morgan Kaufmann.
R. Caruana and D. Freitag. Greedy attribute selection. In Proceedings of the Eleventh International Conference on Machine Learning, San Mateo, CA, 1994. Morgan Kaufmann.
G. DeJong, editor. Investigating Explanation-Based Learning. Kluwer Academic Publishers, Boston, 199
G. DeJong and R. Mooney. Explanation-based learning: An alternative view. Machine Learning, 1(2): 145–176, 1986.
D. H. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172, 1987.
R. Franke. Scattered data interpolation: Tests of some methods. Mathematics of Computation, 38(157):181–200, January 1982.
J. H. Friedman. Multivariate adaptive regression splines. Annals of Statistics, 19(1): 1–141, March 1991.
J. H. Friedman. Flexible metric nearest neighbor classification. November 1994.
L-M. Fu. Integration of neural heuristics into knowledge-based inference. Connection Science, l(3):325–339, 1989.
S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4:1–58, 1992.
T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classification. Submitted for publication, December 1994.
H. Hild and A. Waibel. Multi-speaker/speaker-independent architectures for the multi-state time delay neural network. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages II 255–258. IEEE, April 1993.
M. I. Jordan and R. A. Jacobs. Hierarchies of adaptive experts. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems 4, pages 985–992, San Mateo, CA, 1992. Morgan Kaufmann.
L. P. Kaelbling, M. L. Littman, and A. W. Moore. An introduction to reinforcement learning. In L. Steels, editor,The Biology and Technology of Intelligent Autonomous Agents, pages 90–127, Berlin, Heidelberg, March 1995. Springer Publishers.
T. Kohonen. Self-Organization and Associative Memory, 2nd. edition. Springer, Berlin New York, 1988.
M. Lando and S. Edelman. Generalizing from a single view in face recognition. Technical Report CS-TR 95-02, Department of Applied Mathematics and Computer Science, The Weizmann Institute of Science, Rehovot 76100, Israel, January 1995.
N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1987.
J. J. Mahoney and R. J. Mooney. Combining symbolic and neural learning to revise probabilistic theories. In Proceedings of the 1992 Machine Learning Workshop on Integrated Learning in Real Domains, Aberdeen Scotland, July 1992.
R. S. Michalski. Knowledge acquisition through conceptual clustering: A theoretical framework and algorithm for partitioning data into conjunctive concepts. International Journal of Policy Analysis and Information Systems, 4:219–243, 1980.
T. M. Mitchell. Version Spaces: An approach to concept learning. PhD thesis, Stanford University, California, December 1978. Also Stanford CS Report STAN-CS-78-711, HPP-79-2.
T. M. Mitchell. The need for biases in learning generalizations. Technical Report CBM-TR-117, Computer Science Department, Rutgers University, New Brunswick, NJ 08904, 1980. Also appeared in: Readings in Machine Learning, J. Shavlik and T.G. Dietterich (eds.), Morgan Kaufmann.
T. M. Mitchell. Generalization as search. Artificial Intelligence, 18:203–226, 1982.
T. M. Mitchell, R. Keller, and S. Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine Learning, l(l):47–80, 1986.
T. M. Mitchell and S. Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287–294, San Mateo, CA, 1993. Morgan Kaufmann.
A. W. Moore. Efficient Memory-based Learning for Robot Control. PhD thesis, Trinity Hall, University of Cambridge, England, 1990.
A. W. Moore, D. J. Hill, and M. P. Johnson. An Empirical Investigation of Brute Force to choose Features, Smoothers and Function Approximators. In S. Hanson, S. Judd, and T. Petsche, editors, Computational Learning Theory and Natural Learning Systems, Volume 3. MIT Press, 1992.
Y. Moses, S. Ullman, and S. Edelman. Generalization across changes in illumination and viewing position in upright and inverted faces. Technical Report CS-TR 93-14, Department of Applied Mathematics and Computer Science, The Weizmann Institute of Science, Rehovot 76100, Israel, 1993.
S. Muggelton. Inductive Logic Programming. Academic Press, New York, 1992.
J. O’Sullivan, T. M. Mitchell, and S. Thrun. Explanation-based neural network learning from mobile robot perception. In K. Ikeuchi and M. Veloso, editors, Symbolic Visual Learning. Oxford University Press, 1996.
D. Ourston and R. J. Mooney. Theory refinement with noisy data. Technical Report AI 91-153, Artificial Intelligence Lab, University of Texas at Austin, March 1991.
M. J. Pazzani, C. A. Brunk, and G. Silverstein. A knowledge-intensive approach to learning relational concepts. In Proceedings of the Eighth International Workshop on Machine Learning, pages 432–436, Evanston, IL, June 1991.
J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers, San Mateo, CA, 198
D. A. Pomerleau. Knowledge-based training of artificial neural networks for autonomous robot driving. In J. H. Connell and S. Mahadevan, editors, Robot Learning, pages 19–43. Kluwer Academic Publishers, 1993.
L. Y. Pratt. Transferring Previously Learned Back-Propagation Neural Networks to New Learning Tasks. PhD thesis, Rutgers University, Department of Computer Science, New Brunswick, NJ 08904, May 1993. also appeared as Technical Report ML-TR-37.
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
L. Rendell, R. Seshu, and D. Tcheng. Layered concept-learning and dynamically-variable bias management. In Proceedings of IJCAI-87, pages 308–314, 1987.
J. Rennie. Cancer catcher: Neural net catches errors that slip through pap tests. Scientific American, 262, May 1990.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol. I + II. MIT Press, 198
D. E. Rumelhart, B. Widrow, and M. A. Lehr. The basic ideas in neural networks. Communications of the ACM, 37(3):87–92, March 1994.
D. E. Rumelhart and D. Zipser. Feature discovery by competitive learning. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol. I + III. MIT Press, 198
S. Schaal and C. G. Atkeson. Robot learning by nonparametric regression. In Proceedings of the IEEE/RSJ/GI International Conference on Intelligent Robots and Systems, pages 478–485, September 1994.
N. E. Sharkey and A. J. C. Sharkey. Adaptive generalization and the transfer of knowledge. In Proceedings of the Second Irish Neural Networks Conference, Belfast, 1992.
J. W. Shavlik and G. G. Towell. An approach to combining explanation-based and neural learning algorithms. Connection Science, 1(3):231–253, 1989.
D. Shepard. A two-dimensional interpolation function for irregularly spaced data. In 23rd National Conference ACM, pages 517–523, 1968.
P. Simard, B. Victorri, Y. LeCun, and J. Denker. Tangent prop-a formalism for specifying selected invariances in an adaptive network. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems 4, pages 895–903, San Mateo, CA, 1992. Morgan Kaufmann.
C. Stanfill and D. Waltz. Towards memory-based reasoning. Communications of the ACM, 29(12):1213–1228, December 1986.
S. C. Suddarth and A. Holden. Symbolic neural systems and the use of hints for developing complex systems. International Journal of Machine Studies, 35, 1991.
R. S. Sutton. Integrated modeling and control based on reinforcement learning and dynamic programming. In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 471–478, San Mateo, 1991. Morgan Kaufmann.
S. Thrun. An approach to learning mobile robot navigation. Robotics and Autonomous Systems, 15:301–319, 1995.
S. Thrun. Explanation-Based Neural Network Learning: A Lifelong Learning Approach. Kluwer Academic Publishers, Boston, MA, 1996.
S. Thrun. Is learning the n-th thing any easier than learning the first? In D. Touretzky, M. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 640–646, Cambridge, MA, 1996. MIT Press.
S. Thrun and T. M. Mitchell. Learning one more thing. In Proceedings of ?CAI-95, Montreal, Canada, August 1995. IJCAI, Inc.
S. Thrun and J. O’ Sullivan. Discovering structure in multiple learning tasks: The TC algorithm. In L. Saitta, editor, Proceedings of the Thirteenth International Conference on Machine Learning, San Mateo, CA, July 1996. Morgan Kaufmann.
G. G. Towell and J. W. Shavlik. Knowledge-based artificial neural networks. Artificial Intelligence, 70(l/2):119–165, 1994.
P. E. Utgoff. Machine Learning of Inductive Bias. Kluwer Academic Publishers, 1986.
V. Vapnik. Estimations of dependences based on statistical data. Springer Publisher, 1982.
M. M. Veloso. Learning by Analogical Reasoning in General Problem Solving. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, August 1992.
C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, England, 1989.
P. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavorial Sciences. PhD thesis, Harvard University, Committee on Applied Mathematics, Cambridge, MA, November 1994.
B. Widrow and M. E. Hoff. Adaptive Switching Circuits. Institute of Radio Engineers, Western Electronic Show and Convention, Convention Record, Part4,1960.
D. H. Wolpert. Off-training set error and a priori distinctions between learning algorithms. Technical Report SFI TR 95-01-003, Santa Fe Institute, Santa Fe, NM 87501, 1994.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Science+Business Media New York
About this chapter
Cite this chapter
Thrun, S. (1998). Lifelong Learning Algorithms. In: Thrun, S., Pratt, L. (eds) Learning to Learn. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5529-2_8
Download citation
DOI: https://doi.org/10.1007/978-1-4615-5529-2_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7527-2
Online ISBN: 978-1-4615-5529-2
eBook Packages: Springer Book Archive