Abstract
Maze problems represent a simplified virtual model of the real environment and can be used for developing core algorithms of many real-world application related to the problem of navigation. Learning Classifier Systems (LCS) are the most widely used class of algorithms for reinforcement learning in mazes. However, LCSs best achievements in maze problems are still mostly bounded to non-aliasing environments, while LCS complexity seems to obstruct a proper analysis of the reasons for failure. Moreover, there is a lack of knowledge of what makes a maze problem hard to solve by a learning agent. To overcome this restriction we try to improve our understanding of the nature and structure of maze environments. In this paper we describe a new LCS agent that has a simpler and more transparent performance mechanism. We use the structure of a predictive LCS model, strip out the evolutionary mechanism, simplify the reinforcement learning procedure and equip the agent with the ability to Associative Perception, adopted from psychology. We then assess the new LCS with Associative Perception on an extensive set of mazes and analyse the results to discover which features of the environments play the most significant role in the learning process. We identify a particularly hard feature for learning in mazes, aliasing clones, which arise when groups of aliasing cells occur in similar patterns in different parts of the maze. We discuss the impact of aliasing clones and other types of aliasing on learning algorithms.
Similar content being viewed by others
References
Arai S, Sycara K (2001) Credit assignment method for learning effective stochastic policies in uncertain domain. In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt H-M, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pp 815–822. San Francisco, California, USA, 7–11 2001. Morgan Kaufmann
Bagnall AJ, Smith GD (2005) A multi-agent model of the UK market in electricity generation. IEEE Trans Evol Comput 9(5)
Bagnall AJ, Zatuchna ZV (2005) On the classification of maze problems. In: Bull L, Kovacs T (eds) Foundations of Learning Classifier Systems. Springer, pp 307–316
Browne W, Scott D (2005) An abstraction agorithm for genetics-based reinforcement learning. In: Beyer H-G, et al (eds) GECCO 2005: proceedings of the 2005 conference on genetic and evolutionary computation, vol 2, pp 1875–1882, 25–29 June 2005, ACM Press, Washington, DC, USA
Bull L (2002) Lookahead latent learning in ZCS. In: Langdon WB, Cantú-Paz E, Mathias K, Roy R, Davis D, Poli R, Balakrishnan K, Honavar V, Rudolph G, Wegener J, Bull L, Potter MA, Schultz AC, Miller JF, Burke E, Jonoska N (eds) GECCO 2002: proceedings of the genetic and evolutionary computation conference, pp 897–904, 9–13 July 2002. Morgan Kaufmann Publishers, New York
Bull L, Hurst J (2001) ZCS: theory and practice. Technical Report 01-001, UWE Learning Classifier Systems Group
Bull L, Hurst J (2002) ZCS redux. Evol Comput 10(2):185–205
Bull L, Hurst J (2003) A neural Learning Classifier System with self-adaptive constructivism. Technical report, University of the West of England
Butz MV, Goldberg DE, Stolzmann W (2000) Probability-enhanced predictions in the Anticipatory Classifier System. In: Proceedings of the International Workshop on Learning Classifier Systems (IWLCS-2000), in the joint workshops of SAB 2000 and PPSN 2000 [1]. Extended abstract
Cassandra AR, Kaelbling LP, Littman ML (1994) Acting optimally in partially observable stochastic domains. In: Proceedings of the twelfth national conference on artificial intelligence (AAAI-94), vol 2, pp 1023–1028. MIT Press
Cliff D, Ross S (1994) Adding temporary memory to ZCS. Adapt Behav 3(2):101–150
Hoffman J (1993) Vorhersage und Erkenntnis. Gottingen, Hogrefe
Holland JH, Reitman JS (1978) Cognitive systems based on adaptive algorithms. In: Waterman DA, Hayes-Roth F (eds) Pattern-directed inference systems. Academic Press, New York
Hurst J, Bull L (2000) A self-adaptive Classifier System. In: Lanzi PL [1], pp 70–79. Extended abstract
Lanzi PL (1997a) A model of the environment to avoid local learning (an analysis of the generalization mechanism of XCS). Technical Report 97.46, Politecnico di Milano. Department of Electronic Engineering and Information Sciences. http://ftp.elet.polimi.it/people/lanzi/report46.ps.gz
Lanzi PL (1997b) Solving problems in partially observable environments with Classifier Systems (Experiments on adding memory to XCS). Technical Report 97.45, Politecnico di Milano. Department of Electronic Engineering and Information Sciences. http://ftp.elet.polimi.it/people/lanzi/report45.ps.gz
Lanzi PL (1997c) A study of the generalization capabilities of XCS. In: Bäck T (ed) Proceedings of the 7th International Conference on Genetic Algorithms (ICGA97), pp 418–425. Morgan Kaufmann, http://ftp.elet.polimi.it/people/lanzi/icga97.ps.gz
Lanzi PL (1998) An analysis of the memory mechanism of XCSM. In: Koza JR, Banzhaf W, Chellapilla K, Deb K, Dorigo M, Fogel DB, Garzon MH, Goldberg DE, Iba H, Riolo R (eds) Genetic programming 1998: proceedings of the third annual conference, pp 643–651. Morgan Kaufmann, http://ftp.elet.polimi.it/people/lanzi/gp98.ps.gz
Lanzi PL, Wilson SW (1999) Optimal Classifier System performance in non-Markov environments. Technical Report 99.36, Dipartimento di Elettronica e Informazione – Politecnico di Milano
Littman ML (1992) An optimization-based categorization of reinforcement learning environments. In: Roitblatand J-AMH (ed) From animals to animats 2: proceedings of the second international conference on simulation of adaptive behavior. The MIT Press/Bradford Books
Littman ML (1995) Learning policies for partially observable environments: scaling up. In: Proceedings of the twelfth international conference on machine learning
Lorenz K (1935) Der kumpan in der umwelt des vogels. J Ornithol 137–215
Maze material for AgentP (2005) http://www.cmp.uea.ac.uk/Research/ kdd/projects.php?project=17
McCallum AR (1993) Overcoming incomplete perception with utile distinction memory. In: The proceedings of the tenth international machine learning conference
Métivier M, Lattaud C (2002) Anticipatory Classifier System using behavioral sequences in non-Markov environments. In: IWLCS, pp 143–162
Miyazaki K, Kobayashi S (1999) Proposal for an algorithm to improve a rational policy in POMDPs. In: Proc of international conference on Systems, Man and Cybernetics (SMC 99), pp 492–497
O’Hara T, Bull L (2005) A memetic accuracy-based neural Learning Classifier System. In: Proceedings of the IEEE congress on evolutionary computation, pp 2040–2045. IEEE
Pavlov IP (1927) Conditioned reflexes. Oxford University Press, London
Proceedings of the International Workshop on Learning Classifier Systems (IWLCS-2000), in the joint workshops of SAB 2000 and PPSN 2000 (2000). Pier Luca Lanzi, Wolfgang Stolzmann and Stewart W. Wilson (workshop organisers)
Skinner BF (1953) Science and human behavior. Macmillan, New York
Stolzmann W (2000) An introduction to Anticipatory Classifier Systems. In: Stolzmann W, Lanzi PL, Wilson SW (eds) Learning Classifier Systems, from foundations to applications. Springer-Verlag, pp 175–194
Studley M, Bull L (2005) Using the XCS classifier system for multi-objective reinforcement learning problems. Technical report, University of the West of England
Thorndike EL (1911) Animal intelligence. Hafner, Darien, CT
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):272–292
Wertheimer M (1938) Laws of organization in perceptual forms. In: A source book of gestalt psychology. Routledge and Kegan Paul, London, pp 71–88
Wilson SW (1990) The animat path to AI. In: Meyer JA, Wilson SW (eds) From animals to animats 1. Proceedings of the first international conference on Simulation of Adaptive Behavior (SAB90), pp 15–21. A Bradford book. MIT Press, http://prediction-dynamics.com/
Wilson SW (1994) ZCS: a zeroth level Classifier System. Evol Comput 2(1):1–18
Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175
Zatuchna ZV (2004) AgentP model: Learning Classifier System with Associative Perception. In: Yao X et al (eds) Proceedings of the Parallel Problem Solving from Nature Conference (PPSN), vol 3242, of Lecture Notes in Computer Science, pp 1172–1182. Springer
Zatuchna ZV (2006) AgentP: A Learning Classifier System with Associative Perception in Maze Environments. PhD Thesis, School of Computing Sciences, University of East Anglia
Zatuchna ZV, Bagnall AJ (2005) AgentP classifier system: Self-adjusting vs. Gradual approach. In: Proceedings of the 2005 congress on evolutionary computation, pp 1196–1203
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zatuchna, Z.V., Bagnall, A.J. A learning classifier system for mazes with aliasing clones. Nat Comput 8, 57–99 (2009). https://doi.org/10.1007/s11047-007-9055-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047-007-9055-7