Towards Evaluating Policy Optimisation Agents Using Algorithmic Intelligence Quotient Test

Vadinský, Ondřej; Zeman, Petr

doi:10.1007/978-3-031-50396-2_25

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1947))

Included in the following conference series:

European Conference on Artificial Intelligence

232 Accesses

Abstract

With the advent of more powerful AI systems, the issue of theoretically well-founded and more robust methods for general evaluation of intelligence in (not only) artificial systems increases in importance. The Algorithmic Intelligence Quotient Test (AIQ test) is an example of a reasonably well-founded yet practically feasible test of intelligence. Deep Reinforcement Learning offers a powerful framework that enables artificial agents to learn how to act in unknown environments of realistic complexities. Vanilla Policy Gradient (VPG) and Proximal Policy Optimisation (PPO) are two examples of model-free on-policy deep reinforcement learning agents. In this paper, a computational experiment with the AIQ test is conducted that evaluates VPG and PPO agents and compares them to classical off-policy Q-learning. An initial analysis of the results indicates that while the maximum AIQ achieved is comparable for the tested agents given sufficient training time, large differences show with short training times. Corresponding to previous research, on-policy methods have lower starting positions than off-policy methods, and PPO learns faster than VPG. This further depends on steps-per-epoch parameter setting of PPO and VPG agents. These findings indicate the utility of the AIQ test as an AI evaluation method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Achiam, J.: Proximal policy optimization. In: Spinning Up in Deep RL (2018). https://spinningup.openai.com/en/latest/algorithms/ppo.html
Achiam, J.: Spinning up in deep RL (2018). https://spinningup.openai.com/en/latest/
Achiam, J.: Vanilla policy gradient. In: Spinning Up in Deep RL (2018). https://spinningup.openai.com/en/latest/algorithms/vpg.html
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: Arcade learning environment: an evaluation platform for general agents. J. Artifi. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Brockman, G., et al.: Openai gym. Tech. Rep. 1606, 01540 (2016)
Google Scholar
Chaitin, G.J.: Algorithmic Information Theory, Cambridge Tracts in Theoretical Computer Science, vol. 1. Cambridge University Press, Cambridge, 3 edn. (1987)
Google Scholar
Chaitin, G.J.: Information, 2nd edn. Randomness and Incompleteness. World Scientific, Singapore (1990)
Google Scholar
Genesereth, M., Love, N., Pell, B.: General game playing: overview of the AAAI competition. AI Mag. 26(2), 62–72 (2005)
Google Scholar
Genesereth, M., Thielscher, M.: General Game Playing, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 24. 1 edn. (2014)
Google Scholar
Goertzel, B., Pennachin, C. (eds.): Artificial General Intelligence, Cognitive Technologies, vol. 8. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-68677-4
Book Google Scholar
Hernández-Orallo, J.: Measure of All Minds, The. Cambridge University Press, Cambridge, 1 edn. (2017). https://doi.org/10.1017/9781316594179
Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: towards an anytime intelligence test. Artif. Intell. 174(18), 1508–1539 (2010). https://doi.org/10.1016/j.artint.2010.09.006
Article MathSciNet Google Scholar
Hernández-Orallo, J.: Beyond the turing test. J. Logic Lang. Inform. 9(4), 447–466 (2000). https://doi.org/10.1023/A:1008367325700
Article MathSciNet Google Scholar
Hernández-Orallo, J., Loe, B.S., Cheke, L., Martínez-Plumed, F., hÉigeartaigh, S.: General intelligence disentangled via a generality metric for natural and artificial intelligence. Nat. Sci. Rep. 11(1), 1–16 (2021). https://doi.org/10.1038/s41598-021-01997-7
Hibbard, B.: Bias and no free lunch in formal measures of intelligence. J. Artifi. Gen. Intell. 1(1), 54–61 (2009). https://doi.org/10.2478/v10229-011-0004-6
Article Google Scholar
Hutter, M., Legg, S.: Temporal difference updating without a learning rate. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Proceedings of the 21st Annual Conference on Advances in Neural Information Processing Systems, NIPS 2007, pp. 705–712. Curran Associates Inc, New York (2007)
Google Scholar
Insa-Cabrera, J., Dowe, D.L., España-Cubillo, S., Hernández-Lloreda, M.V., Hernández-Orallo, J.: Comparing humans and AI agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 122–132. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22887-2_13
Chapter Google Scholar
Kolmogorov, A.N.: On tables of random numbers. Sankhyā: Indian J. Stat. Ser. A 4(25), 369–376 (1963). https://doi.org/10.1016/S0304-3975(98)00075-9
Legg, S., Hutter, M.: A collection of definitions of intelligence. In: Goertzel, B., Wang, P. (eds.) Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms, Frontiers in Artificial Intelligence and Applications, vol. 157, pp. 17–24. IOS Press, Amsterdam (2007)
Google Scholar
Legg, S., Hutter, M.: Universal intelligence: a definition of machine intelligence. Mind. Mach. 17(4), 391–444 (2007). https://doi.org/10.1007/s11023-007-9079-x
Article Google Scholar
Legg, S., Veness, J.: AIQ: Algorithmic intelligence quotient [source codes] (2011). https://github.com/mathemajician/AIQ
Legg, S., Veness, J.: An approximation of the universal intelligence measure. In: Dowe, D.L. (ed.) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. LNCS, vol. 7070, pp. 236–249. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-44958-1_18
Chapter Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Article Google Scholar
Müller, U.: Dev/lang/brainfuck-2.lha in aminet (1993). http://aminet.net/package.php?package=dev/lang/brainfuck-2.lha
Saeed, W., Omlin, C.: Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 263, 110273 (2023). https://doi.org/10.1016/j.knosys.2023.110273
Schellaert, W., et al.: Your prompt is my command: on assessing the human-centred generality of multimodal models. J. Artifi. Intell. Res. 2023(77), 377–394 (2023)
Google Scholar
Schrittwieser, J., et al.: Mastering Atari, go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020). https://doi.org/10.1038/s41586-020-03051-4
Article Google Scholar
Schulman, J.: Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs. Ph.D. thesis, University of California, Berkeley (2016)
Google Scholar
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, PMLR 37, pp. 1889–1897 (2015)
Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016 (2016)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. Tech. Rep. 1707.06347, OpenAI (2017)
Google Scholar
Skansi, S.: Introduction to Deep Learning : From Logical Calculus to Artificial Intelligence. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73004-2
Book Google Scholar
Solomonoff, R.J.: A formal theory of inductive inference, part 1 and part 2. Inf. Control 7(1–22), 224–254 (1964). https://doi.org/10.1016/S0019-9958(64)90131-7
Article MathSciNet Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)
Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Müller, K. (eds.) NIPS 1999: Proceedings of the 12th International Conference on Neural Information Processing Systems, pp. 1057–1063. MIT Press, Cambridge (1999)
Google Scholar
Vadinský, O.: AIQ: Algorithmic intelligence quotient [source codes] (2018). https://github.com/xvado00/AIQ/archive/v1.3.zip
Vadinský, O.: Towards general evaluation of intelligent systems: lessons learned from reproducing AIQ test results. J. Artifi. Gen. Intell. 9(1), 1–54 (2018). https://doi.org/10.2478/jagi-2018-0001
Vadinský, O.: Towards general evaluation of intelligent systems: using semantic analysis to improve environments in the AIQ test. In: Iklé, M., Franz, A., Rzepka, R., Goertzel, B. (eds.) AGI 2018. LNCS (LNAI), vol. 10999, pp. 248–258. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97676-1_24
Chapter Google Scholar
Wang, P.: On defining artificial intelligence. J. Artifi. Gen. Intell. 2(10), 1–37 (2019). https://doi.org/10.2478/jagi-2019-0002
Article Google Scholar
Watkins, C.: Learning from Delayed Rewards. Ph.D. thesis, University of Cambridge, Kings College, Cambridge (1989)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992). https://doi.org/10.1007/BF00992696
Article Google Scholar

Download references

Acknowledgements

This work was funded by the Internal Grant Agency of Prague University of Economics and Business (F4/41/2023). Computational resources were kindly provided by the project “e-Infrastruktura CZ” (e-INFRA CZ LM2018140) supported by the Ministry of Education, Youth and Sports of the Czech Republic.

Author information

Authors and Affiliations

Department of Information and Knowledge Engineering, Prague University of Economics and Business, Prague, Czech Republic
Ondřej Vadinský & Petr Zeman

Authors

Ondřej Vadinský
View author publications
You can also search for this author in PubMed Google Scholar
Petr Zeman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ondřej Vadinský .

Editor information

Editors and Affiliations

Halmstad University, Halmstad, Sweden
Sławomir Nowaczyk
Warsaw University of Technology, Warsaw, Poland
Przemysław Biecek
Warsaw University, Warsaw, Poland
Neo Christopher Chung
University of Huddersfield, Huddersfield, UK
Mauro Vallati
AGH University of Science and Technology, Kraków, Poland
Paweł Skruch
AGH University of Science and Technology, Kraków, Poland
Joanna Jaworek-Korjakowska
University of Huddersfield, Huddersfield, UK
Simon Parkinson
University of Huddersfield, Huddersfield, UK
Alexandros Nikitas
Universität Osnabrück, Osnabrück, Germany
Martin Atzmüller
University of Economics Prague, Prague, Czech Republic
Tomáš Kliegr
University of Bamberg, Bamberg, Germany
Ute Schmid
Jagiellonian University, Kraków, Poland
Szymon Bobek
Jožef Stefan Institute, Ljubljana, Slovenia
Nada Lavrac
HU University of Applied Sciences Utrecht, Utrecht, The Netherlands
Marieke Peeters
Rotterdam University of Applied Sciences, Rotterdam, The Netherlands
Roland van Dierendonck
Amsterdam University of Applied Sciences, Amsterdam, The Netherlands
Saskia Robben
University of Reims Champagne-Ardenne, Reims, France
Eunika Mercier-Laurent
Istanbul Technical University, Istanbul, Türkiye
Gülgün Kayakutlu
Wroclaw University of Economics and Business, Wrocław, Poland
Mieczyslaw Lech Owoc
University of Galway, Galway, Ireland
Karl Mason
University of Galway, Galway, Ireland
Abdul Wahid
University of Calabria, Rende, Italy
Pierangela Bruno
University of Calabria, Rende, Italy
Francesco Calimeri
Marche Polytechnic University, Ancona, Italy
Francesco Cauteruccio
University of Calabria, Rende, Italy
Giorgio Terracina
University of Bamberg, Bamberg, Germany
Diedrich Wolter
Coburg University of Applied Sciences, Coburg, Germany
Jochen L. Leidner
FAU Erlangen-Nürnberg, Erlangen, Germany
Michael Kohlhase
University of Leeds, Leeds, UK
Vania Dimitrova

Appendix

A compact view of the results of all tested VPG and PPO configurations to facilitate visual comparison is shown by Fig. 3.

Full experiment settings, as well as results of the conducted analyses and experiments, are available from: https://github.com/xvado00/TEPOA/archive/refs/tags/XI-ML23.zip.

Full sources of the AIQ test (a Python 3 conversion of [36]), including the implementation of VPG and PPO agents, are available from: https://github.com/zemp02/AIQ/archive/refs/tags/v2.1.zip.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vadinský, O., Zeman, P. (2024). Towards Evaluating Policy Optimisation Agents Using Algorithmic Intelligence Quotient Test. In: Nowaczyk, S., et al. Artificial Intelligence. ECAI 2023 International Workshops. ECAI 2023. Communications in Computer and Information Science, vol 1947. Springer, Cham. https://doi.org/10.1007/978-3-031-50396-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-50396-2_25
Published: 21 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50395-5
Online ISBN: 978-3-031-50396-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Evaluating Policy Optimisation Agents Using Algorithmic Intelligence Quotient Test

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation