Skip to main content

Towards Evaluating Policy Optimisation Agents Using Algorithmic Intelligence Quotient Test

  • Conference paper
  • First Online:
Artificial Intelligence. ECAI 2023 International Workshops (ECAI 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1947))

Included in the following conference series:

  • 232 Accesses

Abstract

With the advent of more powerful AI systems, the issue of theoretically well-founded and more robust methods for general evaluation of intelligence in (not only) artificial systems increases in importance. The Algorithmic Intelligence Quotient Test (AIQ test) is an example of a reasonably well-founded yet practically feasible test of intelligence. Deep Reinforcement Learning offers a powerful framework that enables artificial agents to learn how to act in unknown environments of realistic complexities. Vanilla Policy Gradient (VPG) and Proximal Policy Optimisation (PPO) are two examples of model-free on-policy deep reinforcement learning agents. In this paper, a computational experiment with the AIQ test is conducted that evaluates VPG and PPO agents and compares them to classical off-policy Q-learning. An initial analysis of the results indicates that while the maximum AIQ achieved is comparable for the tested agents given sufficient training time, large differences show with short training times. Corresponding to previous research, on-policy methods have lower starting positions than off-policy methods, and PPO learns faster than VPG. This further depends on steps-per-epoch parameter setting of PPO and VPG agents. These findings indicate the utility of the AIQ test as an AI evaluation method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achiam, J.: Proximal policy optimization. In: Spinning Up in Deep RL (2018). https://spinningup.openai.com/en/latest/algorithms/ppo.html

  2. Achiam, J.: Spinning up in deep RL (2018). https://spinningup.openai.com/en/latest/

  3. Achiam, J.: Vanilla policy gradient. In: Spinning Up in Deep RL (2018). https://spinningup.openai.com/en/latest/algorithms/vpg.html

  4. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: Arcade learning environment: an evaluation platform for general agents. J. Artifi. Intell. Res. 47, 253–279 (2013)

    Article  Google Scholar 

  5. Brockman, G., et al.: Openai gym. Tech. Rep. 1606, 01540 (2016)

    Google Scholar 

  6. Chaitin, G.J.: Algorithmic Information Theory, Cambridge Tracts in Theoretical Computer Science, vol. 1. Cambridge University Press, Cambridge, 3 edn. (1987)

    Google Scholar 

  7. Chaitin, G.J.: Information, 2nd edn. Randomness and Incompleteness. World Scientific, Singapore (1990)

    Google Scholar 

  8. Genesereth, M., Love, N., Pell, B.: General game playing: overview of the AAAI competition. AI Mag. 26(2), 62–72 (2005)

    Google Scholar 

  9. Genesereth, M., Thielscher, M.: General Game Playing, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 24. 1 edn. (2014)

    Google Scholar 

  10. Goertzel, B., Pennachin, C. (eds.): Artificial General Intelligence, Cognitive Technologies, vol. 8. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-68677-4

    Book  Google Scholar 

  11. Hernández-Orallo, J.: Measure of All Minds, The. Cambridge University Press, Cambridge, 1 edn. (2017). https://doi.org/10.1017/9781316594179

  12. Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: towards an anytime intelligence test. Artif. Intell. 174(18), 1508–1539 (2010). https://doi.org/10.1016/j.artint.2010.09.006

    Article  MathSciNet  Google Scholar 

  13. Hernández-Orallo, J.: Beyond the turing test. J. Logic Lang. Inform. 9(4), 447–466 (2000). https://doi.org/10.1023/A:1008367325700

    Article  MathSciNet  Google Scholar 

  14. Hernández-Orallo, J., Loe, B.S., Cheke, L., Martínez-Plumed, F., hÉigeartaigh, S.: General intelligence disentangled via a generality metric for natural and artificial intelligence. Nat. Sci. Rep. 11(1), 1–16 (2021). https://doi.org/10.1038/s41598-021-01997-7

  15. Hibbard, B.: Bias and no free lunch in formal measures of intelligence. J. Artifi. Gen. Intell. 1(1), 54–61 (2009). https://doi.org/10.2478/v10229-011-0004-6

    Article  Google Scholar 

  16. Hutter, M., Legg, S.: Temporal difference updating without a learning rate. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Proceedings of the 21st Annual Conference on Advances in Neural Information Processing Systems, NIPS 2007, pp. 705–712. Curran Associates Inc, New York (2007)

    Google Scholar 

  17. Insa-Cabrera, J., Dowe, D.L., España-Cubillo, S., Hernández-Lloreda, M.V., Hernández-Orallo, J.: Comparing humans and AI agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 122–132. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22887-2_13

    Chapter  Google Scholar 

  18. Kolmogorov, A.N.: On tables of random numbers. Sankhyā: Indian J. Stat. Ser. A 4(25), 369–376 (1963). https://doi.org/10.1016/S0304-3975(98)00075-9

  19. Legg, S., Hutter, M.: A collection of definitions of intelligence. In: Goertzel, B., Wang, P. (eds.) Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms, Frontiers in Artificial Intelligence and Applications, vol. 157, pp. 17–24. IOS Press, Amsterdam (2007)

    Google Scholar 

  20. Legg, S., Hutter, M.: Universal intelligence: a definition of machine intelligence. Mind. Mach. 17(4), 391–444 (2007). https://doi.org/10.1007/s11023-007-9079-x

    Article  Google Scholar 

  21. Legg, S., Veness, J.: AIQ: Algorithmic intelligence quotient [source codes] (2011). https://github.com/mathemajician/AIQ

  22. Legg, S., Veness, J.: An approximation of the universal intelligence measure. In: Dowe, D.L. (ed.) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. LNCS, vol. 7070, pp. 236–249. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-44958-1_18

    Chapter  Google Scholar 

  23. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  24. Müller, U.: Dev/lang/brainfuck-2.lha in aminet (1993). http://aminet.net/package.php?package=dev/lang/brainfuck-2.lha

  25. Saeed, W., Omlin, C.: Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 263, 110273 (2023). https://doi.org/10.1016/j.knosys.2023.110273

  26. Schellaert, W., et al.: Your prompt is my command: on assessing the human-centred generality of multimodal models. J. Artifi. Intell. Res. 2023(77), 377–394 (2023)

    Google Scholar 

  27. Schrittwieser, J., et al.: Mastering Atari, go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020). https://doi.org/10.1038/s41586-020-03051-4

    Article  Google Scholar 

  28. Schulman, J.: Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs. Ph.D. thesis, University of California, Berkeley (2016)

    Google Scholar 

  29. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, PMLR 37, pp. 1889–1897 (2015)

    Google Scholar 

  30. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016 (2016)

    Google Scholar 

  31. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. Tech. Rep. 1707.06347, OpenAI (2017)

    Google Scholar 

  32. Skansi, S.: Introduction to Deep Learning : From Logical Calculus to Artificial Intelligence. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73004-2

    Book  Google Scholar 

  33. Solomonoff, R.J.: A formal theory of inductive inference, part 1 and part 2. Inf. Control 7(1–22), 224–254 (1964). https://doi.org/10.1016/S0019-9958(64)90131-7

    Article  MathSciNet  Google Scholar 

  34. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)

    Google Scholar 

  35. Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Müller, K. (eds.) NIPS 1999: Proceedings of the 12th International Conference on Neural Information Processing Systems, pp. 1057–1063. MIT Press, Cambridge (1999)

    Google Scholar 

  36. Vadinský, O.: AIQ: Algorithmic intelligence quotient [source codes] (2018). https://github.com/xvado00/AIQ/archive/v1.3.zip

  37. Vadinský, O.: Towards general evaluation of intelligent systems: lessons learned from reproducing AIQ test results. J. Artifi. Gen. Intell. 9(1), 1–54 (2018). https://doi.org/10.2478/jagi-2018-0001

  38. Vadinský, O.: Towards general evaluation of intelligent systems: using semantic analysis to improve environments in the AIQ test. In: Iklé, M., Franz, A., Rzepka, R., Goertzel, B. (eds.) AGI 2018. LNCS (LNAI), vol. 10999, pp. 248–258. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97676-1_24

    Chapter  Google Scholar 

  39. Wang, P.: On defining artificial intelligence. J. Artifi. Gen. Intell. 2(10), 1–37 (2019). https://doi.org/10.2478/jagi-2019-0002

    Article  Google Scholar 

  40. Watkins, C.: Learning from Delayed Rewards. Ph.D. thesis, University of Cambridge, Kings College, Cambridge (1989)

    Google Scholar 

  41. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992). https://doi.org/10.1007/BF00992696

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by the Internal Grant Agency of Prague University of Economics and Business (F4/41/2023). Computational resources were kindly provided by the project “e-Infrastruktura CZ” (e-INFRA CZ LM2018140) supported by the Ministry of Education, Youth and Sports of the Czech Republic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ondřej Vadinský .

Editor information

Editors and Affiliations

Appendix

Appendix

A compact view of the results of all tested VPG and PPO configurations to facilitate visual comparison is shown by Fig. 3.

Fig. 3.
figure 3

Achieved estimated AIQ scores of VPG and PPO as a function of episode length on BF 5 reference machine with patched SEP-ext and SDP.

Full experiment settings, as well as results of the conducted analyses and experiments, are available from: https://github.com/xvado00/TEPOA/archive/refs/tags/XI-ML23.zip.

Full sources of the AIQ test (a Python 3 conversion of [36]), including the implementation of VPG and PPO agents, are available from: https://github.com/zemp02/AIQ/archive/refs/tags/v2.1.zip.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vadinský, O., Zeman, P. (2024). Towards Evaluating Policy Optimisation Agents Using Algorithmic Intelligence Quotient Test. In: Nowaczyk, S., et al. Artificial Intelligence. ECAI 2023 International Workshops. ECAI 2023. Communications in Computer and Information Science, vol 1947. Springer, Cham. https://doi.org/10.1007/978-3-031-50396-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50396-2_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50395-5

  • Online ISBN: 978-3-031-50396-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics