Abstract
With the advent of more powerful AI systems, the issue of theoretically well-founded and more robust methods for general evaluation of intelligence in (not only) artificial systems increases in importance. The Algorithmic Intelligence Quotient Test (AIQ test) is an example of a reasonably well-founded yet practically feasible test of intelligence. Deep Reinforcement Learning offers a powerful framework that enables artificial agents to learn how to act in unknown environments of realistic complexities. Vanilla Policy Gradient (VPG) and Proximal Policy Optimisation (PPO) are two examples of model-free on-policy deep reinforcement learning agents. In this paper, a computational experiment with the AIQ test is conducted that evaluates VPG and PPO agents and compares them to classical off-policy Q-learning. An initial analysis of the results indicates that while the maximum AIQ achieved is comparable for the tested agents given sufficient training time, large differences show with short training times. Corresponding to previous research, on-policy methods have lower starting positions than off-policy methods, and PPO learns faster than VPG. This further depends on steps-per-epoch parameter setting of PPO and VPG agents. These findings indicate the utility of the AIQ test as an AI evaluation method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Achiam, J.: Proximal policy optimization. In: Spinning Up in Deep RL (2018). https://spinningup.openai.com/en/latest/algorithms/ppo.html
Achiam, J.: Spinning up in deep RL (2018). https://spinningup.openai.com/en/latest/
Achiam, J.: Vanilla policy gradient. In: Spinning Up in Deep RL (2018). https://spinningup.openai.com/en/latest/algorithms/vpg.html
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: Arcade learning environment: an evaluation platform for general agents. J. Artifi. Intell. Res. 47, 253–279 (2013)
Brockman, G., et al.: Openai gym. Tech. Rep. 1606, 01540 (2016)
Chaitin, G.J.: Algorithmic Information Theory, Cambridge Tracts in Theoretical Computer Science, vol. 1. Cambridge University Press, Cambridge, 3 edn. (1987)
Chaitin, G.J.: Information, 2nd edn. Randomness and Incompleteness. World Scientific, Singapore (1990)
Genesereth, M., Love, N., Pell, B.: General game playing: overview of the AAAI competition. AI Mag. 26(2), 62–72 (2005)
Genesereth, M., Thielscher, M.: General Game Playing, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 24. 1 edn. (2014)
Goertzel, B., Pennachin, C. (eds.): Artificial General Intelligence, Cognitive Technologies, vol. 8. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-68677-4
Hernández-Orallo, J.: Measure of All Minds, The. Cambridge University Press, Cambridge, 1 edn. (2017). https://doi.org/10.1017/9781316594179
Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: towards an anytime intelligence test. Artif. Intell. 174(18), 1508–1539 (2010). https://doi.org/10.1016/j.artint.2010.09.006
Hernández-Orallo, J.: Beyond the turing test. J. Logic Lang. Inform. 9(4), 447–466 (2000). https://doi.org/10.1023/A:1008367325700
Hernández-Orallo, J., Loe, B.S., Cheke, L., Martínez-Plumed, F., hÉigeartaigh, S.: General intelligence disentangled via a generality metric for natural and artificial intelligence. Nat. Sci. Rep. 11(1), 1–16 (2021). https://doi.org/10.1038/s41598-021-01997-7
Hibbard, B.: Bias and no free lunch in formal measures of intelligence. J. Artifi. Gen. Intell. 1(1), 54–61 (2009). https://doi.org/10.2478/v10229-011-0004-6
Hutter, M., Legg, S.: Temporal difference updating without a learning rate. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Proceedings of the 21st Annual Conference on Advances in Neural Information Processing Systems, NIPS 2007, pp. 705–712. Curran Associates Inc, New York (2007)
Insa-Cabrera, J., Dowe, D.L., España-Cubillo, S., Hernández-Lloreda, M.V., Hernández-Orallo, J.: Comparing humans and AI agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 122–132. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22887-2_13
Kolmogorov, A.N.: On tables of random numbers. Sankhyā: Indian J. Stat. Ser. A 4(25), 369–376 (1963). https://doi.org/10.1016/S0304-3975(98)00075-9
Legg, S., Hutter, M.: A collection of definitions of intelligence. In: Goertzel, B., Wang, P. (eds.) Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms, Frontiers in Artificial Intelligence and Applications, vol. 157, pp. 17–24. IOS Press, Amsterdam (2007)
Legg, S., Hutter, M.: Universal intelligence: a definition of machine intelligence. Mind. Mach. 17(4), 391–444 (2007). https://doi.org/10.1007/s11023-007-9079-x
Legg, S., Veness, J.: AIQ: Algorithmic intelligence quotient [source codes] (2011). https://github.com/mathemajician/AIQ
Legg, S., Veness, J.: An approximation of the universal intelligence measure. In: Dowe, D.L. (ed.) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. LNCS, vol. 7070, pp. 236–249. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-44958-1_18
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Müller, U.: Dev/lang/brainfuck-2.lha in aminet (1993). http://aminet.net/package.php?package=dev/lang/brainfuck-2.lha
Saeed, W., Omlin, C.: Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 263, 110273 (2023). https://doi.org/10.1016/j.knosys.2023.110273
Schellaert, W., et al.: Your prompt is my command: on assessing the human-centred generality of multimodal models. J. Artifi. Intell. Res. 2023(77), 377–394 (2023)
Schrittwieser, J., et al.: Mastering Atari, go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020). https://doi.org/10.1038/s41586-020-03051-4
Schulman, J.: Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs. Ph.D. thesis, University of California, Berkeley (2016)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, PMLR 37, pp. 1889–1897 (2015)
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016 (2016)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. Tech. Rep. 1707.06347, OpenAI (2017)
Skansi, S.: Introduction to Deep Learning : From Logical Calculus to Artificial Intelligence. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73004-2
Solomonoff, R.J.: A formal theory of inductive inference, part 1 and part 2. Inf. Control 7(1–22), 224–254 (1964). https://doi.org/10.1016/S0019-9958(64)90131-7
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Müller, K. (eds.) NIPS 1999: Proceedings of the 12th International Conference on Neural Information Processing Systems, pp. 1057–1063. MIT Press, Cambridge (1999)
Vadinský, O.: AIQ: Algorithmic intelligence quotient [source codes] (2018). https://github.com/xvado00/AIQ/archive/v1.3.zip
Vadinský, O.: Towards general evaluation of intelligent systems: lessons learned from reproducing AIQ test results. J. Artifi. Gen. Intell. 9(1), 1–54 (2018). https://doi.org/10.2478/jagi-2018-0001
Vadinský, O.: Towards general evaluation of intelligent systems: using semantic analysis to improve environments in the AIQ test. In: Iklé, M., Franz, A., Rzepka, R., Goertzel, B. (eds.) AGI 2018. LNCS (LNAI), vol. 10999, pp. 248–258. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97676-1_24
Wang, P.: On defining artificial intelligence. J. Artifi. Gen. Intell. 2(10), 1–37 (2019). https://doi.org/10.2478/jagi-2019-0002
Watkins, C.: Learning from Delayed Rewards. Ph.D. thesis, University of Cambridge, Kings College, Cambridge (1989)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992). https://doi.org/10.1007/BF00992696
Acknowledgements
This work was funded by the Internal Grant Agency of Prague University of Economics and Business (F4/41/2023). Computational resources were kindly provided by the project “e-Infrastruktura CZ” (e-INFRA CZ LM2018140) supported by the Ministry of Education, Youth and Sports of the Czech Republic.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
A compact view of the results of all tested VPG and PPO configurations to facilitate visual comparison is shown by Fig. 3.
Full experiment settings, as well as results of the conducted analyses and experiments, are available from: https://github.com/xvado00/TEPOA/archive/refs/tags/XI-ML23.zip.
Full sources of the AIQ test (a Python 3 conversion of [36]), including the implementation of VPG and PPO agents, are available from: https://github.com/zemp02/AIQ/archive/refs/tags/v2.1.zip.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vadinský, O., Zeman, P. (2024). Towards Evaluating Policy Optimisation Agents Using Algorithmic Intelligence Quotient Test. In: Nowaczyk, S., et al. Artificial Intelligence. ECAI 2023 International Workshops. ECAI 2023. Communications in Computer and Information Science, vol 1947. Springer, Cham. https://doi.org/10.1007/978-3-031-50396-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-50396-2_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50395-5
Online ISBN: 978-3-031-50396-2
eBook Packages: Computer ScienceComputer Science (R0)