Skip to main content
Log in

ORLEP: an efficient offline reinforcement learning evaluation platform

  • 1230: Sentient Multimedia Systems and Visual Intelligence
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Developing offline reinforcement learning evaluation applications faces challenges such as heterogeneous data and algorithm integration, user-friendly interface, and flexible resource management. This paper designs and implements ORLEP, an efficient platform to provide high-level services for offline reinforcement learning evaluation. Besides integrating underlying infrastructure with highly concurrency and reliability, core components with distributed deployment and 3rd party libs and benchmarks incorporation, ORLEP supplies high-level abstractions for (1) data management, (2) model training and evaluation, (3) result visualization, and (4) resource configuration and supervision. Moreover, this paper verifies specific cases and the results demonstrate the performance and scalability of the proposed ORLEP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

Data openly available in a public repository.

References

  1. Alshuqayran N, Ali N, Evans R (2016) A systematic mapping study in microservice architecture. In: 2016 IEEE 9th International Conference on Service-Oriented Computing and Applications (SOCA). IEEE, pp 44–51

  2. Burch C (2010) Django, a web framework using python: Tutorial presentation. In: J Comput Sci Coll 25(5):154–155

    Google Scholar 

  3. Cabi S, et al (2019) A framework for data-driven robotics. In: arXiv:1909.12200

  4. D’Eramo C et al (2021) Mushroomrl: Simplifying reinforcement learning research. In: J Mach Learn Res 22(1):5867–5871

    MathSciNet  Google Scholar 

  5. Denoyer L, et al (2021) Salina: Sequential learning of agents. In: arXiv:2110.07910

  6. Fu J, et al (2020) D4rl: Datasets for deep data-driven reinforcement learning. In: arXiv:2004.07219

  7. Fujimoto S, Meger D, Precup D (2019) Off-policy deep reinforcement learning without exploration. In: International conference on machine learning. PMLR, pp 2052–2062

  8. Gade AN et al (2018) REDIS: A value-based decision support tool for renovation of building portfolios. Building and environment 142:107–118

    Article  Google Scholar 

  9. Henderson J, Lemon O, Georgila K (2008) Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets. In: Comput Linguist 34(4):487–511

    Google Scholar 

  10. Ionescu VM (2015) The analysis of the performance of RabbitMQ and ActiveMQ. In: 2015 14th RoEduNet International Conference-Networking in Education and Research (RoEduNet NER). IEEE, pp 132–137

  11. Jaques N et al (2019) Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. In: arXiv:1907.00456

  12. Kuhnle A, Schaarschmidt M, Fricke K (2017) Tensorforce: a tensorflow library for applied reinforcement learning. In: Web p 9

  13. Kumar A et al (2020) Conservative q-learning for offline reinforcement learning. Adv Neural Inf Process Syst 33:1179–1191

    Google Scholar 

  14. Levine S, et al (2020) Offline reinforcement learning:Tutorial, review, and perspectives on open problems. In: arXiv:2005.01643

  15. Li L, et al (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on World wide web, pp 661–670

  16. Linzecong. LPOJ usage and development Document. https://docs.lpoj.cn/.2023.5.20

  17. Nandy A, et al (2018) Reinforcement learning with keras, tensorflow, and chainerrl. In: Reinforcement learning: With open ai, tensorflow and keras using python, pp 129–153

  18. Ouyang L et al (2022) Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 35:27730–27744

    Google Scholar 

  19. Pietquin O et al (2011) Sample-efficient batch reinforcement learning for dialogue management optimization. In: ACM Trans Audio Speech Lang Process (TSLP) 7(3):1–21s

    Google Scholar 

  20. Qin RJ et al (2022) NeoRL: A near real-world benchmark for offline reinforcement learning. Adv Neural Inf Process Syst 35:24753–24765

    Google Scholar 

  21. Raffin A et al (2021) Stable-baselines3: Reliable reinforcement learning implementations. In: J Mach Learn Res 22(1):12348–12355

    MathSciNet  Google Scholar 

  22. Seno T, Imai M (2022) d3rlpy: An offline deep reinforcement learning library. In: J Mach Learn Res 23(1):14205–14224

    MathSciNet  Google Scholar 

  23. Sheldon R, Moes G (2005) Beginning MySQL. John Wiley & Sons

    Google Scholar 

  24. Silver D et al (2017) Mastering the game of go without human knowledge. In: Nature 550(7676):354–359

    Google Scholar 

  25. Strehl A, et al (2010) Learning from logged implicit exploration data. In: Adv Neural Inf Process Syst 23

  26. Thomas P, et al (2017) Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31(2), pp 4740–4745

  27. Vinyals O et al (2019) Grandmaster level in Star-Craft II using multi-agent reinforcement learning. In: Nature 575(7782):350–354

    Google Scholar 

  28. Weng J, et al (2021) Tianshou: A highly modularized deep reinforcement learning library. In: arXiv:2107.14171

  29. Wiering MA, Van Otterlo M (2012) Reinforcement learning. In: Adapt Learn Optim 12(3):729

    Google Scholar 

  30. You E (2022) Vue.js Progressive JavaScript Framework. https://v2.cn.vuejs.org/.2023.5.20

Download references

Acknowledgements

All authors contributed to the study conception and design. Material preparation, analysis and writing original draft were performed by Chen Chen. Resources, supervision, funding acquisition were performed by Mao Keming. Material preparation, software, investigation were performed by Zhang Jinkai. Data collection and test were performed by Li Yiyang. And all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Funding

This work was supported by Natural Science Foundation(No.2022-MS-112) of Lianoning Province, China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Chen.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mao, K., Chen, C., Zhang, J. et al. ORLEP: an efficient offline reinforcement learning evaluation platform. Multimed Tools Appl 83, 37073–37087 (2024). https://doi.org/10.1007/s11042-023-16906-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16906-5

Keywords

Navigation