MOT: A Mixture of Actors Reinforcement Learning Method by Optimal Transport for Algorithmic Trading

Cheng, Xi; Zhang, Jinghao; Zeng, Yunan; Xue, Wenfang

doi:10.1007/978-981-97-2238-9_3

Xi Cheng¹³,
Jinghao Zhang¹³,
Yunan Zeng¹³ &
…
Wenfang Xue¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14648))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

144 Accesses
1 Altmetric

Abstract

Algorithmic trading refers to executing buy and sell orders for specific assets based on automatically identified trading opportunities. Strategies based on reinforcement learning (RL) have demonstrated remarkable capabilities in addressing algorithmic trading problems. However, the trading patterns differ among market conditions due to shifted distribution data. Ignoring multiple patterns in the data will undermine the performance of RL. In this paper, we propose MOT, which designs multiple actors with disentangled representation learning to model the different patterns of the market. Furthermore, we incorporate the Optimal Transport (OT) algorithm to allocate samples to the appropriate actor by introducing a regularization loss term. Additionally, we propose Pretrain Module to facilitate imitation learning by aligning the outputs of actors with expert strategy and better balance the exploration and exploitation of RL. Experimental results on real futures market data demonstrate that MOT exhibits excellent profit capabilities while balancing risks. Ablation studies validate the effectiveness of the components of MOT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Transaction costs are charged as a percentage of the contract.
2.
Slippage refers to the difference between the expected and the actual execution price.
3.
A well-known Chinese quantitative trading platform, https://www.ricequant.com/.
4.
We chose it as a baseline because we employed the GRU method in the Pretrain Module before imitation learning. The results of GRU demonstrate the performance of the Pretrain Module.
5.
We enhance PPO using imitation learning mentioned in Methodology Section.

References

Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: NIPS, vol. 26 (2013)
Google Scholar
Deng, Y., Bao, F., Kong, Y., Ren, Z., Dai, Q.: Deep direct reinforcement learning for financial signal representation and trading. IEEE TNNLS 28(3), 653–664 (2016)
Google Scholar
Fama, E.F., French, K.R.: Multifactor explanations of asset pricing anomalies. J. Financ. 51(1), 55–84 (1996)
Article Google Scholar
Fedus, W., Zoph, B., Shazeer, N.: Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. JMLR 23(1), 5232–5270 (2022)
MathSciNet Google Scholar
Gurrib, I., et al.: Performance of the average directional index as a market timing tool for the most actively traded USD based currency pairs. Banks Bank Syst. 13(3), 58–70 (2018)
Article Google Scholar
Hong, H., Stein, J.C.: A unified theory of underreaction, momentum trading, and overreaction in asset markets. J. Financ. 54(6), 2143–2184 (1999)
Article Google Scholar
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019)
Google Scholar
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
Jegadeesh, N., Titman, S.: Returns to buying winners and selling losers: implications for stock market efficiency. J. Financ. 48(1), 65–91 (1993)
Article Google Scholar
Jegadeesh, N., Titman, S.: Cross-sectional and time-series determinants of momentum returns. Rev. Financ. Stud. 15(1), 143–157 (2002)
Article Google Scholar
Jeong, G., Kim, H.Y.: Improving financial trading decisions using deep q-learning: predicting the number of shares, action strategies, and transfer learning. Expert Syst. Appl. 117, 125–138 (2019)
Article Google Scholar
Kim, H.J., Shin, K.S.: A hybrid approach based on neural networks and genetic algorithms for detecting temporal patterns in stock markets. Appl. Soft Comput. 7(2), 569–576 (2007)
Article Google Scholar
Li, Z., Tam, V.: A machine learning view on momentum and reversal trading. Algorithms 11(11), 170 (2018)
Article MathSciNet Google Scholar
Lin, H., Zhou, D., Liu, W., Bian, J.: Learning multiple stock trading patterns with temporal routing adaptor and optimal transport. In: 27th ACM SIGKDD, pp. 1017–1026 (2021)
Google Scholar
Liu, Y., Liu, Q., Zhao, H., Pan, Z., Liu, C.: Adaptive quantitative trading: an imitative deep reinforcement learning approach. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2128–2135 (2020)
Google Scholar
Moody, J., Saffell, M.: Reinforcement learning for trading. In: NIPS, vol. 11 (1998)
Google Scholar
Moody, J., Wu, L.: Optimization of trading systems and portfolios. In: Proceedings of the IEEE/IAFE 1997 CIFEr, pp. 300–307. IEEE (1997)
Google Scholar
de Oliveira, R.A., Ramos, H.S., Dalip, D.H., Pereira, A.C.M.: A tabular sarsa-based stock market agent. In: Proceedings of the First ACM International Conference on AI in Finance, pp. 1–8 (2020)
Google Scholar
Poterba, J.M., Summers, L.H.: Mean reversion in stock prices: evidence and implications. J. Financ. Econ. 22(1), 27–59 (1988)
Article Google Scholar
Pricope, T.V.: Deep reinforcement learning in quantitative algorithmic trading: a review. arXiv preprint arXiv:2106.00123 (2021)
Ritter, J.R.: Behavioral finance. Pac.-Basin Finance J. 11(4), 429–437 (2003)
Article Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sharpe, W.F.: Mutual fund performance. J. Bus. 39(1), 119–138 (1966)
Article Google Scholar
Si, W., Li, J., Ding, P., Rao, R.: A multi-objective deep reinforcement learning approach for stock index future’s intraday trading. In: 2017 10th ISCID, vol. 2, pp. 431–436. IEEE (2017)
Google Scholar
Tsang, W.W.H., Chong, T.T.L., et al.: Profitability of the on-balance volume indicator. Econ. Bull. 29(3), 2424–2431 (2009)
Google Scholar
Wilder, J.W.: New concepts in technical trading systems. Trend Research (1978)
Google Scholar
Xu, W., et al.: HIST: a graph-based framework for stock trend forecasting via mining concept-oriented shared information. arXiv preprint arXiv:2110.13716 (2021)
Xu, W., Liu, W., Xu, C., Bian, J., Yin, J., Liu, T.Y.: Rest: relational event-driven stock trend forecasting. In: Proceedings of the Web Conference 2021, pp. 1–10 (2021)
Google Scholar
Yuan, Y., Wen, W., Yang, J.: Using data augmentation based reinforcement learning for daily stock trading. Electronics 9(9), 1384 (2020)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 72374201).

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Xi Cheng, Jinghao Zhang, Yunan Zeng & Wenfang Xue

Authors

Xi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jinghao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yunan Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Wenfang Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi Cheng .

Editor information

Editors and Affiliations

Academia Sinica, Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, X., Zhang, J., Zeng, Y., Xue, W. (2024). MOT: A Mixture of Actors Reinforcement Learning Method by Optimal Transport for Algorithmic Trading. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14648. Springer, Singapore. https://doi.org/10.1007/978-981-97-2238-9_3

Download citation

DOI: https://doi.org/10.1007/978-981-97-2238-9_3
Published: 01 May 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2240-2
Online ISBN: 978-981-97-2238-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MOT: A Mixture of Actors Reinforcement Learning Method by Optimal Transport for Algorithmic Trading