南通大学学报（自然科学版）

2023, 02, v.22;No.85 43-49+74

基于注意力机制的A3C量化交易策略

符甲鑫¹ 刘磊¹

钱成²

1.河海大学理学院 2.东南大学数学学院

基金项目(Foundation): 国家自然科学基金面上项目（61773152）

邮箱(Email):

DOI: 10.12194/j.ntu.20221128006

移动端阅读

488	6	644
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

针对传统交易策略无法有效长期消除市场噪声和非线性影响的问题，提出一种基于注意力机制的异步优势动作评价（squeeze-and-excitation asynchronous advantage actor-critic,SE-A3C）量化交易策略。以历史技术指标因子为环境状态，利用卷积网络和注意力机制模块提取数据特征，判断交易动作，并采用异步训练的方式将多智能体与环境进行交互，有效提升策略的自适应能力。采用该策略对沪深300和上证50股指期货进行交易，结果表明：在测试阶段，沪深300的收益率为12.23%，胜率为58.82%，最大回撤率为2.47%；上证50的收益率为18.82%，胜率为57.56%，最大回撤率为1.05%。

关键词： 深度强化学习; 异步优势动作评价; 注意力机制; 定量交易;

Abstract：

Aiming at the inability of traditional trading strategies to effectively eliminate market noise and non-linear effects in the long term, an squeeze-and-excitation asynchronous advantage actor-critic(SE-A3C) quantitative trading strategy based on the attention mechanism is proposed. Taking historical technical indicator factors as the environmental state, using convolutional network and attention mechanism modules to extract data features, determine transaction actions, and use asynchronous training to interact with the environment by multi-agents, effectively improving the adaptive ability of strategies. This strategy trades CSI 300 and SSE 50 stock index futures. In the testing phase, the yield of CSI 300 is 12.23%, the winning rate is 58.82%, the maximum drawdown is 2.47%, and the yield of SSE 50 is18.82%, the winning rate is 57.56%, the maximum drawdown is 1.05%.

KeyWords： deep reinforcement learning; asynchronous advantage actor-critic(A3C); attention mechanism; quantitative trading;

参考文献

[1] LIN C S, CHIU S H, LIN T Y. Empirical mode decomposition-based least squares support vector regression for foreign exchange rate forecasting[J]. Economic Modelling,2012, 29(6):2583-2590.

[2] AHMED S, ALSHATER M M, EL AMMARI A, et al. Artificial intelligence and machine learning in finance:a bibliometric review[J]. Research in International Business and Finance, 2022, 61:101646.

[3] TSANTEKIDIS A, PASSALIS N, TEFAS A, et al. Forecasting stock prices from the limit order book using convolutional neural networks[C]//Proceedings of the 2017 IEEE19th Conference on Business Informatics(CBI), July 24-27, 2017, Thessaloniki, Greece. New York:IEEE Xplore,2017:7-12.

[4] TüRKMEN A C, CEMGIL A T. An application of deep learning for trade signal prediction in financial markets[C]//Proceedings of the 2015 23nd Signal Processing and Communications Applications Conference(SIU), May 16-19,2015, Malatya, Turkey. New York:IEEE Xplore, 2015:2521-2524.

[5]刘全，翟建伟，章宗长，等.深度强化学习综述[J].计算机学报，2018, 41(1):1-27.LIU Q, ZHAI J W, ZHANG Z Z, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers,2018, 41(1):1-27.(in Chinese)

[6] LEE J W, HONG E, PARK J. A Q-learning based approach to design of intelligent stock trading agents[C]//Proceedings of the 2004 IEEE International Engineering Management Conference(IEEE Cat. No. 04CH37574), October 18-21, 2004, Singapore. New York:IEEE Xplore,2004:1289-1292.

[7] CARAPU?O J, NEVES R, HORTA N. Reinforcement learning applied to Forex trading[J]. Applied Soft Computing, 2018, 73:783-794.

[8]司伟钰.基于深度强化学习的交易策略技术研究[D].上海：上海交通大学，2018.SI W Y. Research on the trading strategy based on deep reinforcement learning[D]. Shanghai:Shanghai Jiao Tong University, 2018.(in Chinese)

[9] ZHANG Z H, ZOHREN S, ROBERTS S. Deep reinforcement learning for trading[J]. The Journal of Financial Data Science, 2020, 2(2):25-40.

[10] LIU X Y, YANG H Y, CHEN Q, et al. FinRL:a deep reinforcement learning library for automated stock trading in quantitative finance[EB/OL].(2020-11-19)[2022-11-02]. https：//arxiv.org/abs/2011.09607.

[11] KAPTUROWSKI S, OSTROVSKI G, QUAN J, et al. Recurrent experience replay in distributed reinforcement learning[EB/OL].(2018-12-21)[2022-11-02]. https://openreview.net/forum?id=r1lyTjAqYX.

[12] HOFFMAN M W, SHAHRIARI B, ASLANIDES J, et al.Acme:a research framework for distributed reinforcement learning[EB/OL].(2020-06-01)[2022-11-02]. https：//arxiv.org/abs/2006.00979.

[13] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23,2018, Salt Lake City, UT, USA. New York:IEEE Xplore,2018:7132-7141.

[14] PUTERMAN M L. Markov decision processes[M]//HEYMAN D P, SOBEL M J. Handbooks in operations research and management science. Amsterdam:Elsevier, 1990:331-434.

[15] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3):279-292.

[16]梁天新，杨小平，王良，等.基于强化学习的金融交易系统研究与发展[J].软件学报，2019, 30(3):845-864.LIANG T X, YANG X P, WANG L, et al. Review on financial trading system based on reinforcement learning[J].Journal of Software, 2019, 30(3):845-864.(in Chinese)

[17] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[EB/OL].(2013-12-19)[2022-11-02]. https：//arxiv.org/abs/1312.5602.

[18] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of the 12th International Conference on Neural Information Processing Systems,November 29-December 4, 1999, Denver, CO. New York:ACM, 1999:1057-1063.

[19] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1):9-44.

[20] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[EB/OL].(2016-02-04)[2022-11-02]. https：//arxiv.org/abs/1602.01783.

基本信息:

DOI：10.12194/j.ntu.20221128006

中图分类号:TP18;F832.5

引用信息:

[1]符甲鑫,刘磊,钱成.基于注意力机制的A3C量化交易策略[J],2023,22(02):43-49+74.DOI:10.12194/j.ntu.20221128006.

基金信息:

国家自然科学基金面上项目（61773152）

请选择需要下载的pdf数据

南通大学学报（自然科学版）

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

南通大学学报（自然科学版）

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈