InfoSoft Finance

Reinforcement learning (RL) is a machine learning technique that focuses on training an algorithm following the cut-and-try approach. The algorithm (agent) evaluates a current situation (state), takes an action, and receives feedback (reward) from the environment after each act. Positive feedback is a reward (in its usual meaning for us), and negative feedback is punishment for making a mistake.

Deep Reinforcement Learning for Automated Stock Trading

We introduce an ensemble strategy using deep reinforcement learning to maximize returns. This involves a learning agent employing three actor-critic algorithms: Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG). This ensemble approach combines the strengths of each algorithm, adapting well to market changes. To manage memory usage when training with continuous action spaces, we use a load-on-demand data processing technique. We tested our strategy on 30 Dow Jones stocks and compared its performance to the Dow Jones Industrial Average and the traditional min-variance portfolio allocation. Our ensemble strategy outperformed individual algorithms and baselines in risk-adjusted returns, as measured by the Sharpe ratio.

Keywords: Deep Reinforcement Learning, Markov Decision Process, Automated Stock Trading, Ensemble Strategy, Actor-Critic Framework

Outline

Deep Reinforcement Learning for Automated Stock Trading
Outline
Overview
Reinforcement, supervised, and unsupervised learning
Multi-level deep Q-networks for Bitcoin trading strategies
File Structure
Supported Data Sources
Deep Q-network
News articles
Installation
Status Update
Tutorials
Publications
News
Citing
Join and Contribute
- Contributors
LICENSE

Overview

Jojo has three layers: market environments, agents, and applications. For a trading task (on the top), an agent (in the middle) interacts with a market environment (at the bottom), making sequential decisions.

Reinforcement, supervised, and unsupervised learning

In supervised learning, an agent “knows” what task to perform and which set of actions is correct. Data scientists train the agent on historical data with target variables (desired answers with predictive analysis) AKA labeled data. The agent receives direct feedback. As a result of training, an agent can forecast whether there will be target variables in new data or not. Supervised learning allows for solving classification and regression tasks.

Reinforcement learning doesn’t rely on labeled datasets: The agent isn’t told which actions to take or the optimal way of performing a task. RL uses rewards and penalties instead of labels associated with each decision in datasets to signal whether a taken action is good or bad. So, the agent only gets feedback once it completes the task. That’s how time-delayed feedback and the trial-and-error principle differentiate reinforcement learning from supervised learning.

Since one of the goals of RL is to find a set of consecutive actions that maximize a reward, sequential decision making is another significant difference between these algorithm training styles. Each agent’s decision can affect its future actions.

Reinforcement learning vs unsupervised learning. In unsupervised learning, the algorithm analyzes unlabeled data to find hidden interconnections between data points and structures them by similarities or differences. RL aims at defining the best action model to get the biggest long-term reward, differentiating it from unsupervised learning in terms of the key goal.

Reinforcement and deep learning. Most of reinforcement learning implementations employ deep learning models. They involve the use of deep neural networks as the core method for agent training. Unlike other machine learning methods, deep learning fits best for recognizing complex patterns in images, sounds, and texts. Additionally, neural networks allow data scientists to fit all processes into a single model without breaking down the agent’s architecture into multiple modules.

Multi-level deep Q-networks for Bitcoin trading strategies

We propose a multi-level deep Q-network (M-DQN) that leverages historical Bitcoin price data and Twitter sentiment analysis. In addition, an innovative preprocessing pipeline is introduced to extract valuable insights from the data, which are then input into the M-DQN model. In the experiments, this integration led to a noteworthy increase in investment value from the initial amount and a Sharpe Ratio in excess of 2.7 in measuring risk-adjusted return.

Reinforcement learning is applicable in numerous industries, including internet advertising and eCommerce, finance, robotics, and manufacturing. Let’s take a closer look at these use cases.

## File Structure

The main folder finrl has three subfolders applications, agents, meta. We employ a train-test-trade pipeline with three files: train.py, test.py, and trade.py.

FinRL
├── finrl (main folder)
│   ├── applications
│   	├── Stock_NeurIPS2018
│   	├── imitation_learning
│   	├── cryptocurrency_trading
│   	├── high_frequency_trading
│   	├── portfolio_allocation
│   	└── stock_trading
│   ├── agents
│   	├── elegantrl
│   	├── rllib
│   	└── stablebaseline3
│   ├── meta
│   	├── data_processors
│   	├── env_cryptocurrency_trading
│   	├── env_portfolio_allocation
│   	├── env_stock_trading
│   	├── preprocessor
│   	├── data_processor.py
│       ├── meta_config_tickers.py
│   	└── meta_config.py
│   ├── config.py
│   ├── config_tickers.py
│   ├── main.py
│   ├── plot.py
│   ├── train.py
│   ├── test.py
│   └── trade.py
│
├── examples
├── unit_tests (unit tests to verify codes on env & data)
│   ├── environments
│   	└── test_env_cashpenalty.py
│   └── downloaders
│   	├── test_yahoodownload.py
│   	└── test_alpaca_downloader.py
├── setup.py
├── requirements.txt
└── OVERVIEW.md

Supported Data Sources

Data Source	Type	Range and Frequency	Request Limits	Raw Data	Preprocessed Data
Akshare	CN Securities	2015-now, 1day	Account-specific	OHLCV	Prices&Indicators
Alpaca	US Stocks, ETFs	2015-now, 1min	Account-specific	OHLCV	Prices&Indicators
Baostock	CN Securities	1990-12-19-now, 5min	Account-specific	OHLCV	Prices&Indicators
Binance	Cryptocurrency	API-specific, 1s, 1min	API-specific	Tick-level daily aggegrated trades, OHLCV	Prices&Indicators
CCXT	Cryptocurrency	API-specific, 1min	API-specific	OHLCV	Prices&Indicators
EODhistoricaldata	US Securities	Frequency-specific, 1min	API-specific	OHLCV	Prices&Indicators
IEXCloud	NMS US securities	1970-now, 1 day	100 per second per IP	OHLCV	Prices&Indicators
JoinQuant	CN Securities	2005-now, 1min	3 requests each time	OHLCV	Prices&Indicators
QuantConnect	US Securities	1998-now, 1s	NA	OHLCV	Prices&Indicators
RiceQuant	CN Securities	2005-now, 1ms	Account-specific	OHLCV	Prices&Indicators
Sinopac	Taiwan securities	2023-04-13~now, 1min	Account-specific	OHLCV	Prices&Indicators
Tushare	CN Securities, A share	-now, 1 min	Account-specific	OHLCV	Prices&Indicators
WRDS	US Securities	2003-now, 1ms	5 requests each time	Intraday Trades	Prices&Indicators
YahooFinance	US Securities	Frequency-specific, 1min	2,000/hour	OHLCV	Prices&Indicators

OHLCV: open, high, low, and close prices; volume. adjusted_close: adjusted close price

Technical indicators: 'macd', 'boll_ub', 'boll_lb', 'rsi_30', 'dx_30', 'close_30_sma', 'close_60_sma'. Users also can add new features.

Deep Q-network

Building upon the foundations of Q-learning, DQN is an extension that combines reinforcement learning with deep learning techniques15. It uses a deep neural network as an approximator to estimate the action-value function Q(s, a). DQN addresses the main challenges of traditional Q-learning, such as learning stability. Moreover, by employing deep learning, DQN can handle high-dimensional state spaces, such as those encountered in image-based tasks or large-scale problems

News articles

Returns latest news articles across stocks and crypto. By default returns latest 10 news articles. News recommendation. Machine learning has made it possible for businesses to personalize customer interactions at scale through the analysis of data on their preferences, background, and online behavior patterns.

However, recommending such content type as online news is still a complex task. News features are dynamic by nature and become rapidly irrelevant. User preferences in topics change as well. A Deep Reinforcement Learning Framework for News Recommendation discuss three main challenges related to news recommendation methods. We used the Deep Q-Learning based recommendation framework that considers current reward and future reward simultaneously in addition to user return as feedback.

Installation

Install description for all operating systems (MAC OS, Ubuntu, Windows 10)
FinRL for Quantitative Finance: Install and Setup Tutorial for Beginners

Status Update

Version History [click to expand]

2022-06-25 0.3.5: Formal release of FinRL, neo_finrl is chenged to FinRL-Meta with related files in directory: meta.
2021-08-25 0.3.1: pytorch version with a three-layer architecture, apps (financial tasks), drl_agents (drl algorithms), neo_finrl (gym env)
2020-12-14 Upgraded to Pytorch with stable-baselines3; Remove tensorflow 1.0 at this moment, under development to support tensorflow 2.0
2020-11-27 0.1: Beta version with tensorflow 1.5

Tutorials

[Towardsdatascience] Deep Reinforcement Learning for Automated Stock Trading

Publications

Click to view publications

Title	Conference/Journal	Link	Citations	Year
Dynamic Datasets and Market Environments for Financial Reinforcement Learning	Machine Learning - Springer Nature	paper code	7	2024
FinRL-Meta: FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning	NeurIPS 2022	paper code	37	2022
FinRL: Deep reinforcement learning framework to automate trading in quantitative finance	ACM International Conference on AI in Finance (ICAIF)	paper	49	2021
FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance	NeurIPS 2020 Deep RL Workshop	paper	87	2020
Deep reinforcement learning for automated stock trading: An ensemble strategy	ACM International Conference on AI in Finance (ICAIF)	paper code	154	2020
Practical deep reinforcement learning approach for stock trading	NeurIPS 2018 Workshop on Challenges and Opportunities for AI in Financial Services	paper code	164	2018

News

Returns latest news articles across stocks and crypto. By default returns latest 10 news articles. Example using symbol BTCUSD-BitCoin start:2024-08-04 end 2024-11-04:

{ "news": [ { "author": "Bibhu Pattnaik", "content": "

Crypto analyst Kevin Svenson predicts that Bitcoin (CRYPTO: <a class="ticker" href="https://www.benzinga.com/quote/btc/usd">BTC) could see an upsurge of up to 86%, <a href="https://www.benzinga.com/markets/cryptocurrency/23/12/36274982/crypto-analyst-forecasts-monumental-bitcoin-rally-by-2026">potentially hitting the $100,000 mark.

\n\n\n\n

What Happened: Svenson said that Bitcoin has formed a bullish divergence pattern on the daily chart. This pattern emerges when the asset’s price is trading down or sideways, while an oscillator like the relative strength index (RSI) is in an uptrend, indicating increasing bullish momentum.

\n\n\n\n

In a video <a href="https://www.youtube.com/watch?v=dU_qfQRtxks">post, Svenson said, “We got actually a slightly higher low in the RSI – you could also call it flat support, horizontal support. And upon that flat support, we had lower lows in price. That is a bullish divergence.”

\n\n\n\n

He also observed a broadening pattern in the Bitcoin chart, characterized by lower highs and even lower lows, which could be interpreted as a bullish continuation pattern if the asset breaks its diagonal resistance.

\n\n\n\n

“And so what is happening on the Bitcoin chart? Well, what we see is a broadening pattern of sorts – lower highs but even lower lows," he added.

\n\n\n\n

Also Read: <a href="https://www.benzinga.com/markets/cryptocurrency/24/06/39241558/analyst-predicts-bitcoin-to-reach-groundbreaking-100-000-milestone">Analyst Predicts Bitcoin To Reach Groundbreaking $100,000 Milestone

\n\n\n\n

Citing

DRL based trading agents: Risk driven learning for financial rules-based policy

A General Portfolio Optimization Environment

Join and Contribute

Welcome to JojoFinance community!

Contributors

Thank you!

LICENSE

MIT License

Disclaimer: We are sharing codes for academic purpose under the MIT education license. Nothing herein is financial advice, and NOT a recommendation to trade real money. Please use common sense and always first consult a professional before trading or investing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.TECH.md

README.TECH.md

InfoSoft Finance

Deep Reinforcement Learning for Automated Stock Trading

Outline

Overview

Reinforcement, supervised, and unsupervised learning

Multi-level deep Q-networks for Bitcoin trading strategies

Supported Data Sources

Deep Q-network

News articles

Installation

Status Update

Tutorials

Publications

News

Citing

Join and Contribute

Contributors

LICENSE

Files

README.TECH.md

Latest commit

History

README.TECH.md

File metadata and controls

InfoSoft Finance

Deep Reinforcement Learning for Automated Stock Trading

Outline

Overview

Reinforcement, supervised, and unsupervised learning

Multi-level deep Q-networks for Bitcoin trading strategies

Supported Data Sources

Deep Q-network

News articles

Installation

Status Update

Tutorials

Publications

News

Citing

Join and Contribute

Contributors

LICENSE