References

Deep Q-Learning with Keras and Gym - balance CartPole game with Q-learning

https://quantdare.com/deep-reinforcement-trading/

paper of Deep Q network by deep mind

paper Capturing Financial markets to apply Deep RL

tensor force_bitcoin_trader

gym_trading

other python resources

Q-Trader agent

As illustrated in below figure, model DQN model input :

historical stock data
historicsl market data
investment status, and reward

model output(action prediction):

hold
buy
sell

Environment

market env is essentially a time-series data-frame (RNNs work well with time-series data) there is a zero-market impact hypothesis, which essentially states that the agent’s action can never be significant enough to affect the market env.

The policy network outputs an action daily the market returns the rewards of such actions (the profit) and all this data ( status, amount of money gain or lost), sent to policy network to train

Features

this Q-learning implementation applied to (short-term) stock trading. The model uses t-day windows of close prices as features to determine if the best action to take at a given time is to buy, sell or hold.

Reward

Reward shaping is a technique inspired by animal training where supplemental rewards are provided to make a problem easier to learn. There is usually an obvious natural reward for any problem. For games, this is usually a win or loss. For financial problems, the reward is usually profit. Reward shaping augments the natural reward signal by adding additional rewards for making progress toward a good solution.

learning is based on immediate and long-term reward To make the model perform well in long-term, we need to take into account not only the immediate rewards but also the future rewards we are going to get.

In order to do this, we are going to have a ‘discount rate’ or ‘gamma’. If gamma=0 then agent will only learn to consider current rewards. if gamma=1 then agent will make it strive for a long-term high reward. This way the agent will learn to maximize the discounted future reward based on the given state.

the model is not very good at making decisions in long-term , but is quite good at predicting peaks and troughs.

we use TD method: In plain English, it means maximum future reward for this state and action (s,a) is the immediate reward r plus maximum future reward for the next state

the model get updated every few days.

Results

Some examples of results on test sets:

!^GSPC 2015 S&P 500, 2015. Profit of $431.04.

Alibaba Group Holding Ltd, 2015. Loss of $351.59.

Apple, Inc, 2016. Profit of $162.73.

Google, Inc, August 2017. Profit of $19.37.

How to Run

Install python 3.7
- Anaconda, Python, IPython, and Jupyter notebooks
- Installing packages
- conda environments
Download data
- training and test csv files from Yahoo! Finance
- put in files/input/
Bring other features if u have
Train model. for good results run:
- with minimum 200 episodes
- on all data (not just 2011)
- with GPU https://www.paperspace.com

python rl_dqn.py

See 2 plots generated
- profits over time
- trades over time
Back-test last model created in files/output/ on any stock

python backtest.py

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
files		files
README.md		README.md
ai_agent.py		ai_agent.py
backtest.py		backtest.py
rl_dqn.py		rl_dqn.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

References

Q-Trader agent

Environment

Features

Reward

Results

How to Run

About

Uh oh!

Releases

Packages

Languages

DataMining4Finance/py-ML-rl-trade

Folders and files

Latest commit

History

Repository files navigation

References

Q-Trader agent

Environment

Features

Reward

Results

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages