ML4T/strategy_evaluation/strategy_evaluation.md

This document is the final report for the machine learning for trading
course. I have implemented two manual strategies, a random tree
learner-based strategy and one based on Q-learning.

# Experiment 1

I have implemented two manual strategies. The first strategy buys on a
bullish MACD cross with a MACD smaller than zero and sells on a bearish
MACD cross with a MACD greater than one.

The second strategy uses MACD diff (the difference between the MACD and
the MACD signal), RSI, and price SMA with a period of eight. I have
plotted the metrics over their one, three, and five days return to find
reasonable thresholds for the strategy.

![Scatter plot to find reasonable thresholds.](figure_3.png)

Based on the scatter plots, I have created a list of buy and sell
signals. Each signal uses the current number of shares owned and one of
the three indicators. The following figure shows the result for both
manual strategies compared to the benchmark. Both approaches do well in
the in-sample period but worse afterward, which I expected because I
cherry-picked the thresholds based on the in-sample period's scatter
plots.

![First strategy based on MACD. Better than just holding.](figure_1.png)

Next, I have implemented a random tree-based strategy learner. The
learner uses a leaf size of five and no bagging. A smaller leaf size
would result in overfitting to the in-sample data. But as the following
screenshot shows, five works well, and the RT learner does well for the
out of sample data.

![Manual strategy compared to RT learner.](figure_2.png)

I have also implemented a strategy learner based on Q-learning. The
Q-learner uses fifteen training runs on the in-sample data. It mostly
does well for the out of sample data, but it looks like the RT-based
strategy learner is better.

I am using a bin-size of five for the three indicators mentioned before.
That results in 375 (3x5x5x5) states with only about 500 in-sample data
points. Probably the Q-learner is overfitting to the in-sample data.
Indeed, with bin sizes of four, the Q learner performs better for the
out-of-sample data.

![Strategy learner based on Q-Learning with using four and five bins
for discretization out of sample.](figure_4.png)

# Experiment 2

Experiment 2 aims to show that the strategy learner trades differently
when there is a commission, and the impact is not zero. The RT-based
trader does not consider the commission value, but the Q-learning based
trader does.

However, it seems like a commission smaller than $10 does not affect
the number of trades significantly. Only when the commission is around
$50 or with a slippage of 1% we see considerably fewer transactions.

| commission | n_orders |
|------------|----------|
| 9.95       | 79       |
| 20         | 83       |
| 50         | 63       |
| 100        | 37       |

# Closing Remarks

Machine Learning for Trading is a great course. It gives an excellent
introduction to finance, trading, and machine learning without getting lost in
technical or mathematical details. I have enjoyed building decision tree
learners and a Q learner from first principles. At the same time, the course
accurately teaches powerful libraries such as NumPy and Pandas.