During my postgrad studies this semester I undertook an analysis of a few machine learning algorithms in an attempt to predict photometric redshifts (
see the video here: https://www.youtube.com/watch?v=BARCby6X0uk ). I ended up focussing on Classification And Regression Trees (CART) as well as their ensembles. First I considered standard regression trees moving onto to random forests and boosting and found that using random forests coupled with boosting produced the lowest RMS.
I have begun to think about using the same approach to forecast stock prices. CART is a form of supervised learning that looks to find a map $$f: \mathbb{R}^n \rightarrow \mathbb{R}$$ by dividing the measurement space $\chi$ consisting of all the measurement vectors $\vec{x}_i = (x_{i, 1}, x_{i, 2}, ..., x_{i, n})$, using binary trees, such that every measurement vector is mapped to a $j^{th}$ class. For continuous output, a regression is performed within each class.
After some research (and stumbling onto Axel Sunden's thesis) I began with a crude beginning, I defined the measurement vector $\vec{x} =$ (Opening Price, Closing Price, Todays Low, Volume, Fast Stochastic, Slow Stochastic) looking to predict tomorrows High Price. I have the code in place so it was just a matter of changing the input data and training the algorithm.
To get the daily historical data, I used Yahoo! Finance's python API and began with General Electric to train my algo. I used 8977 days to train and I tested it using an independent sample of 4489 days. To quantify the output, I classified the predictions as
- 1. 'long' - tomorrows high is forecasted to be 3% higher than todays.
- 2. 'flat' - tomorrows high is forccasted to between [-3, 3]% of todays
- 3. 'short' - tomorrows high is forecasted to be 3% lower than todays
Comparing the predicted with actual sample results, the algorithm classified 93% of the days correctly. More importantly, the algo never predicted a 'long' when the actual sample reflected a 'short' move. There were, however, 2 predictions where the actual market reflected a 'long' and the algo called a 'short' move. Showing results for the last 100 days is the figure below.
While this was a very crude model, I believe it can be more successful with improvements, perhaps using weekly sampled data to improve on a definitive market long or short position and using more technical analysis indicators which I will implement soon.
No comments:
Post a Comment