Monday, July 6, 2015

Machine Learning For Stock Price Forecasting

During my postgrad studies this semester I undertook an analysis of a few machine learning algorithms in an attempt to predict photometric redshifts (see the video here: https://www.youtube.com/watch?v=BARCby6X0uk ). I ended up focussing on Classification And Regression Trees (CART) as well as their ensembles. First I considered standard regression trees moving onto to random forests and boosting and found that using random forests coupled with boosting produced the lowest RMS. I have begun to think about using the same approach to forecast stock prices. CART is a form of supervised learning that looks to find a map $$f: \mathbb{R}^n \rightarrow \mathbb{R}$$ by dividing the measurement space $\chi$ consisting of all the measurement vectors $\vec{x}_i = (x_{i, 1}, x_{i, 2}, ..., x_{i, n})$, using binary trees, such that every measurement vector is mapped to a $j^{th}$ class. For continuous output, a regression is performed within each class. After some research (and stumbling onto Axel Sunden's thesis) I began with a crude beginning, I defined the measurement vector $\vec{x} =$ (Opening Price, Closing Price, Todays Low, Volume, Fast Stochastic, Slow Stochastic) looking to predict tomorrows High Price. I have the code in place so it was just a matter of changing the input data and training the algorithm. To get the daily historical data, I used Yahoo! Finance's python API and began with General Electric to train my algo. I used 8977 days to train and I tested it using an independent sample of 4489 days. To quantify the output, I classified the predictions as
  • 1. 'long' - tomorrows high is forecasted to be 3% higher than todays. 
  •  2. 'flat' - tomorrows high is forccasted to between [-3, 3]% of todays 
  • 3. 'short' - tomorrows high is forecasted to be 3% lower than todays 
Comparing the predicted with actual sample results, the algorithm classified 93% of the days correctly. More importantly, the algo never predicted a 'long' when the actual sample reflected a 'short' move. There were, however, 2 predictions where the actual market reflected a 'long' and the algo called a 'short' move. Showing results for the last 100 days is the figure below.


 While this was a very crude model, I believe it can be more successful with improvements, perhaps using weekly sampled data to improve on a definitive market long or short position and using more technical analysis indicators which I will implement soon.

Thursday, July 2, 2015

Monte-Carlo Option Pricing Via Encapsulation

In the spirit of Object Orientated Programming, we revisit our previous post using 'C++ Design Patterns and Derivative Pricing' by M Joshi to extend vanilla options pricing to not only calls but puts as well. We wish to encapsulate the code using seperate compilation and header files. This will allow the code to be used in such a way that the programmers need not know whats going behind the scenes of the class but is still able to use it. we begin by defining a class PayOff in the header PayOff1.h that has an enumeration declaration for the option types: call or put as well as the constructor for the Payoff which takes in the strike of the option and the type of the option pay-off. Lastly in the public section we have the main method for this class 'double PayOff::operator()(double spot) const' which is given a value of spot and returns the value of the pay-off. The operator() has been overloaded which is used as a 'functor'. We store the variables 'strike' and 'TheOptionsType' in the private section. We then define the implementation file 'PayOff1.cpp' which includes the above header file to initialize the constructor as well as the method operator(). By seperating the definitions and implementations for the private data we increase the privacy of the code. PayOff1.cpp can be found below: . Simply, the 'PayOff::operator ()(double spot) const' looks at the option type - call or put - and defines the payoff accordingly. Since puts are bets on prices falling, the amount paid would assume that Strike price > Spot price or else the payoff would be 0. We then define a header file called 'SimpleMC.h' storing the function declaration for our simulation calculator whose implementation is stored in 'SimpleMC.cpp'. This function is exactly the same as the function in the "call1" function from the previous post http://youth-economics.blogspot.com/2015/06/monte-carlo-methods-for-valuing-call.html except now we have a PayOff memory address to store thePayOff in the function declaration, also since we are now calculating calls and puts, the lines 22 and 23 of the previous calculator are replaced with The main programme is also the same as the previous post except when calling the final price of the put or call. The implementation can be found below. Here we use the method operator() to calculate the the payoff for a call and put and these values are then plugged into the monte-carlo engine to calculate call and put prices. Calculating option prices with a Spot = 80, Strike = 100, time = 1 year, volatility = 50% and the risk free rate of 4%, we use 100 000 paths to calcuate Call: 10.3117 Put: 26.6019 which are very close to the theoretical prices given by http://www.option-price.com/index.php

Tuesday, June 23, 2015

Monte-Carlo Methods For Valuing A Call Option

In Paul Wilmott's FAQ in Quantitative Finance, he describes 3 main numerical methods to value option contracts. These include finite difference, numerical quadrature and Monte Carlo methods. In this post, we will explore how to value a simple call option using c++ implementation. This follows the work described in 'c++ Design Patterns and Derivative Pricing' by M Joshi. We begin with a model of stock price evolution under a risk-neutral process with risk free rate r $$ dS_t = rS_tdt + \sigma S_t dW_t$$ Solving the stochastic differential equation using Ito's lemma so that $$ dlogS_t = \left(r-\frac{1}{2}\sigma^2 \right)dt + \sigma dW_t$$ Using the fact that the process is constant coefficient and $W_t$ ~ Brownian Motion ~ $\sqrt{T}N(0, 1)$ where T is the time to expiry we obtain $$ logS_t = log S_0 + \left(r-\frac{1}{2}\sigma^2 \right)T + \sigma \sqrt{T}N(0, 1)$$ so that $$S_T = S_0 e^{\left(r-\frac{1}{2}\sigma^2 \right)T + \sigma \sqrt{T}N(0, 1)}$$ Black–Scholes pricing theory then tells us that the price of a vanilla option, with expiry T and pay-off f , is equal to the expectation of the pay-off such that the price P is given by $$P =E[f(S_0 e^{\left(r-\frac{1}{2}\sigma^2 \right)T + \sigma \sqrt{T}N(0, 1)})] $$ Thus all is required is to randomly generate normal (0, 1) random numbers, plugging them into the formula and calculating the average price. The code can be found below: There are a few differences in the code found in Joshi's book. For example, instead of using header files to generate random numbers using box mulller or gaussian summation, I have opted to use the library and the normal distribution generator. I will be trying to go through the subsequent chapters of Joshi's book to generalise the above code to puts and other option types while incorporating OOP practices.

Tuesday, May 7, 2013

Need to price an option? Easy.. While the Financial Accounting Standards Board lists a few methods, the typical gold standard at the moment is the Black Scholes formula, made famous by Fischer Black, Myron Scholes and Robert Merton, awarded a nobel prize and is deemed a financial weapon of mass destruction. It has gained extreme publicity and yet is relatively simple to compute.. A call option is priced as:





Whats all the fuss about? The inputs are easy to understand and it takes a minute to programme. Some software packages have the function built in! In matlab: an example using the financial toolbox: 

Consider European stock options that expire in three months with a strike price of 95. Assume the underlying stocks spot price is currently 100 and pays no dividend with  volatility at 0.5 or 50% with the risk free rate at 10%, then 

[Call, Put] = blsprice(100, 95, 0.1, 0.25, 0.5) 
returns call and put prices of 13.70 and 6.35, respectively.
Options are here to stay. There are trillions of dollars, pounds and euros traded daily worldwide and are only increasing. Exciting times.
The Alpha Hunter

Sunday, January 27, 2013

Algorithms Everywhere!

From your everyday Google searches to high frequency trading, algorithms are everywhere in your everyday life. They play a crucial role in the functioning of our everyday lives working behind the scenes to simplify, automate and enrich our experiences. From the movies we watch, books we read, people we follow and pages we like.

We are being controlled by 'Bots' that attempt to think, act, behave.... and trade like we do. One big difference, its on an enormous scale and faster than we can ever imagine.
Trades are now being executed under 10 microseconds, bearing in mind it takes 350 microseconds to blink!

After reading "Dark Pools" it has come to my attention the future in quant finance is moving towards Artificial Intelligence or AI as its more commonly known, firms looking for the best programmers who have aptitude in exploiting speed of code, increasing speed of internet and understanding the wiring or 'plumbing' behind the connections to the exchanges. Just have a look at the jobs advertised at http://www.quantfinancejobs.com/ and you will understand exactly what I am talking about with openings requesting a PhD in a maths/applied maths/statistics/computer science/physics from a top school with a proficiency in programming in C++/C#/Sql/Java/R/Matlab ( I have a long way to go)!

With the toolbox consisting of  neural networks, fuzzy logic, genetic algorithms, machine learning and expert systems, applied mathematicians are able to take advantage of algorithms designed to scalp, pair trade, arbitrage and mean revert with machines thinking faster than you think you can think. 


Algorithms are buying and selling millions of dollars worth of financial instruments in split seconds on the large indexes and dark pools taking advantage of the 'basic at home trader' and 'investor' pushing prices up and front running when these algo's read a 'large whale order' coming in.
 Algo's called "Stealth" (developed by the Deutsche Bank), "Iceberg", "Dagger", "Guerrilla", "Sniper", "BASOR" (developed by Quod Financial) and "Sniffer" are giant sharks just waiting to exploit their strategies, gobbling us and each other up in a feeding frenzy when prices aren't 'mathematically sound'. 

While there is an argument that the electronic market is better than the old days of human traders, increasing liquidity and decreasing spreads, there are also the moments of weakness when computers get stuck in an infinite loop aka a selling frenzy (recall Knight Capital 2012 losing $400 million).



While it looks like a daunting time for financial markets. One cringes at the thought that markets are controlled by whoever has the best technology, there still is a big demand for the scientists who can use the technology to best effect. It still seems knowledge of stochastic calculus & processes, PDE's, measure theory, probability theory and their programming applications are important basics or building blocks for the quants these days the general trend is increasing specialization in computer science and AI using the basics along with speed to exploit price patterns.

The Alpha Hunter










Saturday, August 4, 2012

Correlation between Olympic Medals and GDP


I have been hit by the Olympic bug, stuck to the tv watching competitive events not normally televised. The Olympics seems to be a strange event where every 4 years we get behind our athletes to support them in events we rarely care about (how many times have you watched an IAAF meeting or a weightlifting competition outside of Olympic screening?)

As we see the USA and China in their familiar competitive environment, I looked at the medals tables over the previous 4 Olympics and began to wonder if I was just trying to reason random data, or if there is scientific proof regarding the economic prowess of a nation and its ability to achieve Olympic success. Stumbling on some academic work, I thought I should share... 

A June 2008 economic paper published by consultants PricewaterhouseCoopers found a strong historic link between money and medals: Countries with the bigger GDPs tend to be represented most often on Olympic podiums. It’s logical — nations with more resources can, if they choose, devote more money to investing in their athletes.
Similarly higher GDP per capita may also be associated with higher average nutrition and health levels could also boost performance in some sports.

Now for the stats...

In an article written by Xun Bian (2005) in The Park Place Economist, Volume XIII, the author attempted to quantify the relationship I mentioned above. The paper follows two studies on modeling national Olympic performance and using both a multiple linear regression model  and the ever popular Cobb-Douglas production function to estimate the influence of population size, economic resources, political and economic structure, and hosting advantage on nations’ Olympic performance.   

For the linear regression: Mt = C + α1 Nt + α2 (Yt / Nt) +α3 P +α4 Ht + ε. 


Mt denotes the medal number for a country at a particular Olympic Game.

Yt/Nt is therefore the per capita GDP of the country at the Olympic year. P and Ht are dummy variables for political and economic structure and hosting. P takes the value 1 if the country
has socialist background, which means the country is or was a socialist country, and it takes 0 if otherwise. Similarly, if the country is hosting the Olympics in that year, Ht takes the value of 1,
and 0 if otherwise. 


From the model we can note GDP per capita, population, socialism and hosting have positive coefficients noting the positive correlation between these variables, however we must not read too much into this model, with maximum adjusted R^2 of 50%, we would say that at least 50% of the variation in medal counts is unexplained by this model. Note however how statistically significant the variables are! 

Lets take a look at the second model, the Cobb Douglas Production Function: 
 As we have learnt in multivariable calculus, this is a function of 2 variables namely population (N), economic resources (Y):      Mt = At (Nt)^γ (Yt)^θ 

By taking natural log of both sides of the equation to make it linear in log form,  yields the following specification for Olympic medal counts: lnMt = lnAt + γ lnNt +θ lnYt + e. 


Since At captures other aspects that are influential on a country’s Olympic performance, we can
replace lnAt with the constant C, the communist dummy variable P, and the hosting dummy variable Ht. Therefore, the actual equation I used takes the
following form, in which α1 = γ and α2 = θ.

lnMt = C + α1 lnNt + α2 lnYt +α3 P +α4 Ht + e


The results are as follows: 


Notice we have not improved the adjusted R^2 and the coefficient ln Nt variable for population is now negative (but not statistically significant in this model), indicating a negative correlation between population size and Olympic performance. 


The author however says the above results are consistent with previous studies on national
Olympic performance (Johnson and Ali , 2000  & Bernard and Busse , 2000), this paper finds that socioeconomic variables, including population size, economic
resources, hosting advantage, and political structure have a significant impact on a country’s Olympic performance. 
In general, population size and economic resources are positively correlated with medal
counts. The larger the population size, the more likely a country is going to do better in the Olympics; the richer a country is, the more Olympic medals it will likely win. Being a hosting nation and having a communist background both have a favorable influence on a country’s Olympic performance.

The Young Economist







Thursday, July 5, 2012

Risk - A Risky Businesss!




Model risk, business risk, trading risk, operating risk. The world of finance has some obscure risks, many of which are trying to be quantyified. Enter the magical mysteries of mathematics and statistics along with the German genius Carl Freferich Gauss, famous for the Gauss/normal distribution of probability.    

He noted many natural observations follow a 'normal' distribution with bell curve shapes such as the height of people.

As mentioned in my previous blog, key discriptions of distributions include the mean (the first moment) or average as well as standard deviations (how much the actual observations deviate from the mean) - the second moment.

Tchebysheff's theorem postulates that, for a normal distribution, 67% of all observations lie in 1 standard deviation of the mean, 95% within 2 standard deviations and 99% within 3 deviations.

Modern finance has become fixated on standard deviations as it describes the volatility or risk of a share. A share that moves by 2% is twice as risky as a share that has a standard ddeviation of 1%. The standard deviation is then scaled to a volaility index - 
1% per day * square root(250 business days a year)
= 15.81% annual volatility. 


Why sqrt(250)? (5days a week * 52 weeks) - 10 days public holidays. The square root is a statistical trick, known as the 'root mean square rule' based on Geometric Brownian Motion describing the random pattern a share price is 'expected' to follow (qualitative investors im sorry but this is one of the assumptions). 

The whole point of these calculations is to answer questions on a portfolio "with a 99% confidence, what is the maximum price change, what is the maximum I would loose".


When all of the posiitons are evaluated (daily), the chairman (or whoever is in charge) recieves a "4:15 Report" - a daily report summarising the risk, usually with a figure such as VAR - value at risk. 


A VAR figure of 50 million at a 99% 10 day means you have a 99% probability (1% probablity) that you wont(will) loose 50million in a 10 day period. 

Today VAR figures feature in banking regulations, and 'offers precision in a world of chaos'.

VAR and other risk management tools have come under considerable fire for failing under abnormal market conditions. In the words of Satyajit Das "what is often forgotten is that Gauss originally intended the normal distribution as a test of error, not accuracy"  

The Young Economist