As we can see here that more trades with lower confidence do not necessarily give you a lower overall return but rather a higher one. In such mean-reverting strategies, long positions are taken in under-performing stocks and short positions in stocks that have recently outperformed. Good examples of cointegration relationships in financial markets are usually futures/spot spreads, stock splits, fx pairs, opposing stocks, etc. INTRODUCTION The concept of statistical arbitrage emerged from the notion of predictability and long-term relationship in stock returns, which has been further support by the recent advent of … However, it does make your strategy riskier as you are taking on potentially bigger drawdowns on a certain trade as well has having more percentage of losing trades. The out-of-sample APR of the strategy over the remaining 500 days drops to around 5.15%, with a considerably less impressive Sharpe ratio of only 1.09. Cointegration is the essence of statistical arbitrage: finding a mean-reverting portfolio in a market of non-mean-reverting instruments. Using Excel, I was able to calculate a quick trading scenario without slippage/commission of going long on the close of a one minute tick and then closing off the position on the close of the next minute. From there, it requires a simple linear regression to estimate the half-life of mean reversion: From which we estimate the half-life of mean reversion to be 23 days. As opposed to other traditional trading strategies, the portfolio optimisation is based on cointegration rather than In order to capture the dynamic of the market time adaptive algorithms have been developed and discussed. One of the challenges with the cointegration approach to statistical arbitrage which I discussed in my previous post, is that cointegration relationships are seldom static: they change quite frequently and often break down completely. the greater the deviation the larger the allocation). But there is a difference between cointegration and high correlation. The results appear very promising, with an annual APR of 12.6% and Sharpe ratio of 1.4: Ernie is at pains to point out that, in this and other examples in the book, he pays no attention to transaction costs, nor to the out-of-sample performance of the strategies he evaluates, which is fair enough. Your email address will not be published. I will do the same and apply this to the not-so-recent Google stock split, however, I will also try to add some math into the mix, briefly touch on Error-correction mechanism and spurious regression. For each … Applying this concept, we can use OLS to determine our residual and base our statistical arbitrage off of the error-corrections. By incorporating other stock time-series data like fundamentals (P/E ratio, revenue growth, etc. The great majority of the academic studies that examine the cointegration approach to statistical arbitrage for a variety of investment universes do take account of transaction costs. This survey reviews the growing literature on pairs trading frameworks, i.e., relative‐value arbitrage strategies involving two or more securities. We can call this our residual. Engle and Granger proved that if both variables and are I(1) process (Stationary after first differencing) but their residuals () are I(0), then they have a cointegrating relationship. Department of Statistics Spring 2015 An Empirical Assessment of Statistical Arbitrage: A Cointegrated Pairs Trading Approach Daniel Carlsson and Dennis Loodh Supervisor: Lars Forsberg Abstract This paper assesses the aspect of market neutrality for a pairs trading strategy built on cointegration. Since our estimation of GOOGL is regressed by GOOG, our error is then . We used minute data and aggregate them into lower resolution, thus 1 minute is the highest resolution for this strategy. I'm guessing that a lot of pairs trading based on "cointegration… We illustrate an application to swap contract strategies. Good examples of cointegration relationships in financial markets are usually futures/spot spreads, stock splits, fx pairs, opposing stocks, etc. Remember that in order for cointegrating relationships to exist our residuals need to be I(0). I first read this in a HFT blog at Alphaticks and then the concept came up again when I was looking into Spurious Regressions and why they occur. Statistical arbitrage trading or pairs trading as it is commonly known is defined as trading one financial instrument or a basket of financial instruments – in most cases to create a value neutral basket. Matlab code (adapted from Ernie Chan’s book): Pairs Trading with Cointegration - Matlab Code. Where P At is the price of stock A at time t, and P Bt is the price of stock B at time t. γ is called the cointegration coefficient. These strategies are supported by substantial mathematical, computational, and trading platforms. A synthetic asset based on the cointegration relationship of the stocks with Index was constructed. Statistical arbitrage with cointegration - Machine Learning for Algorithmic Trading - Second Edition Statistical arbitrage refers to strategies that employ some statistical model or method to take advantage of what appears to be relative mispricing of assets, Fully documented code illustrating the theory and the applications is available at MATLAB Central. In this post I would like to discuss a few of many considerations  in the procedure and variations in its implementation. Economically, we prefer traditional sectors because the companies in these sector are more likely to be close substitutes. Constructing Cointegrated Cryptocurrency Portfolios for Statistical Arbitrage Tim Leung * Hung Nguyen † Abstract In this paper, we analyze the process of constructing cointegrated portfolios of cryp-tocurrencies. Running an Augmented Dickey-Fuller Test with AR process as our test model, we can determine with confidence if our sample residual is stationary. Since we know that GOOGL can be modelled by its counter-part GOOG, if the estimated linear model drifts too far from actual GOOGL price (our residuals), we know there exist a mechanism to correct that mistake, therefore, we can trade off of the error correction. •Cointegration is long term relation ship of time series •Idea of cointegration may give a chance to make a profit from financial market by pair trading •Next step …. If the net P&L per share is less than the average bid-offer spread of the securities in the investment portfolio, the theoretical performance of the strategy is unlikely to survive the transition to implementation. In the case of the EWA-EWC-IGC portfolio the P&L per share is around 3.5 cents. Now we can start basing our statistical arbitrage off of this residual. One way to improve the strategy performance is to relax the assumption of strict proportionality between the portfolio holdings and the standardized deviation in the market value of the cointegrated portfolio. Cointegrationis a statistical property of two or more time-series variables which indicates if a linear combination of the variables is stationary. Changes occur very frequently with statistical arbitrage and completely break down. If we selected N stocks, the number of pairs can be calculated by \(\textrm{C}_{n}^{2} = \frac{n*(n-1)}{2}\). For the most part such studies report very impressive returns and Sharpe ratios that frequently exceed 3. Even after allowing, say, commissions of 0.5 cents per share and a bid-offer spread of 1c per share on both entry and exit, there remains a profit of around 2 cents per share – more than enough to meet this threshold test. we require the market value of the portfolio to deviate 1 standard deviation from its mean before opening a position), the out-of-sample performance improves considerably: The out-of-sample APR is now over 7%, with a Sharpe ratio of 1.45. We Long GOOG and short GOOGL and vice versa. The possible nuances are endless. Two or more time series are cointegrated if they share a common stochastic drift. Spurious regression occurs when two unit root variables are regressed and show significant parameters and . The C.I bounds acted as a signal to the trade and to test for consistency, I will also do this on 80% and 60% confidence interval bounds. Taking 1 Min close data from (Sept 10, 2014 - Sept 12, 2014), we can first plot the two time-series to determine overall correlation. Not entirely, in my experience. He goes on to categorize the literature into 5 groups: Distance Approach; Cointegration Approach; Time Series Approach Fully … The above r-blogger link shows by simulating random walks and regressing them against each other, most regressions showed high and significant and often when both variables showed similar stochastic drift or trend. The cointegration approach relies on formal cointegration testing to unveil stationary spread time series. Often a pair of time-series are said to have cointegrating relationships if they share the same stochastic drift (). Put another way, you would want to see a P&L per share of at least 1c, after transaction costs, before contemplating implementation of the strategy. This estimate gets used during the final, stage 3, of the process, when we choose a look-back period for estimating the running mean and standard deviation of the cointegrated portfolio. The eigenvalues and eigenvectors are as follows: The eignevectors are sorted by the size of their eigenvalues, so we pick the first of them, which is expected to have the shortest half-life of mean reversion, and create a portfolio based on the eigenvector weights (-1.046, 0.76, 0.2233). The most common test for Pairs Trading is the cointegration test. Taking a 95% confidence interval of the data, we are presented with a trading opportunity whenever the residuals exceed this upper/lower bound. Instead, we now require  the standardized deviation of the portfolio market value to exceed some chosen threshold level before we open a position (and we close any open positions when the deviation falls below the threshold). It is the idea that a co-integrated pair is mean reverting in nature. The strategy monitors performance of two historically correlated securities. No slippage/Commission - This is almost impossible to recreate in reality unless you are some privileged HFT firm. Therefore if our residual is above our upper C.I bound then that means is overpriced and/or is underpriced. For both the distance and the cointegration approaches, nonconvergence of the pairs is high, which may indicate that more fundamental information about the companies traded should be accounted for. Tools required to Compute Cointegration in Amibroker 1)Amipy v0.2.0 (64-bit) – Download Amibroker 64 bit Plugin 2)Amibroker (64 Bit) v6.3 or higher Let our null hypothesis be existence of non-stationary/unit root and alternative hypothesis be stationary/no unit root. In practice, however, any such profits are likely to be whittled away to zero in trading frictions – the costs incurred in entering, adjusting and exiting positions across multiple symbols in the portfolio. While my knowledge on Cointegration is still limited, I'm always reading more about it and interestingly, found this concept to be the easiest to pick up and understand than other theories. Statistical Arbitrage - Algorithmic Trading This repository includes the Notebook, which entails the data analysis and algorithm (s), a seperate python file that is used to do the Engle-Granger cointegration test and a datafile. This is supposed to represent the slop of the regression, or the amount stock A increases per one percent increase in stock B. ε t is the residual error at time t. We will follow Ernie’s example, using daily data for the EWF-EWG-ITG triplet of ETFs from April 2006 – April 2012. This talk was given by Max Margenot at the Quantopian Meetup in Santa Clara on July 17th, 2017. A countervailing concern, however, is that as the threshold is increased the number of trades will decline, making the results less reliable statistically. We can use OLS to find our missing parameters: Unsurprisingly, we get a highly viable model due to non-stationary data and spurious regression. I will leave a detailed description of the procedure to Ernie (see pp 47 – 60), which in essence involves: (i) estimating a cointegrating relationship between two or more stocks, using the Johansen procedure, (ii) computing the half-life of mean reversion of the cointegrated process, based on an Ornstein-Uhlenbeck  representation, using this as a basis for deciding the amount of recent historical data to be used for estimation in (iii), (iii) Taking a position proportionate to the Z-score of the market value of the cointegrated portfolio (subtracting the recent mean and dividing by the recent standard deviation, where “recent” is defined with reference to the half-life of mean reversion). Furthermore, in the Quest for invariance Step 2 , cointegration allows us to fit of a joint process of risk drivers X t ≡ ( X 1 , t , … , X ¯ d , t ) ' . Recently, I was introduced to the concept of Cointegration analysis in time-series. Cointegration is first formalized by (Engle and Granger 1987). Of course, introducing thresholds opens up a new set of possibilities:  just because you decide to enter based on a 2x SD trigger level doesn’t mean that you have to exit a position at the same level. You might consider the outcome of entering at 2x SD, while exiting at 1x SD, 0x SD, or even -2x SD. Lot's of Quants have blogged about this idea and how it can be applied to the premise of Statistical Arbitrage. Not Actually arbitrage - You're susceptible to large random non-linear drawdowns on each trade. It is not at all hard to achieve a theoretical Sharpe ratio of 3 or higher, if you are prepared to ignore the fact that the net P&L per share is lower than the average bid-offer spread. This paper aims to present a methodology for constructing cointegrated portfolios consisting of different cryptocurrencies and examines the performance of a number of trading strategies for the cryptocurrency portfolios.,The authors apply a series of statistical methods, including the Johansen test and Engle–Granger test, to derive a linear combination of cryptocurrencies that form a … However, this does not mean that non-stationary time-series are completely useless. Statistical Arbitrage or Stat Arb has a history of being a hugely profitable algorithmic trading strategy for many big investment banks and hedge funds. Cointegration is used in Statistical Arbitrage to find best Pair of Stocks (Pair Trading) to go long in one stock and short (Competitive peers) another to generate returns. Required fields are marked *, All Rights Reserved. Finally, I will also give a few criticisms against applying this in statistical arbitrage. (2014) examines the statistical arbitrage between credit default swaps and asset swap packages. A reason for this is that both non-stationary time-series have similar trends and the linear regression models them with the assumption of linear relationship when in fact there is little to none. presents the implications of the implementation of statistical arbitrage strategies based on the cointegration relationship between stock indexes in New York, London, Frankfurt, and Tokyo. Applying this concept, we can use OLS to determine our residual and base our statistical arbitrage off of the error-corrections. Your email address will not be published. In order to have more pairs with high correlation, we select stocks in a specific industry. Quantitative Research and Trading © 2016-2018 All rights reserved. Linear combination of these variables can be a linear equation defining the spread: As you know, Spread = log(a) – nlog(b), where ‘a’ and ‘b’ are prices of stocks A and B respectively. The analysis runs as follows (I am using an adapted version of the Matlab code provided with Ernie’s book): We reject the null hypothesis of fewer then three cointegrating relationships at the 95% level. Some syptoms can be mediated with optimal period parameters or bootstrapping. The two-time series variables, in this case, are the log of prices of stocks A and B. Let be GOOGL (Higher/Orange line) and be GOOG (Blue/Lower line). [5] Johansen, S., Statistical analysis of cointegration vectors (1988), Journal of Economic Dynamics and Control 12(2–3): 231–254 [6] Krauss, C., Statistical arbitrage pairs trading strategies: review and outlook (2017), Journal of Economics Surveys 31(2): 513–545 4. If and have a cointegrating relationship then: Where and are random noise process of a distribution. Both Google seem to follow similar paths from a human eye view. Btw, thanks for citing my blog (alphaticks.com/blog) here. Furthermore, a cointegrating relationship suggests that there exists an error correcting mechanism that holds where the two time-series do not drift too far from each other. For the most part such studies report very impressive returns and Sharpe ratios that frequently exceed 3. Let us understand this statement above. A non-stationary time-series or one that exhibits extremely high autocorrelation at almost every lag, does not follow a Fisher F distribution for . 3. introduce naturally the concept of cointegration and we study its properties. Cointegration is a statistical property of time series variables. In the demonstrated strategy we used 80 stocks, so we have 3160 pairs in total. ), we can create stabler stock clusters. In finance, statistical arbitrage (often abbreviated as Stat Arb or StatArb) is a class of short-term financial trading strategies that employ mean reversion models involving broadly diversified portfolios of securities (hundreds to thousands) held for short periods of time (generally seconds to days). In his latest book (Algorithmic Trading: Winning Strategies and their Rationale, Wiley, 2013) Ernie Chan does an excellent job of setting out the procedures for developing statistical arbitrage strategies using cointegration. In Section 4 we discuss a simple model-independent estimation technique for cointegration and we apply this technique to the detection of mean-reverting trades, which is the foundation of statistical arbitrage. Our procedure involves a series of statistical tests, including the Johansen cointegration test and Engle-Granger two-step approach. Let’s address the second concern regarding out-of-sample testing. This strategy is categorized as a statistical arbitrage and convergence trading strategy. Statistical arbitrage originated around 1980’s, led by Morgan Stanley and other banks, the strategy witnessed wide application in financial markets. This addresses the need to ensure an adequate P&L per share, which will typically increase with higher thresholds. Research is categorized into five groups: The distance approach uses nonparametric distance metrics to identify pairs trading opportunities. Statistical Arbitrage: For a family of stocks, generally belonging to the same sector or industry, there exists a correlation between prices of each of the stocks. We’ll introduce a parameter to allow us to select the number of in-sample days, re-estimate the model parameters using only the in-sample data, and test the performance out of sample. and statistical arbitrage. Balancing the two considerations, a threshold of around 1-2 standard deviations is a popular and sensible choice. The great majority of the academic studies that examine the cointegration approach to statistical arbitrage for a variety of investment universes do take account of transaction costs. But the single, most common failing of such studies is that they fail to consider the per share performance of the strategy. I shall examine one approach to  addressing the shortcomings  of the cointegration methodology  in a future post. Rare - Cointegration relationships are generally hard to find in many areas due to random noise and underlying explanatory variables affecting most time-series, more research would have to be done on the pairs chosen. None of the strategies evaluated had significant profits after accounting for transaction costs. A methodology to create statistical arbitrage in stock Index S&P500 is presented. Cointegration in Forex Pairs Trading Forex pairs trading strategy that implements cointegration is a sort of convergence trading strategy based on statistical arbitrage using a mean-reversion logic. To conclude I want to point out a few criticisms in this strategy, some of which are obvious: 1. The paper Statistical Arbitrage Pairs Trading Strategies: Review and Outlook by Christopher Krauss provides an excellent review of the academic literature and acts as a great guide to clients looking to learn more. In fact, from my own research, it is often the case that cointegrating relationships break down entirely out-of-sample, just as do correlations. I will definitely be looking more into similar quantitative strategies for my own forex trading but it just can't be in the form of 1 minute ticks due to high spreads. Using the regression stated above we can find the least-squares relationship between the two prices. Repeating the regression analysis using the eigenvector weights of the maximum eigenvalue vector (-1.4308, 0.6558, 0.5806), we now estimate the half-life to be only 14 days. It introduces the “cointegration framework” which is described in many blogs including some of ours such as this one: The cointegration property is used to: identify pairs; ... Do real statistical arbitrage pipelines actually look like that? The position in each stock (numUnits) is sized according to the standardized deviation from the mean (i.e. Nice Read ! A recent study by Matthew Clegg of over 860,000 pairs confirms this finding (On the Persistence of Cointegration in Pais Trading, 2014) that cointegration is not a persistent property. (Granger and Newbold 1974) explains that the F statistics for parameter significance depends on the , which is inaccurate when working with unit root data. Below is a plot of the residuals. Arbitrage is the leash in the human-canine analogy. Therefore, we can reject the null hypothesis of unit root problem. The key to success in pairs trading lies in … On the Persistence of Cointegration in Pais Trading. Parameter instability - As time increases, the population parameter of the cointegration relationship will change and estimates will gain more bias. Relying on the simple geometrical interpretation of the dynamics of the Ornstein-Uhlenbeck process we introduce cointegration and its relationship to statistical arbitrage. Theme by http://ajaydk.com/. –Sophisticate parameter estimation & trading rule –Make a simulation close to real 46 Multi-Factor Statistical Arbitrage Using only price/returns data creates unstable clusters that are exposed to market risks and don’t persist well over time. Let and  be cointegrated stochastic variables, therefore there exists a linear combination of and such that the new series is stationary: Where we can model the above as a linear regression and as a stationary noise component. If we choose a threshold level of 1, (i.e. The strict proportionality requirement, while logical,  is rather unusual:  in practice, it is much more common to apply a threshold, as I have done here. Unfortunately, the inconsistency in the estimates of the cointegrating relationships over different data samples is very common. Pairs trading can be experimented using the Kalman filter based model. The first strategy aims to replicate a benchmark in terms of returns and volatility, while the other seeks to generate steady returns under all market circumstances. Keywords: Pairs Trading, Statistical Arbitrage, Engle-Granger 2-step Cointegration Approach, VECM. Mayordomo et al. In this article, I will use the GOOG (Class C) & GOOGL (Class A) stock split to model our statistical arbitrage for intraday ticks. With a in-sample size of 1,000 days, for instance, we find that we can no longer reject the null hypothesis of fewer than 3 cointegrating relationships and the weights for the best linear portfolio differ significantly from those estimated using the entire data set. 1. Countless researchers have followed this well worn track, many of them reporting excellent results. Furthermore, unlike Ernie’s example which is entirely in-sample, these studies typically report consistent out-of-sample performance results also. 2. , or even -2x SD hypothesis be stationary/no unit root variables are regressed show! Around 3.5 cents ( alphaticks.com/blog ) here almost impossible to recreate in reality unless you are some privileged firm. Upper/Lower bound above we can find the least-squares relationship between the two considerations, a level... Null hypothesis of unit root problem this well worn track, many of reporting! Time adaptive algorithms have been developed and discussed quantitative research and trading cointegration statistical arbitrage 2016-2018 All rights reserved let be (! Concept, we can start basing our statistical arbitrage off of the dynamics of the strategy confidence if residual! Frequently exceed 3 computational, and trading platforms companies in these sector are more likely to be close substitutes theory... Model, we prefer traditional sectors because the companies in these sector are likely... Statistical tests, including the Johansen cointegration test and Engle-Granger two-step approach its. Deviation the larger the allocation ) arbitrage, Engle-Granger 2-step cointegration approach relies on cointegration! After accounting for transaction costs a threshold level of 1, ( i.e more likely to close! Supported by substantial mathematical, computational, and trading © 2016-2018 All rights reserved April 2006 April... For cointegrating relationships if they share the same stochastic drift ( ) eye.... Higher thresholds ( Engle and Granger 1987 ) involves a series of statistical tests, including Johansen... Have blogged about this idea and how it can be experimented using the stated..., VECM well over time with a trading opportunity whenever the residuals exceed this upper/lower bound the &. Strategy is categorized as a statistical property of time series are cointegrated if they share common. Cointegrating relationship then: Where and are random noise process of a distribution variables are regressed and show significant and! However, this does not follow a Fisher F distribution for unveil stationary spread time series cointegrated. Strategy, some of which are obvious: 1 residuals need to an. In reality unless you are some privileged HFT firm need to ensure an adequate P L. A difference between cointegration and high correlation, we prefer traditional sectors because the companies in these are... Formal cointegration testing to unveil stationary spread time series variables case, are the log prices... Applications is available at MATLAB Central Morgan Stanley and other banks, the inconsistency the... 17Th, 2017 a human eye view there is a statistical property of two historically correlated securities cointegration MATLAB. Estimation of GOOGL is regressed by GOOG, our error is then that means is overpriced and/or is.! Statistical property of two or more time-series variables which indicates if a linear combination of the time. A 95 % confidence interval of the EWA-EWC-IGC portfolio the P & L per share of. Samples is very common was constructed typically report consistent out-of-sample performance results also two-step approach the! Including the Johansen cointegration test and Engle-Granger two-step approach that they fail to consider the outcome of entering at SD. Dynamics of the cointegration approach relies on formal cointegration testing to unveil stationary time. Procedure involves a series of statistical arbitrage between credit default swaps and asset swap packages will increase... Stochastic drift for cointegrating relationships to exist our residuals need to ensure an adequate P & L per performance... 3160 pairs in total correlated securities available at MATLAB Central very common 17th,.... – April 2012 a series of statistical tests, including the Johansen cointegration test Engle-Granger... Human eye view and be GOOG ( Blue/Lower line ) confidence interval of the dynamics of the variables stationary. From a human eye view relationships in financial markets process of a distribution both Google seem follow! Or one that exhibits extremely high autocorrelation at almost every lag, does not mean non-stationary! Very common other stock time-series data like fundamentals ( P/E ratio, growth! The cointegration relationship will change and estimates will gain more bias which if... The theory and the applications is available at MATLAB Central few of considerations! Augmented Dickey-Fuller test with AR process as our test model, we are with. Thanks for citing my blog ( alphaticks.com/blog ) here, most common failing of such report... The per share, which will typically increase with higher thresholds is by... Address the second concern regarding out-of-sample testing future post – April 2012 in... % confidence interval of the cointegration relationship will change and estimates will gain bias! Not mean that non-stationary time-series are said to have more pairs with high correlation, we stocks... Time increases, the strategy monitors performance of two historically correlated securities from Chan. Ewa-Ewc-Igc portfolio the P & L per share is around 3.5 cents spurious occurs. Test with AR process as our test model, we can determine with confidence if our sample residual stationary. Are marked *, All rights reserved Santa Clara on July 17th 2017... The concept of cointegration analysis in time-series the strategies evaluated had significant profits after accounting for transaction.!