Pairs Trading Using HFT in OMX Baltic Market

Statistical arbitrage is a popular trading strategy where a profit arises from pricing inefficiencies between securities. The idea is simple: to find two stocks that move together and take long/short positions when they diverge abnormally, hoping that the prices will converge in the future. In the previous researches, the most popular statistical arbitrage strategies were tested using high frequency gas future market data. The best performance was shown with pairs trading strategy proposed by J. Caldeira and G. V. Moura. In this paper the database for testing covers 14 OMX Baltic stocks for 6 months between 2014-10-01 and 2015-03-31. During the trading period there were no predefined pairs, thus it was necessary to incorporate a pair selection algorithm in order to find best pairs for each trading period. The contribution of this paper is to test pairs trading strategy proposed by J. Caldeira and G. V. Moura with OMX Baltic stocks and to incorporate a trading pair selection algorithm.


Introduction
The global finance market is full of uncertainty, moreover, it is hard to predict how it will move and here the difficulty of trading in it lies.However, many investors make profit using available information and applying it to different strategies.The common method used by investors is the technical analysis.Pairs trading is one of the strategies, which detects arbitrage opportunities in the market.
It should be noted that the pairs trading strategy was originally developed by a group of computer scientists, physicists and mathematicians, employed by Morgan Stanley & Co., in the 1980s.This led to the development of automated trading program using advanced statistical modelling to exploit the market uncertainties.Morgan Stanley achieved a profit of $ 50 million by using the strategy till 1987, then its efficiency decreased as a result of the intensifying bid spread of the method, and, therefore, the group of Tartaglia was dissolved by 1989 (Madhavaram, 2013).
Recent reports reveal the great importance of electronic market making, as filed with the Securities and Exchange Commission, the Virtu Financial, Inc. by 2014 was able to perform only with one day losing out of 1238 days using electronic trading strategies (Cifu, 2014).

Vaitonis
High frequency trading can be found in almost all stocks, currencies, futures and options markets (Burton 2003, AFM 2010).There are many advantages of HFT, for example, bringing liquidity to markets, making market more efficient, etc. (Hagströmer and Norden, 2013).There are few papers that explain the trading strategy behaviour with high frequency data.The objective of this paper is to test the high frequency trading, statistical arbitrage and pair selection algorithm with 14 OMX Baltic stocks.HFT experimental study in OMX Baltic markets was chosen for the following reasons: Importantly, when working with stocks from different markets, it is possible to find similar commodities that are highly correlated and follow each other in the market even, if these stocks do not belong to the same market.
High frequency traders bring liquidity, which is important for markets, and even rewards traders for it.
The use of millisecond data brings the competitive edge to the trader.When trading with this kind of data, you work with the most recent information, which is important for HFT pairs trading.
Exchanges with electronic order handling structure where the biggest part of trading volume is generated in HFT by algorithmic machines.
In this paper, the system related to market efficiency or price discovery and statistical arbitrage was selected for deeper analysis.As it has already been mentioned, contribution of this paper lies in testing the proposed HFT pairs trading strategy and the pair selection algorithm with OMX Baltic stocks.The main results of this research are presented in the paper.

Statistical arbitrage and pairs trading
The beginning of statistical arbitrage is in line with the start of the first hedge funds, i.e., 1950, running statistical arbitrage strategies using mathematical models to find pricing inefficiencies, where long and short positions helped to reduce market risks (Ferguson and Laster, 2007).Pairs trading as well as statistical arbitrage find profitability when a long position earns more or loses less than a short position.Before advancements in computational science arbitrageurs, managing large positions had risk-free arbitrage opportunities based on actual pricing flaws or price change delays between different correlated markets (Driaunys et al., 2014).Now, the technological advances are far more than just the ability to trade big positions and have parallel access to correlated markets.All markets have moved to the electronical environment, trades are made in nanoseconds and more market participants have a real-time access to market data feeds.
Pairs trading is one of the most common strategies of statistical arbitrage and has been widely used by professional traders, institutional investors, and hedge fund managers since 1980 (Vidyamurthy, 2004;Dunis et al., 2010;Gatev et al., 2006;Hogan et al., 2004).Historically, pairs trading is a trading strategy taking advantage of the market inefficiencies based on a pair of stocks.The idea is to identify two stocks that move together and to take long and short positions simultaneously when they diverge abnormally (Miao, 2014;Elliot et al., 2005).It is expected that the prices of the two stocks will converge to a mean in the future (Caldeira and Moura, 2013;Perlin, 2009).
Furthermore, pairs trading is a market neutral statistical arbitrage strategy based on the convergence of financial instruments prices.Stock pairs, which present significant statistical correlation, are selected, and then by adding equivalent long and short positions one can create zero-investment portfolios; when the stock pairs abnormally deviate for a short period excess return can be gained the strategy results in abnormal return (Gatev et al., 2006).

Statistical arbitrage in HFT
High frequency traders seek profit from short-term pricing inefficiencies.With plenty of market players, more algorithms that are sophisticated are build and this only reduces profitability; however, high frequency trading is about effective trading strategies adopted to perform the trade that will guarantee success (AFM, 2010;Hagströmer and Norden, 2013).
HFT also refers to fully automated trading strategies in different securities like equities, derivatives and currencies.These types of opportunities have life span from seconds to nanoseconds.Capturing a tiny fraction of profit from each trade in huge number makes HFT an efficient way to generating substantial profit (Botos et al., 2014).
The main goal of any investor is to earn profit from their investment without losing any initial invested capital.Earning profits have become very difficult due to the uncertainties and risks involved in stock market.The implementation of certain trading strategies has become very useful in exploiting the market by using statistical arbitrage (Madhavaram, 2013).

The object of research
High frequency data was taken from NASDAQ exchange to test a statistical arbitrage profitability of the Baltic stocks.The main objective of this research is to find out whether HFT can be applied to the aforementioned stocks and the market.It has been done by applying statistical arbitrage strategy with high frequency data by calculating its profitability and risk.Caldeira and Moura (2013) were first to implement the strategy used; the second strategy is based on Herlemont (2006).Both strategies were modified to be able to work with high frequency data.At the end of the research, strategies were measured by taking into account the generated profit.Before selecting, these two strategies were tested in future commodity market (Masteika and Vaitonis, 2015).
Then, the pair selection algorithm, which was tested with low frequency data by Kun (2005), Perlin (2009), Bernardi and Gnoatto (2010) in their research, needed to be tested here.The method of the minimal squared distance was used.

The basic algorithm of statistical arbitrage in HFT
It should be noted that pairs trading is a trading strategy, which takes advantage of the market inefficiencies based on a pair of stocks.
It is sought to identify two stocks that move together and to take long and short positions simultaneously when they diverge abnormally.The prices of the two stocks are expected to converge to a mean in the future.Accordingly, Fig. 1 shows two correlated stocks Coca-Cola (KO) and Pepsi (PEP) that illustrate the possibility of pairs trading.

Methodology for pairs trading strategy
In fact, the idea of statistical arbitrage pairs trading strategy is quite simple.Following a detailed research, six main steps, necessary to be considered before starting working with this type of trading, were identified.The proposed methodology for the pairs trading strategy is illustrated in Fig. 2: An algorithm is constructed by completing the following steps: 1. Identifying window for training and data normalization; 2. Data normalization; 3. Correlated pairs selection; 4. Trading period selection; 5. Defining parameters for long/short positons; 6. Evaluation of trading strategy.The following chapters discuss the above steps in more details.As shown in a Fig. 2, the first step is to identify a window for the training period.As the high frequency data is ready to be incorporated into the strategy, a trader must decide on the length of the training period.When selecting the training period, one cannot make it too long because algorithm will overtrain, however, it cannot be too short as the strategy will not notice abnormal behaviour that comes from stock pricing.
Second period is for the data normalization, which is also used for trading period.Due to the fact that this period could be used for both, it is necessary to choose the right size.The size of this window depends on how sensitive the strategy should be.The shorter the period the more sensitive trading strategy.In our research, 1 minute training period and 1 minute window for the data normalization and trading was selected.
Next, before testing the strategy, data must be normalized.Normalization consists of two steps: 1. Time stamps normalization; 2. Stock price normalization.The first step is required as when we have 14 different stocks they do have different time stamps.It means that, for example, if one stock has the time of 08:46:03.740and the other has 08:46:03.745,both time stamps must appear in each stock and they are filled with previous prices.Time stamps represent the stock price at the given time that represents hours, minutes and milliseconds.Importantly, this normalization is needed only for the purpose of testing the strategy (Masteika et al., 2013).
In the second step, it is necessary to bring all prices to one size, in this way it is possible to compare stocks and find a correlation more efficiently.High frequency data normalization takes place in the following way: for each stock price P(i,t) we calculate the empirical mean µ(i,t) and standard deviation σ(i,t), and then the following equation is applied: The value p(i,t) is the normalized price of asset i at time t (Perlin, 2009;Driaunys et al., 2014;Masteika and Vaitonis, 2015).Other four steps of the trading strategy are explained in the next chapter.

Trading strategy
As it has already been mentioned, the trading strategy consists of six main steps.The diagram below explains every step in more details by filling necessary information for every step using the strategy proposed by Caldeira and Moura: First of all, we define a window for trading and data normalization.For this experiment, 1 minute window was selected.Next step is data normalization.It was explained in the previous section.Following the data normalization, it is necessary to find a correlated pair for each stock.The Least Squares is a procedure, requiring just some calculus and linear algebra, to determine what the "best fit" to the data and in our case which stock is best fit for other stock (Miller, 2006).This method requires little computer resources and calculations are not difficult.By using it, the pair selection becomes fast, which is important for high frequency trading.Therefore, the aforementioned method was selected for this research.

Qdist(i,j)=(x(i)-x(j)) 2 , N=min(Qdist).
(2) As it is shown in the formula (2), for the least square method we must calculate the squared difference of two stocks x(i) and x(j).The difference is of the normalized stock prices, which is then squared.After all squared distances are calculated for all possible pairs, the ones with minimum distance are found.It should be noted that only they are used for further trading.If the stock A is a pair with the stock B and the stock B is a pair with the stock A, only the first pair is used to avoid multiple trading signals.As shown in the formula (2), minimum squared method searches for pairs only for fixed time.However, the pairs are updated, if they change their means for fixed time; if not, pairs are kept the same.During the whole trading period, 2247 pairs were found, but if we take into account that the pair A and B is same as B and A, a total of 1284 possible stock pairs are found.The number of pairs found is low, since the Baltic market has a low liquidity.In comparison with other markets, there is less movement of stocks.Furthermore, if the minimal difference between normalized prices was zero, the pair would not be considered.If the difference between the stocks is zero, it means that the stocks do not move and their price is kept the same, while the algorithm takes into account only the movements of the prices.
When the pairs are found, trading signals are created.While using Caldeira and Moura's strategy, the signals are found by calculating the difference between the normalized prices of stock pairs: ε t =P(i,t)-p(i,t).
(3) where ε t is the difference at time t between the normalized prices of two stocks that are identified as a pair (stock P(i,t) and a stock p(i,t)).Then, the threshold z is found: where µ ε is the mean and σ ε is the standard deviation of the found differences of the normalized stock prices for a given trading window.When z t is found, buy/sell signals are created.Threshold z t is measured against the standard deviation σ of the found differences between the normalized prices of paired stocks:   (Herlemont, 2006).
Close all positions when difference between prices of A t and B t < µ t or the period for keeping positions opened is reached (Herlemont, 2006).Each position is kept opened until threshold is reached or a given time to keep the positions open ends.
Undoubtedly, in Herlemont's strategy, we can find similarities with Caldeira and Moura's strategy because both calculate the mean and the standard deviation; however, the main signal creation methods are different.
During this experiment, the maximum time to keep positions open was 1 minute.

Experimental setup
HFT statistical arbitrage calculations and pair selection algorithms were implemented in MATLAB.The data for our model was taken from OMX Baltic market and consists of 14 stocks.A trading period covers 6 months, from 2014-10-01 to 2015-03-31.The frequency of data provided had timestamp of milliseconds.After normalization, there were a total of 1358448 records and 97032 timestamps.
During a pair selection part, 52 unique pairs were found during whole trading period.The table above shows all these pairs and the frequency with which the pair was found.According to the results we can see that two pairs had the highest frequency GRD1R with AMG1L was found 116 times and OEG1T with AMG1L -106 times.These results may be taken into consideration when selecting correlated pairs for traders.Previous two experiments covered gas future contracts.Although it covered only one month of trading, it resulted in 5004987 records (Masteika and Vaitonis, 2015).To comparison, this experiment with 14 European NASDAQ Baltics region stocks for 6 months had 97032 records.The following results were obtained: Herlemont generated 6434 trading signals with 2,08% of profitability and Caldeira and Moura found 2572 trading signals, which resulted in total 5,05 % profit.As we can see from the results, the strategy of Caldeira and Moura is the more efficient as it generated more profit with less trading signals.When trading strategy creates less trading signals with more profitability it demonstrates that no unnecessary signals that may result in loss are created.The experiment showed that a HFT strategy could be applied in the Baltic region market.However, upon comparing, we noticed low liquidity in this market, thus it would be more advisable to use market-making strategies in OMX Baltic market.Low liquidity resulted in little number of trade signals.It was caused by the fact that there was little movement of stocks and sometimes it took hours until some trading signals were created.
Moreover, the costs and bonuses for liquidity provision must be considered before applying the strategy in real market conditions.The results would also differ because of the costs of HFT platform and subscriptions to HF data feeds.
Comparison was made between the trading results, obtained by using the trading strategy, and the best performance, which was presented by Caldeira with Moura and OMXBBGI index.As we can see from the diagram below, the strategy was beating the index until 19-01-2015.If the trading strategy is to be used in real market, a trader should revise the trading parameters (training, trading period, etc.) as often as possible.The aim of this research was only to test the performance of the trading strategy, no parameters were revised.They were kept unchanged for the whole trading period.Even when the strategy was overtaken by the index, it remained profitable.Over a long period, the trading strategies must be revised and adapted to the market changes.During the experiment, the strategy was kept the same, thus, as we can see from the figure below, it started losing its profitability and was overtaken by the index.Fig. 4 represents only those testing results, where transaction costs were kept as 0. In some cases, markets provide low or even no transaction cost for high frequency traders because they bring liquidity.However, in our case, if transaction costs were taken into account, the results would be less profitable or even negative due to the number of trades.

Conclusions
The number of derived financial instruments and the increased interconnectedness between markets have resulted in more opportunities for the profits from pricing inefficiencies or price move delays between securities, or, in other words, for the strategies, based on statistical arbitrage.These factors are the driving factors in testing Vaitonis the system based on statistical arbitrage in HFT as a leading environment of a modern investment.
It is hypothesized that the use of high frequency data improves profitability.Before applying the given data, it is necessary to normalize the datasets between stocks.Normalizing data is required as you must compare different stock prices.The main disadvantage of normalization is the fact that it requires additional resources.
Caldeira and Moura strategy showed the best result.During the trading period, the strategy created 2568 trading signals.This demonstrated that the suggested strategy is profitable with generation of total 5.05% profit during the entire trading period.
The minimal squared distance method was used for pairs selection.This algorithm found 1284 pairs during the entire trading period.52 of these pairs were unique.The result showed that this algorithm can be used for selecting pairs with high frequency data.
Before applying the strategy to real market conditions, it is necessary to test the performance with different parameters, to evaluate the trading infrastructure costs, bidirectional arbitrage, and find the ones that fit best.In conclusion, the statistical arbitrage and high frequency trading give positive results and can be attractive for market infrastructure developers and market participants, especially for low latency traders. Vaitonis
(Caldeira and Moura, 2013 long/short positions are created accordingly(Caldeira and Moura, 2013).D. Herlemont proposed the second strategy; it creates trading signals by calculating the mean µ t and the standard deviation σ t of the difference for normalized prices of paired stock for a given trading window.Upon identifying both criteria, signals are created, if the difference between the stock A and B pair prices is <2 σ t : If A t >B t , then open short position with A t and long position with B t ; If A t <B t , then open long position with A t and short position with B t