Pitfalls in long memory research

Abstract This paper offers a multifaceted perspective of the literature on long memory. Although the research on long memory has played an instrumental role in elevating the level of scholarly discourse on market efficiency, the authors believe that the issue of the prevalence of long memory or lack thereof remains unsettled. While long memory models should be in the econometrician’s toolbox, their use should be governed by an initial exploratory analysis of the data being studied and the context of the research questions being addressed. Mere fixation on the presence/absence of long memory without taking due cognisance of other confounding factors would pave way for confirmation bias. Consequently, this paper pinpoints the possible pitfalls and potential trade-offs in modeling long memory in asset prices. While not a comprehensive meta-analysis of the literature on long memory, this paper offers a selective bibliography of prior works on long memory that is geared to nudge researchers to exercise caution and judgement while exploring long memory in asset prices.


Introduction
While studies grounded in Long Memory constitute a notable strand of literature disputing market efficiency, such studies are not devoid of caveats. Long Memory gained traction in scholarly discourse due to Mandelbrot's work on asset prices using rescaled range estimation techniques ABOUT THE AUTHORS Kunal Saha is a PhD student at the IFMR GSB, University of Madras, Chennai. Vinodh Madhavan currently serves as an Associate Professor at Amrut Mody School of Management, Ahmedabad University. He completed his DBA from Golden Gate University (GGU), San Francisco. Dr. Vinodh Madhavan has served as an Adjunct Faculty at GGU, as Malcolm S. M. Watts III Research Fellow at Technical Securities Analysts Association of San Francisco, and as a tenure-track faculty at IIT Kharagpur, IIM Lucknow, IFMR and at IMT Ghaziabad.
G. R. Chandrashekhar is a Professor at IFMR GSB, Krea University, Chennai. He completed his doctoral program from IIM Lucknow. Post his doctoral studies, Prof. Chandrashekhar has worked in industry for a decade, and has been an academic for more than a decade at XLRI, IIM Indore, and IIM Ranchi.

PUBLIC INTEREST STATEMENT
The weak-form of Efficient Market Hypothesis (EMH) essentially states that the future prices of financial assets cannot be predicted based on historical prices. The presence of long memory in financial time series goes against weak-form EMH. Having said so, the prevalent literature on long memory is not without caveats, for long memory is potentially confounded by other statistical factors such as structural breaks, temporal and cross-sectional aggregation. Prior literature reviews on long memory were either anchored on the mathematical underpinnings of different measures of long memory or were subsumed within the broader contours of market efficiency. This paper breaks away from this trend and in turn (a) pinpoints the possible pitfalls and potential trade-offs in modeling long memory in asset prices, and (b) nudges researchers to exercise requisite caution before drawing definitive inferences on long memory. (Mandelbrot & Van Ness, 1968;Mandelbrot & Wallis, 1968). A lot of water has flown under the bridge since then. Meanwhile, the literature on long memory has become multi-dimensional in nature. While the initial works on re-examining market efficiency using methodologies that are theoretically grounded in long memory offered the much-needed contrast to a somewhat homogeneous literature on market efficiency; inferences from such studies on the prevalence of long memory or lack thereof have not been unequivocal.
Although there are a few review papers that discuss long memory (Baillie, 1996;Guégan, 2005;Lim & Brooks, 2011), literature seems wanting on bringing the various arguments for and against the observation of long memory together. In this backdrop, the authors believe a snapshot of prevailing literature on long memory in asset prices, without losing sight of the attendant contexts behind such studies, is the need of the hour. Such a snapshot would aid researchers to take stock of the various facets of the discourse on long memory, in a manner that would nudge them to exercise requisite caution before drawing any definitive inference on long memory in asset prices in their future research endeavours.
While prior studies on long memory have significantly broadened the literature landscape on market efficiency, a definitive takeaway from such literature on long memory that is oblivious to other confounding factors which can manifest as long memory would be short-sighted and self-fulfilling. In short, this is as much an attempt to sensitize researchers about the pitfalls in research on long memory, as highlighting the prominence of long memory in the context of revisiting market efficiency.

Definition
For a second-order stationary process X t with an auto-covariance function γ X ðkÞ, X t has a) Short Memory, if Long memory (or persistence) implies that a positive or negative movement is more likely to be followed by another move in the same direction. On the other hand, for an anti-persistent process, a positive movement is more likely to be followed by a move in the opposite direction. In other words, a persistent process is trending whereas an anti-persistent process shows mean reversion. Beran, Feng, Ghosh, and Kulik (2016) provides a detailed review of various definitions of long memory and the conditions in which these can be used interchangeably. The different measures of long memory are as follows.

Hurst exponent
The most popular measure for long memory is the "Hurst Exponent"(denoted as H). This measure gained traction owing to Mandelbrot and Wallis's pioneering work on operational hydrology (Mandelbrot & Wallis, 1968). There are several methodologies that are used to calculate the Hurst Exponent. The classical-rescaled range (R=S) analysis proposed by Hurst (1951) and its subsequent variants, such as Modified R/S analysis and Rescaled Variance (V/S) analysis, are the most prominent ones.
When 0:5 < H < 1, the autocovariances are positive at all lags and the time series process is called persistent. When 0 < H < 0:5, the autocovariances at all lags are negative and the time series process is called anti-persistent.

Fractional order of integration
Another popular approach to ascertain long memory or lack thereof is to measure the fractional order of integration (denoted as d) of a time series. This paved the way for ARFIMA-FIGARCH models, which were designed to explicitly model long memory in the first and second moments (Baillie, Bollerslev, & Mikkelsen, 1996;Granger & Joyeux, 1980;Hosking, 1981). Gneiting and Schlather (2004) describe the fractal dimension, D, of a surface as a roughness measure with D 2 ½n; n þ 1Þ for a surface in R n where higher values can be interpreted as rougher surfaces. Technically, Fractal dimension(D) and Hurst exponent(H) are independent of each other. Fractal dimension is a local property, while Hurst exponent is a global property, which is used to characterize the long-memory dependence in a time series. For self-affine processes, local properties are reflected in global ones, which lead to the relationship D þ H ¼ n þ 1 between D and H for a self-affine surface in n-dimensional space.

Methodologies
Over the years, a number of methodologies have been proposed by researchers for measuring long memory. While in many cases, long memory in conditional mean and variance are studied independently, unified approaches to study long memory are also present (Teyssière, 1997).
Several popular heuristic methods to measure long memory in the first and second moments include the Rescaled range (R/S) method, Rescaled variance (V/S) method and Detrended Fluctuation Analysis (DFA). Prominent semi-parametric estimators of long memory include the Log Periodogram estimator (Geweke & Porter-Hudak, 1983), and Local Whittle estimator. These estimators were defined based on linear time series models and should be used to measure long memory in the first moment. It was found that the GPH estimator can be downward biased when used to measure long memory in volatility (Deo & Hurvich, 2001). The Whittle estimator is found to be more robust in measuring the long memory in volatility (Hurvich & Ray, 2003).
In terms of parametric modeling-based approaches, Mandelbrot and Van Ness (1968) introduced the fractional brownian motion. This was a generalisation of the standard brownian motion by incorporating a self-similar parameter d 2 ðÀ0:5; 0:5Þ and provides the most basic framework for studying long memory. Another important approach of modeling long memory is the ARFIMA-FIGARCH class of models. While ARFIMA model is used to model long memory in the first moment (return time series), FIGARCH is used for modeling the long memory in volatility. The ARFIMA (p; d; q) model, which was introduced by Granger (1980); Granger and Joyeux (1980) and Hosking (1981), is defined as ϕðLÞð1 À LÞ d y t ¼ θðLÞ t where ϕðLÞ and θðLÞ are lagged polynomials of orders p and q respectively, with t being white noise. For the ARFIMA model, the fractional parameter d lies between −0.5 and 0.5. An ARFIMA process depicts long memory when 0 < d < 0:5, anti-persistence for À 0:5 < d < 0, short memory for d ¼ 0, and infinite memory (random walk) for d ¼ 1.
The incorporation of long memory in GARCH models was introduced by Robinson (1991) and built upon by Baillie et al. (1996); Ding and Granger (1996) and others. Among these approaches, the more popular FIGARCH(p; d; q) model (introduced by Baillie et al. (1996)) is defined as In ARFIMA model, the long memory operator is applied to unconditional mean (μ) of y t , whereas in the case of FIGARCH model, it is applied to squared errors. However, the FIGARCH model has its own nuances that need to be remembered during application. The memory parameter of FIGARCH is actually À d and increases as d ! 0. This happens due to the fact that the memory parameter acts on the squared errors in FIGARCH. Consequently, the Hyperbolic GARCH (HYGARCH) model was proposed by Davidson (2004).
The following Table 2 offers a snapshot of the notable methodologies pertaining to long memory along with their original sources. These popular long memory methodologies possess theoretical antecedents in diverse areas.
However, this evidence of long memory varies across variables and markets. While asset returns have been shown to have weak to no evidence of long memory (especially in developed markets) (Floros, Jaffry, & Valle Lima, 2007;Henry, 2002), asset volatility has been found to show strong evidence of long memory Fleming & Kirby, 2011;Mighri, Mansouri, 2014). In contrast, recent studies with high-frequency data have shown the presence of antipersistence in volatility (Gatheral, Jaisson, & Rosenbaum, 2018). Literature on trading volume seems to be consistently in support of long memory (Lillo & Farmer, 2004;Lux & Kaizoji, 2007). Studies on developing markets also show conflicting results. While prior studies showed stronger presence of long memory in developing markets (Hull & McGroarty, 2014;McMillan & Thupayagale, 2009), recent studies show that some developing markets have become more efficient than a few developed ones since the 2008 financial crisis (Mensi, Tiwari, & Al-Yahyaee, 2019;Sensoy & Tabak, 2016).

Plausible causes of true long memory
While the evidence on long memory is based on statistical and heuristic tests, the discussion remains incomplete without pointing out plausible causes of true long memory.

News flow and its interpretation
Long Memory can be a manifestation of the interaction between many diverse information processes and hence is inherent to the returns process (Andersen & Bollerslev, 1997). This goes against the argument of structural breaks leading to the hyperbolic decay of autocorrelations.
The arrival of news is seen as a driver for markets. Lillo and Farmer (2004) explain that the news could be classified as external or internal. Externals news are events outside of control of market participants (e.g., natural calamities). Such events are known to have power-law distributions. While internal news comes under the purview of market players, the ability to understand and act on them can be complicated due to the social dynamics such as herding behaviour. Moreover, limited attention and comprehension ability of humans coupled with their changing preferences towards fundamental and technical analysis can generate long memory in financial time series (Kirman & Teyssiere, 2002).
These aspects can be better understood in the framework of Adaptive Market Hypothesis (AMH) (Lo, 2004(Lo, , 2005. Human decisions are usually made under incomplete information and are delayed due to other factors. Such time lags in responding to news arrival can lead to autocorrelations in order inflow.  Granger (1980); Granger and Joyeux (1980) 12 Fractionally Integrated GARCH (FIGARCH) Baillie et al. (1996) 13 Hyperbolic GARCH (HYGARCH) Davidson (2004) 14 Fractional Exponential (FEXP) Model Beran (1993)

Market microstructure issues and other factors
Various market microstructure-based factors can lead to long memory. For example, iceberg orders wherein large orders are split into many smaller ones before being sent to the exchange, might lead to power-law based autocorrelations in order flow mechanism (Lillo, Mike, & Farmer, 2005). Similarly, simulation studies suggest that the observed long memory in order flow, volume and volatility can be attributed to the inherent imitative and adaptive behaviour of various market participants (LeBaron & Yamamoto, 2007. Other notable explanatory factors pertaining to long memory include bid-ask spreads, nonsynchronous trading (Campbell, Lo, & MacKinlay, 1997), influence of institutional investors (Gabaix, Gopikrishnan, Plerou, & Stanley, 2006), extent of market openness (Lim & Brooks, 2010) and speed of price adjustment (Zheng, Liu, & Li, 2018). Lastly, long memory has also been attributed to the economic and institutional differences between emerging and developed markets (Liow, 2009).

Long memory: beware of false positives
Empirical research seems divided on the debate on differentiating true and spurious long memory in economic variables. While the prevalence of long memory runs contrary to market efficiency, this section discusses several known pitfalls that can cause false positives in long memory analysis. Potter (1979) argued that long memory may be an artifact of non-homogeneity in the data. He referred to several studies on precipitation and concluded that studies with homogeneous data did not support the presence of long memory. This was also proved in other studies. For example, introducing a trend to a stationary time series can create long memory bhattacharya1983hurst. (Bhattacharya, Gupta, & Waymire, 1983). Simulations based on incorporating breaks in a data generating process (DGP) provide evidence of spurious long memory (Diebold & Inoue, 2001). Various empirical studies using financial market data have also shown the confounding effect structural breaks can have on long memory in returns (Granger & Hyung, 2004) and volatility (Liu, 2000).

Structural breaks
However, distinguishing between long memory and the structural break is mathematically difficult. This is similar to the confusion between unit root and structural breaks. For true long memory processes, several tests for structural change may show structural break when there should be none (Kuan & Hsu, 1998). On the other hand, long memory estimators will be biased towards long memory for stationary processes with level shifts (Perron & Qu, 2010). Literature provides several avenues to check for the confounding effect of structural breaks on long memory and also to incorporate both phenomena in models to measure their individual effects.
Model-specific studies are also available. For example, ARFIMA-based models that are robust to structural breaks have been proposed (Baillie & Morana, 2012;Shi & Ho, 2015). Similarly, attempts to capture and differentiate structural breaks from long memory in volatility have led to pertinent improvisations of FIGARCH (Baillie & Morana, 2009), Markov Switching GARCH (Charfeddine, 2014) and HAR-RV (Hwang & Shin, 2018) class of models. Volatility specific structural break tests like the ICSS test (Inclan & Tiao, 1994) and its variants can be used with FIGARCH models to differentiate between structural breaks and long memory (Walther, Klein, Thu, & Piontek, 2017). Lastly, Long Memory Estimators that are robust to structural breaks have also been proposed by Hou and Perron (2014).
For a review of the literature of tests that aid in differentiating structural breaks from long memory, the readers may refer to Sibbertsen (2004), Banerjee and Urga (2005) and Wenger, Leschinski, and Sibbertsen (2018a).

Temporal aggregation
Temporal Aggregation refers to the transformation of a time series data at a frequency lower than the original DGP. In some cases, data is also analyzed after aggregation over a longer duration to remove seasonal fluctuations. A typical example is data on industrial production where data is only available quarterly. While this aids in smoothening the data points as well as filtering out the high frequency noise, it can also manifest as spurious long memory. For example, LeBaron (2001) showed that a temporally aggregated series created by adding up just three short memory linear time series of different time scales can show spurious long memory.
Having said so, evidence in support of true long memory, notwithstanding temporal aggregation is also available. For instance, Andersen and Bollerslev (1997) showed that true volatility persistence can be attributed to temporal aggregation of heterogeneous inflow of news over time. Their work lends credence to long memory dependence being an inherent feature of the DGP and not a spurious manifestation of temporal aggregation (Mcmillan & Speight, 2008). A notable result in this school of thought was brought forward by Souza (2008) where it was shown that for true long memory series, temporal aggregation does not change the estimated memory parameter.
This association between temporal aggregation and long memory has also formed the basis of a specific class of volatility models called Heterogeneous Autoregressive models (Corsi, 2009;Müller et al., 1997b). These models draw motivation from the Heterogeneous Market Hypothesis (Müller et al., 1993) and also the "Mixture of Distribution Hypothesis" of Andersen and Bollerslev (1997).

Cross-sectional aggregation
Just like temporal aggregation, cross-sectional aggregation can also lead to spurious observations of long memory in time series variables. A large number of AR(1) processes can be added to produce a time series that would show a long memory under certain assumptions (Granger, 1980). Studies on inflation data have attributed the observed long memory to cross-sectional aggregation since inflation is measured via aggregating various sectoral sub-indices that possess only short memory (Altissimo et al., 2009;Balcilar, 2004). On the other hand, prior works such as Kang, Cheong, and Yoon (2010) uncover evidence of long memory in the stock index as well as the underlying constituent stocks.
Cross-sectional aggregation also requires the count of individual series (N) to be very large. Granger (1980) postulated this case for N ! 1. However, this count also varies across studies depending on other assumptions used for the Monte Carlo simulations. While Zaffaroni (2004) used a dataset with N > 1500 to reproduce the theoretical results, Leccadito, Rachedi, and Urga (2015) simulated long memory with N ¼ 500 components only. Another study by Haldrup and Valdés (2017) showed that N seems to depend on the extent of long memory in the individual series. If the individual series have a high long memory, an aggregate of just 250 such series can mimic that inherent long memory. However, for individual series with low levels of persistence, even an aggregate of N ¼ 10; 000 such series did not have a similar level of long memory. In addition, it was proved that when such a composite series is fractionally differenced, the autocorrelation function of its residuals still exhibits hyperbolic decay. This inability of ARFIMA models to suitably capture the long memory of the true DGP caused by cross-sectional aggregation calls for better models.
Long Memory can also be observed when a number of linear and homogeneous subsystems with short memory are connected to create a network structure (Schennach, 2018). This provides another possible reason for long memory without non-linearity, heterogeneity, unit roots or structural breaks. Several economic examples that can be modeled using these approach are firms in an industry and supply chain time series.

Biases in the estimation process and related issues
Differentiating true and spurious long memory calls for researchers to exercise judgement while choosing the estimation method. Not all estimators are equally suitable in all cases. Various studies have commented on the properties of popular R/S statistic and its many variants in terms of size and power (Lo, 1991;Teverovsky, Taqqu, & Willinger, 1999). Similarly, notable prior works offer a critical review of small sample properties of other estimators, such as, but not limited to, GPH (Agiakloglou, Newbold, & Wohar, 1993), Local Whittle (Hurvich & Ray, 2003), Higuchi, Peng and Wavelet estimators (Rea, Oxley, Reale, & Brown, 2013).
Applying an AR-GARCH filter on returns would significantly reduce the spurious long memory effect, for such a filter would, to a larger extent obviate the confounding effect of short memory (Lo, 1991). Further, the use of low-frequency data can cause a downward bias on the long memory estimates (Bollerslev & Wright, 2000;Souza & Smith, 2002). Also, the choice of proxies to measure volatility has an effect on long memory estimates (Wright, 2002).
Various characteristics of data also need to be reviewed before choosing methodologies. For instance, emerging markets, having higher levels of volatility, maybe more suitable for wavelet-based estimation (Ozun & Cifter, 2008). In general, Local Whittle estimators are observed to be the most stable among other long memory estimators (Hassler, 2011;Taqqu, Teverovsky, & Willinger, 1995). In addition, cyclical and seasonal patterns in data can lead to the observation of long memory in the squared return series (Lobato, 1997). Many long memory tests assume unconditional homoscedasticity. Tests that allow for heteroscedasticity (Harris & Kew, 2017) should be used for financial time series. Similarly, the absence of higher-order moments also manifest as long memory (Lobato & Savin, 1998).
While actual datasets may not show these exact pathologies, issues such as the presence of heavy tails are very much real. Consequently, it is advised to ascertain the model assumptions before employing them. If required, more robust models whose assumptions are closer to the empirical properties of the time series should be used.
Since volatility estimation and forecasting are essential in the context of risk management of financial portfolios, Value-at-Risk (VaR) calculations also stand to benefit from long memory models (Batten, Kinateder, & Wagner, 2014;Meng & Taylor, 2018). Allowing for long memory in the cross-section of asset returns can help create specific trading strategies that can generate significant gains (Nguyen, Prokopczuk, & Sibbertsen, 2019).
Notwithstanding the above stated developments on the modeling front, prior studies also nudge researchers to exercise caution while modeling long memory. Notable studies have shown that similar forecasting results can also be approximately matched by using standard ARIMA models of very high orders (Ray, 1993). In addition, the forecasting error pertaining to over-differencing, is significantly lesser than the forecasting error pertaining to under-differencing 1 (Smith & Yadav, 1994). There are more nuances to be considered here. If the AR/MA coefficients are negative (especially for low values of d), standard ARMA models will provide similar short-term predictability. ARFIMA models would be better only for time series with strong persistence (d / 0:5) or longer-term prediction (Andersson, 2000;Man, 2003). Moreover, Granger and Hyung (2004) found that modeling a time series with only structural breaks and separately with only long memory can provide similar predictive performance with long memory model having a slight edge. Similar findings were reported for many extensions of ARFIMA models. Hence, the model specifications would depend on the researcher's choice between parsimonious fractional models and the over-parametrized standard ARIMA models.
Another primary motive for employing long memory models is to study the impact of shocks to volatility on asset prices. If the impact of such shocks are short lived and modest, it calls into question the significance of employing long memory models for examining the DGP as has been illustrated by Christensen and Nielsen (2007).

Conclusion
If true long memory can be established, many modeling exercises should change. For instance, CAPM can be modified to include fractional returns (Raei & Mohammadi, 2008) and a persistent error term (Amano, Kato, & Taniguchi, 2012). Similarly, modeling exercises involving endogenous variables should be geared towards adequately capturing long memory. This would call for refinement of popular multivariate frameworks such as Granger Causality tests (Chen, 2006(Chen, , 2008, VAR-MGARCH (Dark, 2018;Zhao, Liu, Duan, & Li, 2019) and Cointegration methods (Granger, 1986;Johansen, 2008). In addition, implied volatility models based on options pricing would be incomplete without incorporating long memory (Cardinali, 2012).
While it cannot be denied that long memory models should be in the econometrician's toolbox, their use should be governed by an initial exploratory analysis of the data and the context of the research questions. The researcher should keep in mind that to a man with a hammer, everything looks like a nail. Multiple confounding factors, such as, but not limited to, structural breaks and aggregation, can manifest as spurious long memory. This review hopes to nudge researchers to exercise judgement while choosing appropriate long memory models, for inferences derived from misspecified models can lead to misleading policy recommendations. Note 1. Over-differencing refers to first differencing a time series whose DGP is closer to IðdÞ. Under-differencing refers to fractionally differencing a time series who DGP is closer to random walk (Ið1Þ).