Maximum Entropy Bootstrap for Time Series: The meboot R Package

This introduction to the R package meboot is a (slightly) modiﬁed version of Vinod and López-de-Lacalle (2009), published in the Journal of Statistical Software . The maximum entropy bootstrap is an algorithm that creates an ensemble for time series inference. Stationarity is not required and the ensemble satisﬁes the ergodic theorem and the central limit theorem. The meboot R package implements such algorithm. This document introduces the procedure and illustrates its scope by means of several guided applications.


Introduction
This paper illustrates the use of the meboot R package for R (R Development Core Team 2008). The package meboot implements the maximum entropy bootstrap algorithm for time series described in Vinod (2004Vinod ( , 2006. The package can be obtained from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=meboot. In the traditional theory, an ensemble Ω represents the population from which the observed time series is drawn. The maximum entropy (ME) bootstrap constructs a large number of replicates (J = 999, say) as elements of Ω for inference using a seven-step algorithm designed to satisfy the ergodic theorem (the grand mean of all ensembles is close to the sample mean). The algorithm's practical appeal is that it avoids all structural change and unit root type testing involving complicated asymptotics and all shape-destroying transformations like detrending or differencing to achieve stationarity. The constructed ensemble elements retain the basic shape and time dependence structure of the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the original time series.
This discussion collects relevant portions of Vinod (2004Vinod ( , 2006 as templates for users of the meboot package. Let us begin with some motivation. Wiener, Kolmogorov and Khintchine (WKK, Wiener 1930;Kolmogorov 1931;Khintchine 1934), among others, developed the stationary model in the 1930's where the data x t arise from the Ω mentioned above. Stationary time series are integrated of order zero, I(0). Many real world applications involve a mixture of I(0) and nonstationary I(d) series, where the order of integration d can be different for different series and even fractional, and where the stationarity assumptions are difficult to verify. The situation is much worse in the presence of regime switching structural changes and other jump discontinuities occurring at arbitrary times. The WKK theory mostly needs the zero memory I(0) white noise type processes, where some WKK results are true only for circular processes, implying that we can go back in history, (e.g., undo the US Securities and Exchange Commission, the Federal Communications Commission, or go back to horse and buggy, pre 9/11 days, etc.). Irreversibility is an important property of most economic time series, making the assumption of zero memory I(0) process quite unrealistic. Actually, social science systems are often dynamic, complex and adaptive leading to irreversible, non-stationary and sometimes rather short time series. Hence Economists often need: (i) 'non-standard' Dickey-Fuller type sampling distributions for testing regression coefficients (with severe inference problems for panel data), and (ii) detrending and differencing to convert such series to stationarity. The motivation then is to achieve greater flexibility and realism by avoiding both (i) and (ii). Vinod (2004Vinod ( , 2006) offers a computer intensive construction of a plausible ensemble created from a density satisfying the maximum entropy principle. The ME bootstrap algorithm uses quantiles x j,t for j = 1, . . . , J (J = 999, say), of the ME density as members of Ω from the inverse of its 'empirical' cumulative distribution function (CDF). The algorithm guarantees the satisfaction of the ergodic theorem (grand mean of all x j,t representing the ensemble average equals the time average of x t ) and the central limit theorem.
Some authors try to bring realism by testing and allowing for finite 'structural changes', often with ad hoc tools. However, the notion of infinite memory of the random walk I(1) is unrealistic because the very definitions of economic series (e.g., quality and content of the gross domestic product, names of stocks in the Dow Jones average) change over finite (relatively short) time intervals. Changing definitions are generally not a problem in natural sciences. For example, the definition of water or the height of an ocean wave is unchanged over time.

Maximum entropy bootstrap
The bootstrap studies the relation between the sample and the (unknown) population by a comparable relation between the sample at hand and appropriately designed (observable) resamples. If the observed sample is independent and identially distributed (iid), x 1 , ...x T are iid random variables with a common original density: F . The joint density of the sample is given by a T -fold product: F T . Ifθ T estimates a parameter θ, the unknown sampling distribution of (θ T −θ) is given by the conditional distribution of its bootstrap version (θ * −θ T ), Lahiri (2003). This section describes the ME bootstrap algorithm and indicates how it extends the traditional iid bootstrap to nonstationary dependent data.
3. Compute the trimmed mean m trm of deviations x t − x t−1 among all consecutive observations. Compute the lower limit for left tail as z 0 = x (1) − m trm and upper limit for right tail as z T = x (T ) + m trm . These limits become the limiting intermediate points.
4. Compute the mean of the maximum entropy density within each interval such that the 'mean-preserving constraint' (designed to eventually satisfy the ergodic theorem) is satisfied. Interval means are denoted as m t . The means for the first and the last interval have simpler formulas.
5. Generate random numbers from the [0, 1] uniform interval, compute sample quantiles of the ME density at those points and sort them.
6. Reorder the sorted sample quantiles by using the ordering index of step 1. This recovers the time dependence relationships of the originally observed data.

A toy example
The procedure described above is illustrated with a small example. Let the sequence x t = (4, 12, 36, 20, 8) be the series of data observed from the period t = 1 to t = 5 as indicated in the first two columns in Table 1. We jointly sort these two columns on the second column and place the result in the next two columns ( The ME density is shown in Figure 1 along with the five (half-open) intervals. Note that these intervals join all intermediate points z t (those in column 5 plus two limiting ones) without gaps. The uniform densities are also designed to satisfy the 'mean-preserving constraint', by making sure that the interval means for the uniform density, m t , satisfy the following relations: , for the lowest interval, where x (t) are the order statistics. The desired means using these formulas for the toy example are reported in column 6.
Finally, random numbers from the [0, 1] uniform intervals are independently drawn to compute quantiles of the ME density. (See left side plot in Figure 2.) The ME density quantiles obtained in this way provide a monotonic series. The final replicate is obtained after recovering the original order sorting column 8 according to the index order given in column 3. (See right side plot in Figure 2.) Singh (1981) used Edgeworth expansions to confirm the superiority of iid boot. He also proved that iid-boot fails for dependent data. See Davison and Hinkley (1997, Chapter 8) and Lahiri (2003) for more recent results. A modification of the iid boot for stationary m-dependent data called the 'block bootstrap' is extensively discussed by Lahiri (2003). However, if the evolutionary data are non-stationary, one cannot always use 'differencing' operations to render them stationary. The ME bootstrap algorithm is more general, since it does not assume stationarity and does not need possibly 'questionable' differencing operations.

Contrast with traditional iid bootstrap
In addition to avoiding stationarity, Vinod (2004Vinod ( , 2006 mentions that it is desirable to avoid the following three properties of traditional iid bootstrap. • The traditional bootstrap sample obtained from shuffling with replacement repeats some x t values while not using as many others. It never admits nearby data values in a resample. We are considering applications where there is no reason to believe that values near the observed x t are impossible. For example, let x t = 49.2. Since 49.19 or 49.24, both of which round to x t = 49.2, there is no justification for excluding all such values. • The traditional bootstrap resamples must lie in the closed interval [min(x t ), max(x t )].
Since the observed range is random, we cannot rule out somewhat smaller or larger x t . Note that the third step of our algorithm implies a less restrictive and wider range [z 0 , z T ].
• The traditional bootstrap resample shuffles x t such that any dependence information in the time series sequence (x 1 , . . . , x t , x t+1 , . . . , x T ) is lost in the shuffle. If we try to restore the original order to the shuffled resample of the traditional bootstrap, we end up with essentially the original set x t , except that some dropped x t values are replaced by the repeats of adjacent values. Hence, it is impossible to generate a large number J of sensibly distinct resamples with the traditional bootstrap shuffle without admitting nearby values.

Shape retention
The j-th ME boot resample {x j,t } retains the shape, or local peaks and troughs, of the original time series x t , by being 'strongly dependent' on it. We now imagine that the time series x t represents a set (or bundle) of levels of 'utility' enjoyed by someone. Economic theorists do not like to make interpersonal comparisons of utility, since two persons can never really 'feel' exactly the same level of satisfaction. Yet economists must compare utilities to make policy recommendations by considering preference orderings based on 'ordinal utility theory,' which says that utilities experienced by two individuals can be made comparable to each other, provided the two utility bundles satisfy a common partial ordering. Indeed our ME boot resamples do satisfy a common partial ordering, since their ranks match perfectly.
Imagine that the original {x t } represents the evolving time path for an individual's income, sensitive to initial resources at birth and intellectual endowments with a corresponding path of utility (enjoyment) levels. Our ME boot algorithm creates reincarnations of these paths ensuring that ordinal utilities are comparable across reincarnations, retaining just enough of the basic shape of x t . See Henderson and Quandt (1980) for a discussion of multi-period consumption and ordinal utility.
Next we provide an example of how ME boot retains the shape as well as the periodicity of the original series by using the AirPassengers time series available in R.

Consumption function
This example describes how to carry out inference through the ME boot ensemble in the following regression: for the null hypothesis β 3 = 0.
We use the annual data set employed in Murray (2006, pp. 799-801) to discuss a Keynesian consumption function on the basis of the Friedman's permanent income hypothesis (PIH) and a simpler version of Robert Hall's model. The data are the logarithm of the US consumption, c t , and disposable income, y t , in the period 1948-1998. The packages car (Fox 2002) and lmtest (Zeileis and Hothorn 2002) will be useful to extract information from linear regression models. We use the interface in package dynlm (Zeileis 2008)  The residuals are serially uncorrelated since the p values of the generalized Durbin-Watson (DW) statistics up to order 4 are larger than the significance level 0.05. The seed was needed in the above code for a reproducible computation of p values for the DW statistics. The estimated coefficient of lagged income,β 3 = 0.027, with the standard error se = 0.1439, is statistically insignificant. The 95% confidence interval (−0.263, 0.316) has the zero inside this interval.
This result was initially interpreted as supporting Friedman's PIH. However, the large unit root literature argued that the sampling distribution ofβ 3 is nonstandard, and that traditional inference based on the Student's t or asymptotic normal distributions may lead to spurious results. Hence, these days, one uses unit root tests to decide whether differencing or detrending of c t and y t would make all variables in a regression integrated of the same order, say I(0). The critical values from a Dickey-Fuller type nonstandard density (originally obtained by a simulation) replace the usual Student's t critical values. Our bootstrap also reveals any nonstandard features of the sampling distribution and confidence intervals specific to the problem at hand, avoiding the use of critical values altogether. Thus we can cover a wide variety of situations beyond the one simulated by Dickey and Fuller.
Instead of resampling the residuals, our ME bootstrap resamples all time series in the regression themselves by following the 'resampling cases' bootstrap method. Three advantages of this method noted by Davison and Hinkley (1997, Section 6.2.4) are: (a) This method does not use any simulated errors based on the assumed reliability of a parametric model. (b) It does not need to assume that the conditional mean of the dependent variable given a realization of regressors (E(y|X = x) in standard notation) is linear. (c) It is robust against heteroscedastic errors. Now we briefly describe the 'resampling cases' method in the context of time series regressions, where the 'case' refers to time. From (1) it is intuitively clear that we should resample only the two 'original' time series c t and y t , and then lag them as needed instead of blindly resampling (c t , c t−1 , y t−1 ) all three variables in the model. Our bootstrap inference will rely on a confidence interval for any function θ = f (β) of coefficients β. For example, θ = β 3 for assessing the Friedman hypothesis based on (1).

R> theta <-function(y, x) { + reg <-dynlm(y~L(y, 1) + L(x, 1)) + thet <-coef(reg)[3] + return(thet) + }
The above code represents our choice of simplicity over generality. It is intended to be used upon replacing the y by c t and the x by y t , for its use as θ = β 3 in (1). For any other example it provides a template, needing modifications. If a researcher wishes to analyze the scale elasticity of a Cobb-Douglas type production function, Vinod (2008, pp. 10-11), the regression becomes y = β 1 x 1 + β 2 x 2 . Then the parameter of interest: θ = β 1 + β 2 , is a sum of two slope coefficients. The modified theta for this example denoted by theta.cobbd is given by the following code:

R> theta.cobbd <-function(y, x1, x2) { + reg <-lm(y~x1 + x2) + thet <-coef(reg)[2] + coef(reg)[3] + return(thet) + }
In general, a modification of theta can involve a nonlinear function of several coefficients. For example, Vinod (2008, Section 3.2), if the parameter of interest is the long-run multiplier, it becomes a nonlinear function. The main point is that our θ refers to only one parameter of interest. Any researcher interested in two or more parameters can readily repeat our procedure as often as needed.
The following function called bstar.consu generates a large number of bootstrap single parameter estimates. The bstar in its name suggests that resamples of the third regression coefficient might be denoted as {b * 3 }. More important, it is a template, expecting modifications. Its initial arguments refer to data on all 'original' time series (not counting leads and lags as separate series) using the notation y for the dependent variable and x for the regressor. It is flexible, allowing the user to choose the confidence level (default: 95%), the R function theta (defining the parameter of interest must be predefined), size of resamples as bigJ (default: bigJ = 999) and the seed for random number generator as seed1 (default: seed1 = 135). Since the Cobb-Douglas model involves regressing y on x 1 and x 2 , its function (called bstar.cobbd below) has an additional input x2. It calls the function meboot thrice for y, x1 and x2. Also, the input to the function theta.cobbd needs both xx1 and xx2 instead of simply xx, in the code bstar.consu above. We believe that it is easy to make such changes to our simple and intuitive bstar type functions. The template bstar.cobbd, for the two regressor Cobb-Douglas case below, explicitly shows how to extend the function bstar.consu to two or more regressors by using x2, x3, x4, . . . as needed.

R> bstar.consu <-function
R> bstar.cobbd <-function(y, x1, x2, theta = theta.cobbd, + level = 0.95, bigJ = 999, (1), without having to assume that the distribution is Student's t or Dickey-Fuller. That is, we use the output of the function bstar.consu to construct a confidence interval for β 3 to help decide whetherβ 3 is statistically significantly different from zero. Assuming the earlier code is in the memory, let us begin by computing the simplest percentile interval, using the function quantile of R, while choosing type = 8 (as recommended by Hyndman and Fan 1996, see also help("quantile")).
Statistical theory behind these refinements is mentioned at the beginning of Section 2. In the present context, bootstrap estimates of θ (see b3s above) areθ * j , with j = 1, 2, . . . , J. If the standard error se ofθ is known, then (θ * j −θ)/se values provide a good approximation to the sampling distribution of (θ − θ)/se, the Wald statistic. The code in boot.ci is designed to correct for bias and improve asymptotic properties of bootstrap confidence intervals.
Finally, let us consider sophisticated confidence intervals based on highest density regions (HDR) of sampling distributions, Hyndman (1996). If f (θ) is the density, and α is the type I error (= 0.05, say), then the 100(1−α)% HDR is the subset of the sample space of the random variable such that where f α is the largest constant such that the following probability statement holds true: P r(θ ∈ HDR(f α )) ≥ 1−α. Highest density means every point inside the HDR has probability density at least as large as every point outside the HDR. When the sampling distribution is bimodal or multimodal, HDR seems to be a reliable way of finding confidence regions. Hyndman (1996) discusses many advantages of HDR methods. We use the R package hdrcde (Hyndman 2008) to find HDR regions with graphics for a study of the sampling distribution ofθ under the null. It also reports the value of f α appearing in equation (2). Note that even the 50% confidence region (HDR) starts at nearly zero, while 95% region decidedly covers the zero. However the largest constants f α are all positive. Our meboot results (including the HDR) support Friedman's PIH, since zero is inside all 95% confidence intervals for β 3 .

Assessment of the Fed effect on stock prices using panel data
This example shows how the ME bootstrap can be employed for panel data analysis. Our example is from Vinod (2002) where the effect of monetary policy (interest rates) on prices and their 'turning points' in the stock market is evaluated in greater detail.
The 'Fed effect' discussed in the financial press refers to a rally in the S&P 500 index of stock prices a few days before the Federal Reserve Bank (Fed) policy makers' meeting and a price decline after the meeting. This example focuses on the longer term than daily price fluctuations by using the monthly data (May 1993 to November 1998 with T = 67) for stocks with ticker symbols: ABT, AEG, ATI, ALD, ALL, AOL, and AXP and regard this as a representative sample of the market containing N = 7 individual companies.
Note that when the Fed adjusts the Fed funds rate, it affects market expectations and, hence, the interest on 3-month Treasury bills (Tb3 ), the key short-run interest rate in the economy. Our simple model of monetary policy regresses the stock price (P ) on the natural log of market capitalization (LMV , as a control variable for the size of the firm) and the Tb3 . We write: where the subscript it refers to i-th individual (company) at time t and where ε it are assumed to be iid. The Fed effect is present, if the coefficient of the variable Tb3 in equation (3) is statistically significant.
We use the R package plm (Croissant and Millo 2008) for basic estimation of panel data models before any bootstrap. It expects that the data be in the form of a data.frame object. Accordingly, the package meboot provides the data for this example as a data.frame object called ullwan. Let us replace the third column containing the 'market value of the firm' by its logarithms denoted by LMV within the data frame object. Since the data setup is critical, it is perhaps useful to illustrate our (slightly revised) data frame ullwan by displaying the initial and ending observations. Note that the first column entitled Subj contains the identification numbers 1 to 7, for the 7 ticker symbols included in the data set. Note that all time series for the first ticker symbol ABT are placed together at the beginning of the data set. These are viewed by using the command: head(ullwan). Trailing data for observation numbers 464 to 469 for the last ticker symbol AXP are viewed by using the command: tail(ullwan) in the following code.

Pooled effects
Pooled regression means combining the cross-sectional data and time series data into one large set of T × N (= 67 × 7 = 469 here) observations. We note below that Student's t value and the corresponding p value from the pooled model suggest a highly significant Tb3 regressor implying that the Fed announcement does have a statistically significant effect on the level of stock prices. The multiple R 2 is 0.497, which becomes 0.495 when adjusted for degrees of freedom.

R> summary(lm(Price~LMV + Tb3))
Call: lm(formula = Price~LMV + Tb3) The t value for the coefficient of Tb3 from the pooled model suggests that the Fed does have a statistically significant effect on the level of stock prices in a pooled regression.
The default use of the function meboot is illustrated in the earlier subsection in bstar.consu and bstar.cobbd, where the argument x represents a single vector of numbers (usually time series). In this subsection we require meboot to create J replicates over time, separately for all N individuals to suit the panel data. Since the plm package expects the panel data in the form of a data.frame object, our function meboot is designed to expect similarly organized data. For example, since the identifier for individual (company ticker) called Subj is located in the first column of the ullwan data frame, the meboot would expect colsubj=1 as the argument. It is necessary to call the meboot function separately for each relevant column of the data frame identified by its column number denoted as the argument coldata. For example, we set coldata = 3 for LMV since third column has data on LMV. Mere presence of non-null values for colsubj and coldata in the call to meboot triggers it to implement its panel data version.
The following code creates J = 999 ensembles for the T = 67 time series points for the N = 7 stocks separately for the three variables in our regression (P , LMV and Tb3 ). Each call yields an ensemble of 1000 sets of 67 × 7 = 469 data points, upon including the original data as the first column and 999 additional columns of similarly evolving time series.

Random effects
The random effects model results are obtained by setting model = "random" in the arguments of the function plm.
R> gir <-plm(Price~LMV + Tb3, data = ullwan, model = "random") R> coef(gir) data. Besides econometrics, at least some time series in biology, engineering and social sciences are similarly state-dependent and subject to structural changes and discontinuities. All such series cannot be transformed into stationary series without impairing our understanding of underlying relations among them. The meboot R package not only fills a gap in the bootstrap toolkit, but is particularly useful as it permits simpler model specifications (allowing a direct use of one or more such time series without first making them stationary) and convenient statistical inference (avoiding non-standard Dickey-Fuller type sampling distributions).