Autoregressive Modeling of Forest Dynamics

In this work, we employ autoregressive models developed in financial engineering for modeling of forest dynamics. Autoregressive models have some theoretical advantage over currently employed forest modeling approaches such as Markov chains and individual-based models, as autoregressive models are both analytically tractable and operate with continuous state space. We perform time series statistical analysis of forest biomass and basal area recorded in Quebec provincial forest inventories in 1970-2007. The geometric random walk model adequately describes the yearly average dynamics. For individual patches, we fit an AR(1) process capable to model negative feedback (mean-reversion). Overall, the best fit also turns out to be geometric random walk, however, the normality tests for residuals fail. In contrast, yearly means are adequately described by normal fluctuations, with annual growth, on average, 2.3%, but with standard deviation of order 40%. We use Bayesian analysis to account for uneven number of observations per year. This work demonstrates that autoregressive models represent a valuable tool for modeling of forest dynamics. In particular, they quantify stochastic effects of environmental disturbances and develop predictive empirical models on short and intermediate temporal scales.

1. Introduction 1.1. Background. Understanding the dynamics and self-organization of ecosystems is one of the most challenging problems in modern ecology [19]. Self-organization occurs simultaneously on several levels of hierarchical ecosystem organization and involves dynamic processes operating on different temporal and spatial scales [21]. Forest dynamics refers to temporal and spatial changes that occur simultaneously at different levels of ecosystem organization. Various modeling approaches employed to understand and predict these changes include a number of discrete and continuous stochastic and deterministic models, such as Markov chains, individual-based models, ordinary and partial differential equations [41]. A large number of forest models have been developed over the last decades [2,31,33,34,37,40]. Still, there are fundamental questions that existing quantitative approaches fail to fully address. One of the major challenges is the understanding of forest succession and biomass dynamics under the non-stationary disturbance regimes related to climatic factors and anthropogenic activities. An incomplete list of disturbances that substantially affect tree survival and lead to forest biomass decrease include wind, frost, hurricanes, harvesting and forest fires. Markov chains are able to capture effects of all these disturbances on forest biomass dynamics [28]. However, their application is based on the discretization of the state space [39]. Spatially-explicit individual-based models are able to simulate effects of disturbances in continuous time and state space [32,40,41]. However, these models are typically analytically intractable, i.e. the model predictions are produced by computer simulations only, and it limits our ability to understand what model prediction in general [26,40].
Another challenge to our understanding of forest dynamics is the multidimensional nature of this complex adaptive system [21,27]. Forest disturbances are traditionally associated with a loss of biomass. However, Markov chain models based only on biomass do not capture forest succession comprehensively [27,39]. This limitation motivates the need for alternative formulations that are able to consider several forest dimensions instead of only one. In our previous study, we combined multivariate statistical analysis with Markov chain approach to develop a multidimensional Markov Chain [27]. However, simultaneous discretization of several independent forest characteristics of a different nature substantially reduced our ability to understand the ecological meaning of model predictions, which was one of the major advantages of the Markov Chain approach [5,28,34,39].
1.2. Forests and stock markets. In this work, we employ time series (autoregressive models and random walk) to quantify disturbance regimes and to build a predictive stochastic model of multiple disturbance classes. This type of models can overcome both major shortcomings of previous models outlined above. Autoregressive models operate with continuous space, they are analytically tractable, and they can operate with multidimensional characteristics of complex adaptive systems. Similar approaches have been successfully applied, for example, in modeling stock market fluctuations. We develop a stochastic theory of forest dynamics using an analogy to stock market theory in financial mathematics. A stock market is another complex system with random fluctuations due to multiple difficult-to-predict factors. Each individual stock has fluctuations with heavy tails. But the total stock market, measured by an index (such as Standard & Poor 500) has long-run fluctuations (3-4 years) which follow Gaussian distribution. These fluctuations depend on various factors, such a price-earnings ratio (this measures whether the stock market is underpriced or overpriced; one can informally think of this as temperature of the stock market) and Treasury rates (long-term and short-term). These factors, in turn, can be modeled as various autoregressive models. Our main idea is to imagine that individual patches behave like stocks, and an average over a particular region is a stock market index.
1.3. Understanding and modeling of forest patch dynamics. In this work we propose to employ autoregressive models to understand and predict forest dynamics at the patch level. The patch-mosaic concept [46] was actively developed in the second half of the twentieth century after suggestion in [45] that forested ecological systems can be considered a collection of patches at different successional stages. Dynamic equilibrium arises at the level of the patch mosaic rather than at the level of individual patches. The classic patch-mosaic methodology assumes that patch dynamics can be represented by changes in macroscopic variables characterizing the state of the patch as a function of time [20]. Conservation law modeling in the case of continuous time and patch state results in the reaction-advection-diffusion model [20].
Patch dynamics concept can be applied for understanding and predicting of forest dynamics at different levels of forest organisation within the hierarchical patch dynamics paradigm [41,46]. At the level of individual trees patch dynamics concept is traditionally called the forest gap dynamics [30,36,37]. Individual-based forest models capture gap dynamics by simulating growth, competition and mortality of individual trees [3,8,32,33]. Individual-based models and analytically-tractable models approximating gap dynamics [18,40] provide scaling from individual-level dynamics to the next level of forest hierarchical organisation, the stand-level.
In the present work we apply autoregressive models to the stand-level forest patch dynamics. At this level of forest organisation we operate with forest patches (forest stands) consisting of a large number of individual trees [39]. Forest stand is affected not by individual-level tree dynamics, as well as by large-scale disturbances such as forest fires, drought and hurricanes [15,30], which affect many trees in the stand at the same time. The stand-level dynamics scales up to the next hierarchical level of particular forest type or regional patch mosaic (level 3 in the hierarchical patch mosaic Matreshka model [41]). The interplay of individual-level and stand-level changes and disturbances creates complex dynamical patterns at the stand-level. One particular source of complexity is related to a large number of intermediate level disturbances affecting only a fraction of trees in the patch [39]. As the consequence of this system complexity a classical linear patch dynamics model does not capture patch dynamics of the US and Canadian forests [23,28,39]. This classical patchdynamics model can be represented in continuous case by advection-reaction patch-dynamics conservation law model [20], and in discrete case by birth and death process that can be written as a Markov Chain [41], or as a simple forest fire model [43]. As the result, in order to capture patch dynamics of the US and Canadian forests, we need to consider more complicated models. In particular, if we discretize forest dynamics with respect to both time and state variable (biomass) we can achieve an adequate representation of forest patch dynamics within the framework of Markov chains [27,28,39]. Markov chains provide an analytically tractable representation of forest stand dynamics, while they have a discretization error that is challenging to quantify. This work introduces an autoregressive modeling approach in application to the forest patch dynamics in Quebec. Theoretically, our modeling approach will deliver stochastic and analytically tractable models operating with continuous state-space and -time, without discretization errors.
1.4. Our contributions. Here, we model dynamics of Quebec forests using a traditional AR(1) process borrowed from quantitative finance without modifications. We select the Quebec forest inventory for this proof of concept work as it is a long term dataset collected over more than 3 decades using the same field survey protocol [27]. We operate with the same biomass and basal area data derived from Quebec forest inventories in our previous publication on Markov chain modeling (data-mining protocol is available in [25,27]). For both individual patches and the Quebec region, we model logarithms of biomass or basal area as autoregressive process. The best fit, in a certain sense, turns out to be a random walk, with independent increments, which allows to quantify forest disturbance regime overall at the regional level. Regional averages are well-described by normal distribution, while individual patches are not: Fluctuations have heavy tails. This is similar to financial markets, with individual stocks having non-normal fluctuations, and stock indices (in 3-4 years or more) having normal fluctuations. To account for differing number of observations each year, we use Bayesian analysis for annual averages.

2.1.
Data mining of Quebec provincial forest inventories. We base our research on Quebec forest inventory data 1970-2007 www.mffp.gouv.qc.ca. Each permanent forest inventory plot has a circular form of 400 m 2 . The database consists of 32552 plot re-measurements at 11660 different locations. The Quebec forest inventory is designed to comprehensive describe patch mosaic of Quebec forests and plots cover the Quebec territory practically uniformly. The GIS-based map of forest inventory plots is published as Figure 1 in [28]. Forest inventory plots cover hardwood and mixed forests in the northern temperate zone (9621 and 7663 measurements, respectively) and continuous boreal forests in the boreal zone (11969 measurements). These forest patches (forest inventory plots) are remeasured every few years often with irregular time intervals between measurements. The inventory plots were affected by natural and anthropogenic disturbances including fire and harvesting. The statistical analysis of the measurement dynamics and re-measurement intervals are published in [39] (see Figure 2 in Appendix to [39]). In this inventory, each patch observation includes diameter of each tree, its species, soil moisture, and other characteristics. This is the raw data which is then converted to a more tractable data series. In particular, we are interested in biomass and basal area. Calculations of biomass and basal area were previously done according to [17]. The computations of biomass and basal area (as well as other characteristics, such as shade tolerance index, and biodiversity measured by Shannon entropy) is done in articles [23,25,27,39]. The biomass is this article refers to the plot biomass, which is the sum of biomasses of all trees computed using formulas from [17] (see section 3.1 in [27] for the details). The code used for this article is available on GitHub repository asarantsev/Quebec.

2.2.
Autoregressive model for individual forest patches. We propose a new method of modeling the biomass of an individual patch: autoregressive model, when each next year's logarithm of biomass y(t + 1) is a weighted sum of the previous year's logarithm of biomass y(t) and a random Gaussian noise. See the primer on autoregressive models in Appendix B. We measure biomass on a logarithmic scale since it is always positive. That is, where r, a are constants, and ε(t) are i.i.d. (independent identically distributed) N (0, σ 2 ) random variables. If 0 < a < 1, this sequence y(0), y(1), . . . exhibits mean reversion to its long-term average m = r/(1 − a). That is, if y(t) > m, then y(t + 1) − m is likely to be smaller than y(t) − m, and vice versa. Examples of earlier use of such models for forest modeling include [11,12,22]. They include also spatial models (incorporating distance between patches). We shall not attempt it here, instead treating every patch as effectively isolated. Building a spatial model for Quebec forest is left for future research.
Since data is collected on irregular time intervals, we apply (1) multiple times to itself to get the expression of y(t 1 ) from y(t 0 ) if t 0 and t 1 are consecutive years for which this patch was observed. Then we try various a and obtain for each a the maximum likelihood estimate via linear regression. We compare these likelihoods and find the best fit for a. It turns out to be a = 1. That is, this sequence actually does not exhibit any mean reversion, but behaves like a random walk, when each next increment is independent from the past: The biomass itself is a geometric random walk: a process whose logarithm is random walk.
However, the residuals for a = 1 do not pass the normality test. This model does not actually fit well, and we cannot find confidence intervals for a using standard statistical techniques. This is due to noise in measurements of individual patches. Later in the article, we find that the average biomass over all patches exhibits more regular behavior, with normal increments.
We perform two versions of this computation: for biomass and for basal area. For each version, we do it in two ways: (a) original logarithms of biomass/basal area; (b) with logarithm of mean biomass/basal area for this year subtracted. In both cases, the maximum likelihood estimate gives us random walk a = 1.
The biomass and the basal area are highly dependent, and one can plausibly use one of these metrics instead of the other.
We inherit these techniques from quantitative finance. In particular, the geometric random walk model is a classic model for the stock market movements, going back to classic research by [10]. Mean reversion is commonly observed in financial ratios such as earnings yield or dividend yield, [1,16]. See also [4,7,13] for this research and influence of financial ratios on stock market performance. However, these techniques are less known in mathematical biology.

Annual averages.
We are also interested in each year's average over all patches. We have only 38 observations for the mean value. Let c(t) be the logarithm of this mean. We find that the random walk adequately describes this: In particular, we find that the increments ε(t) indeed have normal distribution, and not heavy tails. However, we cannot simply take yearly averages for every year t, since they would have different precision. Reason: each year t has a different number of observations. To account for this, we use Bayesian statistics. We put a prior distribution on the values of µ and ρ 2 (which corresponds to the lack of any existing information), and then compute the posterior distribution from the likelihood. Bayesian techniques are increasingly used in ecology, [9] as well as in medical statistics [14], and quantitative finance [35].

Results and Discussion
We And we define c(t) = ln m(t). The main difficulty is that for almost all patches, a gap between consecutive observations is more than one year, and it differs from patch to patch. In particular, there are 3334 pairs of patch-year observations with the same patch and the gap 8 years, 1923 such pairs with the gap 9 years, but only 66 pairs with gap 22 years. More detailed data is in Appendix D.
where a is an AR(1) parameter, and ε p (t) are i.i.d. N (0, σ 2 ). As mentioned earlier, our main difficulty is that we do not have observations for all t and p, only for t ∈ T (p). Assume t, t + u are subsequent time points in T (p). Then We do this both for y p (t) (raw data),ỹ p (t) = y p (t) − c(t) (centered data), and centered data without t = 1982 and t = 2004. As discussed above, these years have only very few observations, and we do not have much confidence in these values. We could write the log-likelihood of (3) and apply maximum likelihood estimate using gradient descent. However, since we do not have many data points, we can use an equivalent method which is computationally inefficient but easy to implement: Fix an a and run regression with respect to r. Then choose an a such that the standard error of this regression is smallest. To properly apply this linear regression, divide (3) by a constant to make the standard error in error terms in (3) the same for all u: For every a ∈ (−2, 2) which is a multiple of 0.01, fit a simple linear regression (without intercept) to find r and the standard error σ. Appendix A explains why minimizing standard error given normalized residual variance is equivalent to maximizing likelihood. We do this analysis three times: for original values, centered values, and centered values with years 1982 and 2004 removed. We repeat this for biomass and basal area. For all six cases, the parameter a = 1 minimizes the standard error (thus maximizing the likelihood). Thus, the dynamics in (2) is given by y p (t) = r + y p (t − 1) + ε(t), which is a simple random walk with Gaussian increments.
Assuming that the model is, in fact, a random walk, let us make quantile-quantile (QQ) plots for residuals in (4) (we have simple linear regression without intercept):   Figure 2 (B). Then we repeat the computation above for centered values:ỹ p (t) = y p (t) − z(t) instead of y p (t). Similarly, the standard error is minimized for a = 1. For this value, r = 0.0569, σ = 0.224. The QQ plot of residuals is still not normal. For centered values with years 1982 and 2004 removed, the standard error once again is minimized for a = 1, with σ = 0.224 and r = 0.00570. The QQ plot of residuals is still not normal. We omit the standard error graph and the QQ plot for the last two cases: centered data with all years, and centered data without 1982 and 2004, since these plots are very similar to their counterparts for original (non-centered) data.
Modeling basal area data as in (2), and computing standard error of the regression (4), we again get a = 1, r = 0.0337, and σ = 0.207. Again, the best-fitting model among AR (1) according to the maximum likelihood estimation is the random walk. If we center the basal area data, and consider all years, then again, the standard error is minimized for a = 1, with r = 0.00340 and σ = 0.197. Centering the basal area data and removing years 1980 and 2004, we get: a = 1, r = 0.00341, σ = 0.197. In all these cases, similarly to the case of biomass, the QQ plots of residuals for basal area are not normal, with both tails fat. We do not provide pictures of QQ plots, since they are very similar to that for biomass. Correlation between biomass and basal area: Take all patches p and corresponding years t in T (p) Denote by y p (t) the logarithm of biomass, and by y p (t) the logarithm of basal area for patch p and year t. If T (p) = {t 0 , t 1 , t 2 , . . .} has more than one year, order them in increasing order: t 0 < t 1 < t 2 < . . .

Compute correlation coefficient between
It is equal to 0.983. With years 1982 and 2004 removed, this number does not change (in the first four decimal digits). Thus biomass and basal area for individual patches are very correlated. For practical purposes, this means we can use either measure as a size of patch.  3.2. Yearly averages, frequentist analysis. We model the logarithm of yearly average using random walk:    Figure 5 show that, indeed, the residuals are close to normal.
In Figure 3, we plot histograms of the biomass and basal area for an individual patch in 2007. In Figure 4, we simulate biomass and basal area as in (6) until 2019, starting from a patch randomly selected among observed patches in 2007. We superimpose this histogram upon the one for 2007.
Taking increments of logarithms of yearly means for biomass and basal area (37 data points), we get correlation 0.977. For years 1982 and 2004 removed, we get correlation 0.980. Previously, we got very high correlation between increments of logarithms for individual patches, thus we conclude that biomass and basal area are the same for practical purposes, as measures of size. Now we see that the same is true for yearly averages.  A primer on Bayesian statistics for normal distribution can be found in Appendix C. Let for each fixed year t the logarithms of observations are x 1 (t), . . . , x n (t). For these values we performed analysis above: The posterior mean µ(t) and the posterior variance v(t) were generated. The simulation of µ(t) and v(t) was performed N = 1000 times. In Figure 6, we have histograms of 1000 simulations for µ(t) and σ(t) for t = 0 (year 1970), for both biomass and basal area. Hence we obtain 38 sequences of N numbers: µ 1 (t), . . . , µ N (t), t = 1970, . . . , 2007. The average growth rate based on simulated results is The mean increments are: µ i (t) = µ i (t) − µ i (t − 1), t = 1, . . . , T, i = 1, . . . , N . Assuming these are N (g, σ 2 ) i.i.d. we estimate g and σ 2 asĝ in (8) andσ 2 : We compute the point estimates of g and σ 2 for biomass and basal area:  Similarly to the mean estimates, we view these as more scientifically sound that the ones from frequentist analysis from the previous subsection.

General discussion
4.1. Towards autoregressive theory of forest dynamics. We have applied AR(1) autoregressive model to patch/stand dynamics of Quebec forests. Overall, we followed major steps of application of autoregressive model in financial engineering considering stand biomass and basal area as variables similar to a stock market index. However, forest and stock market are different complex systems despite the observed similarities. We consider this work as the first step, and the further discussion is required: it is hard to expect that the forest dynamics modeling will simply mirror stock market theory. One interesting result reported in the Section 2.2 is that a = 1 and the residuals do not pass the normality test. In simple terms, it means that each forest patch is highly volatile, and its dynamics cannot be well described by the normal distribution. Still, among all AR(1), random walk has the maximum likelihood. This suggests for possible use random-walk models with tails heavier than normal. This dichotomy of average patch vs individual patch reminds us of a stock market dynamics, where each individual stock is highly volatile, with non-Gaussian fluctuations [6], but a stock market overall (measured by Standard & Poor 500 or another stock market index) in the long run follows geometric random walk with normal increments (if the time step is large enough, say 3-4 years or more).
From the perspective of random processes a = 1 means that we are dealing with the divergent AR(1) process, traditionally called the random walk, which does not have a stationary distribution. The AR(1) converges to its stationary distribution when −1 < a < 1. This means that biomass and basal area at every step will drift away like in both positive and negative directions. We can search for mechanistic meaning of this result in both fundamental patterns of forested ecosystem dynamics and in general modeling assumptions related to the autoregressive modeling approach and the patch dynamics modeling framework.
Patch dynamics modeling framework assumes that each patch, a forest stand in our case, has the same underlying transition probabilities related to internal changes such as tree growth as well as with respect to natural disturbances [39]. These are typical background assumptions of practically all forest models, however the validity of these assumptions should be evaluated. In particular, in our application we consider the combined data set consisting of hardwood and mixed temperate forest stands as well as continuous boreal forests. In addition, some of the stands even in the same forest types are located in different climatic conditions and might be affected by various disturbance regimes. The overall robustness of modeling predictions with respect to this type of data variability is typically known for traditional models. We have previously investigated effects of these assumptions in the Markov chain modeling framework [27,28,39]. Similarly, individual-based models are often applied for forest simulations in different regions with the same growth/mortality characteristics of particular tree species. Autoregressive modeling is a new approach in forest modeling, and it is not yet known how robust our predictions with respect to variability of forest patch mosaic. We can hypothesise that outcomes reported in the Section 2.2 (a = 0 and the normality test for residuals) might be related to the variability in forest inventories.
Another common underlying assumption in forest modeling is that the random process is time homogeneous or stationary [39]. It is often assumed that climatic and disturbance regime changes occur at the slower scale and can be ignore for the short-and intermediate-term modeling [27]. Time inhomogeneous Markov chains allow to relax this assumption and take into account both climate driven changes in tree growth and frequency of environmental disturbances [28]. Similarly to inhomogeneous Markov chain theory, autoregressive models can be generalized to include directional changes in disturbance dynamics. This leads us to autoregressive integrated moving average (ARIMA) models, which can capture non-stationary (time inhomogenous) dynamics. In particular, we previously detected some non-stationary effects related to the disturbance regimes recorded in the Quebec forest inventory, such as small decrease of the total disturbance rate in boreal and hardwood forests (see Table 1 in [28]). However, the relatively small number of the measurements in the data set did not allow us to report some definite trend. Nevertheless, we anticipate that the pameterization and validation of ARIMA models can be an important step towards time series theory of forest patch dynamics.
Time series analysis can also be applied to forest dynamics at different hierarchical levels. In a certain way autoregressive models are continuous time and state counterpart of discrete Markov chain models. Markov chains were applied at the individual tree and species levels [38,44], standlevel [42], and landscape-level [29]. The autoregressive and ARIMA models can be also employed to capture similar dynamics. In addition, time series models might be used in approximation of the forest gap dynamics and individual-based models.

Future research.
That the random walk with Gaussian increments performed better than any other AR(1) suggests that there is not much dependence of annual change of the biomass on the current biomass. However, testing this statement requires further research.
Time-series analysis is a novel modeling approach for forest modeling at the large scale. We wish to compare this approach with state-of-art mathematical models developed on other principles, in particular with Markov chains [23,25,27,39] and individual-based models [26,41], developed using the same forest inventory data sets. We also need to understand the merits of discrete vs continuous space. This is critical for practical applications of forest models for natural resource management, and risk assessment of forest vulnerability to changes in environmental disturbance regimes.
We can generalize time series approach to model forest dynamics and climate at the regional scale. Such generalization in financial engineering resulted in the development of vector autoregressive integrated moving average (VARIMA) models, which are particularly suited for multidimensional forest modeling under the dynamic disturbance regimes. This can result in an enhanced tolerance-disturbance model build upon our past analyses of tolerance patterns in North American forests, including observed shade tolerance patterns [23], associations between shade tolerance and soil moisture [25], and relative trade-offs between shade, drought, and water-logging tolerance at the continental scale [24].
We can extend our research to the USA Forest Inventories: Analysis of current disturbance regimes across US ecoregions, and how disturbance regimes are connected with other forest macroscopic characteristics. In particular, forest tolerance is a key ecological driver that controls the response of forested ecosystems to climate change-related disturbances such as droughtand fires [24]. We can link climatic models and forest tolerances [23][24][25]. Spatially-explicit information from the USDA Forest Service Forest Inventory and Analysis Program (FIA), Canadian Forest Inventory, and climate data sets and models (WorldClim and PRISM), can be integrated to quantify the effect climate variables in North America have on forest tolerances and disturbances via recently proposed methodology [24].

Conclusion
Individual patches have biomass and basal area whose dynamics (on logarithmic scale) are not well described by classic autoregressive model y(t + 1) = ay(t) + r + ε(t) with normal noise terms ε(t), because the tails of these noise terms turn out to be heavier than normal. However, among these AR(1) models, the best fit (according to the maximum likelihood) is random walk, with a = 1, and increments y(t + 1) − y(t) independent of the past y(s), s ≤ t. Thus one can try a heavy-tailed random walk, in which increments have tails heavier than Gaussian. This topic is left for future research.
In contrast, yearly means (on the log scale) are well described by random walk with normal increments. Bayesian analysis accounts for the fact that different years have different number of observations. We get growth rate (measured on the log scale) 2.3% per year, with standard deviation 46% for biomass, and 37% for basal area. Thus the forest grows on average in the long run, but from year to year there is a lot of volatility.
An important part of forest ecological modeling is to quantify disturbances from fires, droughts, etc. These events quickly destroy significant parts of the forest. An analogy for the stock market would be a crash, as in 2001 or 2008. Indeed, for the stock market, 1-year fluctuations are not adequately described by the normal distribution (only 3-4 years or more are). But for the forest as a whole (as opposed to particular patches), annual fluctuations are normal. We do not need to introduce separate distributions for modeling disturbances. Since we assume the prior is Jeffrey's non-informative and the model is Gaussian, we do not need to do any Monte Carlo or Metropolis-Hastings computations: There are explicit formulas for the posterior distribution of parameters.

Acknowledgments
We thank Dr. Adam Erickson for help with proofreading. Initial data-mining of Quebec forest inventories was done by Dr. Jean Lienard for the Markov chain modeling and published in [27]. We also acknowledge welcoming environment for applied quantitative research in our departments. A.S. is grateful to Dr. Anna Panorska and Dr. Tomasz Kozubowski for mentorship and support. This work was partially supported by a grant from the Simons Foundation (# 283770 to N.S.) 6. Appendix 6.1. Maximal likelihood and minimal standard error. Take a family of linear regressions depending on the parameter a, with d factors and the intercept: Solve for a by maximum likelihood estimation. The Gaussian density for N (0, σ 2 ) is given by The log likelihood of (9) is derived from the Gaussian density (10) and is given by But the standard error of the regression (9) is estimated as Thus we can express We used crucially here that the residuals in (9) are normalized so that they have the same variance σ 2 . To maximize from (11), we need to first minimize the standard error s 2 (a) by computing it for each a and then choosing an appropriate a; then to choose the σ which maximizes (11) for given s 2 (a), which turns out to be σ 2 = s 2 (a).
6.2. Background on autoregressive models and random walk. Consider a time series of random variables (x 0 , x 1 , x 2 , . . .): This is called an autoregressive process of order 1 or AR(1): Regression of this sequence onto itself with a one-step time lag. It has the following properties. For −1 < a < 1, this sequence converges weakly to the stationary distribution: N (m, ρ 2 ), with This means that for every interval [c, d], In addition, mean and variance of x n converge to that of the limiting distribution: E [x n ] → m, Var [x n ] → ρ 2 . Thus this time series exhibits mean reversion: If x n > m then x n+1 is more likely to decrease compared to x n than to increase. For a = 1, this is random walk: Increments x n+1 − x n are independent for different n. This sequence does not have a limit as n → ∞.
We set a non-informative prior on (m, v), which means we do not have any existing information about these parameters: π(m, v) ∝ v −1 . This is an infinite measure: Thus we cannot normalize it (divide it by a constant) to make it a probability measure. But we can still apply Bayesian statistics to this measure. We choose this form of measure because we can get explicit posterior. Bayesian inference works as follows. The likelihood, that is, density of x 1 , . . . , x n given m, v is L(x 1 , . . . , x n | m, v) = (x 1 | m, v) · . . . · (x n | m, v) p(m, v | x 1 , . . . , x n ) ∝ π(m, v) · L(x 1 , . . . , x n | m, v).
Unlike the prior, the posterior is a finite measure. We can normalize it by computing its integral and dividing it by this integral. After computation, we get: p(m, v | x 1 , . . . , x n ) = S n/2 n 1/2 Γ(n/2)(2π) 1 Recall that Gamma distribution with shape α and scale β has density and expectation: density f (z; α, β) = β α Γ(α) z α−1 e −βz , z > 0; Using (14), we can rewrite (13) as follows: That is, v −1 has marginal Gamma distribution with shape n/2 and expectation 1/S; and v has inverse Gamma distribution with shape n/2. The conditional distribution of m given v is normal. The unconditional (marginal) distribution of m is Student (t-distribution). A Student distribution has heavier tails than a normal distribution, which implies more uncertainty, resulting from our Bayesian estimation framework.