How predictable is technological progress?

Recently it has become clear that many technologies follow a generalized version of Moore's law, i.e. costs tend to drop exponentially, at different rates that depend on the technology. Here we formulate Moore's law as a correlated geometric random walk with drift, and apply it to historical data on 53 technologies. We derive a closed form expression approximating the distribution of forecast errors as a function of time. Based on hind-casting experiments we show that this works well, making it possible to collapse the forecast errors for many different technologies at different time horizons onto the same universal distribution. This is valuable because it allows us to make forecasts for any given technology with a clear understanding of the quality of the forecasts. As a practical demonstration we make distributional forecasts at different time horizons for solar photovoltaic modules, and show how our method can be used to estimate the probability that a given technology will outperform another technology at a given point in the future.


Introduction
Technological progress is widely acknowledged as the main driver of economic growth, and thus any method for improved technological forecasting is potentially very useful.Given that technological progress depends on innovation, which is generally thought of as something new and unanticipated, forecasting it might seem to be an oxymoron.In fact there are several postulated laws for technological improvement, such as Moore's law and Wright's law, that have been used to make predictions about technology cost and performance.But how well do these meth-ods work?
Predictions are useful because they allow us to plan, but to form good plans it is necessary to know probabilities of possible outcomes.Point forecasts are of limited value unless they are very accurate, and when uncertainties are large they can even be dangerous if they are taken too seriously.At the very least one needs error bars, or better yet, a distributional forecast, estimating the likelihood of different future outcomes.Although there are now a few papers testing technological forecasts1 , there is as yet no method that gives distributional forecasts.In this paper we remedy this situation by deriving the distributional errors for a simple forecasting method and testing our predictions on empirical data on technology costs.
To motivate the problem that we address, consider three technologies related to electricity generation: coal mining, nuclear power and photovoltaic modules.Figure 1 compares their longterm historical prices.Over the last 150 years the inflation-adjusted price of coal has fluctuated by a factor of three or so, but shows no long term trend, and indeed from the historical time series one cannot reject the null hypothesis of a random walk with no drift 2 (McNerney et al.  2011).The first commercial nuclear power plant was opened in 1956.The cost of electricity generated by nuclear power is highly variable, but has generally increased by a factor of two or three during the period shown here.In contrast, since their first practical use as a power supply for the Vanguard I satellite in 1958, solar photovoltaic modules have dropped in price by a fac-tor of about 2,330 between 1956 and 2013, and since 1980 have decreased in cost at an average rate of about 10% per year3 .
In giving this example we are not trying to make a head-to-head comparison of the full system costs for generating electricity.Rather, we are comparing three different technologies, coal mining, nuclear power and photovoltaic manufacture.Generating electricity with coal requires plant construction (whose historical cost has dropped considerably since the first plants came online at the beginning of the 20th century).Generating electricity via solar photovoltaics has balance of system costs that have not dropped as fast as that of modules in recent years.Our point here is that different technologies can decrease in cost at very different rates.
Predicting the rate of technological improvement is obviously very useful for planning and investment.But how consistent are such trends?In response to a forecast that the trends above will continue, a skeptic would rightfully respond, "How do we know that the historical trend will continue?Isn't it possible that things will reverse, and over the next 20 years coal will drop in price dramatically and solar will go back up?". Figure 1: A comparison of long-term price trends for coal, nuclear power and solar photovoltaic modules.Prices for coal and nuclear power are levelized costs in the US in dollars per kilowatt hour (scale on the left) whereas solar modules are in dollars per watt-peak, i.e. the cost for the capacity to generate a watt of electricity in full sunlight (scale on the right).For coal we use units of the cost of the coal that would need to be burned in a modern US plant if it were necessary to buy the coal at its inflation-adjusted price at different points in the past.Nuclear prices are levelized electricity costs for US nuclear plants in the year in which they became operational (from Cooper (2009)).The alignment of the left and right vertical axes is purely suggestive; based on recent estimates of levelized costs, we took $0.177/kWh = $0.82/Wp in 2013 (2013$).The number $0.177/kWh is a global value produced as a projection for 2013 by the International Energy Agency (Table 4 in International Energy Agency (2014)).We note that it is compatible with estimated values (Table 1 in Baker et al. (2013), Fig. 4 in International Energy Agency ( 2014)).The red cross is the agreed price for the planned UK Nuclear power plant at Hinkley Point which is scheduled to come online in 2023 (£0.0925 ≈ $0.14).The dashed line corresponds to an earlier target of $0.5/kWh set by the DOE (the U.S. Department of Energy).
By studying the history of many technologies our paper provides a quantitative answer to this question.We put ourselves in the past, pretend we don't know the future, and use simple methods to forecast the costs of 53 different technologies.Actually going through the exercise of making out-of-sample forecasts rather than simply performing in-sample tests and computing error bars has the essential advantage that it allows us to say precisely how well forecasts would have performed.Out-ofsample testing is particularly important when models are mis-specified, which one expects for a complicated phenomenon such as technological improvement.
We show how one can combine the experience from all these different technologies to say, "Based on experience with many other technologies, the trend in photovoltaic modules is unlikely to reverse".Indeed we can assign a probability to different price levels at different points in the future, as is done later in Figure 10.Of course technological costs occasionally experience structural breaks where trends changethere are several clear examples in our historical data.The point is that, while such structural breaks happen, they are not so common as to over-ride our ability to forecast.And of course, every technology has its own story, its own specific set of causes and effects that explain why costs went up or down in any given year.Nonetheless, as we demonstrate here, the long term trends tend to be consistent, and can be captured via historical time series methods with no direct information about the underlying technology-specific stories.
In this paper we use a very simple approach to forecasting, which was originally motivated by Moore's Law.As everyone knows, Intel's ex-CEO, Gordon Moore, famously predicted that the number of transistors on integrated circuits would double every two years, i.e. at an annual rate of about 40% (Moore 1965).Because making transistors smaller also brings along a variety of other benefits, such as increased speed, decreased power consumption, and cheaper manufacture costs per unit of computation, with appropriate alterations in the exponent, it quickly became clear that Moore's law applies more broadly.For example, when combined with the predicted increase in transistor density, the scaling of transistor speed as a function of size gives a doubling of computational performance every 18 months.
Moore's law stimulated others to look at related data more carefully, and they discovered that exponential improvement is a reasonable approximation for other types of computer hardware as well, such as hard drives.Since the performance of hard drives depends on physical factors that are unrelated to transistor density this is an independent fact (though of course the fact that mass storage is essential for computation causes a tight coupling between the two technologies).Lienhard, Koh and Magee, and others 4 have examined data for other products and postulated that exponential improvement is a much more general phenomenon that applies to many different technologies.An important difference that in part explains why exponential scaling was discovered later for other technologies is that the rates of improvement vary widely and computer hardware is an outlier in terms of rate.
Although Moore's law is traditionally applied as a regression model, we reformulate it here as a geometric random walk with drift.This allows 4 Examples include Lienhard (2006), Koh & Magee (2006, 2008), Bailey et al. (2012), Benson & Magee (2014a,b), Nagy et al. (2013).Studies of improvement in computers over long spans of time indicate superexponential improvement (Nordhaus 2007, Nagy et al. 2011), suggesting that Moore's law may only be an approximation reasonably valid over spans of time of 50 years or less.See also e.g.Funk (2013) for an explanation of Moore's law based on geometric scaling, and Funk & Magee (2014) for empirical evidence regarding fast improvement prior to large production increase.us to use standard results from the time series forecasting literature 5 .The technology time series in our sample are typically rather short, often only 15 or 20 points long, so to test hypotheses it is essential to pool the data.Because the geometric random walk is so simple it is possible to derive formulas for the errors in closed form.This makes it possible to estimate the errors as a function of both sample size and forecasting horizon, and to combine data from many different technologies into a single analysis.This allows us to get highly statistically significant results.
We find that even though the random walk with drift is a better model than the generallyused linear regression on a time trend, it does not fully capture the temporal structure of technological progress.We present evidence suggesting that a key difference is that the data are positively autocorrelated.As an alternative we develop the hypothesis that technological progress follows a random walk with drift and autocorrelated noise, which we capture via an Integrated Moving Average process of order (1,1), hereafter referred to as an IMA(1,1) model.Under the assumption of sufficiently large autocorrelation this method produces a good fit to the empirically observed forecasting errors.We apply our method to forecast the likely distribution of the price of photovoltaic solar modules and show it can be used to estimate the probability that the 5 Several methods have been defined to obtain prediction intervals, i.e.
error bars for the forecasts (Chatfield 1993).The classical Box-Jenkins methodology for ARIMA processes uses a theoretical formula for the variance of the process, but does not account for uncertainty due to parameter estimates.Another approach is to use the empirical forecast errors to estimate the distribution of forecast errors.In this case, one can use either the in-sample errors (the residuals, as in e.g.Taylor & Bunn (1999)), or the out-of-sample forecast errors (Williams & Goodman 1971, Lee & Scholtes 2014).Several studies have found that using residuals leads to prediction intervals which are too tight (Makridakis & Winkler 1989).
price will undercut a competing technology at a given date in the future.
We want to stress that we do not mean to claim that the generalizations of Moore's law explored here provide the best method for forecasting technological progress.There is a large literature on experience curves6 , studying the relationship between cost and cumulative production originally suggested by Wright (1936), and many authors have proposed alternatives and generalizations7 .Nagy et al. (2013) tested these alternatives using a data set that is very close to ours and found that Moore's and Wright's laws were roughly tied for first place in terms of their forecasting performance.An important caveat is that Nagy et al.'s study was based on regressions, and as we argue here, time series methods are superior, both for forecasting and for statistical testing.It is our intuition that methods using auxiliary data such as production, patent activity, or R&D are likely to be superior8 .
The key assumption made here is that all technologies follow the same random process, even if the parameters of the random process are technology specific.This allows us to develop distributional forecasts in a highly parsimonious manner and efficiently test them out of sample.We develop our results for Moore's law because it is the simplest method to analyze, but this does not mean that we necessarily believe it is the best method of prediction.We restrict ourselves to forecasting unit cost in this paper, for the simple reason that we have data for it and it is comparable across different technologies.The work presented here provides a simple benchmark against which to compare fore-casts of future technological performance based on other methods.
The general approach of basing technological forecasts on historical data, which we pursue here, stands in sharp contrast to the most widely used method, which is based on expert opinion.The use of expert opinions is clearly valuable, and we do not suggest that it should be supplanted, but it has several serious drawbacks.Expert opinions are subjective and can be biased for a variety of reasons, including common information, herding, or vested interest (Albright 2002, National Research Council 2009).Forecasts for the costs of nuclear power in the US, for example, were for several decades consistently low by roughly a factor of three (Cooper 2009).A second problem is that it is very hard to assess the accuracy of expert forecasts.In contrast the method we develop here is objective and the quality of the forecasts is known.Nonetheless we believe that both methods are valuable and that they should be used side-by-side.
The remainder of the paper develops as follows: In Section 2 we derive the error distribution for forecasts based on the geometric random walk as a function of time horizon and other variables and show how the data for different technologies and time horizons should be collapsed.We also introduce the Integrated Moving Average Model and derive similar (approximate) formulas.In Section 3 we describe our data set and present an empirical relationship between the variance of the noise and the improvement rate for different technologies.In Section 4 we describe our method of testing the models against the data and present the results.We then apply our method to give a distributional forecast for solar module prices in Section 5 and show how this can be used to forecast the likelihood that one technology will overtake another.Finally we give some concluding remarks in Section 6.A variety of technical results are given in the appendices.

Geometric random walk
The generalized version of Moore's law we study here is a postulated relationship which in its deterministic form is where p t is either the unit cost or the unit price of a technology at time t; we will hereafter refer to it as the cost.p 0 is the initial cost and µ is the exponential rate of change.(If the technology is improving then µ < 0.) In order to fit this to data one has to allow for the possibility of errors and make an assumption about their structure.Typically the literature has treated Moore's law using linear regression, minimizing least squares errors to fit a model of the form where y t = log(p t ).From the point of view of the regression, y 0 is the intercept, µ is the slope and e t is independent and identically distributed (IID) noise.
We instead cast Moore's law as a geometric random walk with drift by writing it in the form where as before µ is the drift and n t is an IID noise process.Letting the noise go to zero recovers the deterministic version of Moore's law in either case.When the noise is nonzero, however, the models behave quite differently.For the regression model the shocks are purely transitory, i.e. they do not accumulate.In contrast, if y 0 is the cost at time t = 0, Eq. 2 can be iterated and written in the form This is equivalent to Eq. 1 except for the last term.While in the regression model of Eq. 1 the value of y t depends only on the current noise and the slope µ, in the random walk model (Eq.2) it depends on the sum of previous shocks.Hence, shocks in the random walk model accumulate, and the forecasting errors grow with time horizon as one would expect, even if the parameters of the model are perfectly estimated.
Another important difference is that because Eq. 2 is a time series model the residual associated with the most recently observed data point is by definition zero.For the regression model, in contrast, the most recent point may have a large error; this is problematic since as a result the error in the forecast for time horizon τ = 1 can be large.(Indeed the model generally doesn't even agree with the data for t = 0, where p t is known).
For time series models a key question is whether the process is stationary, i.e. whether the process has a unit root.Most of our time series are much too short for unit root tests to be effective (Blough 1992).Nonetheless, we find that our time series forecasts are consistent with the hypothesis of a unit root and that they perform better than several alternatives.

Prediction of forecast errors
We assume that all technologies follow the same random process except for technology-specific parameters.Rewriting Eq. 2 slightly, it becomes where the index i indicates technology i.For convenience we assume that noise n it is IID normal, i.e. n it ∼ N (0, K 2 i ).This means that technology i is characterized by the drift µ i and the standard deviation of the noise K i .We will typically not include the indices for the technology unless we want to emphasize the dependence on technology.
We now derive the expected error distribution for Eq. 2 as a function of the time horizon τ .
Eq. 2 implies that The prediction τ steps ahead is where μ is the estimated µ.The forecast error is defined as Putting Eqs. 5 and 6 into 7 gives which separates the error into two parts.The first term is the error due to the fact that the mean is an estimated parameter, and the second term represents the error due to the fact that unpredictable random shocks accumulate (Sampson 1991).Assuming that the noise increments are i.i.d normal, and that the estimation of the parameters is based on m data points, in appendix B.1 we derive the scaling of the errors with m, τ and K, where K2 is the estimated variance based on a trailing sample of m data points.To study how the errors grow as a function of τ , because we want to aggregate forecast errors for technologies with different volatilities, we use the normalized mean squared forecast error Ξ(τ ), which is This formula makes intuitive sense.The term that grows proportional to τ is the diffusive term, i.e. the growth of errors due to the noise.This term is present even in the limit m → ∞, where the estimation is perfect.The term that is proportional to τ 2 /m is due to estimation error.The prefactor is due to the fact that we use the estimated variance, not the true one, implying that the distribution is not normal9 but rather is Student t: with The linear term corresponds to diffusion due to noise and the quadratic term to the propagation of errors due to misestimation of µ.Equation ( 9) is universal, in the sense that the right hand side depends neither on µ i nor K i , hence it does not depend on the technology.As a result, we can pool all the technologies to analyze the errors.Moreover, the right hand side of Eq. 10 does not even depend on τ , so we can pool different forecast horizons together as well.

Integrated Moving Average model
Although the random walk model above does surprisingly well, and has the important advantage that all the formulas above are simple and intuitive, as we will see when we discuss our empirical results in the next section, there is good evidence that there are positive autocorrelations in the data.In order to incorporate this structure we extend the results above for an ARIMA(0,1,1) (autoregressive integrated moving average) model.The zero indicates that we do not use the autoregressive part, so we will abbreviate this as an IMA(1,1) model in what follows.The IMA(1,1) model is of the form with the noise v t ∼ N (0, σ 2 ).When θ = 0 the time series are autocorrelated; θ > 0 implies positive autocorrelation.
We chose this model mainly because performing Box-Jenkins analysis on the series for individual technologies tend to suggest it more often than other ARIMA models (but less often than the pure random walk with drift 10 ).Moreover, our data are often time-aggregated, that is, our yearly observations are averages of the observed costs over the year.It has been shown that if the true process is a random walk with drift then aggregation can lead to substantial autocorrelation (Working 1960).In any case, while every technology certainly follows an idiosyncratic pattern and may have a complex autocorrelation structure and specific measurement errors, using the IMA(1,1) as a universal model allows us to parsimoniously understand the empirical forecast errors and generate robust prediction intervals.
A key quantity for pooling the data is the variance, which by analogy with the previous model we call K for this model as well.It is easy to show that Box & Jenkins (1970).If we make the forecasts as before (Eq.6), the distribution of forecast errors is (Appendix B.2) 10 Bear in mind that our individual time series are very short, which forces the selection of a very parsimonious model, and likely explains the choice of the random walk model when selection is based on individual series in isolation. with ) Note that we recover Eq. 11 when θ = 0.In the case where the variance is estimated, it is possible to derive an approximate formula for the growth and distribution of the forecast errors by assuming that K and E are independent.The expected mean squared normalized error is and the distribution of rescaled normalized forecast errors is Because the assumption that the numerator and denominator are independent is true only in the limit m → ∞, these formulas are approximations.We compare these to more exact results obtained through simulations in Appendix B.2 -see in particular Figure 13.For m > 30 the approximation is excellent, but there are discrepancies for small values of m.

Data collection
The bulk of our data on technology costs comes from the Santa Fe Institute's Performance Curve DataBase11 , which was originally developed by Bela Nagy and collaborators; we augment it with a few other datasets.These data were collected via literature search, with the principal criterion for selection being availability.Figure 2 plots the time series for each data set.The sharp cutoff for the chemical data, for example, reflects the fact that it comes from a book published by the Boston Consulting Group in 1972.Table 1 gives a summary of the properties of the data and more description of the sources can be found in appendix A. A ubiquitous problem in forecasting technological progress is finding invariant units.A favorable example is electricity.The cost of generating electricity can be measured in dollars per kW/h, making it possible to sensibly compare competing technologies and measure their progress through time.Even in this favorable example, however, making electricity cleaner and safer has a cost, which has affected historical prices for technologies such as coal and nuclear power in recent years, and means that their costs are difficult to compare to clean and safe but intermittent sources of power such as solar energy.To take an unfavorable example, our dataset contains appliances such as television sets, that have dramatically increased in quality through time 12 .
One should therefore regard our results here as a lower bound on what is possible, in the sense that performing the analysis with better data in which all technologies had invariant units would very likely improve the quality of the forecasts.We would love to be able to make appropriate normalizations, but the work involved is prohibitive; if we dropped all questionable examples we would end with little remaining data.Most of the data are costs, but in a few cases they are prices; again, this adds noise but if we were able to be consistent that should only improve our results.We have done various tests removing data and the basic results are not sensitive to what is included and what is omitted (see Fig. 15 in the appendix).
We have removed some technologies that are too similar to each other from the Performance Curve Database.For instance, when we have two datasets for the same technology, we keep only one of them.Our choice was based on data quality and length of the time series.This selection left us with 66 technologies, belonging to different sectors that we label as chemistry, genomics, energy, hardware, consumer durables and food.

Data selection and descriptive statistics
In this paper we are interested in technologies that are improving, so we restrict our analysis to those technologies whose rate of improvement is statistically significant based on the available sample.We used a simple one-sided t-test on the (log) series and removed all technologies for which the p-value indicates that we can't reject the null that µ i = 0 at a 10% confidence level.Note that our results actually get better if we do not exclude the non-improving technologies; this is particularly true for the geometric random walk model.We have removed the nonimproving technologies because including technologies that that do not improve swamps our results for improving technologies and makes the situation unduly favorable for the random walk model.1).μi is the annual logarithmic rate of decrease in cost, Ki is the standard deviation of the noise, T i is the number of available years of data and θ is the autocorrelation.  1 also shows the estimated drift rate μi and the estimated standard deviation Ki based on the full sample for each technology i. (Throughout the paper we use a hat to denote estimates performed within an estimation window of size m and a tilde to denote the estimates made using the full sample).Histograms of μi , Ki , sample size T i and θ are given in Figure 3.Because we use a hindcasting procedure the number of possible forecasts that can be made with a given data window m decreases as the time horizon τ of the forecast increases.For m = 5 we show the number of possible forecasts as well as the number of technologies for which at least one forecast is possible at that time horizon.The horizontal line at τ = 20 indicates our (somewhat arbitrary) choice of a maximum time horizon.There are a total of 8212 forecast errors and 6391 with τ ≤ 20.
Since our dataset includes technologies of different length (see Table 1 and Figure 4), and because we are doing exhaustive hindcasting, as described in the next section13 , the number of possible forecasts is highest for τ = 1 and decreases for longer horizons.Figure 4 shows the number of possible forecasts and the number of technologies as a function of the forecast horizon τ .We somewhat arbitrarily impose an upper bound of τ max = 20, but find this makes very little difference in the results (see Appendix Table 1: Descriptive statistics and parameter estimates (using the full sample) for all available technologies, ordered by p-value of a one-sided t-test for μ.The improvement of the last 13 technologies is not statistically significant and so they are dropped from further analysis. C.3).

Relation between drift and volatility
Figure 5 shows a scatter plot of the estimated standard deviation Ki for technology i vs. the estimated improvement rate − μi .This shows very clearly that on average the uncertainty Ki gets bigger as the improvement rate − μi increases.There is no reason that we are aware of to expect this a priori.One possible explanation is that for technological investment there is a trade-off between risk and returns.Another possibility is that faster improvement amplifies fluctuations.The log-log fit is K = 0.51(−μ) 0.72 , with R 2 = 0.73 and p-value ≈ 0. Technologies with a faster rate of improvement also have higher uncertainty in their improvement.

Statistical validation procedure
We use hindcasting for statistical validation, i.e.
for each technology we pretend to be at a given date in the past and make forecasts for dates in the future relative to the chosen date14 .We have chosen this procedure for several reasons.First, it directly tests the predictive power of the model rather than its goodness of fit to the data, and so is resistant to overfitting.Second, it mimics the same procedure that one would follow in making real predictions, and third, it makes efficient use of the data available for testing.
We fit the model at each time step to the m most recent changes in cost (i.e. the most recent m+1 years of data).We use the same value of m for all technologies and for all forecasts.Because most of the time series in our dataset are quite short, and because we are more concerned here with testing the procedure we have developed than with making optimal forecasts, unless otherwise noted we choose m = 5.This is admittedly very small, but it has the advantage that it allows us to make a large number of forecasts.We will return later to discuss the question of which value of m makes the best forecasts.
We perform hindcasting exhaustively in the sense that we make as many forecasts as possible given the choice of m.For convenience assume that the cost data y t = log p t for technology i exists in years t = 1, 2, . . ., T i .We then make forecasts for each feasible year and each feasible time horizon, i.e. we make forecasts ŷt 0 +τ (t 0 ) rooted in years t 0 = (m + 1, . . ., T i − 1) with forecast horizon τ = (1, . . ., T i − t 0 ).
In each year t 0 for which forecasts are made the drift μt 0 is estimated as the sample mean of the first differences where the last equality follows from the fact that the sum is telescopic, and implies that only two points are needed to estimate the drift.The volatility is estimated using the unbiased estimator 15

Comparison of models to data
This procedure gives us a variable number of forecasts for each technology i and time horizon τ , rooted at all feasible times t 0 .We record the forecasting errors E t 0 ,τ = y t+τ (t 0 ) − ŷt+τ (t 0 ) and the associated values of Kt 0 for all t 0 and all τ where we can make forecasts.We then test the models by computing the mean squared normalized forecast error, , averaging over all technologies and for all forecasts with τ < 20, as shown by the black dots in Figure 6.The dashed line compares this to the predicted normalized error under the geometric random walk hypothesis.
The random walk model does a good job of predicting the scaling of the forecast errors as a function of τ , but the predicted errors are too small by roughly a factor of two.The fact that it underestimates the errors is not surprising -15 This is different from the maximum likelihood estimator, which does not make use of Bessel's correction (i.e.dividing by (m − 1) instead of m).Our choice is driven by the fact that in practice we use a very small m, making the bias of the maximum likelihood estimator rather large.the random walk model is an extreme assumption.The correct prediction of the scaling of the errors as a function of the forecast horizon is not an obvious result; alternative hypotheses, such as long-memory, can produce errors that grow in time faster than the random walk.Given that long-memory is a natural result of nonstationarity, which is commonly associated with technological change, our prior was that it was a highly plausible alternative 16 .These procedures involve some data snooping, i.e. they use knowledge about the future to set θ, but since this is only one parameter estimated from a sample of more than 6, 000 forecasts the resulting estimate should be reasonably reliable.See the more detailed discussion in Appendix D.
To understand why the errors in the model are larger than those of the random walk we investigated two alternative hypotheses, heavy tails and autocorrelated innovations.We found little evidence for heavy tails (appendix C.4) but much stronger evidence for positive autocorrelations.Testing for autocorrelation is difficult due to the fact that the series are short, and the sample autocorrelations of the individual samples are highly variable and in many cases not very statistically significant (see Figure 3).Nonetheless, on average the individual series have positive autocorrelations, suggesting that this is at least a reasonable starting point.
We thus selected the IMA(1,1) model discussed in Section 2.3 as an alternative to the simple random walk with drift.This model requires an additional parameter θ, which describes the strength of the autocorrelation.Because we are using a small number m of past data points to fit the model, a dynamic sample estimate θ based on a moving average is not statistically stable.
To counter this problem we use a global value of θ, i.e. we use the same value for all technologies and all points in time.We use two different methods for doing this and compare them in what follows.The first method takes advantage of the fact that the magnitude of the forecast errors is an increasing function of θ (we assume θ > 0) and chooses θ * m to match the empirically where H is the Hurst exponent.In the absence of longmemory H = 1/2, but for long-memory 1/2 < H < 1.
Long-memory can arise from many causes, including nonstationarity.It is easy to construct plausible processes with the µ parameter varying where the mean squared errors grow faster than τ 2 .observed forecast errors.For the positive errors we compute the number of errors greater than a given value X and divide by the total number of errors to estimate the cumulative probability and plot in semi-log scale.For the negative errors we do the same except that we take the absolute value of the error and plot against −X.
The second method takes a weighted average θ * w calculated as follows: We exclude all technologies for which the estimate of θ reveals specification or estimation issues (θ ≈ 1 or θ ≈ −1).Then at each horizon we compute a weighted average, with the weights proportional to the number of forecasts made with that technology.Finally we average the first 20 estimates.See Appendix D. Note that although we derived an asymptotic formula for the errors of the IMA(1,1) process equation ( 15), with m = 5 there is too little data for this to be accurate.Instead we simulate the IMA(1,1) process to create an artificial data set that mimics our real data 17 .
We then test to see whether we correctly predict the distribution of forecast errors.Figure 17 More specifically, for each technology we generate T i pseudo cost data points using Eq. 12 with µ = μi , K = Ki and θ = θ * w = 0.25.We then estimate the parameters just as we did for the real data and compute the mean squared normalized error in forecasting the artificial data as a function of τ .The curves corresponding to the IMA models in Figure 6 are the result of doing this 1000 times and taking the average.The error bars are also computed using the simulations (95% of the simulated datasets (θ = 0.25) have a mean square normalized forecast error within the grey area) 7 shows the distribution of rescaled forecast errors at several different values of τ using the IMA(1,1) rescaling factor A * (eq.14) and compares it to the predicted distribution.Fig. 8 shows the distribution with all values of τ pooled together and compares the empirical results for the random walk with drift to the empirical results with the IMA(1,1).The predicted distribution is fairly close to the theoretical prediction, and as expected the IMA(1,1) is better than the random walk with drift.
Given the complicated overlapping structure of the exhaustive hindcasting approach that we use here, the only feasible approach to statistically testing our results is simulation18 .To give a feeling for the expected size of the fluctuations, in Appendix E we generate an ensemble of results under the null hypothesis of the IMA(1,1) model, generating an artificial data set that mimics the real data, as described above.This gives a feeling for the expected fluctuations in Figure 8 and makes it clear that the random walk can confidently be rejected.The estimate based on the IMA(1,1) model with θ = θ * w = 0.25 is also rejected, with no more than 1% of the simulations generating deviations as large as the real data (this depends on the way in which deviations from the distribution are measured).Unsurprisingly the IMA(1,1) model with θ = θ * m = 0.63 is not rejected (see Appendix E for details).

Dependence on m
So far we have used a small value of m for testing purposes; this allows us to generate a large number of forecasts and thereby test to see whether we are correctly predicting the distributional forecasting accuracy of our method.
We now address the question of the optimal value of m.In a stationary world, in which the models are well-specified and there is a constant true value of µ, one should always use the largest possible value of m.In a non-stationary world, however, it can be advantageous to use a smaller value of m, or alternatively a weighted average that decays as it goes into the past.We experimented with increasing m as shown in Figure 9 and as described in more detail in Appendix C.1, and we find that the errors drop as m increases roughly as one would expect if the process were stationary.Note that to check forecast errors for high m we have used technologies for which at least m + 2 years were available.

Application to solar PV modules
In this section we provide a distributional forecast for the price of solar photovoltaic modules.
We then show how this can be used to make a comparison to a hypothetical competing technology in order to estimate the probability that one technology will be cheaper than another at a given time horizon, and finally, we use solar energy as an example to illustrate how extrapolation can be useful to forecast production as well as cost.
Figure 10 shows the predicted distribution of likely prices for solar photovoltaic module for time horizons up to 2030.To make this forecast, we use all available years (m = 33), and the normal approximation Eq.13 with θ = θ * m .The prediction says that it is likely that solar PV modules will continue to drop in cost at the roughly 10% rate that they have in the past.Nonetheless there is a small probability (about 5%) that the price in 2030 will be higher than it is now 19 .

Estimating the probability
that one technology will be cheaper than another Consider comparing the price of photovoltaic modules y S with the price of an alternative technology y C .We assume that both technologies follow an IMA(1,1) process with θ = θ * m , and as previously we make forecasts using the simple random walk with drift predictions.To keep things simple we assume that the estimated variance is the true variance, so that the forecast error is normally distributed (Eq.13).Assume the estimated parameters for the two technologies are μS and KS for solar modules and μC and KC for the competing technology.We want to compute the probability that τ steps ahead y S < y C .Under these assumptions where A * (τ ) is defined in Eq. 14 with θ S = θ * m , and m = 33.Technology C is similarly defined, but for the sake of argument we assume that technology C has historically on average had a constant cost, i.e. μC = 0. We also assume that the estimation period is the same, and that θ C = θ S = θ * m .The probability that y S < y C is the probability that the random variable Z = y S − y C is positive.Since y S and y C are normal, assuming they are independent their difference is normal where The probability that y S < y C is the integral for the positive part, 19 This forecast is consistent with the one made several years ago by Nagy et al. (2013) using data only until 2009.
which is expressed in terms of the error function  In Figure 11 we plot this function using the parameters estimated for photovoltaics, assuming that the competing technology is three times cheaper at the starting date in 2013 and has a flat average rate of growth.Three different levels of the noise parameter KC for the alternative technology are considered.Note that the increased uncertainty in the future evolution does not change the expected crossover point.
In the above discussion we have carefully avoided discussing a particular competing technology.A forecast for the full cost of solar PV electricity requires predicting the balance of system costs, for which we lack consistent historical data, and unlike module costs, the full cost depends on factors such as insolation, interest rates and local installation costs.As solar PV grows to be a significant portion of the energy supply the cost of storage will become very important.Nonetheless, it is useful to discuss it in relation to the two competitors mentioned in the introduction.

Discussion of PV relative to coal-fired electricity and nuclear power
An analysis of coal-fired electricity, breaking down costs into their components and examining each of the trends separately, has been made by McNerney et al. (2011).They show that while coal plant costs (which are currently roughly 40% of total cost) dropped historically, this trend reversed circa 1980.Even if the recent trend reverses and plant construction cost drops dramatically in the future, the cost of coal is likely to eventually dominate.As mentioned before, this is because the historical cost of coal is consistent with a random walk without drift, and currently fuel is about 40% of total costs.If coal remains constant in cost (except for random fluctuations up or down) then this places a hard bound on how much the total cost of coal-fired electricity can decrease.Since typical plants have efficiencies the order of 1/3 there is not a lot of room for making the burning of coal more efficient -even a spectacular efficiency improvement to 2/3 of the theoretical limit is only an improvement of a factor of two, corresponding to an expected 7-8 years of progress for PV modules.Similar arguments apply to oil and natural gas 20 .
20 Though much has been made of the recent drop in the price of natural gas due to fracking, which has had Because historical nuclear power costs have tended to increase, not just in the US but worldwide, even a forecast that they will remain constant seems optimistic.Levelized costs for solar PV powerplants in 2013 were as low as 0.078-0.142Euro/kWh (0.09-0.16$) in Germany (Kost et al. 2013)  21 , and in 2014 solar PV reached a new record low with an accepted bid of $0.06/kWh for a plant in Dubai 22 .When these are compared to the projected cost of $0.14/kWh in 2023 for the Hinkley Point nuclear reactor, it appears that the two technologies already have roughly equal costs, though of course a direct comparison is difficult due to factors such as intermittency, waste disposal, insurance costs, etc.

Forecasting production
So far we have focused exclusively on forecasting costs, but in many cases it may be useful to also forecast production, or related factors such as cumulative capacity or consumption.As originally shown by Sahal (1979) the combination of exponentially decreasing costs and exponentially increasing production leads to Wright's law.Nagy et al. (2013) found that as a rough approximation most of the technologies in our a large effect, one should bear in mind that the drop is tiny in comparison to the factor of about 2,330 by which solar PV modules have dropped in price.The small change induced by fracking is only important because it is competing in a narrow price range with other fossil fuel technologies.In work with other collaborators we have examined not just oil, coal and gas, but more than a hundred minerals; all of them show remarkably flat historical prices, i.e. they all change by less than an order of magnitude over the course of a century.
21 Levelized costs decrease more slowly than module costs, but do decrease (Nemet 2006).For instance, installation costs per watt have fallen in Germany and are now about half what they are in the U.S. (Barbose et al. 2014).
Many analysts have expressed concerns about the time required to build the needed capacity for solar energy to play a role in reducing greenhouse gas emissions.The "hi-Ren" (high renewable) scenario of the International Energy Agency assumes that PV will generate 16% of total electricity 23 in 2050; this was recently increased from the previous estimate of only 11%.

As a point of comparison, what do past trends suggest?
Though estimates vary, over the last ten years cumulative installed capacity of PV has grown at an impressive rate.According to BP's Statistical Review of World Energy 2014, during the period from 1983-2013 solar energy as a whole grew at an annual growth rate of 42.5% and in 2014 represented about 0.22% of total primary energy consumption, as shown in Figure 12.By comparison total primary energy consumption grew at an annual rate of 2.6% over the period 1965-2013.Given that solar energy is an intermittent source, it is much easier for it to contribute when it supplies only a minority of energy: New supporting technologies will be required once it becomes a major player.If we somewhat arbitrarily pick 20% as a target, assuming both these trends continue unaltered, a simple calculation shows that this would be achieved in about 13.7 years 24 .That is, under these assumptions in 2027 solar would represent 20% of the energy supply.Of course this is only an extrapolation, but it puts into perspective claims that solar energy cannot play an essential role in mitigating global warming, even on a relatively short timescale.
23 Electricity consumption is currently around 40% of total primary energy but it is expected to grow significantly. 24The expected time to meet this goal is the solution for t of 0.0022(1.425)t = 0.2(1.026)t .Of course the usual caveats apply, and the limitations of such forecasting is evident in the historical series of Figure 12.The increase of solar is far from smooth, wind has a rather dramatic break in its slope in roughly 1988, and a forecast for nuclear power made in 1980 based on production alone would have been far more optimistic than one today.It would be interesting to use a richer economic model to forecast cost and production simultaneously, but this is beyond the scope of this paper.The point here was simply to show that if growth trends continue as they have in the past significant contributions by solar are achievable.

Conclusion
Many technologies follow a similar pattern of progress but with very different rates.In this paper we have proposed a simple method to provide robust predictions for technological progress that are stated as distributions of outcomes rather than point forecasts.We assume that all technologies follow a similar process except for their rates of improvement (the drift term) and volatility.Under this assumption we can pool forecast errors of different technologies to obtain an empirical estimation of the distribution of forecast errors.
One of the essential points of this paper is that the use of many technologies allows us to make a better forecast for a given technology, such as solar PV modules.Although using many technologies does not affect our point forecast, it is the essential element that allows us to make a distributional forecast.The point is that by treating all technologies as essentially the same except for their parameters, and collapsing all the data onto a single distribution, we can pool data from many technologies to gain confidence in and calibrate our method for a given technology.It is of course a bold assumption to say that all technologies follow a random process with the same form, but the empirical results indicate that this hypothesis is useful for forecasting.Of course it is always possible that other methods (e.g. more detailed hypotheses) would reveal more profound differences in the random processes characterising the improvement of different technologies.
We do not want to suggest in this paper that we think that Moore's law provides an optimal forecasting method.Quite the contrary, we believe that by gathering more historical data, and by adding other auxiliary variables, such as production, R&D, patent activity, there should be considerable room for improving forecasting power.Nonetheless, our results provide a benchmark against which others can be measured.They provide a proof of principle that technologies can be successfully forecast, and that the errors in the forecasts can be predicted.
From a policy perspective we believe that these methods can be used to provide objec-tive points of comparison to expert forecasts, which are often biased by vested interests and other factors.The fact that we can associate uncertainties with our predictions makes them far more useful than simple point forecasts.The example of solar PV modules illustrates that differences in the improvement rate of competing technologies can be dramatic, and that an underdog can begin far behind the pack and quickly emerge as a front-runner.Given the urgency of limiting greenhouse gas emissions, it is fortuitous that a green technology also happens to have such a rapid improvement rate, and is likely to eventually surpass its competition within 10 − 20 years.In a context where limited resources for technology investment constrain policy makers to focus on a few technologies that have a real chance to eventually achieve and even exceed grid parity, the ability to have improved forecasts and know how accurate they are should prove particularly useful.data is from Wetterstrand (2015) (cost per human-size genome), and for each year we took the last available month (September for 2001-2002 and October afterwards) and corrected for inflation using the US GDP deflator.

B.1 Random walk with drift
This section derives the distribution of forecast errors.Note that by definition y t+1 − y t = ∆y ∼ N (µ, K 2 ).To obtain μ we assume m sequential independent observations of ∆y, and compute the average.The sampling distribution of the mean of a normal variable is Moreover, As a result, using Eq. 8 we see that the distribution of forecast errors is Gaussian where A = τ + τ 2 /m (11).Eq. 22 implies Eq. 23 leads to E[E 2 ] = K 2 (τ + τ 2 /m), which appears in more general form in Sampson (1991).However we also have to account for the fact that we have to estimate the variance.Since K2 is the sample variance of a normally distributed random variable, the following standard result holds If Z ∼ N (0, 1), U ∼ χ 2 (r), and Z and U are independent, then Z/ U/r ∼ t(r).Taking Z from Eq. 23, U from Eq. 24 and assuming independence, we find that the rescaled normalized forecast errors have a Student t distribution Note that the t distribution has mean 0 but variance df /(df − 2), where df are the degrees of freedom.Hence the expected squared normalized forecast error is leading to Eq. 9 in the main text.

B.2 Integrated Moving Average
Here we derive the distribution of forecast errors given that the true process is an IMA(1,1) with known θ, µ and K are estimated assuming that the process is a random walk with drift, and the forecasts are made as if the process was a random walk with drift.First note that, from Eq. 12, Using Eq. 6 to make the prediction implies that to obtain Expanding the two sums, this can be rewritten Note that the term v t enters in the forecast error both because it has an effect on parameter estimation and because of its effect on future noise.Now that we have separated the terms we are left with a sum of independent normal random variables.Hence we can obtain E ∼ N (0, σ 2 A * ), where can be simplified as ( 14) in the main text.
To obtain the results with estimated (instead of true) variance (Eq.15 and 16), we follow the same procedure as in appendix B.1, which assumes independence between the error and the estimated variance.Figure 13 shows that the result is not exact but works reasonably well if m > 15.

C Robustness checks
C.1 Size of the learning window The results are robust to a change of the size of learning window m.It is not possible to go below m = 4, as when m = 3 the Student distribution has m − 1 = 2 degrees of freedom, hence an infinite variance.Note that to make forecasts using a large m, only the datasets which are long enough can be included.The results for a few values of m are shown in Figure 9. Figure 14 shows that the normalized mean square forecast error consistently decreases as the learning window increases.

C.2 Data selection
We have checked how the results change when about half of the technologies are randomly selected and removed from the dataset.The shape of the normalized mean square forecast error growth does not change and is shown in Figure 15.The procedure is based on 10000 random trials selecting half the technologies.In the main text we have shown the results for a forecast horizon up to τ max = 20.Moreover, we have used only the forecast errors up to τ max to construct the empirical distribution of forecast errors in Figure 8 and to estimate θ in appendix D. Figure 16 shows that if we use all the forecast errors up to the maximum with τ = 73 the results do not change significantly.

C.4 Heavy tail innovations
To check the effect of non-normal noise increments on Ξ(τ ), we simulated random walks with drift with noise increments drawn from a Student distribution with 3 or 7 degrees of freedom.Figure 17 shows that fat tail noise increments do not change the long horizon errors very much.While the IMA(1,1) model produces a parallel shift of the errors at medium to long horizons, the Student noise increments generate larger errors mostly at short horizons.We select θ in several ways.The first method is to compute a variety of weighted means for the θi estimated on individual series.The main problem with this approach is that for some technology series the estimated θ was very close to 1 or -1, indicating mis-specification or esti- Figure 19: Using the IMA model to make better forecasts.The right panels uses θ = 0.25 mation problems.After removing these 8 technologies the mean with equal weights for each technology is 0.27 with standard deviation 0.35.We can also compute the weighted mean at each forecast horizon, with the weights being equal to the share of each technology in the number of forecast errors available at a given forecast horizon.In this case, the weighted mean θ w (τ ) will not necessarily be constant over time.Figure 18 (right) shows that θ w (τ ) oscillates between 0.24 and 0.26.Taking the average over the first 20 periods, we have θ * w = 1 20 20 τ =1 θ w (τ ) = 0.25, which have used as our main estimate of θ in the previous section, in particular, figures 6, 7 and 8.When doing this, we do not mean to imply that our formulas are valid for a system with heterogenous θ i ; we simply propose a best guess for a universal θ.
The second approach is to select θ in order to match the errors.As before we generate an artificial data set using the IMA(1,1) model.Larger values of θ imply that using the simple random walk model to make the forecasts will result in higher forecast errors.Denote by Ξ(τ ) empi the empirical mean squared normalized forecast error as depicted in Figure 6, and by Ξ(τ ) sim,θ the expected mean squared normalized forecast error obtained by simulating an IMA(1,1) process 3,000 times with a particular global value of θ.We study the ratio of the these two, averaged over all 1 . . .τ max = 20 periods, i.e.We also tried to make forecasts using the IMA model to check that forecasts are improved: which value of θ allows the IMA model to produce better forecasts?We apply the IMA(1,1) model with different values of θ to make forecasts (with the usual estimate of the drift term μ) and study the normalized error as a function of θ.We record the mean squared normalized error and repeat this exercise for a range of values of θ.The results for horizons 1,2, and 10 are reported in Figure 19 (left).This shows that the best value of θ depends on the time horizon τ .The curve shows the mean squared normalized forecast error at a given forecast horizon as a function of the value of θ assumed to make the forecasts.The vertical lines show the minima, at 0.26, 0.40, and 0.66.To make the curves fit on the plot, given that the mean squared normalized forecast error increases with τ , the values are normalized by the mean squared normalized forecast error for θ = 0, that is, obtained when assuming that the true process is a random walk with drift.We also see that as the forecast horizon increases the improvement from taking the autocorrelation into account decreases (Fig. 19, right), as expected theoretically from an IMA process.

E Deviation of the empirical CDF from the theoretical Student t
In this section we check whether the deviations of the empirical forecast errors from the predicted theoretical distribution are due to sampling variability, that is, due to the fact that our datasets provide only a limited number of forecast errors.
We consider the results obtained when pooling the forecast errors at all horizons, up to the maximum horizon τ max = 20.We construct the empirical CDF of this distribution, split its support in 1000 equal intervals, and compare the empirical value in each interval k with the theoretical value, taken from the CDF of Student distribution with m − 1 degrees of freedom.The difference in each interval is denoted by ∆ k .Three different measures of deviation are used: k |∆ k |, k (∆ k ) 2 , and max ∆ k .These measures are computed for the real empirical data and for 10,000 simulated similar datasets, using θ = θ * w = 0.25 and θ = θ * m = 0.63.The results are reported in Figure 20.It is seen that for θ = 0.25 the departure of the real data empirical CDF from the theoretical Student CDF tends to be larger than the departure expected for a similar dataset.Considering k |∆ k |, k (∆ k ) 2 , and max ∆ k , the share of random datasets with a larger deviation is 0.001,0.002,0.011.If we use a larger θ (θ * m = 0.63), these numbers are 0.21, 0.16, 0.20, so if we take a large enough θ we cannot reject the model based on the expected departure of the forecast errors from the theoretical distribution.
photovoltaic module prices U.S. nuclear electrity prices U.K. Hinkley Point price in 2023

Frequency 3 :
Histogram for the estimated parameters for each technology i based on the full sample (see also Table

Figure 4 :
Figure 4: Technologies, forecasts and time horizon.Because we use a hindcasting procedure the number of possible forecasts that can be made with a given data window m decreases as the time horizon τ of the forecast increases.For m = 5 we show the number of possible forecasts as well as the number of technologies for which at least one forecast is possible at that time horizon.The horizontal line at τ = 20 indicates our (somewhat arbitrary) choice of a maximum time horizon.There are a total of 8212 forecast errors and 6391 with τ ≤ 20.

Figure 5 :
Figure5: Scatter plot of μ and K for technologies with a significant improvement rate.The linear fit (which is curved when represented in log scale) is K = 0.02−0.76μ,with R 2 = 0.87 and a p-value ≈ 0. The log-log fit is K = 0.51(−μ) 0.72 , with R 2 = 0.73 and p-value ≈ 0. Technologies with a faster rate of improvement also have higher uncertainty in their improvement.

Figure 6 :
Figure6: Growth of the mean squared normalized forecast error Ξ(τ ) for forecasts on real data, compared to predictions.The empirical value of the normalised error Ξ(τ ) is shown by black dots.The dashed line is the prediction of the random walk with drift according to equation (9).The solid line is based on the expected errors for the IMA(1,1) model using a value of θ = θ * m = 0.63 that best fits the empirically observed errors.The dotted line is based on θ = θ * w = 0.25, which is a sample mean weighted by the τ specific number of forecasts of each technology.The shaded error corresponds to the 95% error quantile using θ * w .See appendix D for details on the estimation of θ.

Figure 7 :
Figure7: Cumulative distribution of empirical rescaled normalized forecast errors at different forecast horizons τ .The forecast errors for each technology i are collapsed using Eq.16 with θ = θ * w = 0.25.This is done for each forecast horizon τ = 1, 2, . . ., 20 as indicated in the legend.The green thick curve is the theoretical prediction.The positive and negative errors are plotted separately.For the positive errors we compute the number of errors greater than a given value X and divide by the total number of errors to estimate the cumulative probability and plot in semi-log scale.For the negative errors we do the same except that we take the absolute value of the error and plot against −X.

Figure 8 :
Figure8: Cumulative distribution of empirical rescaled normalized forecast errors with all τ pooled together for both the autocorrelation IMA(1,1) model and the random walk.The empirical distribution when the normalization is based on a random walk (Eq.10) is denoted by a dashed line and when based on an IMA(1,1) model (Eq.16) is denoted by a solid line.See the caption of Figure7for a description of how the cumulative distributions are computed and plotted.

Figure 9 :
Figure 9: Mean squared normalized forecast error Ξ as a function of the forecast horizon τ for different sizes of the learning window m.

Figure 11 :
Figure 11: Probability that photovoltaic manufacture becomes cheaper than a hypothetical competitor which starts three times cheaper and does not improve.The curves show Eq.19 using μ = −0.10,K = 0.15, m = 33 for solar PV and three different values of the noise parameter K C for the alternative technology.The crossing point is at τ ≈ 11 (2024) in all three cases.

Figure 12 :
Figure 12: Global energy consumption due to each of the major sources from BP Statistical Review of World Energy (BP 2014).Under a projection for solar energy obtained by fitting to the historical data the target of 20% of global primary energy is achieved in 2027.

Figure 13 :
Figure13: Error growth for large simulations of a IMA(1,1) process, to check formula 15.Simulations are done using 5000 time series of 100 periods, all with with µ = 0.04, K = 0.05, θ = 0.6.The insets show the distribution of forecast errors, as in Figure7, for m = 5, 40

Figure 14 :
Figure 14: Empirical mean squared normalized forecast errors as a function of the size of learning window, for different forecast horizons.The dots are the empirical errors, and the plain lines are those expected if the true model was an IMA(1,1) with θ = θ * w = 0.25.

Figure 15 :
Figure 15: Robustness to dataset selection.Mean squared normalized forecast errors as a function of τ , when using only half of the technologies (26 out 53), chosen at random.The 95% confidence intervals, shown as dashed lines, are for the mean squared normalized forecast errors when we randomly select 26 technologies.

Figure 17 :Figure 18 :
Figure 17: Effect of fat tail innovations on error growth.The figure shows the growth of the mean squared normalized forecast errors for four models, showing that introducing fat tail innovations in a random walk with drift (RWD) mostly increases errors only at short horizons.
Figure18(left) are based on 2000 artificial data sets for each value of θ.The value at which |Z − 1| is minimum is at θ * m = 0.63.We also tried to make forecasts using the IMA model to check that forecasts are improved: which value of θ allows the IMA model to produce better forecasts?We apply the IMA(1,1) model with different values of θ to make forecasts (with the usual estimate of the drift term μ) and study the normalized error as a function of θ.We record the mean squared normalized error and repeat this exercise for a range of values of θ.The results for horizons 1,2, and 10 are reported in Figure19 (left).This shows that the best value of θ depends on the time horizon τ .The curve shows the mean squared normalized forecast error at a given forecast horizon as a function of the value of θ assumed to make the forecasts.The vertical lines show the minima, at 0.26, 0.40, and 0.66.To make the curves fit on the plot, given that the mean squared normalized forecast error increases with τ , the values are normalized by the mean squared normalized forecast error for θ = 0, that is, obtained when assuming that the true process is a random walk with drift.We also see that as the forecast horizon increases the improvement from taking the autocorrelation into account decreases (Fig.19, right), as expected theoretically from an IMA process.

FrequencyFigure 20 :
Figure 20: deviations from the Student distribution.The histograms show the sampling distribution of a given statistic, and the thick black line shows the empirical value on real data.The simulations use θ = 0.25 (3 upper panels) and θ = 0.63 (3 lower panels) Table 1 reports the p-values for the one sided t-test and the bottom of the table shows the technologies that are excluded as a result.Table