Inventory – forecasting: Mind the gap

their classiﬁcation with regard to integration. We show that the development from one level to another is in many cases chronological in order, but also associated with speciﬁc schools of thought. We also argue that although movement from one level to another adds realism, it also adds complexity in terms of actual implementations, and thus a trade-off exists. The article makes a contribution into an area that has always been fragmented despite the importance of bringing the forecasting and inventory communities together to solve problems of common interest. We close with an indicative agenda for further research and a call for more theoretical contributions, but also more work that would help to expand the empirical knowledge base in this area. © 2021 The Author(s). Published by Elsevier B


a b s t r a c t
We are concerned with the interaction and integration between demand forecasting and inventory control, in the context of supply chain operations. The majority of the literature is fragmented. Forecasting research more often than not assumes forecasting to be an end in itself, disregarding any subsequent stages of computation that are needed to transform forecasts into replenishment decisions. Conversely, most contributions in inventory theory assume that demand (and its parameters) are known, in effect disregarding any preceding stages of computation. Explicit recognition of these shortcomings is an important step towards more realistic theoretical developments, but still not particularly helpful unless they are somehow addressed. Even then, forecasts often constitute exogenous variables that serially feed into a stock control model. Finally, there is a small but growing stream of research that is explicitly built around jointly tackling the inventory forecasting question.
We introduce a framework to define four levels of integration: from disregarding, to acknowledging, to partly addressing, to fully understanding the interactions. Focusing on the last two, we conduct a structured review of relevant (integrated) academic contributions in the area of forecasting and inventory control and argue for their classification with regard to integration. We show that the development from one level to another is in many cases chronological in order, but also associated with specific schools of thought. We also argue that although movement from one level to another adds realism, it also adds complexity in terms of actual implementations, and thus a trade-off exists. The article makes a contribution into an area that has always been fragmented despite the importance of bringing the forecasting and inventory communities together to solve problems of common interest. We close with an indicative agenda for further research and a call for more theoretical contributions, but also more work that would help to expand the empirical knowledge base in this area.

Motivation and background
Inventory control is concerned with supporting operational decisions on when and how much to replenish for each of multiple stock keeping units (SKUs), as well as the parts and materials used to make them. These inventories are in place to satisfy customer demand, at a required service level and/or a budget. In situations where demand is dependent (such as for parts and components in higher than 0 levels in the bill of materials), controlling the inventories boils down to a scheduling exercise through materials re-quirement planning (MRP) procedures. If demand is independent, but somehow is known in advance, models that in their most basic version minimize the sum of (expected) ordering and inventory carrying costs (such as Harris', 1913 , EOQ model or Wagner andWhitin's, 1958 , model), and that also take account of constraints that have to be satisfied in a given situation (e.g., a minimum service level), are used. However, customer demand is typically independent and unknown at the time stocking and production decisions need to be made, and therefore we need to forecast it.
In this context, we refer to a forecast as the (best possible) genuine expectation of how much demand is going to be for a particular SKU (often with sales as a proxy) 1 . Most often this refers to point, mean demand forecasts, although forecasts of variance or higher moments, other quantiles or indeed the entire lead time demand distribution may be required. For the purposes of this article, we distinguish such forecasts of demand, to be used for SKU inventory management, from forecasts for other functions (e.g., marketing). We use the term "inventory forecasting" then to describe the intersection of these two areas, i.e. integrated literature of forecasting and inventory control. These works are manipulating characteristics of demand (or indeed of forecasts), in pursuit of inventory and ultimately supply chain efficiencies.

Background
The first articles on inventory control date back to the early 20th century. The to this day relevant, and aptly named article "How many parts to make at once" by Harris (1913) introduced the EOQ model. Basic EOQ formulations are built on the assumption that demand and its (true) parameters are known and constant (in a sense, a "perfect forecast" is available), with some exceptions (see Glock et al. , 2014 ;Andriolo et al. , 2014 ; for reviews of the EOQ literature). This thinking, that completely bypasses any need to forecast, is reflected also in several seminal inventory textbooks, and is not constrained to EOQ formulations. For example, forecasting is absent in Hadley and Whitin (1963) and Arrow et al. (1958) , but also in more recent textbooks (Zipkin, 20 0 0; Muckstadt and Sapra, 2010 ). Silver et al. (1998Silver et al. ( , 2017 and Axsäter (2015) take a step forward in explicitly recognising the need to forecast and dedicate a chapter to it.
Similarly, classical forecasting textbooks are not contextualised, treating forecasting as an end in itself (e.g., Makridakis et al. , 1998 ;Ord et al. , 2017 ;Hyndman and Athanasopoulos, 2018 ), with forecasting for inventory control being repeatedly reported a neglected area (e.g., Fildes and Beard, 1992 ;Prak et al. , 2017 ). That is to say, they do not take into account what the purpose of the forecasts is (i.e. the forecast utility, be it in budgeting, energy, scheduling, or in our case, inventory control). Broadly speaking, in terms of forecasting performance, the forecasting literature so far has been mostly concerned with achieving gains against some point forecast error metrics -when, most often, forecasting the mean.
The implicit assumption here is that "achievable improvements in accuracy lead directly to worthwhile savings" ( Fildes and Beard, 1992 , pp. 24). An alarming number of works have, however, challenged this otherwise intuitive and common-sense conjecture (e.g., Flores et al. , 1993 ;Eaves and Kingsman, 2004 ;Syntetos et al. , 2010 ;Tratar, 2010 ;Babai et al. , 2019 ;Kourentzes et al. , 2020 ). Forecasting accuracy metrics can take a variety of forms, but especially when constrained to assessing the accuracy of point forecasts can fail to evaluate their impact (utility) to inventory control. As Davydenko and Fildes (2013 , pp. 511) put it, "the key issue when evaluating a forecasting process is the improvements achieved in supply chain performance", viz., the implications of any attained accuracy.
Inventory control performance, on the other hand, has been mostly concerned with attaining inventory efficiencies, often reported through (the trade-off between) inventory-related costs and some service achievement. However, in the majority of cases, this is done while either disregarding forecasting or assuming some idealistic forecast is available (conforming to strict, often unrealistic specifications).
what a customer would want to buy, sales represent what the customers did buy (and therefore lead to censored demand information; Conrad, 1976 ). In a wholesaling, B2B (business to business) or online sales environment, the real demand can be traced and documented (even when not met). In retail situations however, there is often no way to know what the real demand is as companies would tend to only document sales, which are then used as a proxy for demand, sometimes with some interpolation (see, e.g., Lau andLau, 1996 , andTan andKarabati, 2004 , for a review on the estimation of demand distribution based on censored sales data).
Historically, these two parts of the same "inventory forecasting" function have been treated as separate entities. Of course, this is by no means a critique to these works and authors. There is little doubt that they are important contributions to the state-of-knowledge at the time that they were written, and that the research they contain is important and relevant. Conceptually, these works provide the foundations for integration, defining the constituent blocks of inventory forecasting. Understanding the parts (forecasting and inventory control) is required before attempting to look at the whole (integrated inventory control and forecasting).
This isolationist approach is also reflective of the respective communities and conferences. The International Symposium on Inventories (ISIR) introduced an inventory forecasting stream only in 2008. Supply chain (and therein inventory) related streams were not popular in the International Symposium on Forecasting (ISF) until recently. Simple analysis of the International Journal of Production Economics (IJPE), a journal focused on production and operations management (and publishing research from the ISIR), and the International Journal of Forecasting (IJF) published on behalf of the International Institute of Forecasters (organisers of ISF) is telling (see Figure 1 ).

The need for integration
It has been shown that taking this isolationist approach is not always the best course in terms of performance. For example, and quite tellingly, the best demand forecasting method for minimising inventory costs is not necessarily always the one with the best forecast accuracy ( Tratar, 2010 ;Kourentzes et al. , 2020 ). In some instances, potential costly undershoots, caused by problematic inventory control assumptions with regards to the distribution of the forecast errors, are recoupled by positive bias in the forecasts (e.g., Babai et al. , 2014 ). These are irrefutably valid observations, however the inference that forecast bias may improve the system's performance is problematic, and indicative of frail assumptions ( Taylor, 2007 ;Syntetos and Boylan, 2008 ).
Further, even when the forecast accuracy and inventory performance improvements are in the same direction, they may be of different magnitude; Syntetos et al. (2010) found a 1% reduction in forecast accuracy to translate into a 10-15% reduction in inventory costs for comparable service levels, casting further doubt as to what extent accuracy measures may help explain inventory performance. The assumption then that forecast accuracy gains will translate into inventory gains does not appear to universally hold and seems contingent to the validity of further assumptions in subsequent inventory control calculations.
Simulations, and in particular ones with empirical data, have been extremely helpful in revealing the shortcomings of common-place inventory-related theoretical assumptions (e.g., Bretschneider, 1986 ;Eppen and Martin, 1988 ). This is also supported a fortiori by the fact that there is no consensus (yet) on ways to select a forecast, a persistent issue within the forecasting community ( Gardner, 1985( Gardner, , 2006De Gooijer and Hyndman, 2006 ;Kolassa, 2016 ).
What the above highlight is our limited understanding of the interrelations between forecasting and inventory control, in particular when operating far away from theoretical assumptions. Critically, it also highlights a frequent elusion to accommodate the fact that demand is actually forecasted rather than known ( Prak et al. , 2017 ). Even when the theoretical assumptions might be reasonable, the frequency at which judgemental interventions occur in practice  ) either on forecasts ( Trapero et al. , 2011 ) or directly on inventory quantities of interest (e.g., re-order levels and  (31) articles contain the words 'inventor * ' and 'forecast * ', out of 1952 (1735) containing just the word 'inventor * ' ('forecast * '), about 10% (2%). Notice the scale difference between the primary (left, area) and secondary (right, bar chart) axes in both figures. Source: Scopus, search in titles, abstracts, and keywords, up to 2020. order quantities; Syntetos et al. , 2016 ) might cast doubt on their robustness 2 . This is not a critique; it is rather a reminder of how open this area is to contributions. However, this is central to the argument for integration: there are intricate inventory forecasting problems that require to be approached as a system, taking into account that inventory decisions should be/are 3 informed by forecasts. Control theory lends a nice structure of analysis (see Mason-Jones and Towill, 1998 ): In inventory management, we attempt to control demand uncertainty to efficiently meet customer demand. To do so, we introduce 'control mechanisms' in terms of forecasting procedures (feedforward control) and inventory policies (feedback control) ( Towill, 1982 ). When the interactions of these 'control mechanisms' are not carefully explored, often unwarranted 'control uncertainty' is introduced (see, e.g., Goltsos et al. , 2019a ).
One example is the bullwhip effect in a supply chain context, where (as we move further away from the customer in a supply chain) inventory oscillations become increasing multiples of end demand oscillations, inflating inventory costs (see, e.g., Li et al. , 2014 ). Another example is the 'issue point bias' in an intermittent demand 4 context, where a non-frequent demand occurrence drops the inventory level as it also inflates inappropriate forecasts of demand, triggering inflated orders that lead to overstocking ( Croston, 1972 ).
We are of course not the first to point out the need to jointly approach the interrelated functions of forecasting and inventory control. For forecasting integration means that at a minimum, demand parameters used for inventory control need to be appropriately estimated and updated ( Eppen and Martin, 1988 ), and that forecasting performance needs to be judged through inventory performance ( Gardner, 1990 ). It is to use demand information to calculate (forecast) inventory quantities of interest, to take a holistic view to inventory forecasting solutions for a joint end goal. Watson (1987 , pp. 82) observed that the use of an "elaborate reordering formula is rather pointless when demand-forecast fluctuations may 2 Robustness here and later refers to the 'sufficiently good' performance of the policy under varying demand and product characteristics e.g., Arrow et al., 1958 ;Bijvank et al., 2014 ; for the backorder and lost sales cases of an order-up-to policy, respectively). 3 The be/is differentiation is important. Demand needs to be forecasted, and yet when demand is forecasted, the implications of whatever forecasting method is employed still need to be carefully considered (with relation to assumptions of the inventory model, see, e.g. Hsieh et al., 2020 ). 4 Infrequent positive demand interspersed with periods of zero demand. For the purposes of this work we also interchangeably discuss these demand patterns as 'slow'. A thorough overview of intermittent demand forecasting and inventory control can be found in Boylan and Syntetos (2021) . be large", meaning that the elaboration should be expended on the demand forecast side. Gardner (pp. 498, 1990) noted that the forecasting aim should be "to improve customer service and reduce inventory investment", meaning that forecasts should be judged by bottom-line inventory performance. A keynote speech in the practitioner stream of the International Symposium on Forecasting (ISF) in 2016 ( Syntetos, 2016 ) was one vocal example of an increasing number of recent calls for more integrated approaches.

Summary and outline
This paper is in response to these calls for integration, an effort to understand and support integrated inventory forecasting. We attempt to consolidate relevant arguments, to serve as a single point of reference for researchers to address issues of mutual interest in the two communities. In order to do that we need to qualify the meaning of integration before we attempt to explore it. We explore what does and what does not constitute integration between the fields of forecasting and inventory control. At the same time, it is not our aim to criticise, nor do we imply any relationship between an article's integration 'level' and the quality of research within.
The following notation is used. The letters I and F denote the focus of the paper, being inventory control and/or forecasting, respectively. The numbers zero to three indicate the level of integration. For papers whose focus is on inventory control, integration relates to the extent the work considers the fact that demand needs to be/is forecasted. For papers whose focus is on forecasting, integration relates to the extent the work actually considers that the ultimate goal is to achieve some inventory-related bottom line performance improvement (most commonly some service level/cost balance). Letters IF indicate integrated inventory forecasting literature. Figure 2 summarizes the framework: • Level 0: Inventory application with no mention of forecasting (I0) or the inverse (F0) • Level 1: Inventory application with discussion of forecasting (I1) or the inverse (F1) • Level 2: (Serial) application of forecasting and inventory control (I2 or F2 according to focus) • Level 3: Integrated development of inventory forecasting research (IF3) We strive to adopt an objective position and avoid personal methodological preferences. We take a wide-lens snapshot of the literature in an attempt to qualify and quantify the degrees of integration between forecasting and inventory control. To do so, we perform a wide structured literature survey. To identify the keyword sets and establish our final sample, we construct and analyse a pre-sample of papers as well as contact various experts in the respective fields for suggestions.
There exists a very high degree of isolation between the two disciplines. When compared to the great body of research papers that constitute the forecasting and inventory control literatures, only a small fraction of papers are integrated (levels 2 and 3). This is illustrated in Section 2 using our keyword sets, but we note that this acts as a motivation rather than a hypothesis we set out to prove. We find that the first integrated approaches started appearing in the 1970s and have grown to about ten papers per year recently. We identify tracks of literature where integration is more prevalent, and report on various modelling decisions in this area.
The remainder of our paper is organised as follows: Section 2 describes the keyword selection process and reports on the survey protocol that was followed. This should facilitate the reproduction of our findings and constitute a starting point for further investigations in this area. Section 2 details the integration classification framework and the sample's classification flowchart. The classification framework is applied on our sample, and summary results are presented in Section 3 . Main areas of interest for integration identified in Section 3 are isolated and further explored and discussed in Section 4 . Section 5 summarises and attempts to recast the argument for integration, and discusses when it is warranted alongside promising pathways to pursue it.

Classification and review protocol
In this section we overview the classification process, the selection of our keyword sets and the compilation of our sample. It is not the aim of this paper to discuss the forecasting and inventory theory literature in its entirety. We are interested in the interaction and integration of forecasting and stock control. To this end, we have constructed our keywords sets to accordingly try and exclude inventory control articles that do not include forecasting elements and vice versa. An intentional and direct outcome of this is that non-integrated articles are underrepresented in our sample. All searches were conducted in Scopus, due to the breadth of databases it has access to.

Integration framework and classification
We introduce a classification framework, categorising papers by their level of integration and focus. It consists of four integration levels, alongside a designation for its focus (inventory or forecasting literature). The framework matured over a long period and has been presented and discussed numerous times in conferences and other events 5 . Feedback from the community and internal discussions led to numerous improvements and reclassifications. We are grateful to both the forecasting and inventory control communities for their valuable feedback that has led to this final version of our integration framework.
On one end (level 0), we find inventory applications disconnected from forecasting or the inverse. Inventory research here adopts convenient demand assumptions (cancelling the need to forecast, e.g., Zipkin, 20 0 0; Muckstadt and Sapra, 2010 ) and forecasting research is positioned with forecasting being an end in itself (e.g., Makridakis et al. , 1998 ). At level 1 lies literature that recognises the existence of the other field but does not engage with it. For example, inventory research will note the need to forecast demand parameters (but do not), see, e.g., Waters (2008) ; forecasting research will touch on inventory implications but will not explore forecasts' utility, see, e.g., Ord and Fildes (2012) and Ord et al. (2017) . This body of literature departs from level 0 by recognising either the need to forecast demand parameters, or the fact that the forecasts are ultimately going to be used for inventory control.
Level 2 describes literature that takes the first steps towards integration. For forecasting, integration begins when forecasts are judged on bottom line inventory considerations and metrics (i.e., considered a means to an end). For inventory control, it begins when one uses forecasts of demand (either, e.g., type of distribution and its moments) as opposed to assuming demand is known. At level 2, we consider the 'serial' application of forecasting and inventory control (e.g., Fildes and Beard, 1992 ;Willemain et al. , 1994 ;Eaves and Kingsman, 2004 ;Syntetos and Boylan, 2006 ). Regardless of where the particular focus lies, they tend to estimate demand parameters and then employ inventory control policies, recognising the need for integration.
At level 3 we consider integrated deliberation of inventory forecasting problems. Here, demand characteristics are forecasted and used to in-parallel understand and/or educate the inventory forecasting model or our understanding of it. Authors grapple underlying attributes affecting the system, such as correlation in demand (e.g., Lagodimos et al. , 1995 ;Graves, 1999 ) or forecasts (e.g., Johnston and Harrison, 1986 ;Prak et al. , 2017 ) and/or evaluate their model based on an overarching metric of inventory performance. That is to say, any selection or decisions on both the forecasting method and the inventory policy are based on the total performance of the system rather than the performance of its constituents. (This does not subtract from the importance of measuring forecast accuracy, as it will always be relevant for tracking and monitoring the performance of a system.) Beyond the integration levels (and to also help us decide on it), further information has been extracted from each paper: The forecasting methods and inventory policies employed, forecasting and inventory performance metrics, methodological and contextual information. The process of individual paper evaluation is graphically illustrated in a flowchart ( Figure 3 , this being also the flowchart 'explosion' of the process "paper classification" in the review protocol depicted in Figure 5 of the next Section 2.2 ):

Keyword selection and search
Our keyword sets were informed in three ways: i) initial keyword selection and search, ii) (key)word analysis on the result- ing paper (pre-)sample, and iii) expert consultation. Details of this process can be found in Appendix A . The final keywords used are shown in Figure 4 .
This keyword string provides our initial sample of 880 papers, as returned by Scopus. Scopus employs automatic indexing and does not provide a way to ignore the computer-generated keywords it associates with each paper 6 . There is a risk associated with automatic indexing, in potential automatic misclassification of articles (also see Section 2.3 ). After manually excluding papers that ended up in the initial sample by merit of the automatic indexing alone, 502 papers remained, which were all read and classified (or manually excluded; see Figure 5 for the review protocol).

A note on keyword string database searches
Before we move on to discuss the integration framework, some important notes are warranted with regards to keyword-based database searches. Considerable effort has been expended to construct a search string of keywords, to focus the sample on the matter investigated, while trying to not omit relevant parts of the literature. However, no string is perfect, nor everything always works as intended with database searches. Both of these factors have consequences in the constancy of the returned sample.
Irrelevant (erroneous) article inclusions need to be kept to a minimum, to reduce the size of the sample and manual intervention (that will be expended to manually exclude them later). Relevant article exclusions (false exclusions) should be minimized as well, to not miss parts of the literature. These are often competing goals when compiling the search string, an iterative process of trial and error. Attempting to exclude all irrelevant articles will result in missing parts of the relevant literature. Conversely, attempting to not miss a single relevant paper will result in too big a returning sample, with many irrelevant articles. The goal of minimising both erroneous inclusions and exclusions is at times competing, and a delicate balance needs to be reached through compromise.
By merit of looking for papers that concurrently deal with forecasting and inventory control, works that discuss themselves in Fig. 4. The search string. We review articles that have forecasting, inventory or stock control, and at least one from each of the context-specific keyword sets, in their titles, abstracts or author-selected keywords. terms other than forecasting and inventory/stock control might be erroneous exclusions. This includes top-level, methodological (often older) approaches from fields such as statistics. We attempt to right this in Section 4 , where we partly depart from the confines of the sample to also include other relevant papers that might have been missed. It is important to note here that we do not attempt to compile (or claim to present) an exhaustive list of all the integrated articles ever written. We attempt to capture an objective, reproducible snapshot of the literature (so it could be repeated in the future and see whether things have changed), and explore which streams have shown promising venues toward integration.

Sample analysis
Following the doctrine described in Section 2 , we read every article and decided on whether it should be included, its integration level and further information. Here we present the results of this process. We close the section with an attempt at synthesis of the results, which we use as a springboard for further discussion in the subsequent sections.

Integration levels classification
Out of 270 articles reviewed, five articles were found to be level 0 and 53 articles were found to be level 1. Since we are interested in the integrated literature, these articles are removed from the final sample as wrongful inclusions. The subsequent analysis is constrained to the integrated literature (212 articles) of levels 2 (118 articles) and 3 (94 articles) (see Figure 6 ).

Integration over time
The first papers in this area started appearing in the beginning of the 1970s. In Figure 7 , we plot all integrated articles against time, biannually. We can see there has been a sustained increase in the total number of papers published annually since then. We note a step-change in the late 1980s and early 20 0 0s. The area really started growing in the late 1980s but has plateaued at about 10 articles per year since its peak at 2010 (which was mainly driven by a special issue in the area -s e e next section). One interpretation could be that since that special issue, some authors preferred to focus on one of the two sides.
In Figure 8 , we overlay areas of the relative growth for each of levels 2 and 3 (left) and their percentage split (right), between integration level 2 and 3 papers over the last 20 years (again biannually). We can see that the "mixture" of papers is changing towards more integrated approaches over time. Altogether, we see a slight relative increase in the number of level 3 papers.  Integration level 2 identifies the focus of the paper lying either in forecasting (F2, 49) or inventory control (I2, 69). This slight prevalence of focus in inventory hints that most of the integrated literature focus lies in inventory.

Journal titles
Where is this research published? In Figure 9 , we plot the level of integration against the journal of publication. Understandably, different journals publish different numbers of issues each year, and therefore there are journals that publish many more articles than others. This of course affects these results. For example, the International Journal of Production Economics (IJPE) published 230 articles per year, almost four times more than the International Journal of Forecasting (IJF) at 63 articles per year, but almost half than the European Journal of Operational Research (EJOR) at 414 articles per year, on average 7 . Taking this into account and with some exceptions, integrated literature is quite spread out.
In Figure 10 , we focus on the three most represented journals of our sample. In IJPE, we can see that after the first articles appeared in 1992-1995, there is an increase in the number of published inventory forecasting literature -in particular, after 2008 (introduction of the inventory forecasting stream in ISIR). A peak appears in 2010 with the publication of a special issue entitled "Supply Chain Forecasting Systems", which seems to have supported a further growth since. In contrast, integrated research in the IJF seems constant over time, while EJOR exhibits a modest growth which is perhaps corrected downwards over the last 10 years.

Forecasting methods
In Figure 11 , left, we can see the forecasting methods and approaches of the integration literature, across levels 2 and 3. Ex-ponential smoothing is the most popular approach to forecasting in our sample, with simple moving averages, Croston-like methods (also exponential smoothing-based) and ARIMA following. The use of exponential smoothing and simple moving average methods' (see Hyndman and Athanasopoulos, 2018 ) wide adoption can be partially explained by the fact that they are often used as benchmarks (sometimes concurrently, e.g., Eaves and Kingsman, 2004 ). On the right, we see the relative frequency for a method to be applied in level 2 and level 3 literature where a method is more probable to appear.
Entire distribution-based methods (e.g., Kolassa, 2016 ) seem to be well suited for integrated approaches. The same applies for direct quantile estimation. Taylor (2007) proposes an exponentially weighted quantile regression as a treatment to highly volatile and skewed daily time series. Amrani and Khmelnitsky (2017) estimate quantiles by attributing weights to samples based on their chronological order for non-stationary demand patterns. Cao and Shen (2019) find improvements in quantile estimation, when compared to Holt-Winter's forecasts coupled with a normality assumption of residuals, in two seemingly well-behaved empirical time series.
When distributional assumptions are reasonable, Bayesian methods can offer a way to incorporate unknown demand parameters in an inventory decision model ( Prak and Teunter, 2019 ), as new information becomes available. They can be perceived as the middle ground between assumed known (and unchanged) demand distributions on the one side and data driven distribution-free approaches on the other, producing full predictive distributions. Yelland (2009) use Bayesian state-space formulations to forecast low-count (intermittent) demand. Wang and Mersereau (2017) formulate a Bayesian inventory problem after change-points in demand and provide heuristics to estimate its parameters.
A number of heuristics have been employed in this literature to tackle complex integrated issues, when problems become analytically intractable. We opt to present them under forecasting, as they are used to forecast quantities such as reorder point/order-up-to levels (e.g., Power approximation, Naddor's Heuristic, Normal ap-  proximation in, e.g., Sani and Kingsman, 1997 ;Babai et al. , 2010 ). They include evolutionary algorithms such as the ant colony optimisation (e.g., Su and Wong, 2008 ) the hybrid artificial bee colonychaos algorithm (e.g., Tang et al. , 2020 ), or approximate Bayesian estimator smoothing heuristics (e.g., Karmarkar, 1994 ), among others.
Machine learning (ML) approaches have gained popularity in recent years (12 out of 19 such publications in our sample are from the last five years) in bypassing finding the distribution of demand to integrate parameter estimation and inventory optimization. One of the things ML allows is the incorporation of 'features' or covariate information about the products. Oroojlooyjadid et al. (2020) ap-ply data-driven deep learning to a newsvendor formulation with multiple features.  propose another data-driven approach with covariate information (e.g., price, colour, etc.), the 'residual tree method' (an extension of the scenario tree method), a combined forecasting and optimization algorithm to choose order quantities. Cao and Shen (2019) employ a 'double parallel feedforward network-based quantile forecasting' neural network to directly estimate quantiles for a newsvendor formulation, also for new items.
The 'other' category includes forecasts based on diffusion models (e.g., Ho et al. , 2002 ), Markov chain formulations (e.g., Cervellera and Macciò, 2011 ), failure rate calculations (e.g., Ghodrati and Kumar, 2005 ), judgement (e.g., Syntetos et al. , 2009 ), and robust optimisation approaches (e.g., Kim and Chung, 2017 ), among others. Please note that these are not strictly forecasting methods, but rather describe procedures and formulations that are used either to produce forecasts or to estimate inventory parameters. Of these, robust optimisation seems to be very close to the subject of integration. 'Robust' refers to robustness against distributional assumptions, meaning that the methodology provides distribution-agnostic interval forecasts.

Forecasting performance metrics
When it comes to measuring the performance of the forecasts, we can see that (mostly mean) squared errors and (mostly mean) absolute errors are the two most popular error metrics ( Figure 12 ). Both of these metrics have links with inventory control in terms of the calculation of safety stocks. It is noteworthy that the vast majority of metrics employed emphasizes point forecasts of the mean demand, with squared errors as proxies for the demand variance (that would in most cases define safety stocks).  11. On the left, forecasting methods employed by the literature for integration levels 2 and 3. Prevalence of exponential smoothing, moving average and Croston-like methods. On the right, (normalised, relative) preference of method across the integration levels. Distribution-based methods (bootstrapping, prediction intervals), Bayesian and state-space formulations closer to level 3. Multiple entries allowed per paper. While there are developments in point forecast accuracy estimators (e.g., Petropoulos and Kourentzes, 2015 ), there is a need to develop methods that can judge the accuracy of the entire lead time demand distribution or at least percentiles of interest ( Kolassa, 2016 ). Such methods include the (discrete or continuous) ranked probability score and probability integral transforms ( Yelland, 2009 ;Kolassa, 2016 ). The category 'other' includes relative errors (e.g., Willemain et al. , 1994 ), tracking signals (e.g., Tiacci and Saetta, 2009 ), and times best (e.g., Chatfield and Hayya, 2007 ), among other forecasting performance measures.

Inventory policy
When it comes to inventory policies, there is a clear dominance of the simple yet robust periodic order-up-to (T,S) policy (113 articles), with the continuous re-order point, order quantity (r,Q) policy following (29 articles) (see Figure 13 ). The (T,S) policy in this literature, refers to fixed periods (T), and therefore T does not need to be optimised. When one is only seeking to optimise the orderup-to level S, the problem degenerates to a solution of a simple newsvendor problem which essentially provides a target service level to be achieved ( Eppen and Schrage, 1981 ). The large number of papers working with the (T,S) policy is easily explained by taking into account the simplicity of the model, both to analyse and simulate, offering to mitigate some of the inherent complexity of the integrated approaches. It is also interesting to note that when optimised in all variables, the (T,S) policy performs near optimally compared with all other periodic review policies (see Lagodimos et al. , 2012 , for an in-depth discussion and de Kok, 2018 , who recently reached the same conclusion with a different approach). On the other hand, the (r, Q) policy is quite more convoluted (see Zheng, 1992 , for its basic analysis under simple stochastic demand assumptions).
The prevalence of periodic order-up-to policies in the integrated field may be partially attributed to the fact that a) in reality almost every inventory system is periodic, as forecasts are also almost always periodic in nature (be it in hours, days, weeks, months), b) the move from the continuous to the periodic domain is a move towards more realistic (integrated) demand representations (e.g., Lagodimos et al. , 2018 ), and c) the robustness of the order-up-to policy.
We note the contributions from the (almost by definition integrated) control theoretic approach Automatic Pipeline Inventory and Order Based Production Control System APIOBPCS ( John et al. , 1994, see Lin et al. , 2017 for a recent review of its applications). APIOBPCS is a feedback-based block diagram framework consisting of forecasting, work in progress, inventory, and production lead time policies (controllers). Appropriate parameters are selected among the policies with the "competing objectives of 1) rapid inventory recovery and 2) attenuation of the unknown demand fluctuation […] in an effort to understand supply chain dynamics more completely" ( Lin et al. , 2017 , pp. 136-137). Clearly, this is in line with the goals of integrated inventory forecasting.

Inventory performance metrics
Most works in our sample (see Figure 14 ) measure inventory performance through a (minimized) cost function (including, in most cases, inventory holding and backorder costs) or customer service level achieved. (Maximized) profit functions (e.g., Johnston et al. , 2011 ) and fill rate service level representations (e.g., Heath and Jackson, 1994 ) are less popular alternatives. Cycle (or customer) service level (service level α) is very easy to find and com-pute since it corresponds to a simple probability expression, while the fill rate corresponds to a more complicated form (see Silver et al. , 2017, Schneider, 1981 , for an in-depth exposition of this service level measures; and Diks et al. , 1996 in a multi-echelon context). Some articles avoid the cost/profit representation by measuring directly average inventory, backorder volumes, or stockouts in terms of units (e.g., Babai et al. , 2014 ).
Various inventory-related variance metrics (e.g., bullwhip effect, Dejonckheere et al. , 2002 , or net stock amplification, e.g., Jaipuria and Mahapatra, 2014 ), closely associated with, but not constrained to, system dynamics or control theoretic approaches (e.g., APIOBPCS) can be considered inductive to integration, in as much as they benchmark inventory performance against demandor forecast-based characteristics.

Further information
There are 123 papers that include theory development of some/any kind, while 139 papers employ simulation, see Figure  15 , left. Simulation has played an integral role in revealing inefficiencies of the traditional assumptions and approaches when applied in combination with real data (see Cattani et al. , 2011 ), and in providing arguments for integration (e.g., Eppen and Martin, 1988 ). At level 3, the split between these two categories is almost equal, whereas at level 2, we find a higher proportion of papers employing simulation (58%). Out of the 139 papers that employ simulation, there are 90 papers that do so using empirical data (e.g., Eaves and Kingsman, 2004 , that use aircraft spare part monthly time series; or parameters taken from the real world), and 54 using theoretical data (hypothesised, e.g., drawing from a normal distribution, e.g., Zhao and Leung, 2002 ). In total, 80 papers employed empirical data and 10 0 papers employed theoretical data ( Figure  15 , middle).
There is another vector of integration that takes the perspective of the entire supply chain, trying to investigate or optimise key performance indicators across it ( Figure 15 , right, "multiple nodes"). 43 (65% of which at level 2) articles in our sample did so, concerning themselves with more than one member (or node) of a supply chain (see de Kok et al. , 2018 for a recent review of the area). Intermittent demands refer (as mentioned) to patterns where periods of positive demands are scarce, interspersed around successive periods of no demand. Our sample contains 63 papers dealing with such "slow" moving items, almost evenly split be-  tween levels 2 and 3 ( Figure 15 , right, "Slow"). This high (proportionally to the sample size) number of papers highlight the role intermittent demand research has played in exercising and promoting inventory forecasting integration (see Boylan and Syntetos, 2021 ).
Finally, three articles considered closed loop supply chains (see Goltsos et al. 2019b for a recent review of the area). While the sample has missed a few others (e.g., Toktay et al. , 20 0 0 ), closed loop inventory forecasting is an interesting niche open to contributions. It has been noted that while returns forecasting more often than not considers forecasting (returns) and inventory control jointly, as the focus lies on returns, literature in the area has almost invariably assumed known demand ( Goltsos et al. , 2019b ).

Summary and synthesis
We find that the most common combination of forecasting procedures with inventory policies is that of exponential smoothing and moving average with order-up-to policies. The mean forecasts are then coupled with variance estimations and assumed distributions of errors to compute percentiles of interest. The simplicity of these constituent formulations facilitates the discussion of the inherently more complex integrated inventory forecasting question. Often, but perhaps not often enough, these combinations form the basis of rigorous benchmarking of more complex proposed approaches.
We note that simulation has played a very important role in the development of arguments in the integrated literature (as it has in supply chain management in general, see, e.g., Fagundes et al. , 2020 ). Firstly, it has exposed the mismatch of various theoretical assumptions with the reality faced by practitioners, especially when coupled with empirical data sets coming from industry. Secondly, when properly employed, simulation can show the practical impact of proposed methodologies when compared to simpler benchmarks.
A number of promising approaches and areas emerge from our sample: quantile estimation, robust optimisation, bootstrapping, Bayesian inventory control, data driven and machine learning and importantly the performance measurement of quantile and density forecasts. We expand on these streams in the following section.

Discussion
Based on the review of our sample in Section 3 , we select a few promising areas for integration and attempt a more detailed discussion. Some of the papers presented below are from the sample, but, in general, papers presented here are not constrained to it (especially when it comes to historic methodological developments that for reasons discussed in section 2 do not appear in our sample). Some come from forward and backward 'snowball' searches, complemented by independent mini reviews of the relevant streams.

Simple but fundamental interventions
Before getting into such streams however, it is important to note that integration and improvements can also originate from simple (yet fundamental) interventions. The seminal intervention to forecasting for intermittent demand came from Croston's (1972) appreciation of the 'issue point bias', and his proposed solution to it: forecast (smooth) demand size and length of interdemand intervals independently (and update only after a demand occurrence). Syntetos and Boylan (2005) quantified and approximately corrected a positive bias in Croston's method. Further elaboration in the area and the problem of obsolescence in particular has prompted further important interventions (e.g., updating the demand occurrence every period in Teunter et al. , 2011 ; see also different approaches by Prestwich et al. , 2014 andBabai et al. , 2019 ).
Another example of integrated thinking providing important interventions in the area comes from the realisation that demand parameters cannot be directly substituted by forecasted ones (as is common). Prak et al. (2017) show that for mean-stationary normally distributed demand, forecast errors are auto-correlated and thus safety stocks need to be inflated to avoid understocking. Prak and Teunter (2019) provide a Bayesian framework to incorporate this parameter estimation uncertainty, one that can be applied to any inventory model, demand distribution and parameter estimator. Prak et al. (2021) provide a closed form solution to calculate compound Poisson demand parameters, robust to the compounding distribution misspecification. They find improvements in the finite-sample bias and achieved fill rates in a continuous review order-up-to system against literature suggested method-ofmoments and maximum likelihood formulations.
These select examples show there is certainly tremendous scope in uniting inventory and forecasting by a better conceptual understanding of their interaction, and identification of relevant opportunities for improvement. While more recent data driven and machine learning approaches, covered in the following subsections and elsewhere, increasingly find applications to inventory forecasting problems, we have not yet exhausted the simple interventions that offer direct solutions but also help speed up computations. A further important benefit brought through such (often closed form) solutions to fundamental issues of inventory forecasting is their ease of communication, brought by their transparency (white-box nature -which also shortens potential innovation-adoption gaps).

Quantile estimation
The general aim of integrated inventory forecasting is to derive optimal inventory parameters without resorting to dubious distributional assumptions (an idea tracing back to at least Iyer and Schrage, 1992 , for the deterministic (s, S) system). A number of papers have bypassed the problematic normality (or other distributional) assumptions by directly forecasting the quantile of interest. This stream of literature started with (was inspired by) Koenker and Basset's (1978) work on (linear) quantile regression (extended to the autoregressive case by Koenker and Xiao, 2006 ). Trapero et al. (2019a) employ kernel density estimation (KDE -Silverman, 1986 ) and (generalised) auto regressive conditional heteroskedasticity (ARCH - Engle, 1982 ;GARCH -Bollerslev, 1986 ) to forecast quantiles for directly computing safety stock levels. They find improvements in terms of inventory performance, via KDE in shorter lead times when the normality assumption is most suspect, and via GARCH in longer lead times where conditional heteroscedasticity becomes dominant. In a subsequent work, they find further improvement when combining such quantile forecasts ( Trapero et al. , 2019b ). Taylor (2007) and Cao and Shen (2019) find significant accuracy improvements in estimating quantiles of interest, when compared to more traditional approaches (normality assumption centred on simple exponential smoothing and Holt-Winter's point forecasts, respectively), which they traced back to the unsuitability of the traditional distributional assumptions.
Of course, another approach is to forecast the entire distribution and then to extract desired quantiles of interest ( Fildes et al. , 2019 ; see, e.g., Gneiting 2011b ; Kolassa, 2016 ;Sillanpää and Liesiö, 2018 ). This move from point to probabilistic forecasts is cutting across a number of scientific disciplines (see Gneiting and Katzfuss, 2014 , for a review and further argumentation).

Robust optimisation
Robust optimisation (RO) deals with uncertain variables by only looking at intervals without a need for further distributional information ( Wei et al. , 2011 ). In this sense, it produces results regardless of the true underlying distribution that generates the data. RO was proposed by Soyster (1973) , and it has been criticised, and for years dismissed by researchers, because the robustness originates from assuming worst-case-scenarios for all parameters, (see arguments in Ben-Tal and Nemirovski, 20 0 0 ). To alleviate the overly conservative nature of the results, Mulvey et al. (1995) introduced scenario-based RO, while subsequent research applied RO to linear programming problems with uncertainty sets Nemirovski, 1998 , 1999 , 20  This independence from demand distribution assumptions has made the RO approach conductive to inventory control applications under demand uncertainty. Bertsimas and Thiele (2006) apply RO to an (s, S) inventory setting with backorders, and find evidence that it outperforms dynamic programming formulations. See and Sim (2010) propose a RO approach to address a (T, S) setting facing ARIMA (0,1,1) demand, and find that their formulation performs 'reasonably well' compared to optimal policies, despite using significantly less information. Thorsen and Yao (2017) were the first to also consider uncertain lead times in this setting. Bertsimas et al. (2019) provide an adaptive RO framework to address dynamic problems, providing the ability for it to adapt as new information becomes available. The above works all find improvements over mis-specified optimum policies (the cases where real or realised distributions are different than the assumed or sampled ones).

Bootstrapping
Bootstrapping is a data driven non-parametric method of constructing empirical distributions of demand, and it works by resampling from the historical demands ( Efron, 1979 ). It is a generalisation of the jackknife method ( Quenouille, 1949( Quenouille, , 1956see Miller, 1974 for a review). While we cover more data driven methods in Subsection 4.6 , we dedicate a subsection to bootstrapping for its importance in inventory forecasting. Applications of bootstrapping include Clements and Taylor (2001) for autoregressive models, Snyder et al. (2002) for exponential smoothing models, Rubin (1981) for Bayesian models (sampling from the posterior distribution rather than the observed data). Bertsimas and Sturt (2020) consider deterministic algorithms to calculate exact bootstrap quantities for the sample mean and confidence intervals.
The first application of bootstrapping to inventory control can be traced back to Bookbinder and Lordahl (1989) , who used it to compute the reorder level in an (s, Q) system against a cycle service level criterion. Fricker and Goodhart (20 0 0) apply bootstrapping to the calculation of re-order points for a (s, S) system against a fill rate and other service criteria. Bootstrapping has found relatively wide application to intermittent demand contexts, where the scarcity of positive demand occurrences make parametric approaches harder to implement (e.g., Willemain et al. , 2004 ;Viswanathan and Zhou, 2008 ;van Wingerden et al. , 2014 ;Hasni et al. , 2019b ;see Hasni et al. , 2019a for a recent review on bootstrapping in intermittent demand contexts).
While bootstrapping offers a solution to lacking distributional fits, it does not solve the issue of uncertainties arising from small samples, nor of relevant overfitting issues. If only few observations are available, then bootstrapping draws from only these observations, and does not anticipate on possible future values that are different. Having said this, exceptions such as the jittering process from Willemain et al. (2004) allow expectations of demand sizes not previously observed.

Bayesian analysis
Bayesian analysis provides a formal way to incorporate prior information, at times even before data becomes available to the demand forecasting process. The forecaster can use domain knowledge to select a prior distribution, which is updated through the Bayes theorem into the posterior distribution as more data become available. Applications of Bayesian theory to inventory control were pioneered by Dvoretzky et al. (1952) , Scarf (1959) , and Azoury (1985) .
This approach could be seen as a sound way to utilise classic inventory theory using a nimbler approach that updates the parameters of the distribution as more data becomes available. The dependence on distributional assumptions also makes the Bayesian approach resilient to issues of overfitting that can be generally found in bootstrapping and machine learning approaches. A problem with this approach is that realistic distributions with more than one parameter often lead to intractable solutions, and are often treated with unrealistic assumptions (e.g., Normal distribution with known variance, in Azoury andMiyaoka, 2009 andChen, 2010 ). This assumption is problematic in inventory control as the variance dictates the size of the safety stocks and therefore is quite (very) important ( Prak and Teunter, 2019 ).
Bayesian theory is very useful in machine learning approaches, and widely used in a number of neural network (and other) applications. For example, Boutselis and McNaught (2019) employ a Bayesian Network machine learning approach to a service logistics context. Prak and Teunter (2019) use Bayesian theory to discuss how demand uncertainty should be taken into account in inventory control (when inventory formulations use estimated -r a t h e r than known -parameters of demand). Toktay et al. (20 0 0) and Clottey et al. (2012) use Bayesian updating to incorporate new information on the returns of used products in distributed lag model formulations, as such products returned. In an intermittent demand context, Babai et al. (2021a) propose a compound-Poisson approach, while Ruiz et al. (2021) employ Bayesian degradation modelling, both for spare parts inventory management.

Data driven approaches and machine learning
Beutel and Minner (2012) take a data-driven approach that sets the inventory level as a dependent variable in a linear regression using various explanatory variables of demand (covariates or features). This approach has been adopted by a number of researchers who all report savings over classical inventory control methods (see, e.g., Huang and van Mieghem, 2014 ;Shi et al. , 2016 ;Huber et al. , 2019 ).
Van Steenbergen and Mes (2020) use machine learning to forecast the demand of new products (within 18 weeks of introduction) utilising product characteristics of old comparable products. They find improvements in terms of forecast accuracy and inventory costs against a benchmark consisting of an empirical distribution drawn straight from the test data, and another based on the total demand of the most similar product, multiplied by some fixed coefficient of variation. Huber et al. (2019) use machine learning setups based on linear regression, artificial neural networks (see Hornik, 1991) and gradient-boosted decision trees (see Friedman, 2001 ) to predict daily demand of a German bakery chain (newsvendor problem). They compare forecast accuracy and cost against a well selected number of benchmarks (including various exponential smoothing methods -ES) and find that the machine learning methods outperform in both accuracy and cost, but only when trained on the entire dataset (ES perform best when the methods are trained on one time series at a time). Ban and Rudin (2019) apply machine learning algorithms based on the empirical risk minimisation (ERM) principle and kernel optimisation to a single time series of emergency room nurse staffing levels (approximated as a newsvendor problem). They calculate means and 95% confidence intervals, and benchmark against a number of techniques and report improvements against the 'best practice' (naïve seasonal forecast: average demand per day of week). They demonstrate how to carry out a careful data-driven investigation, conclude that there is no single approach to solving the 'big data' newsvendor problem (newsvendor formulation which is solved with help of explanatory variables), and finally warn about the dangers of overfitting.
A main concern for these methods is the fact that they are a 'black box', which means that there is difficulty in justifying the resulting predictions. When improvements in forecasting accuracy can be convincingly showcased against robust, vigorous benchmarks, this is less of an issue (e.g., Huber et al. , 2019 ). However, this rigorous testing is perhaps not as common as it should be, with machine learning methods often tested in violation of best practice forecasting accuracy testing  ; see Tashman, 20 0 0 , for a review of established guidelines). Spiliotis et al. (2020) and Ma and Fildes (2020) offer exam ples of rigorous testing of new data-driven approaches and show promising results in terms of forecasting accuracy. It would be interesting to see how these improvements translate in inventory savings. Beyond the correct selection of methods to benchmark against, we also see applications on too few SKUs (e.g., Cao and Shen, 2019 ; Ban and Rudin, 2019 ). One reason could be the computational intensity of these methods, although technological advances are making this argument increasingly fragile. Babai et al. (2020) compare the neural network of Gutierrez (2008) and proposed iterations against bootstrapping, simple exponential smoothing and Croston variants, in a dataset of 5135 intermittent demand timeseries. They find that simple exponential smoothing outperforms all other methods both in terms of forecast accuracy and inventory performance.
It has also been noted that ML methods do not readily provide predictive densities . There have been however some adaptations to provide probabilistic forecasts. Wen et al. (2017) and Gasthaus et al. (2019) propose using monotonic regression splines (see Wegman and Wright, 1983 ) optimised by a neural network with a continuous ranked probability score (see Section 4.7 ) objective. Salinas et al. (2020) propose an autoregressive recurrent neural network model for producing probabilistic forecasts. Van Steenbergen and Mes (2020) combine k-means clustering, random forest (see Breiman, 2001 ) and quantile regression forest in a machine learning algorithm to compute quantiles and prediction intervals.

Forecast evaluation
The importance of forecast evaluation cannot be stressed enough, especially when used to select forecasts that are then sequentially informing inventory decisions. Point forecasts are judged on a number of forecasting accuracy metrics (see Section 3.2.2 ), and the lack of consensus on what these metrics should be is well discussed in the literature (see, e.g., Makridakis et al. , 2020 ). A complicating factor is that different point forecasts are optimised at different values depending on the accuracy metric un-der consideration ( Gneiting, 2011a ;Kolassa, 2019). For example, absolute errors are consistent for median point forecasts, while squared errors are consistent for the mean ( Gneiting and Katzfuss, 2014 ).
Feeding forecasts into reasonable inventory models is a good way to gain extra confidence on the adequacy of the inventory forecasting assumptions (e.g., normality of residuals). In effect, what is tested then is the veracity of the distributional assumption of the errors and the accuracy of the quantile extrapolation (e.g., from the point forecast of mean demand to the often cycle service level prescribed quantile of interest). What would perhaps be even better, would be to optimise the forecasts directly on the end goal as it is measured through the relevant inventory policy each time in question ( Tratar, 2010 ;Kourentzes et al. , 2020 ).
A more straightforward way to go about this would be to directly judge the forecasting accuracy of the percentile of interest ( Gneiting, 2011b ). For example, asymmetric piecewise linear loss functions (also known as: the pinball, the linlin, hinge, tick, and newsvendor loss) can be used to compare the relative accuracy of two quantile forecasting models ( Koenker, 2005 ;Gneiting, 2011b ). Very often, inventory costs are compared at a number of target cycle service levels, e.g., 90%, 95%, 99%. Such measures can be used to directly test losses at those quantiles of interest.
A growing number of researchers argue that the focus should move from point forecasts (be it mean or some quantile) to forecast full predictive densities (e.g., Gneiting, 2011b ;Snyder et al. , 2012 ;Kolassa, 2016 ;Fildes et al. , 2019 ;de Kok, 2019 ). Equally important then is to be able to judge the accuracy of these forecasted distributions. But what does accuracy mean in this probabilistic forecasting context?  define it as a combination of calibration (statistical consistency between distributional forecasts and observations) and sharpness (concentration of the predictive distributions). Recently, (proper) scoring rules have been introduced that can rank predictive distributions on both these traits (see Gneiting and Katzfus, 2014 ).
We mention a few scoring rules here for the interested reader, without going into much detail. One such measure is the Continuous Ranked Probability Score (CRPS) ( Brown, 1974 ;Matheson and Winkler, 1976 ;, or its Discrete equivalent (DRPS) ( Epstein, 1969 ;Murphy, 1971 ;Snyder et al. , 2012 ), with the intuitive definition of a pinball loss across all quantile levels ( Gasthaus et al. , 2019 ). Another is the Brier or quadratic score ( Brier, 1950 ). As Boylan and Syntetos (2006) point out however, in most applications of inventory control, we are more interested in particular parts of the predictive distribution (e.g., at 90% + to correspond with the most common target cycle service levels). The scoring rules offer little guidance as to how competing forecasting methods might fair in those particular percentiles (e.g., the overall winning distribution might be overperforming in parts of the distribution we are not interested in and losing in the parts that we are).
The Probability Integral Transform for continuous density forecasts (PIT) ( Rosenblatt, 1952 ) and the randomised PIT for discrete predictive densities (rPIT) (see, e.g., Kolassa, 2016 , and references therein) is another standard way to evaluate distributions. By creating a histogram of PIT values and checking for its uniformity (through goodness-of-fit tests, see, e.g., Inglot and Ledwina, 2006 ), we can see in which percentiles the predictive distributions differ. In other words, having uniform PITs indicates a well-calibrated forecast; however, it does not inform us of its sharpness . To overcome the deficiencies of proper scoring rules and PIT, Kolassa (2016) proposes these measures to be used in conjunction. Another option is to incorporate quantile-weighted CPRS in alignment with the target cycle service levels of interest ( Gneiting and Ranjan, 2011 ).

Conclusion
We have reviewed the literature of integrated inventory control and forecasting. To do so, we consulted experts in both fields to inform our database keyword search but also the integration framework. The framework defined four levels (0 to 3; see Appendix B ). The first two levels describe non-integrated and the last two, which are the focus of our review, integrated approaches. We find the integrated literature started formulating into a stream in the early 1970s. This is logical as integration cannot happen in a vacuum and must be preceded by an in-depth analysis of its constituents (integration levels 0 and 1).
Since then, a growth has been identified, which seems to have plateaued at about 10 articles per year over the last decade. This may be interpreted as a change of focus of authors back to the individual streams, which underlines the importance of this work (as well as others' calls for integration). The analysis of the integrated (levels 2 and 3) sample revealed promising research streams which were followed up with further exploration of the individual streams (including but not constrained to the papers found in our sample). It is worth noting the role of research on slow/intermittent demand which historically approached forecasting and inventory control jointly. The same is also true for research on circular (closed) loops and returns forecasting, which while mostly employing integrated approaches is still an area wide open to contributions.
Historically, forecasters were interested in the performance of (mostly point) forecasts (of mean demand), measured through accuracy metrics serving as proxies for forecasting utility. Inventory controllers, on the other hand, considered forecasts (when they at all did) as an exogenous variable beyond their control or interest, a readily available and to specification input to the inventory control process. The underlying notion is that an expert forecaster would create a 'perfect' forecast (that will accurately describe a true demand distribution as needed), which would then be serially picked up by an expert stockist, to be transformed in as good as possible inventory quantities of interest. Where these forecasts are good (in terms of both accuracy and conformity to inventory control assumptions), literature at levels 0 and 1 (e.g., EOQ formulations) provide optimal inventory decisions.
From the inventory perspective, which for the literature under consideration is in effect the end goal of inventory forecasting, three historical trends have emerged. First was the assumption that every observation is independent and identically distributed, drawn from a real underlying distribution. Using these observations, an estimation of the latter is made, and substituted in its place (and treated as if it was the real distribution). The realisation that demands are more often than not time-correlated led to an adaptation of this process, and to the emergence of a second trend.
Researchers would typically train point forecasts of the mean, assume some distribution of residuals (most often Gaussian with mean of zero), calculate a variance metric (often the mean squared error) and follow tables or algorithms to reach the target service level prescribed quantile. We have seen that this approach is not ideal as a) we tend to judge forecast accuracy for the mean or quantiles irrelevant to our end goals, b) the very convenient normality (or other) distributional assumption of errors is often in violation ( Koenker and Basset, 1978 ).
The third stream then relaxes the assumption of normality, or any distributional or parametric assumption, to move directly from the observations to the quantiles of interest. We outlined examples of this stream in Section 3 and further elaborated on them in Section 4 . Bootstrapping, robust optimisation, and density forecasts or quantile estimation are examples of streams attempting just that. Heuristic and increasingly more so machine learning ap-proaches are called upon to provide solutions to these complex problems. Data driven solution approaches have taken advantage of recent developments in computing power and big data to help with solving them.
A parallel argument that arises from these historic paradigm shifts, or perhaps can be used to explain them, is that of the speed of change. The assumption that demand characteristics remain stable for a long time was perhaps reasonable up to the end of the 20th century, but is maybe not much so anymore ( Bowersox, 2007 ). For one, product life cycles have since reduced, which, other than directly violating the above assumption, also leads to a great reduction in time length of available data ( Basallo-Triana et al. 2017, Baardman et al. , 2018, Van Steenbergen and Mes, 2020.
While bootstrapping and machine learning approaches do not rely on distributional assumptions, problems arise especially when few demand observations are available. Overfitting and their inaptitude to anticipate possible future values that are not encountered in the sample are the main problems. On the other hand, Bayesian approaches do not face these problems, they do however depend on distributional assumptions.
It is worth noting that the vast majority of research is dedicated to linear supply chains: from resource extraction to serving customer demand. Increasingly however, governments, customers and companies are interested in how to retain resources in circulation (the circular economy) as much as possible before disposal ( Goltsos et al. , 2019a ). We note that demand inventory forecasting should be in this sense expanded to demand and returns inventory forecasting, an area which is open to contributions.

The role of integrated inventory forecasting
The need for integration has been to an extent substantiated from the combination of levels 0 and 1 research (in level 2 papers, often in exploration of the validity of relevant theoretical assumptions through simulation in real data). However, what this also implicitly highlights is the fact that integration is not required when the distributional assumptions are robust. Integration should be treated as an injection of realism, to treat (or investigate potential) violated assumptions -b u t not to complicate for the sake of complication. So, literature of levels 0 and 1 provide a mosaic from where to draw solutions when the relevant assumptions (more or less) hold, but also foundations or inspiration to enable and educate the naturally more complex integrated approaches.
Literature at level 2 then seems to identify cases where sequential application of non-integrated approaches underperform. Simulation has played a prominent role in this domain, in particular when combined with industry data, to expose dubious assumptions that lead to performance losses. Literature at level 3 delves deeper and explores the reasons behind the inaccuracies, employing integration as a remedy. Often, non-integrated literature provides bounds for their integrated counterparts. For example, the classical EOQ cost function provides upper cost boundaries for the more realistic, optimal discrete EOQ, with Lagodimos et al. (2018 , pp. 119 ) advising "extreme caution when transferring results between the continuous and discrete-time frameworks" -or in other words, when in violation of the continuous time assumption. The above discussion is graphically summarised in Figure 16 .

The trade-off between complexity and efficiency gains
As the scope widens to accommodate this integration, the analytical and computational burdens increase. Demand assumptions are relaxed, forecasts need to be produced and inventory policies need to accommodate the fact that demand is forecasted, increasing the complexity of mathematical or algorithmic models or sim-ulations. For these reasons, machine learning techniques, heuristics and simulation are often relied upon to explore the intertwined inventory forecasting problems (solutions).
We have shown many examples where integration has provided better results than non-integrated approaches (see Sections 3 and 4 ). However, there exists a trade-off between increasing complexity and potential gains from pursuing it. An important aspect of complexity in research is how it affects its adoption by practice. Niemi et al. (2009) , investigating the innovation-adoption gap with regard to inventory management techniques, note that "despite all the theory available, the inventory management techniques in use in companies are often very elementary". The logical question that arises is, when is it worth to address these extended complexities? It follows that integration is not to be pursued for the sake of integration -i n other words a case needs to be made in terms of efficiency gains over effort expended.
What is perhaps missing is an evaluation of this trade-off between benefits of integration and the realism of the assumptions, as well as the severity and likelihood of finding oneself in violation. Quite a few papers have attempted to quantify this by benchmarking inventory performance of integrated approaches against more traditional ones (e.g., Taylor, 2007 , for the order-upto level calculation). This should perhaps become the standard, in line with the mostly adhered notion of benchmarking against (also) simple forecasting procedures (such as simple exponential smoothing) when comparing forecasting methods (e.g., Eaves and Kingsman, 2004 ). (Relatively) simple integration can reveal the scope for improvement (relative to non-integrated approaches) and provide potential areas where deeper exploration would be merited.
What is a simple way to integrate? Integration is a spectrum, and often the distinction between levels 2 and 3 are not clear. We can however attempt to define where it starts. On the lower end (of level 2), the minimum requirement would be a sequential application of forecasting and inventory control. Demand history is analysed, inventory quantities of interest are forecasted, and these forecasts inform an inventory policy. So, from a forecasting perspective, forecasts need to be subject to an analysis of their 'utility'. From an inventory control perspective, this means that modelling should not overly rely on convenient assumptions of known demand but expose itself to more realistic assumptions of demand and forecasts.

Forecasting for inventory control
What to forecast (and how to assess it) remains an open, actively researched question. An unbiased forecast of e.g., the median (where the absolute errors optimise in) describes the level of inventory that would satisfy demand (from stock on hand) 50% of the time. The question is, why are we forecasting (and comparing performance against) the median (or mean), if the target service levels are rarely 50% (i.e., in a newsvendor type setting the inventory holding and backorder cost being equal)? That is to say, point (mean/median) forecasts are not enough for inventory control purposes, however accurate they might be.
The utility of the forecasts is in effect a proxy evaluation of how good the inventory forecasting system in consideration is, in predicting a quantile of interest (that directly or indirectly corresponds to a service level; Gardner, 1990 ). At a minimum, there should also be a forecast for the variance of the forecast errors, perhaps through some mean squared error smoothing procedure ( Brown, 1982 ;Bretschneider, 1986 ; see Babai et al. , 2021b , for a comparison of approaches). But still a (point) forecast of mean demand and variance also requires a distributional assumption to compute safety stocks (i.e., quantiles of interest), and at times re- Fig. 16. Integration serves to identify and treat cases where the assumptions are dubious. Integration comes at a cost of complexity, but also with potential for improvements.
lying on the first two moments alone is not enough ( Lagodimos et al. , 1995 ).
It seems that one of the most promising approaches, from an integrated inventory forecasting perspective at least, is to forecast the entire lead time demand distribution, bypassing the need to rely on at times dubious distributional assumptions ( Taylor, 2007 ;Snyder et al. , 2012 ;Barrow and Kourentzes, 2016 ;Kolassa, 2016 ;Amrani and Khmelnitsky, 2017 ) and then extracting the relevant quantiles -a n area in need of further exploration . Among the same lines, point forecast accuracy research is progressing (e.g., Petropoulos and Kourentzes, 2015 ), but the need for accuracy measures for the entire distribution is pressing ( Kolassa, 2016 ).
Bayesian methodologies offer a natural connection to classic inventory theory while using a nimbler approach that updates the parameters of the distribution as more data become available. They readily provide probability densities rather than point forecasts; however, the inventory forecaster still needs to assume a distribution (and the selection can complicate the parametrisation efforts). Robust optimisation and bootstrapping approaches are interesting ways to avoid problematic distributional assumptions, especially when avoiding assumptions of i.i.d. demand. Machine learning approaches show great promise with their ability to calculate inventory parameters of interest directly from the data, while incorporating covariate information (features) beyond the timeseries. More effort should be expended in dealing with issues of overfitting and to employ proper benchmarking.
As we move towards more integrated research on inventory forecasting, the good practice of forecast evaluation should be expanded to include rigorous benchmarking of the entire inventory operation. Such a benchmark should include a relevant inventory policy, with a carefully selected number of appropriate forecasting methods and distributional assumptions. Any parameter (including, e.g., weights for the exponential smoothing family) should be optimised on the bottom-line inventory performance metric (most often, some cost equation alongside a service level). Cross-validation and testing across a number of SKUs are well established aspects of forecasting benchmarking that should be retained here too.
Finally, and perhaps most importantly, this integrated thinking should instigate fundamental, simple solutions to the joint question of inventory forecasting. Beyond any future developments in the streams discussed above, there is certainly tremendous scope in uniting inventory and forecasting by a better conceptual understanding of their interaction. Such theoretic, often closed form formulations, provide solutions that can also assist heuristic, machine learning and other applications by providing solid foundations and freeing up processing power.

Appendix A. Search string
Our keyword sets were informed in three ways: i) initial keyword selection and search, ii) (key)word analysis on the resulting paper (pre-)sample, and iii) expert consultation. To initiate the survey of the literature, we created an initial keyword set that returned a pre-sample of the literature (1500 articles). We then went through all titles of the pre-sample papers, and when needed abstracts, excluding irrelevant ones, ending up with 60 articles. Forward and backward searches revealed a further 150 relevant articles, bringing the pre-sample to 210 articles. We performed a content analysis on this pre-sample and produced lists of forecastingand inventory control-related keywords ranked by instances of appearance in titles, abstracts or keywords 8 . At the same time, we contacted leading academics in the broader fields of forecasting and inventory control, asking them to produce five keywords for each group. The results of both exercises served to educate the final keyword selection.
The selected keywords were thematically split between forecasting and inventory control, and then further into "area-defining" and "context-specific" sets. We put broad keywords that tend to create many hits (of a broad scope) in the former, and the rest, which form an inclusive (albeit not exhaustive) attempt to capture different areas and niches of the relevant literature, in the latter. The 'context specific' keyword sets attempt to exclude irrelevant (to our paper) literature that may share some of the keywords (e.g., weather or energy forecasting, or financial/stock market research).
For a paper to be included in the sample, it had to contain at least one word of each group of keywords in its title, abstract or author selected keywords (see Figure 4 ). We focused on academic papers and restricted ourselves to English manuscripts to ensure readability across the authors. Finally, papers from various areas irrelevant to our search academic fields (e.g., medicine, meteorology, etc. -a s these were defined by Scopus) were excluded to help focus the sample.

Appendix B. Levels of integration
Any papers irrelevant to our focus we find in our final sample are considered erroneous inclusions. This in a sense also includes literature of the non-integrated levels 0 and 1 (that may end up in the final sample), as by merit of the search string, we are looking for papers dealing with inventory and forecasting concurrently (and therefore literature of levels 0 and 1 should be largely excluded). While our focus is on the integrated levels, with examples mostly taken from our sample, levels 0 and 1 are included in the framework for completeness, with examples mostly taken from seminal books and reviews of the field. Any integrated literature (levels 2 and 3) that inadvertently ends up outside the sample are considered erroneous exclusions ( Fig. 17 , Fig. 21 ).

B1. Integration level 0
This level consists of what may be termed the 'traditional' literature on forecasting and inventory control, respectively. It can be seen as a natural first step whereby the research is relying on con-venient assumptions to solve the simplest possible models (where the research question still exists).
Inventory level 0 models assume that all information on demand (generating process, DGP) that can be obtained at the point of decision making is given, and they do concern themselves with how the information is obtained (I0, Figure 18 ). There is no mention of any forecasting or estimation of parameters taking place, not even as a recognition that there is a need to do so. Demand is assumed to be known deterministically (e.g., fixed demand rate per unit time as assumed in the basic EOQ formulation in Harris, 1913 ) or stochastically (e.g., normal distribution with known mean and variance in other formulations in Lagodimos, 1992 ). Examples of these works can be found in all standard books in operations and supply chain management (e.g., Hadley and Whitin, 1963 ;Zipkin, 20 0 0;Muckstadt and Sapra, 2010 ).
Along the same lines, the traditional forecasting literature, where methods' development and/or testing solely emphasises forecast accuracy, has no mention of any subsequent use of these forecasts. That is to say, in F0 ( Figure 19 ) forecasting is treated as being an end in itself (e.g., Makridakis et al. , 1998 ), not concerning itself with the utility of the forecasts.
Please note that what we represent here as "forecasting accuracy", "service level" and "inventory cost" can take many forms. Forecast accuracy can mean a variety of things such as point forecast error measurements, e.g., mean squared or absolute errors. Service level might refer to cycle service level or fill rate. Inventory (related) cost can refer to holding costs, ordering costs, backorder costs or costs lost sales, among others. In any case, what we intend to highlight here is that forecasting and/or inventory performance is somehow evaluated. We discuss in Section 3 examples of the measures we encountered in our sample for both forecasting and inventory control.

B2. Integration level 1
This level describes literature very similar to that included in level 0, with one important point of departure: the recognition that demand should actually be forecasted, or that the forecasts are to be used for the end goal of controlling inventories. While these are important qualifications to position the research in the broader context of inventory forecasting, the literature here has otherwise similar modelling decisions and assumptions as it does in level 0.
Research of Inventory level 1 (I1, Figure 20 ) mentions this need, but forecasting is not employed per se. It is discussed as a separate entity (see Silver et al. , 1998Silver et al. , , 2017 and with no integration of forecasts (and their errors) in the inventory policies (see Waters, 2008 ).
Similarly, research in F1 ( Figure 23 ) does not consider forecasting utility metrics or bottom-line inventory implications (i.e., the forecast accuracy implications). That is to say, they do not pursue the investigation of how forecasts are affecting the ultimate goal of controlling inventories, not going much further than mentioning potential inventory implications (e.g., Hyndman and Athanasopoulos, 2018 ). This is also reflected in seminal reviews of time series forecasting, e.g., see Gardner (1985Gardner ( , 2006 (with focus on exponential smoothing) and De Gooijer and Hyndman (2006) .

B3. Integration level 2
At integration level 2, demand is assumed stochastic and unknown, and is forecasted. An inventory policy is turning the forecasts into inventory quantities of interest. The applications at this level are serial in nature, in the sense that forecasting methods and inventory policies are selected (and/or optimised) in isolation, and their interrelation is not closely examined (e.g., Willemain et al. , 1994 ;Eaves and Kingsman, 2004 ).
Level I2 ( Figure 22 ) describes articles where a forecasting procedure is selected and applied, as an input to an inventory control model, which is where the focus of the paper in question lies. Syntetos and Boylan (2006) , for example, investigate the inventory performance (holding cost and fill rate -t h e percentage of demand served directly from stock on hand) by simulating a periodic (review period T) order-up-to (level S), (T, S) policy, with various forecasting methods as input. Another example is Watson (1987) , that incorporated periodic exponential smoothing and moving average forecasts in an EOQ-type model, to explore the effect of forecast errors in achieving service levels.
Similarly, but when the scope and focus of the paper is on forecasting, the paper is assigned the level F2 ( Figure 23 ). Some inven- Fig. 20. Inventory level 1 -Same as I0 but with a discussion of a need to "somehow" forecast demand. Fig. 21. Forecasting Level 1 -Same as F0 but with a discussion of inventory and potential implications.  tory policy is employed, and there is some evaluation of the forecast's 'utility' in terms of inventory implications (most often cost and service level). Gardner (1990) makes a very good example, employing efficiency curves 9 between different forecasting procedures to evaluate their inventory performance in a reorder point (r) order quantity (Q), (r, Q) inventory policy.

B4. Integration level 3
Integration level 3 is reserved for papers that approach the problem of inventory forecasting jointly. That is to say, they take an integrated approach whereby some contribution is achieved by looking at the entire picture. In that sense, forecasting modelling decisions are influenced by the inventory modelling decisions (and vice versa), and most often a joint metric is pursued. A holistic understanding of the specific (and joint) nature of the inventory forecasting problem is required as it is furthered. This would then constitute a development towards inventory forecasting as a joint entity. As such, the framework converges at level IF3, to describe integrated inventory forecasting research. Literature here goes beyond serial application and is more often than not concerned with the interrelations of forecasting and inventory control in specific demand contexts in violation of common (convenient) assumptions. At a minimum, modelling decisions relating to both forecasting and inventory control are judged on inventory performance metrics (e.g., optimizing simple exponential smoothing parameter alpha directly on inventory performance in Kourentzes et al. , 2020 ).
The fact that demand parameters are forecasted has its own implications, and literature at IF3 ( Figure 24 ) attempts to investigate and address them. For example, Prak et al. (2017) observed that while demand may not be correlated, forecast errors are, leading to undershoots when neglected. They proposed appropriate safety stock adjustment mechanisms when the one-step-ahead forecast errors are informing the variance calculations. Hoberg et al. (2007) compared a linear and a proposed non-linear (integrated) inventory policy with simple exponential smoothing forecasts, against stationary and non-stationary demand, and found the integrated approach reduces order amplification. Dejonckheere et al. (2002) employed transfer functions to investigate the influence of forecast errors in the bullwhip effect (inventory variance over demand variance) in supply chains. The above are a collection of papers showing how the research interests in IF3 have grown to include the interrelations of forecasting and inventory control.
Forecasts are moving away from forecasting the mean to also forecasting the variance ( Brown, 1962 ;Bretschneider, 1986 ;Snyder, 2004 ), but also further into forecasting the entire lead time distribution ( Barrow and Kourentzes, 2016 ). In the same wavelength, effort is expended to propose forecasting accuracy metrics that better represent the end goal of inventory control, including ways to judge forecasts in their ability to forecast percentiles of interest ( Kolassa, 2016 ). While the performance of the system will be ultimately judged by appropriate inventory metrics, there is still merit to track forecasting performance as it can help to identify issues in the forecasting process -a n d track performance losses to the demand generating process.
Finally, we note that the proposed integration levels attempt to classify a semi-abstract spectrum of integrated literature. As such, there is a certain degree of subjectivity in assigning the levels for each paper. Especially when it comes to books, certain chapters might be of different levels than others (e.g., Silver et al. , 1998 , and subsequent editions, could be seen as a mostly level F1 book with a chapter on forecasting being level F2).

B.5. Classification beyond integration
Beyond the assignment of an integration level to each paper (and as a stepping-stone to help us reach that decision), we have recorded a further number of variables. These are presented below.
• Bibliometric information: This information is directly extracted from Scopus. We are interested in the year of publication and journal of publication. This will help us get a feel for the area and its size over time.
• Methods and policies employed: We record forecasting methods used to forecast demand (e.g., exponential smoothing family), as well as inventory policies used to satisfy demand (e.g., order-up-to). We also note when particular model (e.g., state space) or approach (e.g., aggregation) or solution methodologies (e.g., heuristics or machine learning) are employed. If a paper employs two or more policies and/or methods (e.g., for comparison purposes), the record will reflect all employed, as required for forecasting or for inventory control.
• Performance measurement: Here we capture information on the metrics used for the evaluation of either the forecasts or the inventory performance. These are forecasting accuracy metrics (e.g., mean squared error) and inventory performance metrics (e.g., bullwhip effect, cost, or average inventory), respectively. Again, multiple entries on both forecasting and inventory metrics are captured.
• Methodology-related information: We also record whether a paper pursued some theoretical development, making no judgement on veracity or significance. Introducing new formulae, models, algorithms, proofs of lemmas/theorems, are judged as an analytical development. Repetitions of methods proposed in other papers do not qualify, while adaptations do. We similarly note if simulation is taking place on any kind of data and scenarios.
Simple numerical examples and simulations that occur elsewhere and are not reported in the paper (omitted) do not qualify. These two variables could simultaneously be true or false for a single paper.
• Extraneous information: Further to the above we also record if a paper is dealing with slow/intermittent demand, if it concerns multiple nodes in a supply chain, or closed loop supply chains. Finally, we make a note if data employed (e.g., demand/sales time series) are empirical (e.g., time series taken from industry) or theoretically generated (e.g., when demand is drawn from a normal distribution with e.g., mean µ = 200 and standard deviation σ = 20).