Using a Linear Regression Approach to Sequential Interindustry Model for Time-Lagged Economic Impact Analysis

The input-output (IO) model is a powerful economic tool with many extended applications. However, one of the widely criticized drawbacks is its rather lengthy time lag in data preparation, making it impossible to apply IO in high-resolution time-series analysis. The conventional IO model is thus unfortunately unsuited for time-series analysis. In this study, we present an innovative algorithm that integrates linear regression techniques into a derivative of the IO method, the Sequential Interindustry Model (SIM), to overcome the inherent shortcomings of statistical lags in conventional IO studies. The regressed relationship can thus be used to predict, in the short term, the accumulated chronological impacts induced by fluctuations in sectorial economic demands under disequilibrium conditions. A simulated calculation is presented to serve as an illustration and verification of the new method. In the future, this application can be extended beyond economic studies to broader problems of system analysis.


Introduction
The input-output (IO) model is a powerful tool that was first proposed by Wassily Leontief in the 1930s (Leontief, 1936). It builds a bridge between the understanding of micro-level production and macro-level economic flows, thus providing policy makers with a tool to account for the operation of past economies and devise more targeted macro-economic strategies. The concept of the IO model is a consolidation of the classical economic conceptualisation of the tableau économique proposed by François Quesnay in 1758 to depict the relationship between the agricultural/mining sector and the artisanal/manufacturing sector in eighteenth-century France (Leontief, 1953, Coffman, 2021. Since World War II, the statistical bureaus of world governments have taken initiatives to construct national IO tables to facilitate policymaking. Recent efforts by international scholars include constructing multiregional IO tables and ancillary emission inventories (Stadler et al., 2018, Timmer et al., 2015, Lenzen et al., 2013, Mi et al., 2017. Many recent studies also extend economic IO analysis to environmental stress in accounting for embodied carbon emissions at the macro level (Dietzenbacher et al., 2020, Hertwich and Peters, 2009, Weber et al., 2008. Other scholars have suggested using input-output based hybrid life cycle assessment models in building and infrastructure projects potentially employing Building Information Modelling (BIM) data (Sharrard et al., 2008, Coffman andKelsey, 2019). In addition to the efforts in extending IO applications, some efforts are also devoted in methodological improvement of the IO modelling. To encompass micro-level perspectives, researchers naturally consider the time domain as an avenue for the methodological improvement of the IO model. In fact, Leontief himself delved into the realm of the time domain of IO by introducing the dynamic IO model (Leontief, 1970, Leontief, 1953, attempting to explain the interaction between capital investment and production output on a chronological basis. Even though some researchers are still working towards the perfection and application of the dynamic IO model (Aulin-Ahmavaara, 2000, Rocco, 2019 to better factor in the impact of capital formation on productivity as a solution for economic prediction, such an approach has been criticized by other researchers for a number of reasons (Kurz and Salvadori, 2000). Besides the classic demand driven IO model, attempts have also been made to construct a supply-driven IO model in the 1980s (Oosterhaven, 1988), but few developments have been fulfilled until its recent emphasis in disaster event analysis (Galbusera andGiannopoulos, 2018, Yagi et al., 2020), reigniting the discussion on supply-driven IO (Reyes and Mendoza, 2021). Under a similar intention, a number of other analysis tools build on the idea of chronological impact in IO model, such as dynamic inoperability (DIIM), supply bottlenecking (ARIO), and hybridization with Computational General Equilibrium (CGE) model, to analyse the impact of disasters in the long term (Mendoza-Tinoco et al., 2017, Zeng et al., 2019, Zeng and Guan, 2020. Such developments normally include non-linear characteristics, assuming a final steady state will be reached given a long-term general equilibrium. Building on this idea, the most well developed and widely applied superset of IO model is the CGE model. As discussed by Koks et al. (2015), CGE models focuses on the long-run future equilibrium state, contrasting to the nearer future focus of IO models. However, the expensive human capital investment required for the learning, building, and using of CGE remains a significant entry barrier for many scholars. The accuracy of CGE model in economic predictions is as well questioned by many (Polo and Viejo, 2015, Dixon and Rimmer, 2009, Zhou and Chen, 2020. A simpler version of chronological IO variant that focuses on the nearer future event would thus be a great complement to the existing toolbox of economic modelling. Hence, jumping out of the general equilibrium assumption, Romanoff and Levine (1977) propose the sequential interindustry model (SIM) as another strain of IO model innovation to pollinate the time domain analysis into the IO model in a culminative linear interaction. Similar to the dynamic material flow analysis (B. Müller, 2006), the SIM hybridises the time lag in demand propagation with the IO model by building a direct linear linkage between future and past economic activities. Some real-world SIM applications have also been made by later researchers. For instance, Okuyama et al. (2004) and Okuyama et al. (2000) use the SIM to assess the economic impact of natural disasters in Japan by using quarterly disaggregated hypothetical data. However, past attempts with the SIM proposed technical coefficient changes based only on hypothetical estimations, which greatly aggravates the uncertainties of modelling outcomes and limits further applications. This is largely because the data demands of the SIM are not easily met (Levine et al., 2007), thus hindering the SIM from fulfilling its deeper potential. It is also the reason why many IO researches often merely mention and discuss the SIM in literature reviews, but resolve to other IO variations as the tool for IO related modelling involving chronological analysis (Barker and Santos, 2010, Malik et al., 2014, Mendoza-Tinoco et al., 2017, Yu et al., 2013, Avelino and Hewings, 2019, Avelino, 2017. In our opinion, the reasons that there is hardly any advancement in the SIM model towards high time-resolution innovation are as follows. 1) Researchers are too convinced that the IO model is an economic model that obeys the principle of general equilibrium, so efforts are concentrated on introducing more economic concepts, such as inoperability (Yu et al., 2013, Barker andSantos, 2010), to build a "perfect economic model', but the fundamental interactions between economic in a physical way are sadly overlooked. If we apply SIM in a short run disequilibrium setting under which macroeconomic structural changes are less influential factors to be considered, we could exploit SIM's potential in regression with big data. 2) The data requirement for high time-resolution analysis is costly. National IO tables are normally produced every 3-5 years. It takes great effort from IO scholars to increase the time resolution on a yearly basis (Avelino, 2017) to match the annually or quarterly updated economic indicators. Data unavailability has disincentivized model builders from working on a theoretical model with limited applications. Thus, limited methodological advancement has been made based on the concept proposed by SIM. As Leontief pointed out at his later stage of research, the IO model is not simply an economic method but can also be understood from the technical/engineering perspective (Leontief, 1991). In this research, we share the same spirit of Coluzzi et al. (2011) to look at the IO model from the perspective of data science. Through an innovative algorithm, we creatively simplified the complicated interactions among economic sectors proposed by SIM into a solvable linear model. Then, we pollinated linear regression technique into the new algorithm, so that the best fitted technical coefficient from sectorial demands and outputs observations can be reversely calculated across the discrete time intervals, instead of hypothetically and less accurately 'divide' production coefficients along the time domain like past SIM modellers did. A simulation based on dummy input has been conducted in this research to show that the algorithm proposed can accurately calculate the unknown chronological production coefficients based on simply output observations. The innovated algorithm will thus offer potential applications both in data science and in econometric interdisciplinary research. It opens the "black box" to reveal the relationships that cannot be told by conventional big data tools such as machine learning algorithms. Thus, scholars could have a new perspective to describe the structure of an economy in a chronological way. If the production coefficient of SIM is to be changed, we will be capable to investigate the impact of an external shock, such as nature disaster, on the economy in a timely dynamic way. On the other hand, if inputted with adequate proxy data, the proposed algorithm may extensively alleviate the cost associated with IO table compilation, revolutionizing the way of conducting large scale economic statistics.

Methodology and Data Sequential Interindustry Model (SIM)
First, let us recall that the classic Leontief input-output model can be written as follows: (1) In equation (1), is the total output in vector form. is the final demand in vector form. is the production coefficient matrix. is the identity matrix. Using the Taylor expansion, the equation above can be rewritten as follows: (2) The physical meaning of equation (2) is that the output induced by demand takes effect through layers of intermediate productions. Both equations (1) and (2) are input-output fundamentals that anyone in the IO field can easily recall. SIM then introduces the time domain by splitting and into finite discrete time intervals as time series, meaning that and can be rewritten as ( ) and ( ) , respectively. Equations (3) and (4) illustrate the principle of this conversion.
For each of the terms, the subscript denotes the specific demand and output at discrete time . is the number of discrete time interval investigated.
is the number of propagation layers that ideally approach infinity (the same as demonstrated by Taylor expansion in equation (2)). It means that for observations of ( ) and ( ) , due to productional propagations, changes on output ( ) will theoretically extend into the infinitely distant future. Hence, considering the time lag feature of ( ) and ( ) , equation (1) will then be changed to equation (5) below.
In equation (5), a recursive algorithm is introduced to obtain the output ( ) at time based on two variables: the current demand ( ) at time and total output ( −1) at the previous discrete time of ( − 1). The rationale is that what has been produced "today" will signal intermediate production "tomorrow" and propagate into the more distant future recursively. For instance, a hundred cars are manufactured and consumed "today", so that four hundred tyres are used in the inventory of car making factories. Thus, receiving the market signal, tyre manufacturers will produce four hundred tyres "tomorrow" to respond to the consumption signal sent out "today". Since SIM discusses the state of economic structure in a very short period of time, we assume that the economy will not be able to respond by investing in capital equipment or similar means to change its structure, but to change production level in the face of differed market signal. Although the length of time duration investigated is , more discrete time is extended into the future as the residue of the outputs induced by demand induced by outputs before time takes effect. The following equations illustrate equation (5) in its expanded form to help readers understand the mathematics and the underlying idea.
Since the row-sum of elements of production coefficients in are all smaller than 1 as input must not exceed output for any productions, converges to zero as becomes sufficiently large. We can thus assume terms with 's degree powers higher than to be neglected. Bearing that concept in mind, we next introduce the time domain into the production coefficient in a similar manner to that of equations (3) and (4).
is thus converted into equation (6) as follows: The reason for splitting is to reflect the time lag that occurs between different sectors due to technical constraints (Romanoff and Levine, 1986). Hence, the modified production coefficients not only reflect the magnitude of intermediate inputs, but also the timing for the intermediate inputs to be fulfilled. For instance, tyres in the car manufacturing industry may be made quicker than cars' control chips due to reasons like the nature of engineering process and distances among different industry clusters. Such information cannot be captured by in traditional IO models, but its spitted version in SIM as described by equation (6). An earlier simple attempt by Okuyama (2004) splits into three stages. Equation (5) is thus further modified into equation (7), the general form of SIM as follows: In an early work of Romanoff and Levine (1981), equation (7) is defined as the responsive production SIM where productions are responses to demand in the past. A further advancement in SIM modelling has been proposed to include anticipatory production of demand from the future. To avoid unnecessary complications, this research will only work with the general responsive production form of SIM. For the purpose of illustration, the SIM process described in equation (7) is shown visually using the schematic diagram given in Figure 1 below. Each horizontal row shows how the output ( ) is comprised at the respective time . Initially in the first row, it shows the state where ( ) and ( ) have not started interacting with each other. Until it moves to the second row of = 0, (0) is first multiplied by the identity matrix to produce (0) , as highlighted in red on the left side. It suggests that output will match the market demand for the first time interval. In the next discrete time of = 1, output at the previous discrete time, (0) , is multiplied by (1) to produce one element that is to be added to the product of (1) and , according to the aforementioned assumption that output will match the instant demand. The product of (0) and (1) indicates that the production coefficient on the second discrete time will be applied to the output at first discrete time. It is to simulate the time-lagged response of intermediate productions to output of previous time discrete. The added sum then produces (1) and is to be used in the next step of = 2. Theoretically, this procedure is repeated for infinitely many processes. Essentially, readers may want to visualize the process as moving the series of ( ) and ( ) towards each other to perform a convolution-like operation, as demonstrated in Figure 1. For easier discussion, we constructed a SIM system with a dummy production coefficient matrix ( ) with two layers as shown in Table 1 below. (1)

simulated production coefficient matrix in time-series of two sectors and two layers, where (1) and
(2) shows the interaction of the two sectors in t=1 and t=2 respectively.
Assuming two unitary demand of 1 happens for Sector A at = 0 and for Sector B at = 1 respectively, denoted as vectors (0) = [ 1 0 ] and (1) = [ 0 1 ], the following output of ( ) can be then obtained according to equation (7): (2) = (1) (1) + (2)  By organizing and plotting ( ) in Figure 2, it is easily observed that under SIM, the final output will decay towards zeros, validating the deductions in SIM we made previously.

Model Innovation
The objective of our proposed algorithm is to obtain the production coefficient ( ) in a time series through ample observations of total output ( ) and final demand ( ) . Although Levine et al. (2007) has proposed Z transformation to solve the convolution-like problem presented in SIM, there is no mathematical solution to Z transformation for matrix variables into the frequency domain. Hence, no methodological advancement has been made to solve SIM model. It is natural to consider solving SIM in the time domain by utilizing the development of programming capacity developed in the recent decade. The SIM model is thus to be solved within the time domain. In our proposed algorithm, we assume that productivity ( ) is unchanged, and production activity ( ) is only signalled by consumption activities ( ) as a responsive demand.

… …
Which can be rewritten again into a linear system in matrix form, as shown in equation (8): sectors. If we take ( + 1) × observations of to make the second matrix on the left a symmetric matrix, then the first term in equation (8) is solvable by taking an inverse of the second term, as shown in equation (9).
[ (1) ( (1) 2 + (2) ) ( (1) 3 + (1) (2) + (2) (1) + (3) ) ⋯] ⋯ ] −1 (9) By observing the term on the left-hand side of the equation, we can see that the time subscripts and power degrees of ( ) in each of the elements all sum up to be the same. For instance, the element ( (1) 3 + (1) (2) + (2) (1) + (3) ) has its terms' time subscripts and power degrees all sum up to 3. Physically, this can be explained by the different paths taken by production operations to reach the current layer; i.e., in ( (1) 3 + (1) (2) + (2) (1) + (3) ), the production sectors that are "closer" to each other and linked by (1) will be induced three times compared to only once by (3) . Such an algorithm telling the combinations that satisfy such a requirement is called a partition in number theory, for which plenty of well-programmed functions in multiple programming platforms for its realization are available. Hence, knowing (1) in the beginning, we can first obtain (2) according to the expression of (2) shown in the second row of equation set (10). Having (1) and (2) , it is then feasible to obtain (3) according to the third row of equation set (10). Thus, any ( ) for > 1 can be obtained by recursive induction step by step.
First, let the variables in equation (8) (11) is thus the estimated output based on ( ) and ( ) , while the right-hand side is the real output. In equation (11), all variables are asymmetric matrices, unlike required equation (9), to include all observations of ( ) and ( ) available. The error in estimation is thus easily obtained as − . Since all variables are matrices, the sum of squared errors is thus given by a simple matrix operation as: Where the superscript means transpose matrix. To find the value of that minimizes ( ), we first need to take the first derivative of ( ) with respect to , so that equation (12) (14), + denotes the Moore-Penrose pseudoinverse matrix of , widely used for linear regression problems (MacAusland, 2014). The best fitted ( ) can thus be easily obtained.

Simulation Results
To verify the efficacy of the innovative algorithm proposed, we simulated a ( ) series based on the SIM interaction of a randomly generated 2-sector ( ) series of 200 discrete time intervals and the 2-layer production system ( ) described in Table 1. We first arbitrarily set propagation layers to be 40, so that the influence from demands at (0) will be negligible after output (42) , i.e. after 40 layers of propagation and 2 layers of production by ( ) . Starting to take observations from (43) , 158 observations of ( ) are used to construct the matrix in equation (11). matrix and its pseudoinverse are also constructed using all 200 observations of ( ) according to equation (14). Thus, the preliminary form of ( ) shown as in equation (11) is easily obtained.
Performing the partition algorithm illustrated in equation (10) will give us the regressed production coefficient ′ ( ) .  Table 2 shows the errors between the ′ ( ) obtained and the actual ( ) . Compared to ( ) , errors are minimal enough to be ignored and conclude that ′ ( ) and ( ) are basically identical, demonstrating the effectiveness of our innovated algorithm.
To further test the capability of the innovated algorithm for SIM, the squared mean errors are calculated under varied number of propagation layers. Figure 4 illustrates how the mean squared errors change with different number of propagation layers set. It is clearly seen that as the number of propagations increases, the mean square error between ′ ( ) and ( ) exponentially decreases, suggesting that ′ ( ) approaches the true value of ( ) as number of propagation increases. It makes sense as SIM models infinitely distant past demand to still have an impact on the present output, but minimal and negligible as propagation extends to the future. Increasing the number of propagations set in our algorithm factors in the increasingly minimal effect on output ( ) from demand ( ) , thus producing a more accurate ′ ( ) as solution to the system.
Another important observation is that the squared mean error drastically increases to 1×10 -2 after the number of propagation layers becomes more than 64, sufficiently large to invalidate the estimated ′ ( ) . The reason is that insufficient number of sample available hinders the functioning of linear regression algorithm. When the number of propagation layers is 64, the dimension of in equation (11) becomes 134-by-134, a symmetric matrix just sufficient to solve for . As the number of propagation layers increases beyond 64, the system described in equation (11) becomes an underdetermined system with no unique solution. ′ ( ) thus deviates significantly from the true value ( ) . If there are more observations available, more propagation layers can then be accommodated to improve the accuracy in the proposed algorithm further. In the simulation exercise, we work on a system with known layers of production in ( ) . In real world applications, the number of layers in production is normally unknown. It is thus sensible to vary both the number of production layers and number of propagation layers in real world applications to find the solution for best fitted ( ) with the minimum mean squared errors as the best approximation for chronological production coefficients.

Discussion & Conclusions
As a unique variation of IO model, SIM offers tremendous potential in economic system analysis. Unlike the dynamic IO model and its later development which keep general equilibrium as an underlying assumption, the SIM is by nature a disequilibrium model that focuses on the nearer future analysis. Unfortunately, disproportionally fewer efforts and achievement in SIM methodological advancement have taken place since its first proposal in 1980s. In this study, we propose an innovative algorithm as an important methodological advancement for the SIM. It complements with SIM by providing a way to integrate it with econometric linear regression analysis. Investigation into the chronologically extended production coefficients can provide temporal information into the interlinkages among economic sectors. For instance, although knowing the units of steel and the associated time needed to produce one unit of output in the manufacturing industry may be possible by conventional life cycle assessment methods, the cost of such an assessment will be tremendously high if comprehensively conducted at macro-level, not to mention the difficulty in constructing the sectorial interlinkages among all the industrial sectors in a similar manner as IO tables. The complementary algorithm to SIM provides a cheap and efficient way to quickly draw a comprehensive picture of chronological interlinkages based on economic activities in high time-resolution. Being more accurate than hypothetically constructed in previous studies, the linearly regressed temporal production coefficients of SIM can be reversely used for short future predictions of sectorial outputs given unscheduled events such as natural disaster. The proposed algorithm can also be deemed as a powerful tool to handle big data and better utilize them for the benefit of socio-economic symbiosis research. As illustrated in the simulation, the time resolution for demand and output observations must be high enough for the feasibility of linear regression. Daily or hourly demand and output observations should be preferably used in our algorithm. Although online transaction data across different sectors is ideally the best, it is highly unlikely that such data are available due to technical constraints at present. As a sensible compromise, proxy data with high chronological resolutions on economic activities, such as electricity consumption and satellite remote sensing, may be a solution to the stringent data quality requirement. Since there are limited modelling tools available for complex system symbiosis analysis, real world application and validation will be the desired next step forward for SIM and the proposed complement algorithm. Extending the improved model beyond economic research, other complex networks similar to IO systems with delay characteristics, such as the human body metabolism system, can borrow the method presented in this study for their unique applications. Unlike macro econometric data, observation data on smaller systems are normally much easier to be obtained. Analysis similar to the previously proposed can be conducted to investigate how materials interact among themselves in a temporal manner to offer insight into the systems investigated. However, algorithm proposed in this study unfortunately does not solve some fundamental limitation of SIM. Firstly, SIM does not address to the "bottlenecking" issue of IO, i.e., when demand exceeds the current production capacity, industries will not be able to linearly scale up production as a response. Secondly, SIM does not differentiate changes in capital and consumer goods. Unlike other model frameworks which have factored in the relationships between changes of capital goods and productivity, SIM is not able to well model the impact on capital changes as an endogenized parameter. Later research on SIM methodological improvement can take the momentum of this study to overcome the weakness of SIM by exploring its integration with other IO variant, such as dynamic IO analysis.

Data availability statement
Data are available upon request from the authors.

Declaration of interest statement
We wish to confirm that there are no known conflicts of interest associated with this publication and that there has been no significant financial support for this work that could have influenced its outcome.