Research on China insurance demand forecasting: Based on mixed frequency data model

In this paper, we introduce the mixed-frequency data model (MIDAS) to China’s insurance demand forecasting. We select the monthly indicators Consumer Confidence Index (CCI), China Economic Policy Uncertainty Index (EPU), Consumer Price Index (PPI), and quarterly indicator Depth of Insurance (TID) to construct a Mixed Data Sampling (MIDAS) regression model, which is used to study the impact and forecasting effect of CCI, EPU, and PPI on China’s insurance demand. To ensure forecasting accuracy, we investigate the forecasting effects of the MIDAS models with different weighting functions, forecasting windows, and a combination of forecasting methods, and use the selected optimal MIDAS models to forecast the short-term insurance demand in China. The experimental results show that the MIDAS model has good forecasting performance, especially in short-term forecasting. Rolling window and recursive identification prediction can improve the prediction accuracy, and the combination prediction makes the results more robust. Consumer confidence is the main factor influencing the demand for insurance during the COVID-19 period, and the demand for insurance is most sensitive to changes in consumer confidence. Shortly, China’s insurance demand is expected to return to the pre-COVID-19 level by 2023Q2, showing positive development. The findings of the study provide new ideas for China’s insurance policymaking.


Introduction
China's insurance industry has maintained rapid growth for many years, and the scale, structure, and quality of insurance have been greatly improved, but compared with the developed countries in the world, China's insurance industry is still at a relatively low level of development (see Table 1).As can be seen from Table 1, although the total premium income of China ranks the second level in the world, the insurance density and depth of insurance are far lower than that of developed countries, and even only half of the world's average level, it can be said that the development potential of China's insurance market is very large, and the demand for vector autoregressive model (MF-VAR) [9].Aruoba et al. propose a mixed-frequency dynamic factor model (MF-DFM) that combines state-space models [10].Foroni et al. propose reverse MIDAS (RR-MIDAS) and reverse U-MIDAS (RU-MIDAS) [11].However, the performance of the model's internal design and optimization has been neglected.We address this gap by considering the impact of different weight functions on the model and investigating the effects of three different prediction windows and different combination prediction forms on the prediction performance.Our findings lay a solid foundation for predicting future insurance demand.Thirdly, the optimal MIDAS model constructed in this paper is used for short-term forecasting and nowcasting of insurance demand in China.Previous studies mainly demonstrated the superiority and feasibility of different MIDAS models in the prediction field, without actually predicting the future.The remainder of the paper is organized as follows.Section 2 Literature review.Section 3 The data and methodology.Section 4 Empirical results.Section 5 Nowcasting and short-term forecasting.Section 6 Conclusion and policy implications.

Insurance demand influencing factors
Existing studies on insurance demand are mostly based on household microdata, examining the impact of factors such as household life cycle, financial status, asset mix, and gender on insurance demand.For example, Lin and Grace point out that there is significant heterogeneity in the demand for commercial life insurance among households in different life cycles, with older households allocating a lower proportion of commercial life insurance compared to younger households [12]; Tian and Dong point out that financial status directly affects the consumption demand for insurance, and different asset portfolios can create heterogeneity in the demand for insurance [13]; Wu and Zheng find that married female household heads are more inclined to purchase commercial insurance [14].
The above studies are based on the micro perspective, and macroeconomic volatility risk, as a separate contextual risk, has been identified as an important factor influencing insurance purchase decisions [1].The index of uncertainty in policies and economic conditions is an important indicator for measuring macroeconomic fluctuations in a country.The Economic Policy Uncertainty (EPU) index constructed by Baker et al. measures the degree of uncertainty in China's economic policies [2].Liu et al. and Ju et al. find that insurance demand increases with the fluctuation of economic policy uncertainty [3,4].At the same time, consumer expectations and actual disposable income levels also play an important role in affecting insurance premium expenditures [1,5].Therefore, we intend to use the Consumer Confidence Index (CCI), EPU, and Consumer Price Index (CPI) to construct a MIDAS model to predict the demand for insurance in China.Among them, CCI, EPU, and CPI are monthly high-frequency indicators, and insurance demand is measured using the quarterly low-frequency indicator of insurance depth (TID, regional premium income/regional GDP).

MIDAS and forecasting
The prevailing prediction models are based on the co-frequency data, such as the autoregressive (AR) model, autoregressive moving average (ARMA) model, generalized autoregressive conditional heteroscedasticity (GARCH) model, back-propagation neural network model, and the grey model (GM) (1, N).While relying on the co-frequency model, prediction is often challenged by the synchronicity of data updates, leading to the need to convert indicator data frequencies.However, this conversion process: 1) leads to an information loss of high-frequency data, reducing the accuracy and timeliness of prediction; 2) may distort the structure of the low-frequency data, resulting in distortion of the constructed high-frequency data.
The Mixed Data Sampling (MIDAS) regression model proposed by Ghysels et al. effectively addresses the limitations of traditional prediction models, it incorporates variables of different frequencies into a single model by assigning weight functions to high-frequency data, to analyze the impact of high-frequency data on low-frequency data [15].This method fully exploits the useful information contained in high-frequency data and enables advanced forecasting.Subsequently, the form and estimation methods of the mixed-frequency data sampling models have been continuously refined.Ghysels [16][17][18][19].In the meantime, the MIDAS model has been widely applied in practical research.For example, Marcellino and Schumacher used dynamic and static estimation windows and the Factor-Augmented Mixed Data Sampling (FA-MIDAS) model to forecast Germany's GDP [20].Foroni et al. construct several MIDAS models and the U-MIDAS model using various weight functions to make real-time forecasts of US quarterly GDP [21].Furthermore, the MIDAS model has also been used to predict inflation, energy consumption, price fluctuations, and others [22][23][24][25][26][27].The existing literature demonstrates that the MIDAS model exhibits advantages not only in terms of prediction accuracy over its counterparts in frequency-based forecasting models, but also in its ability to achieve nowcasting and short-term forecasting, thus improving the timeliness of predictions [9,28].

Data processing
The interval for the monthly indicators CCI, EPU, and CPI is 2006M1-2022M3, and the interval for the quarterly indicator TID is 2006Q1-2022Q1.where CCI and CPI are downloaded from the Wind database (www.wind.com.cn), and EPU is constructed using data from Baker et al. 2013, downloaded from www.policyuncertainty.com.To eliminate the seasonal periodicity of the time series data, we use the X-12-ARIMA method to seasonally adjust CCI, EPU, and CPI.Fig 1 shows the trend of TID with CCI, EPU, and CPI.Visually, both the explanatory and response variables exhibit a rising trend with oscillations, which suggests a certain correlation between TID and CCI, EPU, and CPI.Table 2 indicates that each explanatory variable is stable and can be applied directly to predict the MIDAS model.To further demonstrate the applicability of the selected explanatory variables, we conduct a mixed-frequency data impulse response experiment Fig 2 shows that the changes in CCI, EPU, and CPI have a significant impact on TID in the short term (generally within the first 10 periods), but maintain a stable relationship in the long term.

Model setup.
The MIDAS model integrates the concept of distributed lag models (DL) and is consistent with the construction mechanism of bridging models.By applying a weighting function, high-frequency data can be incorporated into a low-frequency regression model, thus achieving the desirable outcome of coexisting data of varying frequencies.
We set the univariate h−step ahead prediction as follows: where y t is the low-frequency explanatory variable in period t, and x ðmÞ t is the high-frequency explanatory variable, BðL 1=m ; yÞ ¼ X K i¼0 oði; yÞL i=m is the weighted polynomial function, and Bð1; yÞ ¼ X K i¼0 oð1; yÞ ¼ 1, L 1/m is the lag operator for high-frequency data, L i=m x ðmÞ tÀ h ¼ x ðmÞ tÀ h=mÀ i=m ; i ¼ 0; 1; � � � K À 1, ε t is a random disturbance term, m denotes the frequency ratio between the high-frequency variable and the low-frequency variable, in this paper we set m = 3.K is the lag order of high-frequency data, we set the highest lag order K−1 to 30.h is the prediction horizon, it represents the mixed frequency data sampling of one step forward when h = 1, that is, the high-frequency data of the first two months of the current quarter are used to predict the low-frequency data of the current quarter; when h>3, out-ofsample prediction can be realized.We construct the MIDAS model using high-frequency explanatory variables CCI, EPU, and CPI and explain variable TID and the lag effect of TID is considered.The following expression can be obtained: when there is no weight function, it is the U-MIDAS model, which can be expressed as: where q 1 , q 2 , q 3 is the lag order of high-frequency data, and is a random disturbance term.

Weighting function.
The key to the MIDAS data structure is frequency alignment, it is sufficient to treat each high-frequency variable x ðmÞ tÀ h observation corresponding to a lowfrequency variable y t as the same.It is easy to achieve for a small frequency ratio but will be cumbersome for a larger frequency ratio.e.g., if y t is monthly data and x ðmÞ tÀ h is daily data, then for each y t , there will be at least 21 observations x ðmÞ tÀ h corresponding to it.In this case, it is necessary to introduce a polynomial weight function ω(k;θ) to constrain the high-frequency variables, and the optimal effect of the MIDAS model can be achieved through the selection of parameter vector θ and lag order K.In other words, the information of high-frequency data can be retained and the number of parameters to be estimated can be reduced.In this way, the high-frequency data information can be retained, the number of parameters to be estimated can be reduced, and the data-driven automatic screening of the appropriate maximum lag order K avoids the problem of lag order selection in the case of no parameter constraints.Ghysels et al. find that the Beta weight function, Almon weight function, and Exp Almon weight function are more frequently used and generally have better prediction results [18].In this paper, we examine the predicted results of each weighting function.The weighting function can be expressed as follows: Beta weight function where The Beta polynomial weight function is derived from the probability density function in the family of Beta distributions [29], The different values of θ 1 ,θ 2 enable the generation of incremental, decreasing, and then increasing forms of weight changes.Beta weighting functions are used more often in the prediction and analysis of financial market volatility [8].
Almon weight function Exp Almon weight function Almon weight function and Exp Almon weights are some of the most commonly used polynomial functional forms that allow the construction of a wide range of weight functions with a guaranteed positive number of weights, which gives the equations the nice property of having zero approximation error [17] Two parameter exponential Almon lagging polynomials are often used in macroeconomic studies.The θ 1 �300, θ 2 <0 are generally carried out to satisfy the macroeconomic analyses and macroeconomic forecasting required by the weight forms [28].

Combination forecasting and weighting criteria.
We know that combined forecasting generally improves the accuracy and reliability of forecasting relative to single indicator forecasting, so we also investigated the estimation effects of univariate MIDAS models in different combined forms.The expression for combination forecasting, derived from multiple univariate predictors, is as follows: where n represents the total number of explanatory variables based on univariate predictions, ŷA j;tþhjt denotes the value of the forward h−step prediction in period t, ω j,t represents the weight of the combined prediction function, and the usual methods for determining the weights include the following: EW weighting method: BIC weighting method: MSFE and DMSFE weighting method: where T 0 corresponds to the starting point of the in-sample prediction.When δ = 1, it corresponds to the MSFE weight, and when δ = 0.9, it is the DMSFE weight.

Evaluation criteria.
The prediction accuracy is measured by the root mean square error (RMSE).The expression of RMSE is: RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X T In general, the smaller the values of RMSE, the better performance of the model.

MIDAS model selection
The selection of the weighting function and its optimal lag order is crucial for the MIDAS model.We construct three MIDAS models using the weighting above functions, in addition to the U-MIDAS model, resulting in four models in total.The selection criteria for the weighting function are as follows: firstly, a mixed-frequency data regression model is constructed for the insurance demand variable TID and a single explanatory variable x ðmÞ tÀ h , to obtain the optimal lag order of the model's weighting function for the high-frequency data, in which the lag order for the high-frequency variables is set from 1 to 30.The optimal model is determined based on the Akaike information criterion (AIC).Considering the possible autocorrelation of the explanatory variable TID, we included TID(-1) in the regression model.Secondly, considering that the COVID-19 event may have an impact on the model fitting effect, for this reason, we exclude the COVID-19 (from 2019Q3 to 2022Q1) period data for re-estimation.Finally, the COVID-19 period sample is selected to test the forecasting effect of the selected optimal model.Fig 3 represents the optimal univariate MIDAS model based on the full sample.The optimal MIDAS models for the high-frequency explanatory variables CCI, EPU, and CPI are Exp almon-AR(1)-MIDAS(3), U-AR(1)-MIDAS (7), and Exp almon-AR(1)-MIDAS(24), respectively, which indicate that the optimal lag orders in the full-sample estimation are 3, 7, and 24 months for CCI, EPU, and CPI, respectively.
Fig 4 represents the effect of fitting based on in-sample data.The optimal MIDAS models corresponding to the high-frequency explanatory variables CCI, EPU, and CPI in the in-sample estimation are Exp almon-AR(1)-MIDAS(6), U-AR(1)-MIDAS (24), and Exp almon-AR (1)-MIDAS(30), respectively, which indicate that the optimal lags of CCI, EPU, and CPI in the in-sample estimation are 6, 24, and 30 months, respectively.The optimal lags for the CCI, EPU, and CPI in the in-sample estimates are 6, 24, and 30 months, respectively.It is worth noting that the change in sample period does not change the type of optimal weight function, but increases the optimal lag order of each high-frequency explanatory variable, suggesting that the TID has become more sensitive to the effects of CCI, EPU, and CPI as a result of the COVID-19 crisis.In addition, these results suggest that the Exp Almon-MIDAS model and the U-MIDAS model outperform the other models.

Benchmark model comparison
In this section, we divide the whole sample period into the sample estimation period (from 2006Q1 to 2019Q2) and the short-term forecasting period (from 2019Q3 to 2022Q1), and analyze the forecasting performance of the selected optimal univariate MIDAS model and multivariate MIDAS model in the COVID-19 period in comparison with the traditional AR and ARDL models of the co-frequency.Table 3 presents the root mean square error values of the in-sample predictions of the TID explanatory variables under the optimal weighting function and its optimal lag order.The results show that the MIDAS model outperforms the co-frequency AR model and the ARDL model in terms of in-sample prediction accuracy.In addition, the MIDAS model outperforms the same-frequency model in terms of in-sample prediction, except for the univariate EPU.It is worth noting that the in-sample prediction accuracy is lower than the in-sample estimation accuracy due to the effect of COVID-19, especially for EPU, whose optimal lag order for in-sample prediction is shorter, which indicates that during COVID-19, the change of EPU has a relatively small effect on the TID, and the TID is less sensitive to the change of EPU.(As shown in Table 3, when the EPU indicator is excluded, the M(2)-AR(1)-MIDAS model is better than the covariate AR model and ARDL model, except for the univariate EPU.AR(1)-MIDAS function is better when the EPU indicator is excluded, as shown in Table 3).On the contrary, the in-sample prediction accuracy of CCI is better than the in-sample estimation accuracy, indicating that CCI is the main factor For the same MIDAS model, different prediction windows can also affect the prediction performance.Based on the three optimal models selected above, we examined the sample in-  sample prediction performance under three prediction windows: fixed, rolling, and recursive.Table 4 presents the RMSE values for the sample in-sample prediction of 1 to 9 periods ahead based on CCI, EPU, and CPI under prediction windows.From Table 4, we can see that the overall prediction performance of the MIDAS model is good, with RMSE values kept below 1.6118, and the lowest value is 0.2226.Among them, the prediction performance of the CCI indicator is the best, with the smallest and most stable RMSE value.The prediction performance of the EPU indicator is the worst, with a relatively high RMSE value.At the same time, the accuracy of the prediction gradually decreases with increasing prediction period h, which  indicates that the MIDAS model is more suitable for short-term prediction.Also, different prediction windows have different effects on prediction performance.Generally speaking, the prediction accuracy of the rolling window and recursive window better than the accuracy fixed window, but there are differences for single-variable prediction.For CCI, the rolling window prediction performance is better in the short term, the fixed window prediction performance is better in the medium term, and the recursive identification is better in the long term.For the EPU, the fixed window prediction performance is better in the short term, and the recursive window performs better in the medium to long term.For the CPI, the prediction performance of the recursive window is the best.Table 5 shows the RMSE values for the four combined forms of EW, BIC, MSFE, and DMSFE for periods 1 to 9. Based on the results presented in Table 5, it can be observed that the combined forecasting approach performs favorably, as evidenced by the stable and reasonable range of RMSE values, which are generally low.Specifically, the BIC-weighted approach appears to yield the most accurate forecasts, and when applied in conjunction with a shortterm rolling window, it exhibits superior performance compared to long-term recursive prediction.After validating the multiperiod in-sample predictions of insurance demand, the satisfactory predictive performance of the MIDAS model suggests that it can be employed to forecast future insurance demand.Adopting the MIDAS model would likely provide valuable insights for decision-makers in the insurance industry.

Nowcasting and forecasting
The preceding discussion has established the significant advantage of the MIDAS model in of forecast accuracy.In this section, we present evidence that the MIDAS model is capable of nowcasting and forecasting.Using estimates based on the full sample data, Table 6 presents nowcasting and forecasting of China's insurance demand for the period between 2022Q2 and 2023Q1.The main findings are as follows: 1. Overall, the model's performance weakens gradually as the horizon increases.This is because the corresponding amount of recent information gradually decreases as the horizon increases, leading to a decrease in the accuracy of predictions.Nevertheless, the forecast values remain within a reasonable range.2. There is significant volatility in the results of the EPU forecast, which can carry a risk of distortion.The TID values of the CPI and CCI forecast remain relatively stable, and the combination and multivariate predictions outperform the univariate predictions.
3. The TID value has increased to some extent compared to the level of 3.87 in 2022Q1 and is showing a favorable trend, with the expectation of recovering to the level before COVID-19 in 2023Q2.This is mainly to the normalization of COVID-19, the orderly implementation of various stabilization policies by the Chinese government, the gradual stability of social indicators, the continuous increase in the people's confidence index in the economy, and the continuous improvement in insurance demand.

Conclusion and policy implications
In this paper, we try to introduce the MIDAS model into the field of insurance demand forecasting.We select monthly high-frequency data indicators, i.e., CCI, EPU, and CPI, and build a basic MIDAS model by choosing the optimal weighting function and its lag order.In addition to this, we also investigated the impact of different forecast windows and combinations of forecast forms on the forecasting performance.To verify the superiority of the selected MIDAS model, we compare the forecasting performance of the co-frequency AR model and the ARDL model.Finally, the optimal model is used for present and short-term forecasting of future TID values in China.
The following main conclusions are drawn from this study.First, the MIDAS model exhibits higher prediction accuracy compared to the co-frequency AR and ARDL models.Specifically, rolling window and recursive prediction improved the prediction performance, while combined prediction stabilized the prediction results.In addition, the multivariate MIDAS model outperforms the univariate MIDAS model, and the forecasting performance of both models decreases with increasing forecasting horizon, suggesting that the MIDAS model is suitable for short-term forecasting.Secondly, CCI, EPU, and CPI have different explanatory power and predictive effects on TID.Among them, CCI has the best predictive effect and the strongest explanatory power for TID, especially when COVID-19-related uncertainty is included in the estimation interval.The optimal lag order of CCI becomes shorter, and the accuracy of the in-sample prediction improves, suggesting that CCI is the main factor influencing the level of TID during the COVID-19 period and that TID is more sensitive to changes in CCI.Third, from the perspective of nowcasting and short-term forecasting, China's TID is expected to return to its pre-COVID-19 level in 2023Q1, suggesting that TID will show a positive trend shortly.
In conclusion, the within-sample forecast results show that the mixed-frequency data model has a comparative advantage over the same-frequency data model in the within-sample prediction of China's insurance demand, and the estimation and prediction results of the model indicate that consumer confidence is an important factor affecting China's insurance demand in the COVID-19 period; the real-time forecast results suggest that China's insurance demand will maintain good growth in the post-epidemic era.
The above findings provide several policy insights.First, the Chinese government should refer to consumer confidence indicators in the process of formulating and implementing insurance policies; stabilizing consumer confidence is an important way to secure effective insurance demand.Second, policymakers should combine behavioral indicators with mixedfrequency models, which can provide a better grasp of insurance demand by using the effective information provided by mixed-frequency data.Finally, in the post-epidemic era, China's insurance market has better prospects, and policymakers should lay out their plans in advance to optimize insurance demand-side and supply-side reforms to safeguard the high-quality development of China's insurance industry.
This study also has some limitations.For example, this paper only judges whether there is a long-term relationship between CCI, EPU, PPI, and TID from the impulse response results, while it does not explore whether there is a bidirectional causality between the variables and whether there is a problem of covariance; at the same time, this paper does not discuss the limitations of the weighting functions of MIDAS and their impact on the prediction effect.Of course, these contents are not the focus of this paper, and will not affect the correctness of the conclusion.

Fig 2 .
Fig 2. Response of TID to CCI, EPU& CPI.(a) Presents the impulse response of insurance demand TID to the macroeconomic variable CCI.(b) Presents the impulse response of insurance demand TID to the macroeconomic variable EPU.(c) Presents the impulse response of insurance demand TID to the macroeconomic variable CPI.(d) Presents the impulse response of insurance demand TID to the insurance demand TID.https://doi.org/10.1371/journal.pone.0305523.g002

Fig 5 .
Fig 5. Comparison of in-sample predictions.The figure presents the optimal co-frequency model, mixed-frequency model (univariate, multivariate) based forecasts compared to the true values for the period 2019Q3-2022Q1.Following the legend from top to bottom, the true values, AR model forecasts, ARDL model (CCI, EPU and CPI) forecasts, Eep Almon-MIDAS (CCI) model forecasts, U-MIDAS (EPU) model forecasts, Eep Almon-MIDAS (CPI) model forecasts, and M-MIDAS (CCI, EPU and CPI) model forecasts, and M-MIDAS (CCI and CPI) model forecasts.Of course these models are based on the results under the optimal lag order and optimal weight function.From the graphs, it is obvious that the MIDAS model predictions are generally better than the co-frequency model predictions.4.3 Optimization of MIDAS estimation methods.https://doi.org/10.1371/journal.pone.0305523.g005

Table 4 . RMSE values of in-sample predictions under different prediction windows.
The black font indicates the optimal forecast level for the same period.https://doi.org/10.1371/journal.pone.0305523.t004