AREA SPECIFIC EFFECTS SELECTION OF SMALL AREA ESTIMATION FOR CONSTRUCTION OF REGIONAL CONSUMER PRICE INDICES IN INDONESIA

Small area estimation (SAE) techniques are now widely employed to produce parameter estimates for smaller domains where sample sizes cannot be used to deliver direct estimation, such as regional consumer price indices (CPIs). Area-specific effects have important roles in attaining reliable parameter estimates in SAE. Thus their exclusion and inclusion in the model should be carefully examined to get accurate prediction. In practice, it is common to analyze large scale data in which the normality assumption of SAE models maybe violated due to sparsity of the area-specific effects. In this research, we consider the number of regional baskets of goods and services varies across districts. Hence, the calculation of regional CPIs needs different regional baskets of goods and services. To calculate such indices the normL1 penalty for area-specific effects selection is proposed. The response variables are the CPIs of January 2018 whereas the auxiliary variables are infrastructures and resources which are available in the village potential census in 2018 (PODES 2018). The proposed model fits to predict Regional CPIs by groups of expenditures 2 PUSPONEGORO, KURNIA, NOTODIPUTRO, SOLEH, ASTUTI in the sampled districts based on the empirical relative RMSE and area-specific effect selection performance. Finally, based on the regional CPIs prediction values of all over districts/cities in Indonesia the general CPI of Indonesia during January 2018 can be calculated.


INTRODUCTION
Consumer Price Indices (CPIs) are important indicators for monetary stability and inflation. In Indonesia, CPIs are officially calculated on the national level and 82 sampled districts in the urban area. However, the need of regional CPIs (RPCIs) all over districts is on demand as the district development grows. RCPIs show differences in the rate of regional inflation, different regional baskets of goods and services. Regional commodity baskets vary in amount and product of goods and services. In Indonesia, the commodity baskets are grouped by the 1999 Classification of Individual Consumption According to Purpose (COICOP). If the amount of the goods and services are considered as the sample size to the RCPIs prediction thus they may be poorly estimated because some districts have relatively small or zero amount of goods and services compared to the estimates at national level. Thus, one of the proposed solutions to this problem is to utilize the Small Area Estimation (SAE) method.
Small area estimation is estimation of population parameters for small areas or domains, where the sample size for a particular area or domain is too small to provide a valid estimate. The SAE area-level model to estimate CPIs has been demonstrated in various research papers such as to set up a regional basket with higher accuracy in the United Kingdom [1]. It utilized LASSO regression in auxiliary variables selection. SAE method is also applied to estimate consumer expenditure to construct the regional CPI [2], and to predict the district's CPI in West Java and Maluku Provinces, Indonesia [3].
Area-specific effects have important roles in attaining reliable parameter estimates in SAE  Their values are insignificant if the observation have the similar and large sample size, therefore we need select the actual small area to get the reliable estimate. The selection process set up the area-specific effect as zero for a certain area with sufficient sample to deliver direct estimation, whereas it preserves the nonzero value for the small area. This sparsity of area-specific effects brings heavy tails and violation in its normality assumption [4,5]. Thus, the existing SAE methods cannot handle the complexity of area-specific effects. Small area estimation with automatic random effects selection (SARS) model is a linear mixed model that developed to mitigate the random effects or area-specific effect selection issue [6].
SARS model utilized hard-ridge penalty to select the area-specific effects with an iterative selection-estimation algorithm. It employed normL0 as one of the components of the hard-ridge penalty that is discrete and non-convex. SAE with the normL1 penalty is developed to shrinkage the parameter estimate and select the area-specific effects. The research focused on developing a small area estimation model to obtain an accurate response prediction value. In the simulations, the proposed model brings out the smallest mean square error and good performance at shrinkage ability of the area-specific effects [7].
This study is conducted to investigate the parameter estimation accuracy of the SARS model and SAE with normL1 penalty for area-specific effects selection and select the relevant areaspecific effects, to estimate Regional CPIs Group of Expenditures of sampled districts in Indonesia, and to predict the Regional CPIs Group of Expenditures of non-sampled districts in Indonesia. This paper is organized as follows: Section 2 discusses Small Area Estimation with Automatic Random Effects Selection (SARS) model, Section 3 presents the SAE with normL1 penalty as the proposed method, Section 4 describes the data set and preprocessing of constructing the regional CPIs in Indonesia and section 5 presents the results, and finally, Section 6 is about conclusions and future work of this study.

MODEL
The small area estimation area-level model is known as the Fay-Herriot (FH) model. It's a special form of the linear mixed model, which can be written as follow: where be the × 1 vector of the parameters of inferential interest and assume that the direct estimator ̂ is available, is a vector of independent sampling errors with mean vector 0 and known diagonal variance matrix = diag( 2 ) , 2 representing the sampling variance of the direct estimators of the i-th area. Moreover, is the × 1 vector of fixed effects parameters of the auxiliary variables, u is the × 1 vector of independent areas-specific effects with zero mean and × covariance matrix ∑ = 2 where m is the number of small areas [8].
Based on the FH model that can be presented in the form of a linear model and expressed in formula (1), let be a set of small areas. Then ~(0, 2 ) for ∈ and = 0 for ∈ therefore we assumed that the area-specific effects are sparse, an SAE method that was developed to mitigate this issue is SARS model. SARS is a small area modelling that considers the selection of area-specific effects employs the hard-ridge penalty as the penalized function is stated in formula (2).
where 0 is the tuning parameter for the hard penalty in order to optimized SARS prediction (

3) AREA SPECIFIC EFFECTS SELECTION OF SAE FOR CONSTRUCTION OF RCPIs
Optimization problem in SARS is challenging due to the non-convex and non-smooth feature of the hard-ridge penalty. Thus, it is necessary to develop penalized SAE methods that not only select the area-specific effects but also shrinkage the value of their coefficients at once, such as utilize the normL1 penalty.

SELECTION MODEL
The dual penalty that is employed in the SARS model does not interact with each other. The normL2 penalty contributes to shrinkage estimation and the normL0 is solely engaged in sparsity.
On the other hand, the normL1 penalty can fulfil the normL0 regularization and also shrinkage the parameter estimate to improve prediction performance. Based on the FH model in equation (1) and the obeective function of the SARS model in equation (3), thus the obeective functions of the small area estimation model with the area-specific effects selection with normL1 penalty can be written in formula (4) as follow: where is the tuning parameter for normL1 penalty and ≥ 0. Regarding computation, it involves convex optimization thus area-specific effects estimate can be obtained by utilizing the coordinate gradient descent approach which is known as efficient algorithms [9].
Under assumption the sparsity of , based on the perspective of predictive learning [10] and predictive information criterion [11], SARS prediction information criterion (PIC) is used as criteria to achieve the optimal prediction tuning parameter. Since error model variance are unknown thus the application use SARS-PIC for free sampling error variance which can be expressed as in formula (5): where based on the simulation result and empirical experiment [6]. Thus, the variance of error model 2 (2,3), and the value of Δ( )is defined by: with ( )is the number of area-specific effects rows that have coefficient values of parameters that are not equal to zero and it indicate the true small area or area with insufficient samples.

DATA SETS
This  In the SAE model with area-specific effects selection, we take into account the amount of commodity in a regional basket of goods and services. It represents the sample size of each area, and their distribution by the group of expenditure and cities can be seen in Figure 1 above. Figure   1 shows the general result about the count of goods and services in sampled cities, the most commodity packages are in DKI Jakarta and mostly Transport, Communication and Financial Services is the maximum count of the commodity. It also describes a city with the largest count of the sampled commodity in each group, such as Medan has the largest count of the sampled commodity in Foodstuff. Sorong is the city with the largest count of the sampled commodity in prepare food, beverages, cigarette, and tobacco. Moreover, Sibolga, Padang Sidempuan, Bandung, Pontianak and DKI Jakarta are cities with the largest commodity packages in COICOP3 until COICOP7.

PREPROCESSING DATA
In Indonesia, CPI is calculated based on the result of consumer prices data processing in each city.
Direct estimators are needed to estimate the parameters for estimating the small area of CPI. The estimator was directly obtained from data collection of the consumer prices covers the goods and services whose quality/brands are generally consumed by the people in the respective city. The consumer price data are obtained from the selected respondents/retailers. This means that the probability of sampling districts/cities and commodities of goods or services is not known, even though this probability is needed to determine the weights for estimating a statistic.
The RCPIs of group expenditure statistics are estimated using a weighted average formula using the weighted results of the 2012 CLS. Furthermore, the direct estimator obtained by this method in this study is called the weighted proportional sampling (WPS) estimator and its mean square error can be written as follows: where and ̂ are the weighted and direct estimate Regional CPI of the j-th sub group of expenditure, the i-th district and the g-th group of expenditure. Thus, the number of expenditure subgroups in the i-th district and the g-th group of expenditure with = 1,2, … , , = 1,2, … ,82 and = 1,2, … ,7.    Based on Figure 2 and Table 1, it can be summarized that the WPS method obtains a precise prediction of regional CPIs of group expenditure. So that, WPS prediction values is employed as direct estimation Regional CPI of Group Expenditure in this study.

MAIN RESULTS
The first aim of this study is to investigate the prediction accuracy of SAE with the normL1 penalty and to select the relevant area-specific effects. SAE with normL1 penalty for area-specific effects selection (SAEL1) is the proposed model, thus we compare the proposed model with SARS and three EBLUPs for the small area. We consider the Fay-Herriot model on (1) with = + ; = 1, 2, … , . Assuming 2 and are known, thus the best predictor (BP) of is given by: where = 2 2 + 2 . In the small area estimation, the sampling variances 2 are assumed to be known, however both of 2 and are unknown in practical. So that, is estimated by replacing with its maximum likelihood estimator (MLE) for the EBLUP of and the empirical best linear unbiased predictor (EBLUP) can be written as: with variance area-specific effects 2 can be estimated with maximum likelihood estimator (ML),  The empirical relative RMSE are presented in Table 2 and we can investigate the prediction accuracy of the proposed model and previous SAE models. SAE with normL1 penalty for areaspecific effects selection as the proposed model brings out the smaller empirical relative RMSE compared to other models. SAE with the normL1 penalty method obtains a selection of the areaspecific effects to select the relevant area-specific effects. The proposed method brings out the rates of nonzero area-specific effects from the application data are about 10% by the group of expenditure RCPIs, respectively. Based on the empirical relative RMSE and area-specific effect selection performance, we can summarize that SAE with normL1 penalty fits to predict RCPIs by the group of expenditure in the sampled cities.
This study also derives the total RCPIs prediction of sampled cities in Indonesia. Figure 3 displays the distribution of the direct estimation RCPIs, and the prediction values by the SARS method and SAE with normL1 penalty method. Figure 3 carried out the comparison of prediction results, the SAE with normL1 model is quite good at prediction of regional CPIs by the group of expenditure, total regional CPIs moreover at prediction of Indonesia general CPI.

Figure 3. Regional CPIs Prediction of Direct Estimation, SARS and SAEL1 Method
In the last obeective of the study, we predict Indonesia general CPI using RCPIs all over the districts/cities. To fulfil it, we derive regional CPIs by the group of expenditure and total regional CPIs of non sampled districts/cities in Indonesia. We employ the maximum value of area-specific effects of each province in formulation to predict the RCPIs by the group of expenditure of non sampled districts/cities. Furthermore, we also utilize the mean of weighted of sampled districts/cities by the group of expenditure to compute the total RCPIs prediction of non-sampled districts/cities. Figure 4 represents the prediction of total RCPIs prediction of districts/cities in Indonesia. Based on the direct estimate and WPS method, total RCPIs prediction of non-sampled districts/cities are the mean of those values for each province. Thus, the prediction values in the non-sampled districts/cities in each province are the same. Those are displayed as a straight line in On the other hand, the RCPIs prediction values of SAE with normL1 penalty are vary. The results are categorized by island in Indonesia and displayed in Figure 4 above. Figure 4 displays price heterogeneity across provinces even more inter-regional in Indonesia. For Indonesia, the consumer price index measures the pure price change in a selected basket of goods and services (of constant quantity and quality) typically purchased by Indonesian households. Thus, the diversity suggest that some cities move aggregate prices more than others [13,14]. Figure 4 gives evidence that prices stability take place in districts/cities of the middle region of Indonesia. And, heterogeneity of basket of goods and services prices occur in districts/cities of the frontier, the outermost and least developed regions often referred to as 3T (terdepan, terluar dan tertinggal) regions. Finally, based on the regional CPIs prediction values of all over districts/cities in Indonesia, we calculate the general CPI of Indonesia of January 2018 is 131.4059 with its standard deviation is 23.4684.

CONCLUSION
This paper constructs Indonesia general CPI using prediction values of RCPIs all over the districts/cities. The prediction RCPIs derive from SAE with the normL1 penalty method as the proposed model. The proposed model obtain good results on the prediction of RCPIs by the group of expenditure at sampled and non-sampled districts/cities. The proposed model also attain good performance at area-specific effects shrinkage and selection of application data, since it brings out the small value of nonzero area-specific effects rate. Furthermore, the usage maximum value of area-specific effects of each province in the prediction formula has good performance compared to the official CPIs that published by BPS-Statistics Indonesia. The prediction results display price heterogeneity over districts/cities. Yet, formal price tests are needed to investigate districts/cities in each province contains cities that contribute more to prices changes, since better price control in these leader cities will allow for faster convergence to price stability. SAE with normL1 penalty method borrows auxiliary variables strength of 82 sampled cities to calculate the total RCPIs prediction of non-sampled district/cities. Heterogeneity of prediction RCPIs might be caused by the auxiliary variables selection of model, especially in this 'Big" data era. Database size and technology has developed rapidly, it offers high dimensional of auxiliary variables in the SAE approach. Future works will conduct to mitigate the issues.

CONFLICT OF INTERESTS
The author(s) declare that there is no conflict of interests.