A NOVEL APPROACH FOR HIGH-PERFORMANCE HEAT INDEX FORECASTING FOR THE HOTTEST REGION IN THAILAND

In Thailand, the hottest area is the northern inland plains located in the Northern Thailand. In summer, from the middle of February to the middle of May, the temperature may rises up to 40◦ or above due to the changing of northeast monsoon to southwest monsoon and the impact of relative humidity. Relative humidity is the major factor that makes people feel hotter than the actual temperature. If only the air temperature was noticed, it is possible to take risk of overheating and heat illness, especially heat stroke that can be deadly. The heat index has used as an effective warning measurement, this calculated by Steadman’s equation and yields the real feel of body. In order to prevent the heat illness, the predictive analytics such as time series forecasting should be applied. The regular series was constructed by several time points in consecutive daily heat index, the seasonal and cycle effects will be analyzed simultaneously. This scenario leads to the complicated time series model and may cause inaccuracy of forecasting. The proposed study modify the data structure as the series of specific date and time for thirty years, i.e., 1-April to 30-April at time 4.00 p.m., this reveals distinguished increasing trend from year by year. Three trend-focused forecasting model be applied, the two benchmarking models are Holt’s linear trend model and time series regression model, being compared with the proposed model called autocorelated-based decomposition. The forecasting results of Uttaradit and Chiang Mai provinces heat index in recent thirty years show that the proposed approach yields more accuracy than the benchmarks. For Uttaradit, the MAPE of the proposed model less than the others from 26.8% to 36.9%, and less than the others from 48.9% to 55.9% in RMSE. For Chiang ∗Corresponding author E-mail address: nootchanath.k@psu.ac.th Received April 23, 2021 4841 4842 P. YODPIBUL, N. KONGCHOUY, T. PANITYAKUL Mai, the MAPE of the proposed model less than the others from 16.9% to 36.9%, and less than the others from 27.5% to 61.6% in RMSE.


INTRODUCTION
The hottest area in Thailand is the area that away from the coast called the northern inland plains, specifically, the Northern Thailand. In December and January the Northern Thailand has lowest temperature especially the night temperature that can drop to 5 • C or below due to the high atmospheric pressure from China. Oppositely, in the summer period from February to May, the temperature may rises up to 40 • C or above due to the changing of northeast monsoon to southwest monsoon The major factor that causes this region has highest fluctuations in yearly temperatures is the impact of relative humidity. Figure1 shows the relative humidity in Thailand for last thirty years (1991-2020), we noticed that the northern inland region has highly variation compared to other regions. The Southeastern and Southwestern have quite stable relative hu-FIGURE 1. Variation of relative humidity in Thailand in 1991 -2020 midity that leads to the small changing in temperature. While the Northern has wider spread of relative humidity than the others, this leads the summer temperature of this region higher than others, especially in Uttaradit province which has highest temperature in Thailand. In addition, the relative humidity is the major factor that makes people feel hotter than the actual temperature. Because the increasing relative humidity, the decreasing rate of sweating, so it actually feels warmer outside than it is. If only the air temperature was noticed, it is possible to take risk of overheating and heat illness, especially heat stroke that can be deadly [20]. In order to prevent the illness from hot weather, we suppose to know the apparent temperature or real feeling temperature instead of the air temperature. The type of temperature called Heat Index, i.e., the combination between air temperature and relative humidity, used for measure the actual feel temperature. Hot weather leads to illnesses, for instance, heat syncope, heat cramps, heat exhaustion and heat stroke [2], [3], [19] not only for the the tourists but for the workers also as mention in [4] and [18]. Furthermore, the information be used as a scientific tool for helping a heat warning system construction [7] and [8], that will be used as a helping planning tool of heat in vulnerable areas as mention in [4] and [5]. This study provides a high performance heat index forecasting technique called autocorrelation-based decomposition (ACD) comparing to the Holt's linear trend model (HLT) [10], [11] and classical decomposition model (CDM) [16], [17].

DATA AND METHOD
Uttaradit province (UT) and Chiang Mai province (CM) are the northern inland region of thailand. From the period 2001 to 2016, the maximum temperatures in Thailand rose from 38-41 • to 42-44 • as mention in [4] and [7]. This suitable for heat index studying, because Uttaradit province was the hottest province in Thailand in many recent years, and Chiang Mai was the famous city for the tourists from the temperate zone countries and has the international airport. Forecasting heat index in both provinces has highly advantage for preventing the illness from hot weather. Since the heat index is the combination between air temperature and relative humidity, the Steadman's equation [9] and [22] describes the heat index calculation as shown in (1) HI = 42.38 + 2.049T + 10.14RH + 0.2248T RH + 6.88 * 10 −3 T 2 + = 5.482 * 10 −2 RH 2 + 1.228 * 10 −3 T 2 RH + 8.528 * 10 −4 T RH 2 + = 1.99 * 10 −6 T 2 RH 2 (1) where HI: heat index ( • F); T: dry bulb temperature ( • F); RH: relative humidity (%).
In Thailand, there was only air temperature data from the weather stations or meteorological station, we have to apply the Steadman's equation [9] in order to compute the heat index of Uttaradit and Chiang Mai provinces, and combine the heat index data as the original series.
There are 30-years time series of daily heat index between April 15th and May 15th, the hottest period in Thailand, was collected. It is reasonable for consider this period because there is the highest temperature period in every regions of Thailand, especially in the afternoon, says, 4.00 pm. Another reason is avoiding the complicated time series models for predicting the consecutive data points in this period and being comprised year after year. Becaused of these reasons, we have 30 series of heat index to forecast, i.e., series of heat index on April 15th at time 4.00 p.m from 1990 to 2020, series of heat index on April 16th at time 4.00 p.m. from 1991 to 2020, . . . , and so on, as shown in Table 1.       Uttaradit has lower variation than Chiang Mai, this may leads us to the considerably different final models between the two provinces.
As the reason of avoiding the complicated time series models, these models including seasonality that is the major task to handle. By using time series structure as we mention above, the series be remained only the trend, cycle and irregular components. It is appropriate to apply time series analysis methods which focus on trend.
We applied HLT and CDM as benchmark models, the HLT can be considered separately as where a i stands for the level estimate at time i, m i is a level estimate at time i, ϑ is smoothing parameter of the level, ζ is the smoothing parameter of the trend which 0 ≤ ϑ ≤ 1 and 0 ≤ ζ ≤ 1.
We applied additive classical decomposition method (CDM) as another benchmarking model in oder to separate the time series into linear trend and seasonal components, as well as error, and to provide forecasts. The additive model be applied for both CDM and ACD because the magnitude of the seasonal pattern does not change as the series goes up or down. Any time series has four components, i.e., the long-term tendency called trend, the periodic fluctuation within a certain time period called seasonality, the periodic fluctuation over a large time interval called cycles and the random noise or error called irregular.
For both CDM and ACD started with trend estimation,T i , that calculated by s-moving average since s stands for seasonal periods. (As we mentioned above, the data structure of this study designed for avoiding seasonality, seasonality in those data set might be psuedo-seasonality that explained by average El Nino cycle year in eastern tropical Pacific, that is five years.) The second step is detrending by subtraction the original series with trend estimates, obtaining the detrended series, z i −T i . The third step is computing the seasonal component by averaging the detrended values of each season, obtaining the seasonal indices,Ŝ i . The fourth step is subtracting the original series by the seasonal indices, obtaining the seasonal adjusted data, W i . The fifth step is predicting W i by simple prediction method such as random walk with drift (applied for CDM), obtaining theŴ i . The final step is just addingŴ i withŜ i to obtain the predictedẑ i . The fifth step has changed for ACD by adoption the autoregressive moving average (ARMA) approach, we observed that the ARMA(0,1) or MA(1) is the best condition forT i , this is an ensemble probabilistic approach [21]. The ARMA(p, q) in backshift form shown in (5) - where φ and p represent the parameter and the order of autoregressive process repectively, θ and q represent the parameter and the order of moving average process respectively, and B is This study had proposed the new approach of time series decomposition method called autocorrelation-based decomposition, ACD, since the autocorrelation means self-linear relationship between the original series and its lags, as shown in (8), the sample autocorrelation of any time series, where n is the length of time series (in this study all series have length 30). If the magnitude of r k not over the lower or upper bound of correlogram, then we can conclude that no linear relationship between the original series and k th -lag. Our ascription is at the same time (e.g. Apr The model validation has tested by Ljung-Box test, which tests the staionarity of the residuals by applying the Ljung-Box Q statistics in (9) that comprised from the autocorrelation function of the residuals.
where r 2 k is the autocorrelation at k th -lag.

RESULTS AND DISCUSSION
All models performance are shown in Figure 6 -7, For Uttaradit, the performance of ABD explicitly better than HLT and CDM in both MAPE and RMSE, while it is not differ from CDM in both MAPE and RMSE in Chiang Mai. In an average manner, as shown in Table 4, ADC manifestly overcome HLT and CMD by 27% -37% reduction in MAPE and 49% -56% reduction in RMSE for Uttaradit. As well as for Chiang Mai, the percentage reduction is 17% -37% in MAPE and 28% -62% in RMSE. Another visible evidence are from the residual diagnostic as demonstrated in Figure 8. Since the model validation testing by comparing the Ljung-Box Q statistics in (9) to the χ 2 -critical value which 1-degree of freedom, none of ACD over the critical value in both two provinces, we can clearly conclude that ACD suitable to forecast all series and area.     The proposed approach for heat index forecasting including data gathering, series construction and the modified decomposition method, has delivered the better forecasting result comparing to the classical benchmarking models, HLT and CDM, not only by performance measurements and model validity, but future forecasting also. Howerver, the further study should be invent in other approach for instance, frequency domain approach or machine learning approach.

CONFLICT OF INTERESTS
The author(s) declare that there is no conflict of interests.