Forecasting Cases of Dengue Hemorrhagic Fever Using the Backpropagation, Gaussians and Support-Vector Machine Methods

Dengue disease has been known to the people of Indonesia since 1779. The Aedes mosquito has two types, namely Aedes aegypti and Aedes albopictus. Aedes aegypti is a mosquito that carries the dengue virus. The dengue fever cases in Bali province tend to increase from year to year, especially when approaching the rainy season. The government's preventive action is needed to tackle the spread of the dengue virus and casualties. Data mining attempts to extract known knowledge or use historical data to find regularity patterns and relationships in a set of data. In this study, data mining predicts the number of dengue cases in Bali's province. The prediction uses several database variables to predict future variables' values, which are not currently known. The process of estimating predictive values based on patterns in a data set. This forecasting aims to assist the government in predicting dengue fever cases in the coming period to prepare appropriate prevention efforts. Forecasting dengue fever cases are carried out using three methods: backpropagation, gaussians, and support-vector machine. The amount of data used was 528 sample data, from 2008 to 2018. The results obtained are that the backpropagation method is better at predicting dengue fever cases with a MAPE error rate of 0.025. Simultaneously, the gaussian method has a MAPE error rate of 0.035, and support-vector machine has a MAPE error rate of 0.060.


Introduction
Dengue virus infection (DENV) is a global health threat, affecting at least 3.6 billion people living in more than 125 countries in tropical and subtropical regions [1]. Dengue disease has been known to the people of Indonesia since 1779. This disease is a precursor to dengue fever (DHF). Dengue hemorrhagic fever (DHF) or dengue hemorrhagic fever (DHF) attacked Indonesians for the first time in Jakarta in 1968 [2]. The Aedes mosquito spreads the dengue virus. The Aedes mosquito has two types, namely Aedes aegypti and Aedes albopictus. Aedes aegypti is a mosquito that carries the dengue virus; this mosquito lives indoors. Aedes albopictus is a type of mosquito that carries the dengue virus, but its concentration is small compared to Aedes aegypti. The Aedes aegypti mosquitoes' habitat is indoors, while the Aedes albopictus is in the bush. Ordinary people, it is not easy to detect early diagnosis of DHF. Due to the typical early symptoms of DHF, such as bleeding on the skin or other signs of bleeding, it sometimes occurs only at the end of the disease period [3].
Data mining attempts to extract knowledge that includes collecting or using historical data to find regularity patterns and relationships in a data set [4]. Artificial Neural Networks are potent data modeling capable of capturing and representing complex Input-Output relationships, besides a microscopic approach where information is presented by neuronal excitation patterns [5]. Artificial neural networks (ANN), as a part of Artificial Intelligence in computer science, are widely used to solve forecasting or prediction problems, especially those based on time-series data [6]. Rapid Miner is a data mining tool that provides data mining and machine learning procedures, including data preprocessing, visualization, modeling, and evaluation [7]. This research aims to know which method is best for predicting Dengue Hemorrhagic Fever cases based on the level of measurement accuracy and forecasting of Dengue Hemorrhagic Fever Cases in Bali Province in the next phase.
In this study, data mining predicts the number of dengue cases in Bali's province. The prediction uses several database variables to predict future variables' unknown values. The process of estimating predictive values is based on patterns in a data set [8].
This research use backpropagation, gaussian, and support-vector machine methods. Backpropagation is a learning method whose architecture has multi-layer layers where learning is repeated to create a system that is resistant to damage and consistently works well, and is often used to solve complex problems [9,10]. Gaussian Processes are one of the most widely used stochastic processes to model-dependent power that is observed over time, gaussian process collects random variables where these variables' value has a finite value [11]. Support-vector machine is one of the most effective machine learning algorithms. From a practical and theoretical perspective, support-vector machine problems are solved by solving Lagrangian equations that are a dual form of support-vector machine through quadratic programming [12].
Previous research that used the backpropagation methodology was the study of "Implementation of the Backpropagation Algorithm to Predict Student Graduation." The data used were 318 samples of data, 70% of which used as training data, and the other 30% will be used as testing data [13]. Poonpong Suksawang researched gaussian method forecasting in a journal entitled "Electricity Consumption Forecasting in Thailand using Hybrid Model SARIMA and Gaussian Process with Combine Kernel Function Technique." The data used is data on electricity consumption for 11 years, from 2005 to 2015 [14]. Other research conducted at PT. The Java Bali Gresik Power Plant using the supportvector machine method. The data used were taken from the history of water usage for seven months, while the test data used were 49, which were taken randomly from all data [15].
Approaches with backpropagation, gaussians, and support-vector machine that use learning and training methods can be applied to identify data patterns from the prediction system for dengue hemorrhagic fever (DHF) using data from the Bali Provincial Health Office and data mining tools that be used in this research is rapidminer.

Research Method
The data mining method or knowledge discovery in database (KDD) is a term used to describe knowledge discovery in databases. Data Mining is a process that uses statistical techniques, mathematics, artificial intelligence, and machine learning to extract and identify useful information and related knowledge from various large databases [4]. Figure 1 is an overview of this research.

.2 Normalization
Normalization data is a technique that functions to eliminate data redundancies. Normalization process helps make more precise numerical calculations and improves the accuracy of forecast results. The equation used in the min-max normalization is as follows [16] I Made Yudha Arya Dala, I Ketut Gede Darma Putra, Putu Wira Buana RESTI Journal (System Engineering and Information Technology) Vol. Which v' is normalization data, v is actual data, minA is lowest data value and maxA is highest data value.
Normalization is carried out on each attribute, namely the number of sufferers, the total number of deaths, the number of male deaths, and female deaths. Normalization results of data on the number of dengue fever cases can be seen in Table 2.

Training Data and Testing data
Dengue case data is divided into two datasets, namely training data and testing data. A division ratio of 50-50 60-40, 70-30, and 80-20 of 528 sample data. Training data is pre-existing data based on facts that have occurred, and testing data is data used for the accuracy of the data mining model that has been created. The technique used is split validation.

Forecasting Model
Forecasting models are created using the Rapidminer 9.2 tools. This study uses a forecasting model using three methods: the backpropagation method, gaussian, and support-vector machine. Figure 2 is a picture of the forecasting model made in Rapidminer 9.2.
The data retrieve operator functions as a place to accommodate training data and test data. The windowing operator functions to increase accuracy in forecasting data in the form of time series. In contrast, the set role operator functions to determine the attributes processed by the method used. The model operator functions to carry out the forecasting process using the backpropagation, gaussian, and support-vector machine methods. The operator applies model functions to store the data mining process that has been carried out on training data to apply the process to data testing. Operator Performance Operators results and performance functions to display data mining results and the level of accuracy of the methods used. Excel Write operator functions to convert data mining results from Rapidminer into Excel.

Forecasting Results
Forecasting results that have been obtained, of course, will vary depending on the prediction model used.
Forecasting results containing forecasts from the test data will be visualization using a line graph. Prediction results will display the forecasting of the Number of cases, male deaths, female deaths, and the total number of deaths.

Denormalization
Data from forecasting results are reprocessed with data denormalization. The denormalization process serves to return the data value to its original form. The forecast results are converted back into values like the initial data with Equation [17].

Visualization
Data visualization is done to make it easier to understand the data that will be used in forecasting. The visualization used in this research is in the form of a graphic created using Microsoft Excel tools.

Measurement of MAD and MAPE
This study's two measurements to calculate the forecasting results' accuracy using the backpropagation method, gaussians, and support-vector machine are as follows.

Mean Absolute Percentage Error (MAPE)
Mean Absolute Percentage Error is a measure of relative error. MAPE states the percentage of error forecasting results against actual requests during a specific period, which will provide information on the percentage of too high or too low errors. The MAPE equation is as follows [18].

Results and Discussion
They were making a forecast model for dengue fever case data in Bali Province on the Rapidminer 9.2 application. The backpropagation, gaussians, and support-vector machine forecasting methods are used to predict the attributes in dengue fever cases in Bali province, namely the number of dengue cases, the number of women who died, the number of men who died, and the total number of deaths.  Figure 3 shows the forecasting model using the backpropagation method found in the modeling operator view. The determination of hidden layers, training cycles, and learning rates from the backpropagation method is carried out in the Parameters view. The results of forecasting the backpropagation method can be seen in Figure 4. Figure 4 shows the consequence of forecasting in the Rapidminer application using the backpropagation method. The expected value above is still in normalized form, and it will be entered into Microsoft Excel to be normalized to make data and MAPE measurements easier to interpret.   Figure 4 shows the consequence of forecasting in the Rapidminer application using the gaussian method. The expected value above is still in normalized form, and it will be entered into Microsoft Excel to be normalized to make data and MAPE measurements easier to interpret.   Figure 4 shows the consequence of forecasting in the Rapidminer application using the Support-vector machine method. The expected value above is still in normalized form, and it will be entered into Microsoft Excel to be normalized to make data and MAPE measurements easier to interpret.
Forecasting results and each attribute's best architecture in the backpropagation, gaussians, and support-vector machine methods are as follows.  Table 3 is the best architecture for each parameter of the backpropagation method and the results of forecasting the next two years for each attribute. As predicted, the number of dengue fever cases has decreased from 2018, which previously amounted to 16415 to 12915,118 and 13289,213, respectively. The total number of deaths from the forecast results decreased from 2018, which previously amounted to 37, respectively, to 29,397 and 29,079. The number of male deaths resulting from the forecast increased from 2018, which previously amounted to 14, respectively, to 18,539 and 18,535. The number of women who died due to the forecast decreased from 2018, which previously amounted to 23 to 14,750 and 12.         Figure 6 is a bar chart of the final forecasting results from dengue fever data. This value is obtained from the average results for each attribute based on the method used. Backpropagation is the best method with an error rate of 0.024, while forecasting using the Gaussian method has an error rate of 0.035. Moreover, supportvector machine has an error rate of 0.041.

Conclusion
The backpropagation method is a better method than the Gaussian method and the Support-Vector machine in predicting cases of dengue fever in Bali Province based on the analysis carried out because the error rate value of the backpropagation method is smaller, namely 0.024, while forecasting using the Gaussian method has an error rate of 0.035, and the support-vector machine error rate of 0.041. The advantage of this backpropagation method is that it has several units in one or more hidden screens because the backpropagation method is the best forecasting method for this study. The development of dengue fever case forecasting results can be done by adding additional methods, the forecasting method used should be more diverse. This aims to get better forecasting results.