Machine Learning Approach to Predict the Dengue Cases Based on Climate Factors

Dengue is a global health issue threatening public health, particularly in developing countries. Effective disease surveillance is critical to anticipate impending outbreaks and implement appropriate control responses. However, delays in dengue case reporting are frequent due to human resource shortfalls. Improved outbreak predictive capacity also requires additional input on vector presence and abundance, which is currently not captured in the surveillance platform. Thus, we developed a prototype AI application, “Dengue Forecasting,” that leverages machine learning methods in filing the dengue case report and incorporates dengue vector and climatic parameters. This application simplifies the recording of dengue cases, vector abundance ( Angka Bebas Jentik /ABJ/Larvae-Free Index) and selected climatic variables (sun exposure, temperature, humidity, wind speed, and precipitation) in Bandung City. The relevant data were extracted from Indonesia's Ministry of Health and the Meteorological, Climatological, and Geophysical Agency. The entire process, from developing the model to deployment, was conducted under R programming language version 4.2.2 using packages (caret, shiny.io). The linear regression model demonstrated the highest precision (RMSE= 268.32 and MAE= 164.1) in predicting the dengue cases and outbreaks. We also applied this to the application deployment. “Dengue Forecasting” has the potential to assist policymakers at the district level, complementing Dengue EWARS, in anticipating and mitigating dengue outbreaks, especially in Bandung City.


INTRODUCTION
Dengue is a global health issue threatening public health, particularly in developing countries.The global burden of dengue fever has increased significantly over the past few decades.If untreated, dengue can trigger an outbreak and cause death.This condition causes a significant burden on society, health systems, and the economy in most tropical countries in the world. (1)According to the World Health Organization (WHO), dengue fever cases reported to WHO increased from 505,430 in 2000 to 5.2 million cases in 2019, with most cases being asymptomatic or mild and therefore not reported.One modeling estimate suggested 390 million dengue virus infections yearly, with 96 million visible infections. (2)The disease has been endemic in more than 100 countries, including Indonesia. (3) the end of 2022, Indonesia reported 143.000 dengue cases, with the highest incidence in three provinces, including West Java, East Java and Central Java.More than 50% of the national cases are reported from these regions and 1,236 deaths were reported in 2022. (4)Bandung City, the capital city of the West Java province, has been affected heavily by dengue.There were 3,743 cases in 2021, which increased to 5,205 cases in 2022.Dengue incidence rate (IR) also elevated from 145 per 100,000 population in 2021 to 201 per 100,000 population in 2022. (5)ngue control is very complex and requires multisectoral collaboration and integrated approaches, including coordination and leadership, preparedness and response, case diagnostic, health care facilities, operation support and logistics, case surveillance and management, capacity building, risk communication, community engagement and integrated vector management (IVM). (3,6)Dengue control programs encountered many challenges.For example, environmental and vector controls require massive infrastructure investment and long-term human behaviour approaches to improve sanitation. (6,7)Moreover, chemical methods have been used to control Aedes larvae and mosquitoes for long periods, but some considerations arise from the method, including accessibility, environmental toxicity, and insecticide resistance.4)(15) Artificial Intelligence (AI), including Machine Learning (ML) and Deep Learning (DL) algorithms, has been introduced in many disciplines, including public health and healthcare settings.
The use of AI in public health will help public health practitioners perform any activities automatically, which means it can reduce the workload of health workers, reduce human errors, and improve productivity. (16)One of the advantages of AI is the ability to analyze a large and complex dataset to create models and predictions quickly and efficiently. (17,18)As explained above, one of the significant issues in dengue control is the lack of surveillance and prediction systems, which leads to delayed responses, preparedness and inappropriate regulation.AI can help to fill this gap for better decision making and disease control.This paper presents a prototype to utilize AI for dengue forecasting in Bandung City to assist policymakers and public health practitioners in controlling and mitigating the dengue outbreak.

METHODS AND MATERIALS
This study employs machine learning techniques, including Linear Regression, Ridge Regression, Lasso Regression, Regression Tree, Random Forest, and Support Vector Regression, to determine the most accurate model for predicting dengue fever trends from 2018 to 2022.Lasso and Ridge regression are widely used methods in linear regression to tackle overfitting and enhance the model's generalization ability.They incorporate a regularization component into the conventional linear regression cost function.A regression tree is a kind of decision tree employed in machine learning to handle regression problems.It takes the form of a hierarchy where every internal node makes a decision based on a feature, and each leaf node provides a prediction for the target variable. (18) contrast, Random Forest is an ensemble learning technique that constructs numerous decision trees during training and produces the average prediction (for regression tasks) of these individual trees.Another ensemble method is the Support Vector Machine (SVM).The SVM is a supervised learning algorithm that aims to find the hyperplane that best separates the data points into different classes while maximizing the margin.Although, this algorithm does not extract the variable of importance. (18)gure 1.

List of predictors and dependent variable
The data utilized consists of climate data from the Meteorology, Climatology, and Geophysics Agency, dengue cases, and vector density (Angka Bebas Jentik/ABJ) from the Ministry of Health.The research design is illustrated in the diagram above (Figure 1).The model's performance will be evaluated using the Root Mean Squared Error (RMSE) for the training test.We were using the testing data where we will assess the RMSE, Mean Absolute Error (MAE), and R 2 .The lowest RMSE and MAE will be selected as the best model. (19)rmula 1: RMSE and MAE formula The most effective model will be determined based on these indicators.In this study, we used R version 4.3.0 with a caret package. (20)SULT  The ideal model would exhibit the closest pattern to the observed data.Among the models evaluated, including Linear Regression, Ridge Regression, and SVR/SVM, it was found that these models could detect the sharp incline in cases in 2021.Specifically, Linear Regression emerged as the most effective model for dengue prediction.

DISCUSSION
learning algorithms (ML) are gaining significant interest in the creation of predictive models to forecast and monitor dengue transmission rates.Numerous studies have demonstrated the effectiveness of machine learning algorithms to predict the occurrence of cases and outbreaks.3) This study utilized several machine learning algorithms: Linear Regression, Ridge Regression, Lasso Regression, Support Vector Machine (SVM), Random Forest, and Decision Tree.(26)(27) Those methods also followed K-fold crossvalidation (CV) with five folds, which was one of the popular choices for machine learning analysis. (27)e results showed that linear regression became the best performance model among every model.These results were different from previous studies in machine learning utilization for dengue cases, where ensemble models became the best preferences for machine learning analysis. (28,29)This might be caused by the small sample size of this study (only 2018 to 2022), and the model only leveraged a single model to run the prediction compared to the ensemble models.approximately several weeks by the Ministry of Health.This will affect the imprecise time series data for modelling.This machine learning coul be possible way to solve those gaps.
Moreover, this study does not cover many other potential aspects that might contribute to dengue cases in Bandung City.One of the possibilities is population mobility. (43)Since Bandung City became the fourth biggest city in Indonesia, urbanization is inevitable.6) Follow-up research related to the other machine learning algorithms and hyperparameter tuning is needed to figure out how the model can be tailored to the location where it functions.
Besides, there will be a requirement to collect more samples from the district health office to verify the data before it has been used to perform the machine learning model.Thus, it may avoid bias that can be incorporated into the performed model.

CONCLUSION
In conclusion, this study contributes to the growing body of research on dengue transmission prediction through the application of machine learning algorithms.The findings highlight the effectiveness of linear regression in forecasting dengue cases, underscoring its potential utility in public health surveillance efforts.Notably, climate factors, exceptionally average wind speed, emerged as significant predictors, shedding light on the complex dynamics of dengue transmission.Overall, this study represents a significant step towards developing effective predictive models for dengue transmission, offering valuable insights that can inform evidence-based decision-making and ultimately contribute to the prevention and control of this vector-borne disease.

Figure 2 .
Figure 2. Training and Testing Data Partition

Figure 3 .
Figure 3. Exploratory Data Analysis of Variables

Figure 4 .
Figure 4.The Comparisons of Observation (tosca) and Predictions.The ideal model would exhibit the closest pattern to the observed data.Among the models evaluated, including Linear Regression, Ridge Regression, and SVR/SVM, it was found that these models could detect the sharp incline in cases in 2021.Specifically, Linear Regression emerged as the most effective model for dengue prediction.

Table 1 .
Algorithm and Hyperparameter values

Table 2 .
Model PerformancesThe analysis in the table shows that the Linear Regression model is the best model with the lowest RMSE and MAE.We found that the least number of RMSE and MAE was associated with the Linear Regression Model.It means that the model has a better result.Thorough analysis, the Linear Regression model was determined as the one that performs most effectively in predicting dengue outbreaks, supported by its lower Root Mean Square Error (RMSE) of 268.32 and Mean Absolute Error (MAE) of 164.1.Furthermore, our investigation identifies ABJ and average wind speed as significant influencers on dengue occurrences.

Table 3 .
Variable of Importance of Models *N/A= Not availableIn the table, "Ovitrap" and "Average Wind Speed" emerged as the top two variables of importance across all models.