PROGNOSTIC RISK FACTORS INDUCING ACUTE HEPATITIS CONTAGION IN JAKARTA, INDONESIA: LINEAR PREDICTIVE MODEL APPLICATION

. Hepatitis is a serious global health issue caused by a variety of infectious viruses and noninfectious agents, affecting the liver organ of the human body. In Indonesia, hepatitis has been widely transmitted, including in DKI Jakarta, where the number of acute hepatitis cases is a major concern. In this study, we investigated the number of acute hepatitis cases in 44 sub-districts of DKI Jakarta, Indonesia. Using RStudio, we constructed 15 mathematical models to identify risk factors that induce hepatitis transmission. The models were then subjected to a global Anova test (f-test) to determine the factors that significantly impact the number of acute hepatitis patients. Our analysis found that several risk factors were strongly associated with hepatitis transmission in DKI Jakarta. One of the most significant factors was the number of infants who had been immunized with hepatitis B vaccine (HBO). This finding suggests that increasing the number of infants who receive the vaccine could have a significant impact on reducing the number of acute hepatitis cases in the region. Other factors that were found to be strongly associated with hepatitis transmission included the total population and the number of diabetics in each sub-district. To support our findings, we used the pairs function to observe various scatter plots, which helped to visualize the relationship between the risk factors and the number of acute hepatitis patients. Our study provides important insights into the factors that contribute to hepatitis transmission in DKI Jakarta and highlights the need for effective prevention and control strategies to reduce the burden of this disease. Overall, our findings have important implications for public health policy and practice, not only in Indonesia but globally.


INTRODUCTION
Hepatitis is a disease that is widely dispersed worldwide, including in Indonesia, and is recognized as a serious global health issue [1].According to WHO report, approximately 1.45 million individuals die each year after suffering from chronic hepatitis [2].Viral hepatitis causes more deaths than other diseases such as malaria, tuberculosis, and HIV, and the number of cases is still rising since 1990 [3]. Hepatitis virus identified into five varieties, which are A, B, C, D, and E [4].After Myanmar, Indonesia has the second-highest endemic hepatitis B infection rate in Southeast Asia [5].In addition, the hepatitis virus has a harmful impact on the morbidity and mortality of individuals during the acute phase [6].Therefore, it is necessary to preserve Indonesian people from acute hepatitis through concerted efforts from stakeholders, including the government and non-governmental organizations.
Hepatitis is an infection that specifically targets the liver organ in the human body.It is classified as acute when its infection persists for more than six months, while regular hepatitis normally contaminates people for less than six months [7].People suffering from acute hepatitis not only experience symptoms ranging from mild to medium, but they also endure chronic complications [8].Yet, most people experiencing hepatitis are unaware of its symptoms, mistaking decreased appetite and exhaustion as normal indications after extra work [9], [10].If left untreated, acute hepatitis can lead to mortal effects on humans since other diseases caused by it are brain function drop, spleen magnification, and cirrhosis, among others [11].The process by which humans undergoing acute hepatitis are identified by a medical center.They will undergo a sequence of examinations, including a blood check to identify heart function, and imaging checks such as biopsies, MRI, and ultrasound tests for heart cancer filtration [12].
The risk of transmission can be reduced by addressing the various risk factors associated with the disease [13].Despite the availability of treatment, only two out of three individuals with acute hepatitis seek treatment in a timely manner [14].The growth of hepatitis infection is unpredictable, PROGNOSTIC RISK FACTORS INDUCING ACUTE HEPATITIS CONTAGION with symptoms that are often insensible to some individuals until they experience them personally and reach the cirrhosis stage [15], [16].The symptoms of hepatitis infection are observed from several physical manifestations, including feeling unwell, frequent exhaustion, decreased appetite, high fever, limb swelling, constricted blood vessels, fluid accumulation in the abdomen, upper abdominal pain, and jaundice or yellowing of the skin and eyes [17], [18].
The Ministry of Health has announced information regarding 18 suspected cases of chronic hepatitis with unidentified factors in January 2022.These cases were reported in several districts, including Jakarta, Bangka Belitung Island, East Java, West Java, West Sumatera, East Kalimantan and North Sumatera, with most cases occurring in Jakarta [19].Preventing the transmission of hepatitis is crucial to improving public health.The purpose of this experiment is to develop a model to forecast the transmission of acute hepatitis and to identify the predictor variables that significantly influence the response variables.The model aims to predict the number of acute hepatitis cases in each district of Jakarta, with the goal of reducing the spread of the disease.
Additionally, this study aims to analyze the correlation between predictor variables.The model for predicting the number of chronic hepatitis cases in any district of Jakarta has been successful, and the activities undertaken during this research have been beneficial.The study also examines the trends in how predictor variables affect the final data.

MATERIALS AND METHOD
This study involves data in the form of the number of acute hepatitis sufferers in districts of Jakarta.The data consists of 44 rows and 9 columns.Rows in the data show the number of subdistricts in DKI Jakarta studied, while the column in the data indicates eight predictor variables and one response variable.Eight predictor variables include the number of babies who had been immunized with HBO (X1), the number of health workers (X2), the number of health facilities (X3), the number of residents (X4), the number of residents who had proper sanitation (X5), the number of drinking water facilities which met health standards (X6), the number of diabetics (X7), and the number of HIV sufferers (X8).The combination of predictor variables is analyzed by using a multiple linear regression model in order to identify the influence exerted on chronic hepatitis transmission.The following general equation of multiple linear regression.c.There is no multicollinearity among independent variables.

𝑦 = 𝛽 0 + 𝛽
F test is subsequently used to identify the suitability of the model by looking at how the influence of all the independent variables together on the dependent variable or to examine whether the regression model that we create is fit or unfit.We pose the hypothesis that will be verified on the F test as follows.

Decision rules :
Reject  0 if   <  = 0.05, we declare that the model is fit, and the testing can continue.
Accept  0 if   >  = 0.05, we declare that the model is unfit.
The next examination after knowing the F test is in the form of T test application which is beneficial for testing whether these parameters have a significant effect on the model used.The following is the hypothesis tested on the T test.

Decision rules :
Reject  0 if   <  = 0.05, independent variables significantly effect on dependent variable.
Accept  0 if   >  = 0.05, independent variables have no effect on the dependent variable.
This study also considers Variance Inflation Factor (VIF) that is useful for determining how much multicollinearity is in a set of multiple regression variables.

EXPLORATORY ANALYSIS DATA
Data pre-processing is taken in order to retrieve some desired variables, as well as determine the interrelationships among variables.Data processing in this study utilizes the RStudio program.

Pre-processing
Data pre-processing completely involves four activities.The first activity is in the form of completing the blanks in the table with the average value.We conduct this action to fill the empty data that could possibly induce errors when data processing and calculation.Subsequently, we stipulate a method in order to complete the blanks with the average point of these variables.This method was selected with the argument that filling in empty values with the average score will not significantly affect the dataset.In the dataset, there are 2 blank data in X6 column.Hence, it will be filled with the average score of X6 column, which is 38.It will simplify further data processing and minimize calculation errors.We subsequently inspect the correlation among the data provided.We process data correlation in order to identify the correlation between the predictor variables and the target variable.After recognizing the correlation between the target variable and the predictor variables or even among the target variables, we notice that the predictor variables (Xi) have no significant effect on the target variable (Y).In addition, this investigation finds a high correlation among variables X1, X2, X3, and among variables X1, X4, X5 and X7 indicating a powerful relationship among several predictor variables (independent variables).This affair requires to be suspected as a case that have multicollinearity in the dataset owned.A multicollinearity occurs when there is a strong relationship between two or more independent variables in a multiple regression model.A multicollinearity must be avoided since it increases the possibility of rounding errors in estimating  and standard errors.As a result, the regression output will be confusing and tend to be wrong.
Further checking will be conducted in order to recognize and remove one or more correlated independent variables.

Dataset Visualization and Variable Analysis in General
In the initial stage, we must learn about several numeric variables since they are components in the model formulated, both the response variables and the predictor variables.Subsequently, the next stage asks us to describe data according to the table provided and visualize each variable.For instance, the graph of X3 shapes the right skewed corresponding to a negative degree of skewness, while the rest are left skewed linking with a positive degree of skewness.The section of visualization is conducted after removing some outliers from the data.

MODELLING AND MODEL SELECTION
After pre-processing data by eradicating outliers in the previous segment, data continue to process in R.An illustration of the process is provided in the following explanation.

First Regression Model
The first regression model is the primary model for this dataset, as it serves as the basis for selecting predictor variables and identifying multicollinearity.However, the p-values of the predictor variables X1 to X8 in this main effect model are larger than 0.05 and supported by a small R square, indicating that the model cannot accurately explain the variation in the response variable Y.The small R squared value also supports this conclusion.The figure 9 shows that the predictor variables (Xi) have no significant effect on the target variable (Y).Moreover, the output indicates a high correlation between X1, X3, X4, X5, and X7, indicating strong relationships among predictor variables.Additionally, the VIF values for X1, X3, X4, X5, and X7 are close to 4, indicating multicollinearity.This issue will be further investigated by creating hypotheses and conducting modeling experiments to determine the best variable and remove one or more correlated independent variables to create the desired model.In conclusion, this model indicates the existence of significant multicollinearity among its variables.

Second Regression Model
The second regression model generates output in the form of the numbers displayed in Figure 10, then we can compile a model using the fit model equation as follows.
̂=  ̂0 +  ̂1 1 +  ̂2 We get some information from the output presented in Figure 10.Firstly, the outcome of F test indicates that   >  with 0.6331 > 0.05, we deduce that the second model is unfit for predicting the number of chronic hepatitis patients.Furthermore, the outcome of T test indicates that there are no factors that influence the model.Figure 10 also reports another information that we can interpret, adjusted- 2 = 0.1207 which indicates that 12.07% of the sample variation y can be described through the model owned.VIF values of predictor variables are generated as follows.We conclude that there is no significant multicollinearity for several variables in this model.

Third Regression Model
The third regression model releases outcome in the form of the numbers presented in the Figure 11, then a model using the fit model equation we can create below.there are no factors that impact the model.Figure 11 also reveals another information that we can construe, adjusted- 2 = 0.1103 which presents that 11.03% of the sample variation y can be described through the model owned.Furthermore, we attain VIF values among predictor variables in Table 4.All evidence demonstrates that there is no significant multicollinearity for several variables in this model.

Fourth Regression Model
The fourth regression model generates output in the form of the numbers displayed in Figure 12, then we can compose a model using the fit model equation as follows.
̂=  ̂0 +  ̂1 We get some information from the output presented in Figure 12.Firstly, the result of F test demonstrates that   >  with 0.1071 > 0.05, then we can conclude that this fourth model is unfit for predicting the number of acute hepatitis patients.Moreover, the outcome of T test reveals that there are no factors that impact the model.Figure 12 also reports another information that we can interpret, adjusted- 2 = 0.09074 which indicates that 9.074% of the sample variation y can be illustrated through the model owned.In addition, we obtain some VIF values as follows.In conclusion, this model indicates the absence of multicollinearity among its variables.

Fifth Regression Model
The fifth regression model produces output in the form of the numbers presented in Figure 13, then we can formulate a model using the fit model equation as follows.
̂=  ̂0 +  ̂1  We obtain some information from the result presented in Figure 13.Firstly, the outcome of F test indicates that   >  with 0.1535 > 0.05, then we can conclude that this fifth model is unfit for predicting the number of sufferers of acute hepatitis.Furthermore, the outcome of T test reveals that there are no factors that influence the model.Figure 13 also reports another information that we can interpret, adjusted- 2 = 0.06896 which indicates that 6.896% of the sample variation y can be described through the model owned.In addition, we have some VIF values in Table 6.We deduce that there is no significant multicollinearity among several variables in this model.

Sixth Regression Model
The sixth regression model generates outcome in the form of the numbers provided in Figure 14, then a model utilizing the fit model equation we can compile as follows.
̂=  ̂0 +  ̂1 We have some information from the output presented in Figure 14.Firstly, the outcome of F test indicates that   >  with 0.06902 > 0.05, then we can conclude that this sixth model is unfit for predicting the number of acute hepatitis patients.In addition, the result of T test reveals that there are no factors that impact the model.Figure 14 also reports another information that we can interpret, adjusted- 2 = 0.1159 which indicates that 11.59% of the sample variation y can be described through the model owned.The following VIF values among predictor variables given.

Seventh Regression Model
The seventh regression model releases output in the form of the numbers provided in Figure 15, then we can formulate a model using the fit model equation as follows.
̂=  ̂0 +  ̂1 1 +  ̂2  We obtain some data from the outcome displayed in Figure 15.Firstly, the output of F test demonstrates that   >  with 0.1135 > 0.05, then we deduce that this seventh model is unfit for predicting the number of chronic hepatitis.Furthermore, the result of T test reveals that there are no factors that impact the model.Figure 15 also reveals another information that we can see, adjusted- 2 = 0.0873 which means that 8.73% of the sample variation y can be described through the model owned.In addition, we obtain some VIF values among predictor variables below.We resume that there is no significant multicollinearity among some variables in this model.

Eighth Regression Model
The eighth regression model produces outcome in the form of the numbers presented in Figure 16, then we can collate a model using the fit model equation as follows.
̂=  ̂0 +  ̂1 Notation  ̂1 = Predictor variable for the number of babies who had been immunized with HBO.
̂2 = Predictor variable for the number of residents.
̂3 = Predictor variable for the number of diabetics.We get some information from the output presented in Figure 16.Firstly, the outcome of F test indicates that   <  with 0.002472 < 0.05, then we conclude that this eighth model is fit for predicting the number of sufferers of acute hepatitis and testing can continue.Furthermore, the outcome of T test reveals that there are factors that significantly impact the model.Figure 16 also reports another data that we can analyze, adjusted- 2 = 0.2512 which indicates that 25.12% of the sample variation y can be described through the model owned.In conclusion, this model indicates the presence of significant multicollinearity among its variables.

Ninth Regression Model
The ninth regression model releases outcome in the form of the numbers provided in Figure 17, then we can formulate a model utilizing the fit model equation as follows.
̂=  ̂0 +  ̂1 Notation  ̂1 = Predictor variable for the number of babies who had been immunized with HBO.
̂2 = Predictor variable for the number of residents.
̂3 = Predictor variable for the number of diabetics.We can interpret some data from the output given in Figure 17.Firstly, the outcome of F test indicates that   <  with 4.169 * 10 -6 < 0.05, we conclude that this ninth model is fit for predicting the number of chronic hepatitis patients and testing can be continued.Furthermore, the outcome of T test reveals that there are factors that significantly impact the model.Figure 17 also reports another information that we can see, adjusted- 2 = 0.4664 which indicates that 46.64% of the sample variation y can be described through the model owned.All evidence indicates that there is significant multicollinearity among variables in this model.

Tenth Regression Model
The tenth regression model provides result in the form of the numbers displayed in Figure 18, then a model using the fit model equation we can compose as follows.
̂=  ̂0 +  ̂1 Notation  ̂1 = Predictor variable for the number of babies who had been immunized with HBO.
̂2 = Predictor variable for the number of residents.
̂3 = Predictor variable for the number of diabetics.
We can analyze information from the output provided in Figure 18.Firstly, the outcome of F test indicates that   <  with 0.0006104 < 0.05, we resume that this tenth model is fit for predicting the number of chronic hepatitis patients and testing can continue.Furthermore, the outcome of T test reveals that there are factors that significantly impact the model.Figure 18 provides another information that we can process, adjusted- 2 = 0.3054 which means that 30.54% of the sample variation y can be described through the model owned.We conclude that there is significant multicollinearity among several variables in this model.

Eleventh Regression Model
The eleventh regression model generates output in the form of the numbers presented in Figure 19, then we can arrange the following model applying the fit model equation as follows.
̂=  ̂0 +  ̂1 Notation  ̂1 = Predictor variable for the number of babies who had been immunized with HBO.
̂2 = Predictor variable for the number of residents.
̂3 = Predictor variable for the number of diabetics.We get some information from the output presented in Figure 19.Firstly, the outcome of F test indicates that   <  with 0.0001576 < 0.05, we can conclude that the eleventh model is fit for predicting the number of sufferers of acute hepatitis and testing can continue.Furthermore, the outcome of T test reveals that there are factors that significantly impact the model.Figure 19 also reports another information that we can see, adjusted- 2 = 0.3537 which indicates that 35.37% of the sample variation y can be described through the model owned.In conclusion, this model shows the presence of significant multicollinearity among its variables.

Twelfth Regression Model
The twelfth regression model releases outcome in the form of the numbers displayed in Figure 20, then we can compile a model using the fit model equation as follows.
̂=  ̂0 +  ̂1   We have some information from the output presented in Figure 20.Firstly, the outcome of F test indicates that   <  with 4.659 * 10 -5 < 0.05, we can conclude that the twelfth model is fit for predicting the number of sufferers of acute hepatitis and testing can continue.Furthermore, the outcome of T test reveals that there are factors that significantly impact the model.Figure 20 also reveals another information that we can analyze, adjusted- 2 = 0.3941 which means that 39.41% of the sample variation y can be described through the model owned.We deduce that there is significant multicollinearity among several variables in this model.

Thirteenth Regression Model
The thirteenth regression model produces outcome in the form of the numbers given in Figure 21, then we can compose a model using the fit model equation as follows.
̂=  ̂0 +  ̂1 Notation  ̂1 = Predictor variable for the number of babies who had been immunized with HBO.
̂2 = Predictor variable for the number of residents.
̂3 = Predictor variable for the number of diabetics.We can analyze some information from the output given in Figure 21.Firstly, the output of F test indicates that   <  with 1.679 * 10 -5 < 0.05, we can conclude that thirteenth model is fit for predicting the number of sufferers of acute hepatitis and testing can continue.Furthermore, the outcome of T test indicates that there are factors that significantly impact the model.Figure 21 also gives another information that we can interpret, adjusted- 2 = 0.4259 which means that 42.59% of the sample variation y can be described through the model owned.This model indicates the existence of significant multicollinearity among its variables.

Regression Model
After examining all the models, the 14 th regression model is selected as the best model when compared to other models.This is apparent from the following plot, that the data in model 14 is nearby to the estimation outcome of the model.sufferers, including the number of infants who had been immunized with HBO (X1), the total population (X4), and the number of diabetics (X7).The proposed model is then supported by the pairs function to observe various scatter plots.

FIGURE 1 . 6 S
FIGURE 1. Fill in blank data with an average score.

FIGURE 5 .
FIGURE 5. Variable type analysis and skewness illustration.

FIGURE 7 .FIGURE 8 .
FIGURE 7. Visualization of the relationship among numeric variables and its heatmaps.

FIGURE 9 .
FIGURE 9.The first regression output.
FIGURE 10.The second regression output.
12) PROGNOSTIC RISK FACTORS INDUCING ACUTE HEPATITIS CONTAGION Notation  ̂1 = Predictor variable for the number of babies who had been immunized with HBO. ̂2 = Predictor variable for the number of residents. ̂3 = Predictor variable for the number of diabetics.

TABLE 1 .
The classification of VIF value.

TABLE 2 .
The classification of VIF values among predictor variables in the first regression.

TABLE 3 .
The classification of VIF values among predictor variables in the second regression.

TABLE 4 .
The classification of VIF values among predictor variables in the third regression.
We obtain information from the outcome provided in Figure11.Firstly, the output of F test shows that   >  with 0.05654 > 0.05, then we can resume that this third model is unfit for predicting the number of acute hepatitis sufferers.In addition, the outcome of T test reveals that PROGNOSTIC RISK FACTORS INDUCING ACUTE HEPATITIS CONTAGION
Notation  ̂1 = Predictor variable for the number of health workers.̂2 = Predictor variable for the number of health facilities.̂3 = Predictor variable for the number of drinking water facilities who have met health standards.̂4 = Predictor variable for the number of HIV sufferers.FIGURE 12.The fourth regression outcome.

TABLE 6 .
The category of VIF values among predictor variables in the fifth regression model.
̂3 = Predictor variable for the number of diabetics.̂4 = Predictor variable for the number of HIV sufferers.FIGURE 14.The sixth regression result.

TABLE 7 .
The classification of VIF values among predictor variables in the sixth regression.
All evidence demonstrates that there is no significant multicollinearity for variables in this model.PROGNOSTIC RISK FACTORS INDUCING ACUTE HEPATITIS CONTAGION

TABLE 8 .
The category of VIF values among predictor variables in the seventh regression.

TABLE 9 .
The classification of VIF values among predictor variables in the eighth regression.

TABLE 10 .
The category of VIF values among predictor variables in the ninth regression model.

TABLE 11 .
The classification of VIF values among predictor variables in the tenth regression.

TABLE 12 .
The classification of VIF values among predictor variables in the eleventh model.

TABLE 13 .
The classification of VIF values among predictor variables in the twelfth regression.

TABLE 16 .
PROGNOSTIC RISK FACTORS INDUCING ACUTE HEPATITIS CONTAGIONWe can analyze information from the output displayed in Figure23.Firstly, the output of F test indicates that   <  with 0.01 < 0.05, then we can conclude that fifteenth model is fit for predicting the number of acute hepatitis patients.Furthermore, the outcome of T test indicates that there are factors that significantly impact the model.Figure23also reports another information that we can see, adjusted- 2 = 0.192 which indicates that 19.2% of the sample variation y can be described through the model owned.The category of VIF values among predictor variables in the fifteenth regression.