Analysis of Attack Detection on Log Access Servers Using Machine Learning Classification: Integrating Expert Labeling and Optimal Model Selection

. Purpose: As the complexity and diversity of cyberattacks continue to grow, traditional security measures fall short in effectively countering these threats within web-based environments. Therefore, there is an urgent need to develop and implement innovative, advanced techniques tailored specifically to detect and address these evolving security risks within web applications. Methods: This research focuses on analyzing attack detection in log access servers using machine learning classification with two primary approaches: expert labeling integration and best model selection. Expert labeling determines whether log entries are safe or indicate an attack. Result: Validation in labeling was applied using different datasets to minimize errors and increase confidence in the resulting dataset. Experimental results show that the Decision Tree and Random Forest models have nearly identical accuracy rates, around 89.3%-89.4%, while the ANN model has an accuracy of 81%. Novelty: This study proposes a fusion of expert knowledge in labeling log entries with a rigorous process of selecting the best classification model. This integration has not been extensively explored in previous research, offering a novel approach to enhancing attack detection within web applications. The research contribution lies in the integration of expert security assessment and the selection of the best model for detecting attacks on server access logs, along with validating labels using various datasets from different log devices to enhance confidence in the analysis results.


INTRODUCTION
In an increasingly complex digital era, the security of web applications has become highly critical due to the evolving and diverse nature of cyberattacks.Server logs, as primary sources of application activityrelated information, play a key role in the effort to detect and prevent attacks.However, challenges arise in distinguishing between normal and suspicious activities within typically large and heterogeneous server logs [1], [2], [3].
The success of attacks on web applications can result in significant negative impacts, including financial losses, exposure of sensitive data, and harm to an organization's reputation.Therefore, swift and accurate attack detection is crucial to safeguarding web applications and user data.Traditional attack detection methods often encounter limitations in recognizing newly emerging or more sophisticated attack patterns, necessitating a more adaptive and intelligent approach [4].This research explores harnessing the potential of classification to enhance the accuracy of attack detection within server logs [5], [6], [7].
One of the key parameters related to attacks is the server access log, a record of all user accesses requesting services from the web server.Server logs often contain heterogeneous data comprising various types of activities and events, making log data management and analysis complex.Intelligent approaches are needed to discern between normal and suspicious activities [8], [9].Thus, expert knowledge is required to determine whether a log entry is secure or indicates an attack [7][2], [10], [11].
Emphasis on the significance of web application security in the complex digital era can be found in works such as [12].These studies outline the impacts of cyberattacks on web applications, encompassing financial losses and reputational damage to organizations.Web application security addresses vulnerabilities at the web application level, underscoring the importance of information security due to increased information exchange via the web.It analyzes common web vulnerabilities and provides best practices to prevent security issues throughout the development lifecycle [13].Characteristics of software developers, such as application security awareness and self-efficacy, influence their information security behavior.Enhancing developers' awareness and self-efficacy can improve their information security behavior [14].Web applications are susceptible to cyber-based attacks, and various security techniques have been developed to protect them.Prevention involves tool utilization, implementing security standards, and regularly assessing risk factors [1][15] [16].
The analysis of server logs plays a crucial role in attack detection as it helps identify potential attacks and malicious activities [17].These logs contain information about each request made to the web server, including details like the uniform resource identifier (URI), status codes, and user behavior.Through log analysis, machine learning systems can identify patterns and characteristics related to attacks, such as rapid crawling, numerous error status codes, and unusual user behavior.Log-based intrusion detection systems detect attacks by comparing recorded information with known attack signatures or by identifying anomalies in user behavior.Log analysis can also be used for real-time detection of streaming attack behaviors, enabling timely attack discovery.Overall, server logs provide valuable data for detecting and preventing attacks on web applications [18].
Web attack detection faces several challenges in today's technological environment.Traditional security measures, such as firewalls and encryption, have limitations in fully safeguarding web-based systems [19].Additionally, evolving threats and cybercriminal behavior present difficulties in adapting systems and networks to effectively detect attacks [20].To overcome these challenges, a new generation of web application firewall systems (WAF) is being developed using machine learning and deep learning technology [21].These techniques can autonomously learn without human intervention and handle multidimensional data more effectively [22].Distributed training processes that maintain privacy have also been proposed to enhance accuracy while preserving local data and model parameters as secrets.However, the complexity of server log data poses a unique challenge, where automated approaches struggle to control log changes at every time interval, requiring specific understanding and knowledge to identify log developments [19].The heterogeneity and complexity of server log data demand intelligent approaches to differentiate between normal and suspicious activities.In this context, this research explores the integration of expert labeling to enhance human interpretation.
Role of Expert Labeling: Implementing expert labeling in attack detection, as discussed in [11], serves as the foundation for involving security experts in labeling server log data.This is necessary to address the limitations of automated labeling.Expert labeling plays a crucial role in providing reliable and accurate information across various domains, and experts are considered trustworthy sources in complex systems such as the global food system [23].Expert labels are often chosen as the most reliable source compared to other commonly used label types [24].Their expertise helps improve labeling accuracy, even when dealing with mostly low-quality labels in crowdsourcing [25].Collaborative labeling work with domain experts involves principled design, iterative design, and improvisational design, contributing to the quality of ground truth data [26].In the context of machine learning, domain experts provide ground truth labels used to train models and enhance prediction accuracy [10].Overall, expert labeling plays a crucial role in ensuring the reliability and quality of labeled data across various domains.
The development concept of machine learning classification models for attack detection within server logs has been explored in several papers.These models utilize deep learning techniques, such as DCGAN and ResNet-50, for feature extraction, and an AlexNet-based classifier optimized for network attack detection.The proposed approach achieved high accuracy rates, with an accuracy of 99.4% for the first public dataset and an accuracy 99.33% for the second dataset [27].Another approach utilized an Extra Tree classifier in combination with Decision Tree, XGBoost, and Random Forest algorithms for DDoS attack detection [3].Additionally, XGBoost Classifier and Random Forest were employed in modified forms to enhance model accuracy [28], [29].All these studies focus on computer network attack detection using network transmission logs.There are several articles related to web attack detection applying machine learning within them.One example is by Eunaicy, who successfully developed an RNN with web logs (not server logs) and achieved an accuracy of 94% [22].Furthermore, Alaoui used an HTTP Web Request dataset with the LSTM method, averaging 78% accuracy [30].Riera also utilized a multi-label SR-BH 2020 dataset with the CatBoost algorithm achieving 88% accuracy [31].However, all these related studies used outdated open-source datasets, raising uncertainties about the validity of these datasets in today's web security landscape.A new approach is needed to contribute to the latest datasets and validate the classification labeling using experts.
This study develops a machine learning classification model to enhance the accuracy of attack detection within server logs.The objective is to establish a more effective system for identifying new and advanced attack patterns.Integrating expert labeling into the attack detection process aims to provide a deeper human context and understanding of the recorded activities within server logs.The research contributes by creating the latest web server log dataset validated by experts, integrating expert security assessment, selecting the best model for detecting attacks on server access logs, and validating labels using various datasets from different log devices to enhance confidence in the analysis results.

METHODS
This section outlines the procedures used in the study to integrate expert security assessment and the best classification model for detecting attacks on server access logs.The flowchart of this process is depicted in Figure 1.
The research flowchart in Figure 1 encompasses several key steps.Firstly, data collection involves gathering server access logs from various sources and different log devices.Subsequently, the dataset undergoes preprocessing using transformation and cleaning techniques to create a refined dataset.Expert labeling by security professionals assesses log entries and validates labels, determining whether they represent safe activities or potential attacks based on their knowledge of attack patterns.Following that, the best model selection involves processing multiple machine learning classification models using specific criteria, such as accuracy and performance, to choose the optimal model for analysis.The validation process includes validating the dataset logs from various log devices to confirm the accuracy of the expert labeling process.The analysis results are then utilized to identify attacks and potential attack patterns within the server access log data.Through the integration of expert labeling, best model selection, and label validation, confidence in the analysis results is expected to increase, enabling more accurate decision-making in addressing cybersecurity threats.   2 illustrates the data analysis process on a server log dataset comprising 27,729 rows and 6 attributes, entailing a series of crucial preprocessing steps.The initial step involves data transformation, where log sentences are split into several relevant attributes such as IP address, date, request, ID process, and from, to facilitate information representation.Subsequently, the data cleansing process removes empty values, handles duplicates, and normalizes data to ensure accuracy and consistency for further use.Next, security experts or rules, involving four experts, are assigned to label each data row as 'attack' or 'normal,' aiding in the classification process.

Figure 2. Research Analysis Framework
Once the dataset is prepared, the next step involves splitting it into two parts: 70% for model training and 30% for validation.In the modeling phase, three distinct methodologies are employed: Artificial Neural Network (ANN), Decision Tree for explicit rule representation, and Random Forest as a method that aggregates decision trees.Each model is trained with the training dataset and evaluated with the validation dataset to measure its performance and classification ability.Eventually, the results of testing these three models are compared using appropriate evaluation metrics to select the best model most suitable for the classification task between attacks and normal activities within the provided dataset.The chosen approach must be supported with citations, and any pertinent adjustments should be clearly elucidated.An in-depth examination of the procedure and data analysis methodology is imperative in a literature review manuscript.It is essential to provide comprehensive explanations of the research stages and analyses.

Artificial Neural Network (ANN):
ANN consists of interconnected processing units known as neurons arranged in layers.Each neuron receives input, undergoes precise mathematical operations, and generates an output that then becomes input for subsequent neurons.This iterative process continues until the final outcome is achieved.ANN is capable of analyzing intricate patterns in data and identifying nonlinear correlations among pertinent attributes, facilitating the prediction of attack detection [5], [32], [33].The general formula for a single neuron in an artificial neural network is: where: -Output is the neuron of output.
-f is the function of activation.
-Wi is the weight assigned.
-Ii is the input.
-B is the bias value.
-n is the number of inputs.

Random Forest (RF):
A Random Forest is an ensemble learning technique that operates by constructing a multitude of decision trees.Each tree within the RF is trained using a randomly selected data subset, and the ultimate decision is predominantly influenced by the decisions taken by these trees.This technique addresses the constraints encountered with a single Decision Tree, including issues such as overfitting and potential bias of the dataset.RF enhances predictive power by amalgamating insights from multiple decision trees, resulting in robust and precise predictions [34], [35], [36].The RF algorithm begins with random sample selection, decision-tree construction, and voting or averaging to generate a final prediction.

Decision Tree (Tree):
A decision tree is a type of decision-making model with a tree-like structure.The goal is to iteratively partition the dataset according to established decision rules and minimize the impurities within each segment.The ultimate decision of the tree is made at the leaf nodes.Decision trees provide easily understandable decision rules and can discern significant features for predicting student academic performance [37], [38].
Gini impurity: If we have K classes, p1, p2, ..., pK are the proportions of each class in that node.Next is the Best Feature Selection, Splitting Rule, Tree Formation, and prediction, so that the result is the class or value generated from the leaf node where the data end.

RESULTS AND DISCUSSIONS
The experimental results are discussed in this section.First, we present the percentage comparison of target attributes can be shown in Figure 3.
Figure 3. Percentage of Target Attributes The dataset was compiled with normal status data accounting for 77% of the dataset, compared to attack data, which comprises only 23%.For further clarification, refer to Figure 3. Based on the performance evaluation results of the three models tested in Table 1, it is observed that the ANN achieved an accuracy of 81%, with a recall of 60.3%, precision of 90%, and an F-measure of 61%.Meanwhile, the Decision Tree model exhibited higher performance with an accuracy of 89.3%.This model showed a recall of 80.9%, precision of 88.5%, and an F-measure of 83.8%.Furthermore, the RF model demonstrated results almost similar to those of the Decision Tree, with an accuracy of 89.4%, recall of 80.9%, precision of 88.6%, and an F-measure of 83.8%.
Overall, Decision Tree and Random Forest showed comparable performance with nearly identical accuracy rates.Both also exhibited similar values for recall, precision, and F-measure.However, ANN displayed lower accuracy and overall performance compared to the Decision Tree-based models.Therefore, based on this evaluation, the Decision Tree or Random Forest models might be a preferable choice for the classification task between attack and normal activities within the utilized dataset.For a clearer visual representation, please refer to Figure 4. Based on the performance evaluation of the three tested models-ANN, RF, and Tree-it can be concluded that the RF and Decision Tree models outperformed the ANN model for the classification task between attack and normal activities within the utilized dataset.The RF and Decision Tree models exhibited nearly identical accuracy rates, approximately ranging between 89.3-89.4%,while the ANN model had an accuracy of 81%.Additionally, both Decision Tree and Random Forest showed higher values for recall, precision, and F-measure compared to the ANN model.Therefore, based on this evaluation, it can be suggested that utilizing the Decision Tree or Random Forest model might be more effective for classification in the given dataset.Although ANN holds potential for complex classifications, in this scenario, the decision tree-based models demonstrated superior and more stable performance in identifying attacks and normal activities.

Research Model Dataset Labeling User Accuracy
RNN [22] Web logs Dataset Uncertified User 94% LSTM [30] HTTP Web Request dataset (Open Source) Unknown User 78% CatBoost [31] multi-label SR-BH 2020 dataset (Open Source) Unknown User 88% Proposed Model Server Logs Dataset Security experts 89% From Table 2, it can be observed that two previous studies had lower accuracy compared to the proposed model.However, one prior study by Eunancy, utilizing RNN with web logs (not server logs), achieved high accuracy.While the processed data may have had high density, there remains a significant question regarding the determination of target attributes/labels of the dataset, as its credibility was not provided.Therefore, it can be concluded that the proposed model with the generated dataset is sufficient to contribute to further research.

CONCLUSION
Based on the findings of this research, it can be concluded that combining expert labeling with the selection of the optimal classification model significantly improves the accuracy of attack detection in web application server access logs.The expert labeling process, which involves security professionals in determining the status of log entries (safe or attack), yields insightful distinctions among subtle and evolving attack patterns.Incorporating expert knowledge into this process enhances the precision of attack identification.Additionally, tailoring the selection of a classification model to the specific characteristics of server log data contributes to improved attack detection accuracy.Through a comprehensive comparison of various classification models, this study successfully identified the most suitable model capable of handling the complexity of attack patterns within log data.The involvement of multiple security experts in labeling the server log dataset, coupled with the application of validation and consistency in labeling across different datasets, serves to reduce errors and bolster confidence in the final dataset outcomes.Consequently, the integrated approach of expert labeling and the thoughtful selection of a classification model represents a progressive step in addressing evolving cyber threats, strengthening attack detection capabilities in web applications, and enhancing the reliability of information security systems.As a suggestion for future work, further exploration into more heterogeneous datasets involving multiple servers could be undertaken to train more precise models and achieve higher accuracy results.

Figure
Figure2illustrates the data analysis process on a server log dataset comprising 27,729 rows and 6 attributes, entailing a series of crucial preprocessing steps.The initial step involves data transformation, where log sentences are split into several relevant attributes such as IP address, date, request, ID process, and from, to facilitate information representation.Subsequently, the data cleansing process removes empty values, handles duplicates, and normalizes data to ensure accuracy and consistency for further use.Next, security experts or rules, involving four experts, are assigned to label each data row as 'attack' or 'normal,' aiding in the classification process.

Figure 4 .
Figure 4. Visualization of Model Test Results

Table 1 .
Performance Results of the Proposed Model