YOLOv8 Analysis for Vehicle Classification Under Various Image Conditions

. Purpose: The purpose of this research is to detect vehicle types in various image conditions using YOLOv8n, YOLOv8s, and YOLOv8m with augmentation. Methods: This research utilizes the YOLOv8 method on the DAWN dataset. The method involves using pre-trained Convolutional Neural Networks (CNN) to process the images and output the bounding boxes and classes of the detected objects. Additionally, data augmentation applied to improve the model's ability to recognize vehicles from different directions and viewpoints. Result: The mAP values for the test results are as follows: Without data augmentation, YOLOv8n achieved approximately 58%, YOLOv8s scored around 68.5%, and YOLOv8m achieved roughly 68.9%. However, after applying horizontal flip data augmentation, YOLOv8n's mAP increased to about 60.9%, YOLOv8s improved to about 62%, and YOLOv8m excelled with a mAP of about 71.2%. Using horizontal flip data augmentation improves the performance of all three YOLOv8 models. The YOLOv8m model achieves the highest mAP value of 71.2%, indicating its high effectiveness in detecting objects after applying horizontal flip augmentation. Novelty: This research introduces novelty by employing the latest version of YOLO, YOLOv8, and comparing its performance with YOLOv8n, YOLOv8s, and YOLOv8m. The use of data augmentation techniques, such as horizontal flip, to increase data variation is also novel in expanding the dataset and improving the model's ability to recognize objects.


INTRODUCTION
Object detection is a computer vision technology that can recognize specific objects or objects in an image or video and provide output in the form of the object's location along with a label or description of the type of object detected.Object detection enables a computer to identify objects within an image [1].Traditional object detection methods such as HOG, HARR, and SIFT have been widely used.While these traditional methods are effective in object detection, they still rely heavily on specific features and need more flexibility in handling data variations in object appearance [2], [3].In recent years, deep learning-based approaches, such as Convolutional Neural Networks (CNN), have shown superior performance in object detection.
One of the deep learning algorithms or methods in object detection is You Only Look Once (YOLO) [4], [5].Over the years, the development of YOLO has continued to grow from YOLOv2 to YOLOv7 [6], [7].There is a new version with a more accurate model, YOLOv8.YOLO is an object detection method that has good performance.YOLO uses a pre-trained Convolutional Neural Network (CNN) to process images in layers and output the bounding boxes and classes of detected objects.CNNs can perform a magnificent object recognition process from an image.CCN processes the input image and extracts increasingly complex features such as angle and texture [6], [5].This technology is usually used for various applications such as security surveillance, product quality analysis, facial recognition, and others.YOLO is also utilized for detecting vehicle types using object detection [8].
Vehicle type detection is one of the essential things in the world of transportation and road safety.A sound transportation system is one of the critical role in the urban traffic environment [6].Detecting vehicle types such as cars, motorbikes, buses, and trucks from captured images refers to fitting the target object across consecutive frames.This can be a challenge if the object is in different conditions, such as variations in viewing angle, vehicle direction, lighting conditions, and variations in vehicle image details.This can result in incorrect feature extraction.Additionally, high computation prolongs the data training process, impacting the performance and speed of object detection.To address these challenges, an algorithm or method is required to accurately and swiftly analyze or detect objects [9].
Many studies have used the You Only Look Once (YOLO) method in object detection.A previous study [10], used YOLOv5 and Flip-Mosaic to improve the network's ability to detect vehicle types.The dataset used covered various scenarios on the motorway, monitoring different parts of the road and different viewpoints, thus providing a highly relevant dataset for vehicle object detection in high-speed scenarios, and the Flip-Mosaic method significantly improved the target recognition rate, which was more accurate.Another study used the YOLOv5 architecture to classify hand symbols in real-time, and the model's performance results for classifying hand symbols achieved 80% accuracy, 95% precision, 84% recall, and 89% F1 score [11].Further research on developing the YOLOv4 approach for live object detection and tracking on CCTV recordings of vehicle activity on Medan city roads at Kratau Bilal, Glugur, and Merdeka locations.The YOLOv4 method produces a detection accuracy rate of 87.98% based on mean average precision (mAP) [12].
This study uses the newer YOLOv8 to improve the detection accuracy of vehicle types such as cars, motorcycles, buses, and trucks.It evaluates the performance of the YOLOv8 method in detecting objects under different weather conditions.By testing the YOLOv8n, YOLOv8s, and YOLOv8m versions on the DAWN dataset with six object classes, this study highlights the performance differences between the models and the application of data augmentation techniques such as horizontal flip.The horizontal flip data augmentation technique flips the vehicle image horizontally to provide a variety of viewpoints and vehicle directions in the dataset.It helps the YOLO model to learn vehicle detection capabilities from different directions.

METHODS
Figure 1 illustrates the research flow using YOLOv8 with the DAWN dataset.YOLOv8 is one of the most well-known models in object detection and classification.The first step in data processing is to collect and prepare a dataset of vehicle images used in model training.This dataset should be large and representative enough so that the model can learn enough variations of the objects to be detected and classified.Once the image dataset is prepared, the next step is to perform Data Pre-Processing.This process involves several stages, such as resizing and creating augmentation data, such as horizontal flips.In this research, scenario 1 involves using the original dataset.Then, scenario 2 adds data augmentation using several versions of YOLOv8, namely YOLOv8n, YOLOv8s, and YOLOv8m.The next stage is to train the YOLOv8 model using image datasets that have been processed and object detection.After completing the training process, the next step is to classify the detected objects.The results of the YOLOv8 model in detection and classification are analyzed by examining precision, recall, and Mean Average Precision (mAP).

Dataset
The data collection for vehicle detection is taken from the DAWN dataset.The dataset used is open-sourced and widely accessible.The DAWN dataset was developed in 2020, with a total of 1000 images from a collection of real traffic environments [13].The main objective of the DAWN dataset is to assess the performance of vehicle detection and classification techniques.It includes a range of natural images depicting traffic scenes in various adverse weather conditions, categorized into four groups: fog, snow, rain, and sand.The DAWN dataset shows significant diversity in vehicle type, size, direction, pose, lighting conditions, position, and occlusion.In addition, this dataset greatly emphasizes traffic scenarios that occur in challenging weather, such as heavy snowfall, hail, rain, and extreme weather, such as sand and dust storms [13].The DAWN dataset has 4 categories of images in various weather conditions in Figure 2, and there are six classes: person, bicycle, car, motorbike, bus, and truck.The weather category statistics of the DAWN dataset can be seen in Table 1.Image Pre-Processing Data Augmentation: Data augmentation is a strategy often used to increase dataset diversity and variety and improve model performance [14].In this study, data augmentation was performed on the dataset to test the difference in results on the original dataset and when adding data augmentation.
In Scenario 1, testing was performed on the original DAWN dataset without adding data augmentation.In Scenario 2, horizontal flip data augmentation was added.Horizontal flip is used to create various viewpoints and directions in the dataset, helping the YOLOv8 model to detect objects from different perspectives.Horizontal flip is applied to the training dataset images as a data augmentation step.The number of images according to the scenario can be seen in Table 2.In the first scenario, without augmentation, 70% of the total data (983 images) used for training, which is equivalent to 688 images.The amount of data to be used for validation is 20%, 197 images, and the amount of data to be used for testing is 10%, 98 images.In Scenario 2, the amount of data used for training remains at 70% of the total data, which is 688 images.However, each original image augmented by flipping, effectively doubling the number of training images to 1376.Therefore, the total data to be used for training is 1376 images.The number of images for validation and testing remains the same as in scenario 1. Figures 3 and 4 show the images from Scenarios 1 and 2, respectively.Horizontal flip augmentation involves randomly flipping the input image along the vertical axis (from left to right) with a certain probability [15].The formula for performing a horizontal flip is described in Equation ( 1) and Equation ( 2). [

YOLOv8
Object detection algorithms can be categorized into two primary types: classification-based, also known as 2-stage detectors, and regression-based, referred to as 1-stage detectors.Classification-based algorithms follow a two-stage process.In the first stage, they identify a set of regions that are deemed likely to contain objects.In the second stage, Convolution Neural Networks (CNN) are applied to search for objects within the previously marked regions [16], [17].In contrast, single-stage detectors or regression-based algorithms such as You Only Look Once (YOLO) detect objects in a single detection.YOLO architecture does not select a specific region as the area of interest.Instead, it predicts the class with bounding boxes, which makes detection faster.Therefore, YOLO has become famous for fast and accurate object detection [5], [18].

RESULTS AND DISCUSSIONS Training Results
The DAWN dataset training process uses Google Collaboratory with T4 GPU, System RAM 12.7GB, GPU RAM 15 GB, and 26.8/166.8GB of storage.In the training process, the 983 images from the DAWN dataset are divided into 70% for training and 30% for validation data.The training was performed using YOLOv8n (nano), YOLOv8s (small), and YOLOv8m (medium) with 100 epochs.This study uses the mean Average Precision (mAP) to represent the average precision across all classes.'N' denotes the total number of discovered object classes.P denotes the precision value, and R denotes the recall value, and mAP is calculated using Equation (3) as follows [20].
Batch 0 and 1 consist of several data samples randomly selected from the DAWN dataset.The model will process the data in sets and calculate the gradient used to optimize the model parameters, and this is done through a process of "forward pass" and "backward pass," where the model predicts the output and then the gradient indicating how much error will be calculated.Then comes the parameter update stage, where the model will be updated using the computed angle.The main goal is to reduce the model error so that the model can better predict the given data.After completing batches 0 and 1, the model will be tested using validation data to check how much the model has improved during one training iteration.The evaluation results may include metrics such as mAP, confusion matrix, or other appropriate metrics.At this stage, we will explain the training results for batch 0 in Figure 5 and batch 1 in Figure 6.   Figure 7 illustrates the training process followed by the results of the precision and recall curves of the model.The YOLOv8n precision and recall curve has an average mAP value of 58.5%, with the car class having the highest mAP value of 81.9%.Meanwhile, the truck class has the second-highest mAP value at 59.2%.In the YOLOv8s model, the overall mAP value of all classes is 67.8%, with the car class having the highest mAP value of 81.6%, followed by the motorcycle class with the second highest mAP value of 73.5%.In the YOLOv8m model, the overall mAP value is 68.5%,where the car and bicycle classes have the highest mAP value of 83.4%, followed by the motorcycle class with the second highest mAP value of 74%.Using mAP can evaluate the model's overall performance, with the highest mAP usually indicating a better model for detecting objects.The results also identify the object class with the best performance in each model.In all three precision and recall curves, the "car" class consistently has the highest mAP in all three models, indicating that it is better at detecting cars than other object classes.The results can be described using four categories: True Positive (TP), where the model predicts a label that matches the ground truth; False Positive (FP), where the model predicts a label that does not correspond to the ground truth; True Negative (TN), where the model correctly does not predict a label; and False Negative (FN), where the model incorrectly predicts a negative evaluation despite the positive situation.Equations ( 4)-( 5) describe precision and recall [21], [22] while equation ( 6) describes F1 to evaluate the model [23], [24].
S is an abbreviation that refers to the formula used to compute the number of cells in the image.B represents the number of predicted bounding boxes in each grid cell, while c represents the predicted class for each grid cell.Furthermore, the symbol p_i(c) refers to the confidence probability of the class.The center coordinates of the anchor box j in cell i are given by x_ij and y_ij, while the height and width dimensions of the box are described by h_ij and w_ij.Moreover, the confidence value for the box is characterized as C_ij.Two weights, λcoord and λnoobj, are used to determine the relative significance of localization and recognition during training.

Testing Result
Tables IV and V are the evaluation results of 6 classes on the original DAWN dataset and the addition of horizontal flip data augmentation in the YOLOv8n, YOLOv8, and YOLOv8m versions.The application of horizontal flip data augmentation is used to generate a variety of viewpoints and vehicle orientations in the dataset by rotating the vehicle image horizontally.This helps the YOLOv8 model acquire the ability to detect vehicles from different viewpoints and directions.The YOLOv8m model has the highest mAP of 71.2%.In table 6 of the DAWN dataset testing process with Horizontal Flip Data Augmentation, the results of the YOLOv8n model data achieved a MAP value of about 60.9%, YOLOv8s also experienced an improvement in performance with a MAP value of about 62%, YOLOv8m achieved the highest MAP value of about 71.2% after using horizontal flip augmentation.These results show that YOLOv8m is the most effective model in object detection on the DAWN dataset, especially after using horizontal flip augmentation.Such augmentation can improve the model's ability to identify objects in various situations and orientations, which makes it a good choice for more complex object detection tasks.Thus, these results provide important insights into the model performance and data augmentation effects in the context of object detection on the DAWN dataset.In table 7 provides an explanation of the comparison with previous studies.The YOLOv8m technique outperforms previous studies in the mAP evaluation metric, achieving a value of 71.2% on the DAWN dataset.In vehicle classification under various image conditions there is an improvement in the method used.Farid et al. [26] proposed the YOLOv5 method which only showed 34.2% mAP.Another study [27] implemented Faster R-CNN and obtained an mAP value of 36.8%.Comparing the results of these methods, YOLOv8m has a much better performance than previous studies using YOLOv5 and Faster R-CNN.The high mAP value of 71.2% indicates that the model used has a better ability to detect objects in images than the model used in the previous study.

CONCLUSION
This research provides a brief overview of the YOLOv8 method in detecting objects in various weather conditions by testing and analyzing the results obtained.The research process was carried out using the YOLOv8n, YOLOv8s, and YOLOv8m versions of the DAWN dataset with six classes of person, bicycle, car, motorcycle, bus, and truck.The mAP values for the test results are as follows: Without data augmentation, YOLOv8n achieved approximately 58%, YOLOv8s scored around 68.5%, and YOLOv8m achieved roughly 68.9%.However, after applying horizontal flip data augmentation, YOLOv8n's mAP increased to about 60.9%, YOLOv8s improved to about 62%, and YOLOv8m excelled with a mAP of about 71.2%.The application of horizontal flip data augmentation improves the performance of the three YOLOv8 models.The YOLOv8m model achieves the highest mAP value of 71.2%, indicating that it is highly effective in detecting objects after adding horizontal flip augmentation.

Table 4 Figure 7 .
shows the results of the DAWN dataset training process with the addition of horizontal flip data augmentation.The YOLOv8n model, after training with horizontal flip data augmentation, achieved an average mAP level of approximately 61.1%, then YOLOv8s has an improvement in performance with a mAP value of about 67.7% after going through the training process with horizontal flip augmentation.This result shows that such augmentation successfully improves the object detection capability, and the YOLOv8m model significantly improves performance after going through the training process with horizontal flip augmentation, with a mAP value of about 71.4%.This indicates that YOLOv8m has an excellent ability to detect objects after the augmentation.Based on the training results, it can be concluded that the YOLOv8m model with horizontal flip data augmentation has the best value with a precision of 81.2%, recall of 63.5%, and mAP value of 71.4%.In addition to the mAP training results, Precision-Recall curves were obtained from each scenario, namely the original DAWN dataset training scenario in each version of YOLOv8.Figure 7 explains the Precision-Recall curve of the original DAWN dataset and Figure 8 Precision and Recall of Horizontal Flip Data Augmentations.Precision and Recall DAWN original dataset (a) YOLOv8n, (b) YOLOv8s, and (c) YOLOv8m

Figure 8 .Figure 8
Figure8illustrates the training process followed by the Precision and Recall curve results of Horizontal Flip Data Augmentation on the model.The YOLOv8n precision and recall curve has an average mAP value of 61.1%, with the car class having the highest mAP value of 0.809%.Meanwhile, the bicycle class has the second-highest mAP value of 77.2%.In the YOLOv8s model, the overall mAP value of all classes is 67.7%, with the car class having the highest mAP value of 0.842%, followed by the bicycle class with the second highest mAP value of 75%.In the YOLOv8m model, the overall mAP value is 71.4%,where the bicycle class has the highest mAP value of 87.7%, followed by the car class with the second highest mAP value of 84%.These results show the performance of the object detection model with horizontal flip data augmentation.The average mAP and the highest-class mAP provide information on how well the model can detect objects in the image.Higher results indicate that the model performs better in the object detection task, and the "car" and "bicycle" classes consistently have the highest mAP among the classes in the model.

Figure 9 .Figure 10 .
Figure 10.Layers, Parameters, and GFLOPsFigure10contains information on various models' detection performance and complexity, including layers, parameters, and GFLOPs.The YOLOv8n model, with 168 layers, 3006818 parameters, and 8.1 GFLOPs, has the lowest number of layers among the models.This allows the training process to be faster but is limited in handling more complex and diverse data variations.In addition, the number of parameters in this model is 3006818, and the GFLOPs rate of 8.1 is relatively small, which means this model is lighter and requires fewer computational resources.YOLOv8s with 168 layers, 11127906 parameters, and 28.4 GFLOPs, this model is heavier than YOLOv8n.However, the YOLOv8s model has the same 168 layers as YOLOv8n but a higher number of parameters and GFLOPs, and this model may require more computing resources and longer inference time compared to YOLOv8n.The YOLOv8m model has 218 layers, 25843234 parameters, and 78.6 GFLOPs.The YOLOv8m model, with 218 layers and 25843234 parameters, is the largest and has the highest GFLOPs at 78.6, indicating its deeper depth and higher computational requirements compared to the previous models.Table7.Comparison of methods on the DAWN datasets

Table 1 .
DAWN dataset categories

Table 2 .
Number of images per scenario continues to evolve, and YOLOv8 was released by the Ultralytics team, which is the latest version of YOLO in January 2023.YOLO continues to evolve, and YOLOv8 was released by the Ultralytics team, which is the latest version of YOLO in January 2023.YOLOv8 introduced five versions, namely YOLOv8n [19]o), YOLOv8s (small), YOLOv8m (medium), YOLOv8l (large), and YOLOv8x (extra-large).These models support various computer vision tasks, including object detection, segmentation, pose estimation, tracking, and classification.YOLOv8 can be executed via the command line interface (CLI) or installed as a Python package via pip.In addition, YOLOv8 offers seamless integration options for labelling, training, and deployment.Notably, the developers have introduced several enhancements to improve model accuracy by intensifying data addition during training.One notable feature of YOLOv8 is its ability to incorporate additional images during online training, contributing to its strong performance[19].YOLOv8 is more efficient than previous versions because it uses a larger feature mAP.

Table 3 .
DAWN training results original dataset

Table 3
a better mAP rate than YOLOv8n and is almost comparable to YOLOv8s.This shows that YOLOv8m has an excellent ability to detect objects in the validation dataset.Based on the training results, it can be concluded that the YOLOv8m model has the best value with a precision of 84.4%, recall of 60.2%, and mAP value of 68.5%.Furthermore, the bicycle and car classes have the highest mAP value of 83.4%.
shows the results of the mean Average Precision (mAP) metric, which measures the performance of the YOLOv8 model's object detector on the validation dataset.A high mAP value indicates better performance in detecting objects.YOLOv8n achieved a mAP value of 58.5%, while YOLOv8s performed better, with a mAP value of approximately 67.8%.The YOLOv8m model has a mAP rate of about 68.5%, which shows

Table 4 .
Horizontal flip data augmentation training result

Table 5 .
DAWN testing results original dataset

Table 5
shows the mAP results of the YOLOv8n model on the original DAWN dataset, with an average mAP rate of about 58%.YOLOv8s shows improved performance over YOLOv8n, with a mAP value of about 68.5%.This shows that the YOLOv8s model better detects objects on the DAWN dataset.The YOLOv8m model achieved the highest mAP value of about 68.9% in the test on the DAWN dataset.This indicates that YOLOv8m performs best in detecting objects on the dataset compared to the other two models.This result shows that YOLOv8m is the best model for detecting objects on the original DAWN dataset, with the highest mAP among the three models.YOLOv8s also shows improved performance compared to YOLOv8n, although not as good as YOLOv8m.

Table 6 .
Horizontal flip data augmentation testing result The YOLOv8n model, with 168 layers, 3006818 parameters, and 8.1 GFLOPs, has the lowest number of layers among the models.This allows the training process to be faster but is limited in handling more complex and diverse data variations.In addition, the number of parameters in this model is 3006818, and the GFLOPs rate of 8.1 is relatively small, which means this model is lighter and requires fewer computational resources.YOLOv8s with 168 layers, 11127906 parameters, and 28.4 GFLOPs, this model is heavier than YOLOv8n.However, the YOLOv8s model has the same 168 layers as YOLOv8n but a higher number of parameters and GFLOPs, and this model may require more computing resources and longer inference time compared to YOLOv8n.The YOLOv8m model has 218 layers, 25843234 parameters, and 78.6 GFLOPs.The YOLOv8m model, with 218 layers and 25843234 parameters, is the largest and has the highest GFLOPs at 78.6, indicating its deeper depth and higher computational requirements compared to the previous models.Table7.Comparison of methods on the DAWN datasets