Convolutional Neural Network and LSTM for Seat Belt Detection in Vehicles using YOLO3

The application of an electronic violation detection system has begun to be implemented in many countries by utilizing CCTV cameras installed at highway and toll road points. However, the development of a violation detection system using data in the form of images that have a high level of accuracy is still a challenge for researchers. Several types of violations detected include the use of seat belts and the use of cell phones while driving which is influenced by the number of vehicles, vehicle speed and lighting which can increase the difficulty in the detection process. This research developed a traffic violation detection system using YOLO3. The YOLO is used as the basic architecture of CNN which is then combined with LSTM. The dataset was obtained from RoboFlow Universe with a total of 199 front-view car images consisting of 82 using seatbelts and 78 not using seatbelts for the training process. The CNN algorithm plays a role in the feature extraction process from input image data, while LSTM plays a role in the prediction process. Furthermore, the performance evaluation of the CNN+LSTM algorithm will be measured using the value of accuracy to measure the performance of the training process and testing process. In measuring the performance of the training process, it will be compared with several basic detection models used, such as CNN, VGG16, ResNet50, MobileNetV2, YOLO3, and YOLO3+LSTM. The test results show that YOLO3+LSTM has higher accuracy compared to the others at 89%. Next, in the testing process, the CNN+LSTM model will be compared with the basic method, namely CNN. The test results show that the CNN+LSTM models have higher accuracy at 89%. Meanwhile, in the basic CNN model, the resulting accuracy was 85%.


Introduction
The problem of traffic violations has become a serious concern nowadays, along with the high volume of traffic due to the increasing number of cars and other vehicles [1], [2].The increase in the number of accidents that occur as a result of violations of traffic rules is very important to control.In general, traffic violations such as breaking a red light, not wearing a helmet, not using a seat belt and violating traffic markings.Enforcement of traffic regulations through manual ticketing has begun to be abolished as a form of effort to increase police professionalism.One of the government's efforts that has recently been implemented is the use of electronic traffic tickets or ETLE (Electronic Traffic Law Enforcement) [3], [4].ETLE is a new method of applying traffic discipline used by police officers to detect traffic violations.The use of ETLE can be installed on police vehicles so that it is more flexible.Furthermore, static ETLE is implemented using CCTV cameras placed at certain points on protocol roads as shown in Figure 1.ETLE technology is a violation detection system that is increasingly being implemented in Indonesia using data in the form of images.However, this system is a challenge for researchers considering the number of vehicles, vehicle speed and lighting which are obstacles in the detection process.
Image-based traffic violation detection systems relate to image processing, Artificial Intelligence and deep learning which are used to detect objects and their classification classes in the form of images and videos [5]- [13].With the increasing rate of traffic casualties, especially among car drivers, enforcing regulations on the use of seat belts has become very important to effectively protect driver safety.For this reason, in this research, the development of a traffic violation detection system is focused on detecting the object of seat belt use in car drivers.
Several studies have been carried out in the development of a traffic violation detection system.Researchers Kashevnik [5] developed a system for detecting seat belts in vehicle cabins and monitoring driver behaviour [14] using the YOLO method.However, this detection system still has to consider reducing parameters and improving performance.
Researcher Ravish [10] developed a system for detecting traffic violations such as violating red traffic signals, not using a helmet and not using a seat belt using the YOLO method.After all, traffic violations can only be detected during the day.Franklin researchers [11] developed a detection system for traffic violations such as vehicle speed, traffic signals and number of vehicles using the YOLO method.However, further development is still needed to reduce the computation time in high road traffic volumes.
Researcher Chun [15] developed seat belt detection and driver and passenger behaviour using the NADS-Net architecture on the CNN algorithm with cameras and infrared light in the vehicle cabin.However, this method is susceptible to data collection bias.Researcher Yang [16] developed a seat belt detector using a deep learning algorithm with the MobileNet V2 architecture by paying attention to the accuracy of using the belt.Furthermore, researcher Yi [17] detects the correct use of seat belts using the Part Affinity Field (PAF) algorithm.The use of cameras placed in the vehicle cabin.To identify whether the driver is wearing the seat belt correctly based on human joint points.
Several studies related to the development of a traffic violation detection system have been described.Table 1 summarizes the methods used in previous studies.
In this research, the detection model developed is limited to car drivers and will be further developed to detect belt use in front of passengers.The YOLO model is used to detect the use of seat belts on drivers and the classifier model uses CNN+LSTM with hyperparameter settings to obtain higher accuracy than another method.Furthermore, the performance evaluation of the algorithm will be measured using the RMSE value and compared with several basic detection models used, such as CNN, VGG16, ResNet50, MobileNetV2, Yolo3, Yolo3 and CNN+LSTM.

Research Methods
This research develops a violation detection system for using seat belts through static cameras using YOLO and the CNN+LSTM algorithm.Based on Figure 2, image data is taken through CCTV cameras placed on the road and the vehicle detection module will take pictures of the cars that pass through it.Next, the windshield detection module will determine the position of the car's windshield.After that, the seat belt detection module will determine the position of the driver and detect seat belt use.CNN and LSTM algorithms are used for the violation classification process by detecting seat belt use.This research will further develop the CNN and LSTM algorithm hybrid models with hyperparameter settings to obtain high accuracy.Next, the performance evaluation of the CNN algorithm will be measured using the RMSE value and compared with other algorithms.
Research begins by first identifying needs, namely determining the components in developing a prototype of an intelligent system for detecting driving violations in the use of seat belts.Furthermore, the development of the model is carried out starting from the process of detecting the seat belt object obtained from the RGB image.
Next, develop a classification model to detect the use of seat belts.This detection process is first carried out in the pre-processing stage, namely noise removal and normalization of raw data.After pre-processing, it is continued with the feature extraction stage.At this stage, the features will be obtained which will be used in the detection process of seat belt usage.
The next process is to develop a CNN method with modifications and adjustments to the hyperparameters for the seat belt use detection process.The next stage is assessment of existing conditions and then determining the development of an appropriate model, which aims to carry out data analysis.From this model, a prototype model was developed in the laboratory based on the system that had been designed.Furthermore, efforts are also made to optimize the performance of the system that has been built.This study uses the YOLO method to detect the use of seat belts on 4-wheeled vehicles through static cameras.The YOLO method was developed by modifying the CNN algorithm in combination with LSTM and hyperparameter settings.The process of detecting seat belt use is carried out through an image processing process with several stages as shown in Figure 3.
Data in the form of images/videos is taken using RGB cameras that are installed statically at several points on roads that are busy with vehicles.Data was taken from several places and in different weather, namely morning, afternoon, cloudy conditions and rainy conditions.The data that has been recorded is then prepared to be used as a dataset.The data covers different types of vehicles, namely large, medium and small.The dataset created is estimated to consist of 100 images for initial research.
At data pre-processing stage, the images are processed in such a way that they are suitable for model training.Some of the methods used in image pre-processing include: Cropping image: cropping the image so that only the front view of the driver is visible; Resizing image: resizing the image to 128 x 128; Blurring image: blurring the image for smoothness using the Gaussian method; The pre-processing method can be performed using the pre-process_input() function from the Keras library.
CNN is used as a classification model to determine violations of seat belt use.The YOLO method is used as the basic architecture of CNN which is then combined with LSTM. Figure 4 shows the YOLO architecture with a combination of the CNN and LSTM algorithms.In the initial implementation, the parameter values in the CNN are determined as in Figure 5. Image processing and analysis are performed using a single image object.The object analysis process is aimed at objects on the windshield with various image angles.The image analysis model is a CNN model which consists of input layers, convolution layers, LSTM layers and then connected into one fully connected layer.The extraction process occurs on the image object in the windshield area of the car.In measuring the performance of the training process, it will be compared with several basic detection models used, such as CNN, VGG16, ResNet50, MobileNetV2, Yolo3, Yolo3 and CNN+LSTM.Next, in the testing process, the proposed methods, namely CNN and LSTM, will be compared with the basic method, namely CNN.

Results and Discussions
The dataset was obtained from RoboFlow Universe in the form of 199 front-view car images.The dataset consists of images for the training process, namely 82 images using seatbelts and 78 images not using seatbelts.The validation process uses 40 images, namely 20 using a seatbelt and 20 images not using a seatbelt.The images provided are in bright lighting conditions and the driver is clearly visible.Figures 6 and 7 show several images in the dataset.The seat belt detection process uses the object of the car's windshield so that the driver's use of the seat belt can be analyzed.
The initial step in this detection is to first detect the presence of a vehicle and then take the Region of Interest (ROI) area on the car's windshield.YOLO as a classification model has high detection speed and accuracy.The YOLO model used in this research has good performance, compared to other basic CNN models in seat belt detection.Table 2 shows a comparison of accuracy in the training process.10.
The test results shown in Figure 11 that the Yolo and LSTM models have an accuracy of 89% which was carried out in experiments of 40 epochs and began to converge at the 4th epoch.Meanwhile, in the basic CNN model, the resulting accuracy was 85%.This research has succeeded in testing the accuracy value of the CNN and LSTM models for detecting car drivers' seat belt use.Based on the state-of-the-art research, the results of this experiment have several advantages compared to research that has been conducted previously.Research [5], [14], [15], [17], [21] implemented a detection method using a camera in the car cabin, which has limitations in detecting only one car object.In research [10], [16] applied the use of static cameras and MobileNet v2 to detect several types of violations on many car objects, including the detection of seat belt use.However, such research requires high computation when the volume of detected objects increases.Therefore, the use of the YOLO model combined with LSTM in this research has produced high-accuracy results compared to other models, according to the results shown in Table 2 and Figure 11.

Conclusions
The application of the discipline of using a seat belt while driving is very important to be able to reduce the impact of an accident.With the increasing rate of traffic casualties, especially among car drivers, enforcing regulations on the use of seat belts has become very important to effectively protect driver safety.For this reason, in this research, the development of a traffic violation detection system is focused on detecting the object of seat belt use among car drivers.This study uses the YOLO model to detect objects and the hybrid model for classification using CNN and LSTM.The test results show that the Yolo and LSTM models have an accuracy of 89% which was carried out in experiments of 40 epochs and began to converge at the 4th epoch.Meanwhile, in the basic CNN model, the resulting accuracy was 85%.

Figure 3 .
Figure 3.The process of detecting a violation of the use of seat belts

Figures 8 and 9
show the car and car windshield objects which are the ROI area.The object detection results in this ROI are then classified into violation labels if you do not use a seat belt and do not violate if you use a seat belt.The CNN model is used to classify violations and nonviolations.The CNN model was chosen because it was proven to have good performance for classifying image objects, and there were many references available in image classification experiments.Furthermore, this study uses YOLO which is part of the CNN method.

Figure 11 .
Figure 11.The test results of CNN + LSTM Model Architecture

Table 1 .
Research related to traffic violation detection systems

Table 2 .
Comparison of accuracy in the training process of base models Several parameters in the architecture have been determined including the number of hidden layers, activation function, learning rate, epoch, batch size, optimizer, and loss function.The architectural model used in this research is shown in Figure