Webcam-Based Bus Passenger Detection System Using Single Shot Detector Method

Buses are one of the most widely chosen transportation methods to support the mobility of the Indonesian people. Mobility that is often found in addition to public transportation, is also often found in the mobility of tourism tour activities for a travel group. The number of tourist destinations to which passengers go up and down makes the assistant bus driver or group leader work hard to ensure that the number of passengers boarding the bus matches the number of groups. It often takes a long time to ensure the accuracy of the number of passengers before departure to the next destination. This conventional method results in the delay of the tourism tour schedule. In this research, the author designs a webcam-based bus passenger face detection system using the Single Shot Detector (SSD) method that can provide real-time information to bus drivers, assistant bus drivers or group leaders. The results obtained by the system obtained an achievement of 95% of the total system creation along with testing the detection of bus passenger faces in actual conditions resulted in an average accuracy of 77.5%.


INTRODUCTION
Traveling is one of the needs of some Indonesians.Based on a survey organized by Kompas Research and Development, it shows that the majority of respondents have traveled.From the data of the Ministry of Tourism, the number of domestic tourists in 2014 reached 251 million people, which means that the number of Indonesians traveling to tourist attractions is close to the population of Indonesia [1].
Tourist buses are the type of transportation that is often chosen by tourists, with the availability and scheduling system that is more flexible in setting schedules at tourist destinations.Seeing the demand from tourists regarding this transportation need, many private companies are competing to provide comfort and safety while still paying attention to the level of comfort based on the number of passengers [2].
Tourists as consumers of tourist transportation are not infrequently harmed by the transportation service provider with the existence of several problems that occur such as delays in tour departure schedules due to the lack of rapid delivery of information on the number of passengers on the tour bus in each passenger up and down, both when the rest schedule or visiting schedule at one tourist spot, due to manual counting by the assistant bus driver or group leader.In fact, it is ineviTable that passengers are left behind, so that the tourism bus makes a U-turn to the place where the passengers are left, which results in further increasing the delay in the bus schedule [2].
To overcome and as a preventive measure for possible errors due to lack of information on the number of passengers before departure, through the development of artificial intelligence and internet networks, various Automated Process Control (APC) technologies have been developed.Some of them use infrared methods, pressure sensors, or use computer vision systems.Infrared is the most widely used technology, but it is stated that this method has poor accuracy if the number of people passing through the sensor is too large.[3] The number of passengers can also be calculated based on seat status by placing a pressure sensor on each seat.This method also does not guarantee the accuracy of the data provided, because the system will only detect based on weight (it could be that what is placed on the chair is goods and not people).This method also requires high costs due to the placement of hardware in each chair [4].
Based on the description of the problem above, the author chose the SSD (Single Shot Detector) method.SSD is one of the popular approaches from three approaches namely SSD, You Only Look Once (YOLO) and Region-proposal Convolutional Neural Network (RCNN).RCNN provides accurate results but a long process while YOLO provides fast results but less accurate than RCNN, therefore SSD is the best choice, which provides more accurate results than YOLO although not more accurate than RCNN and provides faster results than RCNN although not faster than YOLO, even so SSD also has several disadvantages such as not being very good at detecting small objects [5].With this research, it is expected that the system can detect bus passengers through faces that have a high level of accuracy and have a high level of efficiency so as to minimize passengers who are left behind on tourism tours.

METHODS
In Figure 1 is a block diagram of the system to be designed.The diagram can be described as follows: 1. Video data input is the initial input in this research, where video images can be obtained from camera recordings installed on the bus, in a position at the front, so that all passengers can be seen clearly.2. Using the camera hardware system, image acquisition is the process of obtaining images from video sources.This is the most crucial phase in this method's workflow because erroneous drawings will render the entire process ineffective.To ensure that machine vision 39 systems are able to interpret digital photographs of objects rather than the actual thing, it is crucial to obtain images with appropriate contrast and clarity.3. Grayscale images have a pixel depth of 8 bits, or 256 degrees of gray.They are processed with the lowest intensity value representing black and the greatest intensity value representing white.4. Proceeded with The process of altering an image's resolution or its horizontal and vertical dimensions is known as resizing.Reducing or increasing the resolution as necessary will change the resolution size.The device can then detect every face seen inside the video frame.
After the face is discovered, a drawbox appears to indicate that the face has been detected.5.The process of converting unprocessed data into numerical features that may be handled while keeping the information in the original data set is known as feature extraction.
Compared to directly applying machine learning to the raw data, it produces better outcomes.6. Face alignment recognizes the geometric structure of people's faces in digital photographs.
The shape of facial features, such as the nose and eyes, is automatically determined based on face position and size.The face alignment algorithm is used to iteratively change a deformation model that encodes past knowledge about the shape or appearance of the face.7. Face detection is a technique that recognizes a person's face from digital photos or objects acquired by a camera.Face detection's primary job is to identify faces rather than recognize and analyze them.Then, this technology can be used to monitor and track someone in realtime.Following the final phase, a complete face may be described, including the detection results in the bounding box, facial markers, and derived descriptors.The detection can then be displayed by creating a bounding box on the canvas and delivering a full face description, as used in facial recognition.
. Then it will open the webcam display that has been integrated on the computer.Figure 2 is a webcam display that has been accessed to the web and has been integrated with the computer.

Figure 2 webcam display 2.2 Image Acquisition
The first stage of this system is image acquisition, which is taking images from various sources such as datasets or images, which are generated by a webcam integrated with a computer and will later be processed training and data validation using a single deep neural network technique.Illustration of the placement of this detection system can be illustrated in Figure 3 and 4.

Processing Single Shot Detector
Single Shot Detector (SSD) is a method used to detect certain objects in images or videos using a single deep neural network and Single Shot Detector (SSD) is one of the most popular object detection algorithms used because it has better accuracy and speed of image processing when compared to other methods such as FasterR-CNN, YOLOv1.Figure 5 is the architecture of the Single Shot Detector (SSD).In the face recognition and identification process based on face recognition using the SSD method, there are also some simple programming paths where the face recognition process starts, the system stops when the condition is met, and the program resumes when the condition is not met.repeating the initial process until the system detects the face.The author's face recognition program uses Visual Studio Code and XAMPP and works online.The algorithm used by the author is Single Shot Detector.

Face Alignment
Face alignment is the following phase.In this instance, a modest yet precise 68-point facial landmark predictor from Face-api.js is employed.The entire model fits in web contexts and weighs 200 kilobytes.The expected 68-point landmarks are then aligned with conventional landmarks using this procedure.A face alignment example is shown in Figure 6.
The face has then been positioned for facial recognition by the system.Architectures like Single Shot Detector (SSD) are used to calculate facial descriptors in order to conduct facial recognition.A vector with 128 values that depicts a face in a 128-dimensional vector space is called a face descriptor.Furthermore, the euclidean metrics demonstrate the similarities between several vectors.In Figure 7, the euclidean metric is displayed.An essential first step in identifying human faces is face detection, which divides the image into two halves.The program will have very big computing limits because the first half will recognize faces and the second part will represent non-face areas.As a result, in order to reduce computation in an active program, one must ascertain a calculation limit that reduces the calculation area.In addition, face detection performs the flexible localization of the face region, enabling it to recognize all inputs inside the frame area and distinguish between faces and nonfaces [4] [15].In this work, the Single Shot Detector (SSD) technique is used for face detection.Now, the flowchart will be displayed in Figure 8.

Figure 8 Flowchart of Face Detection and Tracking
After completing the face detection and tracking procedure, the process of detecting faces in the bus is depicted in Figure 9.The method of monitoring unique features, such points and contours, in a facial image is known as feature extraction and tracking.The nose, lips, eyes, eyebrows, and other facial features are among those that have been removed.The accuracy of face detection will also be high if the computation on the facial contour is successful in yielding a decent value.Face contours are identified by scanning the first frame of the video, where the first pixel representing the skin color that is identified is displayed as the first contour point (either the left or right side of the head).

2.6
Drawbox counter At this stage the program will provide a drawbox in which there are faces of the bus passengers.Then all drawboxes that appear on the system will be totaled.An example of a drawbox that appears on the display can be shown in Figure 11.
Figure 11.drawbox Drawboxes will rise if a large number of drawboxes emerge on the system display, and the results of the total drawboxes that appear will also be presented on the system display, as will the accuracy level of the passenger face detection system.Based on test Tables 1 and 2, it is concluded that the system can perform face detection with variable brightness conditions in the morning, afternoon and night, and can distinguish the detection of faces that are detected intact or partially covered faces.

Testing face detection based on seat distance
At this stage, the author has tested face detection using objects that resemble faces to find out the system can distinguish between objects that resemble humans and human faces.
Based on Table 3 and Figure 14, it can be seen that the system cannot detect objects that resemble objects and can only detect human faces so that it does not affect the performance of the system.Based on Figure 15 and Table 4 the detection of bus passengers by detecting faces through cameras installed towards passengers in Table 4 with a total of 5 samples resulting in an average of 77.5%.This test was conducted on 16 passengers calculated from the actual camera's point of view.From these tests, errors are still found when there are still passenger faces that have not been detected perfectly.This is due to the condition of passengers on the bus who often move and/or do not point to the camera.This can be minimized by instructing all passengers to look at the camera as final data before ensuring departure by the assistant bus driver or group leader.In addition, the placement of cameras with a viewing angle that can cover all passenger faces will provide more optimal data.This research shows that the system can detect faces with an accuracy of 77.5% based on the results of the data that has been done.Face detection uses features extracted in the form of eyebrows, eyes, mouth, nose and others.The bus passenger detection system can be applied.For further development, it can provide information on the number of passengers in bus operations in several sessions.

Figure 1
Figure 1 Block Diagram of System

Figure 3 Figure 4 Figure 5
Figure 3 Bus Plan

Figure 6 Figure 7
Figure 6 face alignment example

Figure 10
Figure 10 is the outcome of feature extraction on the face.When identifying human facial emotions in real time, two factors must be considered: accuracy and efficiency.Face detection and tracking, feature extraction and tracking, feature reduction, and finally separation are all necessary processes.

Figure 9 Face
Figure 9 Face Detection on the Bus

◼Figure 12
Figure 12 (a) in the morning at a distance of 30 cm, (b) in daylight with a distance of 30 cm, and (c) at night with a distance of 30 cm

Figure 3
Testing face detection under actual conditions At this stage, the author has tested face detection with the original conditions, namely on a bus with many passengers.