Multithreading Application for Counting Vehicle by Using Background Subtraction Method

Image and video processing has become important part in intelligent transportation system (ITS) application, especially for collecting road traffic data. Pictures that already collected by a charged coupled device (CCD) camera usually being processed by several image processing algorithms and the application’s code will be executed in a large number of iteration because many algorithms are getting involved in processing the frame which captured by the camera. Typical application will process the first frame until finish and then continue to the next frame, so the application must wait until the first frame being processed. If the algorithms that executed quite complex and have a significant running time there will be a dropped frame and the time difference between data acquisition and real time video is divided by large margin. We proposed an implementation of multithreading to boost the application performance so the data can be acquire in real time and every new frame could be processed in short time. The application performance before and after using a multithreading is known by comparing the data acquisition time that stored in the database. The application effectiveness could define by running a multiple video streaming in same resolution. Keywords—background subtraction, image, intelligent transport system, multithreaded, parallel processing  ISSN (print): 1978-1520, ISSN (online): 2460-7258 IJCCS Vol. 14, No. 3, July 2020 : 309 – 318 310


INTRODUCTION
In recent years, image processing has been used for collecting the road traffic data by using camera as a sensors. As proposed by Anandhanarayanan and Govindaraj (2013) Image processing encompasses processes whose inputs and outputs are images and extract that images attributes [1]. Based on the definition, the image data that stored on the storage will be available to be processed as an input by some image processing algorithms. The real time processing is needed here in order to maintain the algorithm's speed in processing every frame. The multiple frames which came from different resources couldn"t be processed at the same time without fast time execution. We prefer a multi-threading approach to increase an algorithm speed so it could provide the data extraction in a real time.
Shanty and Anthony (2009) stated that multi-threaded process has multiple points of concurrent execution within the process [2]. The situation here where the single thread waiting for a resources pictures / frames made we need run another thread to process another frame from different resources. We use concurrency to run multiple threads in a single process. Figure 1. Proposed by Alda and Silvana (2013) show two threads within a process communicating each other through shared memory [3]: Figure 1 Communication between a pair of threads in a single process The thread in Figure 1 shared the same address space. Thread 1 capture frames and thread 2 consumes the frames for image processing. So multiple frames grabbed and processed in different threads by using a circular buffer The producer consumer problem which proposed by Dijkstra (1972) has purpose to make save way of passing tasks from producer to consumer threads. One or more producers are generating and placing data in a buffer, then one or more consumers are taking items out of the buffer [4]. In this work we only use 1 producer and 1 consumer thread, and try to utilize the idle time while waiting for the next frame to do the processing. If the capture thread grab the frame at 20fps (50ms) and the time that we need to capture the frame is only 2ms, in this case we have 48ms idle time. We could use this idle time to do some processing and it must be shorter than 48ms. If the processing is too long and exceed the 48ms, the frame/image will put as queue in the buffer. If buffer already full then we need to drop/discard some picture in the buffer.  Because we use multiple cameras and each camera have own capture and processing thread, so we will try to make the same buffer can be accessed by two or more threads at the same time on different parts. It means the different processing threads from each cameras could access the buffer at different parts (this model is depicted in Figure 2). The producer consumer model presented above makes it possible to write highly concurrent multithreaded applications and it could exceed the mutex-based program if we use a multiprocessor machine, since more than one threads could access the different parts of buffer at the same time.
Outcome that expected by using multithread applications is to get the time precision data from extracting the real time video. The idle time that available between each frame capturing is being used to speed up the image processing algorithm, so the data could derived from the image in no matter of time. With this outcome we could process the traffic data in real time with no delay and give precision output in collecting the road traffic data (no information loss that caused by dropped frame).

METHODS
In this section will be explained about the research design that used to measure the effectiveness of multithreading implementation in speed up the image processing process. We tried to compare between the method that didn"t use the multithreading process and the another one that implementing the multithreading. In Figure 3 we could see that if we didn"t use a multithreading, the frame from each devices will be processed by separate algorithms. Which means that the frame from the "device 1" will have it"s own capture and preprocessing unit, as well as "device 2". So the idle time couldn"t be used to process another frame. Before the first frame has been processed by the algorithms, another frame must wait for 48ms (times required until the first frame finished being processed). Figure 3 above presented the research step and design of this paper:

Input
The data inputs are comes from every camera devices that used to capture an image. In this case the video was captured to be analyzed by image processing.

2 Process
Video that has been captured will undergo a processing stage. In this stage performed the video processing using a background subtraction algorithm [5]. After the vehicle could be detect as an object, we use a two line counting method to capture the time when the vehicle passing the counting line. All image processing step shown by Figure. 4.

Figure 4 Diagram Block for Object Detection
For background subtraction, we used the Pixel Based Adaptive Segmenter (PBAS) [6] which is base on ViBe algorithm [7] and Sigma-Delta algorithm [8]. We used PBAS because this algorithm has a good background modeling. PBAS use a non-parametric background modeling, so the background is modeled by a history of recently observed pixel values [9]. On the other hand sigma delta (-) was lightweight algorithm and has a good processing speed, but we don"t use it alone here because we want to see the effect of the multithreading in speedup the video analysis process. (using - alone is too lightweight) For masking after the video recordings are processed using a background reduction algorithm, the current image only consists of binary images, each pixel can only have 2 added values. Foreground is usually generated by pixels. However, in order for the foreground to be more easily distinguished from the background, we need to refine this foreground and reduce existing noise by using the masking method or also called filtering. To determine the binary foreground mask, [10] suggests the use of a low-pass filter (LPF) by: The advantage of the Eq. above is the noise reduction in the foreground mask. where T is the threshold. However, Eq. (1) results in a foreground mask with real values [0, 1]. In the lower right image can be seen by masking some of the noise that appears in the left image becomes no longer visible [11]. For tracking a blob (blob tracking), we used a tracking algorithm based on "Appearance Models for Occlusion Handling" [12] because this algorithm has a good performance in complex environment. In the last step to count volume and vehicle velocity, we proposed a twoline counting method. Until the third step we use existing algorithm but for last step (vehicle count) we tried to implement the method that we proposed.
The first parameters can be obtained by activated the GetTickCountFunction (the function to count the running time) between the begin and end process in application. For the second parameter could be gain by using a two counting line method. Here is the brief explanation about two counting line method that adapted from Pratama et al. (2016).
The vehicle in the video that already detected as an object will have a centroid that represent the vehicle itself. This centroid located in the middle of rectangular mark that surrounded the vehicle. To obtain the time that needed by centroid to travel from the first counting line (counting line 1) to an additional counting line (counting line 2) we use a GetTickCountFunction in every line. We could see in Figure 5 line 2 serves as "counting line 2" and line 1 as "counting line 1 [13]". The pseudocode to count a time that needed by the vehicle that move in x coordinate to reach line 2 from line 1 ( Figure 5) could be seen below. If we want automatically detect whether the vehicle move in x or y coordinate, we need to added the new "conditional if "into the pseudocode line. In pseudocode above when the state of centroid were change from one state to another state then the GetTickCount function will capture the current time at instant. When the state change from A  B the t 0 time will be captured and this also applies for t 1 time that will be taken when the state change from B  C. After we get t we could use it as second parameter by averaging all t that stored in the system database. For the first parameter we could use a system date and time to capture the time when the first and the last object being detected in the video, after that we can know the running time to process the 10 minutes video.

3 Output
The output of the application is the difference value between the application that didn"t use a multithreading and those who used multithreading. We include the performance comparison testing between the application that use multithreading with different number videos running in background (1 until 5 video) and the application without multithreading feature. We will see the number of queue in the buffer and how long the application could running until it drop some frame if the buffer already full. Table 1 is the data output that will be stored in the database and used to analyzed the application performance. "System time" will become the first parameter and the "time to travel" become a second parameter to be analyzed. The number of centroid that passing counting line 1 and 2 could be seen in column line 1 and 2 respectively. The number of vehicle which represented by centroid in line 1 is higher than line 2, this happened because there were some centroids that hasn"t been passed yet the counting line 2. System time and time to travel will be updated when there was a centroid that passing the counting line 2. This applied because for the "system time" will be ended when the last centroid passed the counting line 2 also for "time to travel" is completed when the centroid passed the counting line 2.

RESULTS AND DISCUSSION
In this experiment we use the hardware with current specification: Processor: Intel® Core ™ i5 CPU @ 2.5GHz, Memory(RAM): 4 GB 1600MHz, Harddisk: 500 Gb and the video that used for the test case have a 140x180 resolution ( Figure 6). We use a low resolution video here to speed up the processing progress. Since the hardware that we used has a limited resources. Table 2 below is the comparison of first parameter (time that needed to process whole video) that already made between the application that use a multithreading (varies between 1 until 5 videos) and those which doesn"t:  10.32 w/multithread (4 videos) 10.54 w/multithread (5 videos) 12.37 We could see that running time of the application with multithreading is outrun the application without multithreading, except if the multithreading application open and processing 5 videos simultaneously. In table 3, the second parameters that shown an average time that needed by the vehicle to travel between the line 1 and line 2, it seems that the multithreading application still outrun the application that didn"t implemented the multithreads. Even though 5 videos play simultaneously, the multithreading application still have advantage in terms of performance.

Figure 6 Test Case Video
To see the frame queue behavior in this application, we set buffer size into 999 frames, so it can hold 999 frames in the queue. The greater number of videos that play simultaneously in the application, the faster the frame queues will grow and the buffer being filled. We will see the frame queue growth rate in the buffer if we run more than 1 video simultaneously. Based on observation the maximum frame that piles up in the buffer when running 1 video is up to 16 frames. This happen when there were many objects that must be detected in the video. After that, the frame will reduce again to zero because the processing rate could overcome the capture rate. Below (Figure 7) is the image of the application and its buffer when running.

Figure 7 Test Case Video
If we run 2 simultaneously video, the percentage of buffer that being occupied by the frames is around 22%. After some periods, it will reduce again and fluctuated between 0 until 22 percent but never overcome the 22%. But for the other case, when we run 3 or more videos simultaneously, the buffer will be full for a certain amount time and never been reduce again. This happened because the capture rate exceeded the processing rate. The frame rate of growth is faster here because it being supplied by 3 video resources. The relationship between time and the growth of frames can depict on Figure 8. The polynomial functions (Figure 8) for every parallel running videos are shown an increasing frame growth in the queue during the application running. The greater number of videos that running parallel, the more steep the gradient function. We get the polynomial function for the application that running 3 videos simultaneously as below: y = -3x 3 + 27x 2 -44x + 19.2 (2) and for 4 videos that running simultaneously, we get the different function: y = -2.1667x 3 + 15.429x 2 -0.4048x -14.4 (3) If we see both functions, we could conclude that the more running videos that run simultaneously, the more steep the gradient function would be.   We also can see from the chart that the greater number of videos that run simultaneously the more faster the rate of grow frames in the buffer. We have been seen that the polynomial functions that obtained from the regression process show us the same thing.
If we see the number of object that being detected during some period of time, we can conclude that the number of object that could be process by multithreading application is larger than the non multithreading application. It show on the Figure 9 that in shorter time the more number of object could be processed by multithreading application. This method could be used to build a vehicle counting application for the real time condition. Because if we don"t use the multithreading there will be delay in processing frame that will lead to application crash due to buffer overflow. The delay is about 3.14 minutes if the application only running 1 videos with 10 minutes duration. If we know the relation between time and the rate of growth frame in the buffer we could avoid such condition that leading to non-optimal performance. From the experiment that already done it can conclude that the multithreading could speed up the image processing in the application and improve the application running time. To prevent the buffer overflow if we run plenty videos, we could use a drop buffer scheme if the buffer already full.