Advertisement billboard detection and geotagging system with inductive transfer learning in deep convolutional neural network

In this paper, we propose an approach to detect and geotag advertisement billboard in real-time condition. Our approach is using AlexNet’s Deep Convolutional Neural Network (DCNN) as a pre-trained neural network with 1000 categories for image classification. To improve the performance of the pre-trained neural network, we retrain the network by adding more advertisement billboard images using inductive transfer learning approach. Then, we fine-tuned the output layer into advertisement billboard related categories. Furthermore, the detected advertisement billboard images will be geotagged by inserting Exif metadata into the image file. Experimental results show that the approach achieves 92.7% training accuracy for advertisement billboard detection, while for overall testing results it will give 71.86% testing accuracy.


Introduction
Advertisement billboard is an effective commercial media for advertising information about products or services.Currently, advertisement billboard data management has been done using conventional approach, e.g. by capturing advertisement billboard using camera and writing the location details in notes.This approach is very labor-intensive and time-consuming.Hence, a faster approach is needed to extract information in advertisement billboard and the geographic location of the object to improve the data acquisition.
The advertisement billboard detection problem has been approached with many different methods in recent years, mainly for detecting advertisement billboard in sport TV.Medioni et al. [1] use interest point operator, color-based point filter, point matcher, precise lock-in using Sum of Squared Differences (SSD), and predictor using Measure of Belief to detect and substitute billboard in broadcast video.The result shows good performance in detecting advertisement billboard.Cai et al. [2] detect advertisement billboard images in sport TV using Fast Hough Transform for line detection and histogram-based analysis for segmentation.The result shows that the approach achieves 90% accuracy in detecting advertisement billboard.Another approach by Aldershoff & Gevers [3] used histogram back-projection to detect advertisement billboard in soccer broadcast video.The result shows good performance in detecting advertisement billboard.Ichimura [4] detect advertisement billboard in motor sports video using Hessian-Laplace detector and Gradient Location-Orientation Histogram (GLOH) descriptor.In addition, RAndom SAmple Consensus (RANSAC) algorithm based on homography is used to recognize multiple advertisement billboards.Watve & Sural [5] detect advertisement billboard in soccer video using hue slicing and Hough Transform.The result shows that the approach achieves 90% accuracy in detecting advertisement billboard.Orginc [6] uses homography estimation using Direct Linear Transformation (DLT), RAndom SAmple Consensus (RANSAC), and Maximally Stable Extremal ◼ ISSN: 1693-6930 TELKOMNIKA Vol.17, No. 5, October 2019: 2659-2666 2660 Region (MSER) to detect advertisement billboard in broadcast video.The result shows that the approach achieves high accuracy.Ordelman [7] detects advertisement billboard using template matching with Fast Fourier Transform, color matching with Euclidean distance calculation in normalized HSL (NHSL) color spaces, and neighbor voting.The result shows 36% accuracy in detecting advertisement billboards.
Occlusion and environment complexity have been the difficulties in detecting advertisement billboard in previous works.Edge and color detection techniques can only be used in advertisement billboard detection unless the environment is not complex, assuming the background is plain, and the object is not occluded.Furthermore, the previous works depend on edge in detecting the shape of the advertisement billboard.When one of the edges is covered with occlusions, the detection will fail to detect advertisement billboard.In addition, color detection approach works if the advertisement billboard has specific color and has a contrast color to the background.To overcome this problem, a supervised machine learning approach is needed to detect advertisement billboard in more complex environment, e.g.roads with many obstacles such as tree, vehicle, cable, and pole.
Recently, the field of machine learning has made tremendous progress on addressing classification with complex objects.Researchers have found that using a pre-trained neural network model with large learning capacity, e.g.QuocNet, AlexNet, and GoogleNet could improve image classification accuracy.In addition, a Deep Convolutional Neural Network (DCNN) has been proven to solve image classification on hard visual recognition tasks [8].
For advertisement billboard acquisition, geotagging plays an important role in obtaining the geographical location of the detected advertisement billboard.Several approaches for geotagging object using smarphones has been done.Macias et al. [9] adds geotag information in video using angular values from smartphone GPS, 3G networks, and WiFi.The result shows the geotagging achieves high accuracy and consumes a little bandwidth.Sahu and Chakraborty [10] add geotag information by inserting Exchangeable image file (Exif) metadata into the image.The result shows that the geotagging achieves up to 30-meter accuracy from the maps.In addition to overcome the geotagging problem, Rahmat et al [11] improve the geotagging accuracy by using perspective projection.Overall, the geotagging process depends on network connection.
We have reviewed several related works that use Deep Convolutional Neural Network (DCNN) for large-scale image classification.Lin & Chen [12] detect pedestrians using GoogleNet Two Parallel DCNN.The results show 19.57% regression in pedestrian detection.Zhang et al. [13] classify makers and models of cars image using pre-trained DCNN with transfer learning.The results show 79% accuracy in classifying cars.Yanai & Kawano [14] use DCNN for food image classification.The results show 78.77% accuracy for Top-1 prediction using UEC-FOOD100 testing dataset and 67.57% accuracy for Top-1 prediction using UEC-FOOD256 testing dataset.Yan et al. [15] classify objects into 1000 categories from ILSVRC 2012 dataset using Hierarchical DCNN (HD-CNN).The results show 36.66%error-rate for Top-1 prediction and 15.80% for Top-5 prediction.Pasquale et al. [16] identify 50 objects from iCubWorld dataset using DCNN.The results show 86% for identifying 50 objects.Jung et al. [17] recognize traffic signs using Le Net-5 CNN.The results show that the methods can classify 16 street signs.Ouyang et al. [18]  In this paper, we propose a method to detect advertisement billboard by using Deep Convolutional Neural Network (DCNN).First, a pre-trained neural network architecture from AlexNet is used for image classification.To improve the advertisement billboard detection performance, we retrain the network using transfer learning approach.Several other informations such as billboard categories and names also propagated to the system.In addition, for the geotagging process, an Exif metadata insertion is used to store geographical location of the detected advertisement billboard.

Methodology
In this research, the method for detecting and do geotagging actions for billboard image by using smartphone can be described in several phases.The phases can be described as follows, Image Pre-Training process which will be combined with inputted image and retraining process using transfer learning method.After this process finished and the system filled with the trained DCNN, then it can be used to identify a new image taken by Android device in real-time mode.Thus, Fine Tuning Process will categorize six classes, they are scoreboard, monitor, television, cinema, and web site into one class named as billboard.The next phase is feature extraction which can detect billboard image frame by frame in DCNN, then automatic image storage process to the database with minimum accuracy, then we apply geotangging process with the metadata steganopraphy processin Exif file.In the postprocessing phase, we will add more important informations of the billboard image and do validation process.All these processes are described and shown in Figure 1.

DCNN Training with ILSVRC 2012
First, we use Inception-v3 as a pre-trained model for image classification with 1.2 million dataset images and 1000 different classes from ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012.This model uses AlexNet's DCNN architecture with 60 million parameters and 650.000 neurons, consists of five convolutional layers, three pooling layers, and three fully-connected layers.The output of the model is 1000-way softmax resulting in five top predictions of the object.The network contains eight layers with weights; the first five are convolutional and the remaining three are fully-connected.The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels.The network's input is 150,528-dimensional, and the number of neurons in the network's remaining layers is given by 253, 440-186, 624-64, 896-64, 896-43, 264-4096-4096-1000.
Table 1 shows the DCNN process in every layer.The first convolutional layer filters the 224x224x3 input image with 96 kernels of size 11x11x3 with a stride of 4 pixels.The output of the first convolutional layer will be used as an input to the second convolutional layer with 256 kernels of size 5x5x48.The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers.The third convolutional layer has 384 kernels of size 3x3x256 connected to the outputs of the second convolutional layer.The fourth and fifth convolutional layer has 384 kernels of size 3x3x192.The fully-connected layers have 4096 neurons each.

DCNN withTransfer Learning
To improve DCNN features, we retrain the network by adding more advertisement billboard images into the pre-trained model using inductive transfer learning approach.Transfer learning is a process which will modify a value from previously trained dataset, with additional data propagated to the DCNN.The first process is by inputting 300 additional billboard image taken from Google image randomly, and 84 billboard images taken from Android smartphone in realtime mode to the Inception-v3 model and do DCNN Retraining Process.Next process involves a bottleneck process where all the images used for training will be analyzed and calculated.The bottleneck values contain a meaningful and compact summary of the images.By using transfer learning, we can train dataset faster than training from scratch.After we finish the bottleneck process then we start using 8000 training steps in the DCNN which resulted 5 highest preicted objects with the accuracy itself.

Fine-tuning
We fine-tuned the DCNN with 1000 categories using transfer learning approach into advertisement billboard related categories, such as scoreboard, screen, monitor, television, and cinema.Figure 2 shows an example of fine-tuning process showing top five object predictions.The highest advertisement billboard related prediction will be selected as the final output.There are 2 related categories related to advertisement billboard, which are scoreboard with 72.6% accuracy and cinema with 0.548% accuracy.Since the scoreboard accuracy has higher accuracy, the accuracy will be chosen as the final accuracy for the detected object.

Real-time Advertisement Billboard Detection
Advertisement billboard detection is conducted using Android smartphone in real-time road condition based on frame by frame images.The detected billboard image will be automatically geotagged and saved into the database.

Geotagging
We use Android smartphone GPS and 4G networks to obtain geotag information.Then, Exchangeable Image File (Exif) metadata is used to store geographical location, e.g.latitude and longitude value of the detected advertisement billboard.Furthermore, the address of the advertisement billboard image can be extracted using reverse geocoding.Reverse geocoding represents a location in more readable forms, e.g.street name, place name, county, state, country, and postal code.

Post-processing
After saving image into database, we add additional information to advertisement billboard data, such as advertisement billboard name and category.Advertisement billboard is divided into several categories, e.g.food and beverage, telecommunication, bank, insurance, transportation, real estate, education, cigarette, event, campaign, promotion, home products, electronic products, media.These categories is useful for commercial purpose.In addition for advertisement billboard verification, we need to ensure if the advertisement billboard collected is not expired yet.Hence, a routine update is necessary to check whether the advertisement billboard is still active or expired.

Training Results
During training process, each step chooses ten images at random from the training set, find their bottleneck values, and use them into the final layer to get predicitions.Those ◼ ISSN: 1693-6930 TELKOMNIKA Vol.17, No. 5, October 2019: 2659-2666 2664 predictions are then compared against the actual labels to update the final layer's weights through the back-propagation process.
The training is conducted in two experiments, such as experiment with 4000 training steps and 8000 training steps.Table 2 shows several parameters used for training with 4000 and 8000 training steps.The results of the training process show that the experiment with 4000 training steps result in 89.3% accuracy, whereas the experiment with 8000 training steps result in 92.7% accuracy.These show that more training steps result in higher accuracy.Since the training accuracy is higher, we will use the model with 8000 training steps for the detection process.Table 3

Testing Result
The detection process is conducted during a day and in real-time condition.Table 4 shows that the testing results achieve 72.0%.We evaluate several problems affecting the accuracy of DCNN in detecting advertisement billboard.First, given the testing results in Table 3, we conclude that brightness affects the detection accuracy.Another factors also affect the detection process, which are occlusions, e.g.noise, electrical cable, tree, pole, street sign, and other objects that cover the advertisement billboard area.In addition, shooting distance affects the advertisement billboard accuracy.
The nearer the shooting distance to the object, the higher the accuracy achieved.Figure 3 shows two advertisement billboard images with two different shooting distances in which the far shooting distance results in 28.1% accuracy and the near shooting distance results in 87.3%.This shows that the accuracy is higher when the object is nearer.Shooting angle also affects detection accuracy.Figure 4 shows two advertisement billboard images with two different shooting angles in which shooting from front angle achieves 86.7% accuracy and the shooting from right angle achieves 66.1%.This shows that the accuracy is higher when the object is detected in front angle.
In the future, several phase will be added into the system.We will train the advertisement billboard images with deeper DCNN architecture, e.g.GoogleNet's DCNN ISSN: 1693-6930 ◼ Advertisement billboard detection and geotagging system... (Romi Fadillah Rahmat) 2665 with 22 hidden layers [21] or with better machine learning approach, such as Deep Residual Network (DRN) [22].In addition, we will add more advertisement billboard images at night time to improve the detection at night.Furthermore, we will add optical character recognition (OCR) to extract text in advertisement billboard automatically [23].Moreover, we will use estimation algorithm to measure the distance and size of the advertisement billboard to improve the accuracy of geotagging process [24,25].

Conclusion
In this paper, we present Deep Convolutional Neural Network (DCNN) to detect advertisement billboard.The DCNN training results achieve 92.7% accuracy.In addition, the DCNN real-time testing results achieve 71.86% accuracy during day and 16.98% at night.This shows that the detection performs better during day time.For geotagging process, most of the saved images contain its geotagging location.However, the images are taken from a car, thus will have 1-10 metres difference from exact location.It means the real accuration is degredated because of real situation in the field.The optimal range of the system achieves high accuracy under 4G networks.
recognize common objects using Deformable Deep Convolutional Neural Networks (DeepID-Net).The classification results in 50.3% accuracy.Li et al. [19] use GoogleNet Inception with 22 convolutional layers for classifying 1000 objects.The classification results show 89.45% for Top-1 prediction.Martinson & Yalla [20] use Krizhevsky's DCNN with 5 convolutional layers, 3 pooling layers, and 3 fullyconnected layers for classifying 1000 objects.The results show 90.1% under structured light sensor in open lab space, 86.7% under structured light sensor in home environment, 80.4% under stereo camera in an office environment, and 73.9% under time-of-flight camera in a home environment.

◼Figure 1 .
Figure 1.Advertisement billboard detection and geotagging system architecture

Figure 3 .Figure 4 .
Figure 3. Two advertisement billboard images with different shooting distances

Table 2 .
shows the advertisement billboard images training results with 8000 training steps.Overall, the training results achieve 92.7% accuracy.Parameter used for Training Dataset

Table 3 .
Advertisement Billboard Training Results with 8000 Training Steps Summary

Table 4 .
Advertisement Billboard Testing Results During Day