Optimizing Internet of Things-Based Intelligent Transportation System’s Information Acquisition Using Deep Learning

This work first discusses the Intelligent Transportation System (ITS)-oriented dynamic and static Information Acquisition Models (IAMs) and explains the information collection mechanism of the Internet of Things (IoT)-based ITS. The goal is to improve travel conditions and contribute to a better urban environment. In order to do so, the Faster Region-based Convolutional Neural Network (Faster R-CNN) is introduced to extract the IoT-based ITS’s electronic data features. It is observed that the Faster R-CNN has excellent recall and accuracy in extracting the features from the ITS electronic data sets. Specifically, the Faster R-CNN’s average recall and accuracy reach 83.89% and 86.79%. The accuracy is 6.20% higher than the R-CNN method. Thus, the Faster R-CNN algorithm features more robust and reliable performance for collecting and analyzing ITS data. Overall, this work examines ITS-oriented electronic information collection and automatic detection against the technological background of applying Computer Vision, Deep Learning, and IoT in urban traffic management. In particular, it explains the IoT-based ITS’s electronic information collection mechanism under Deep Learning (Faster R-CNN). The finding offers a theoretical foundation for implementing Deep Learning technologies in collecting ITS-oriented big data and smart city construction.


I. INTRODUCTION
With the urban expansion, vehicle ownership surged, causing increased traffic congestion and accidents. As a result, sustainable urban development, especially smart city construction, faces severe challenges [1], [2]. Hence, it has become the top concern of all social communities to tackle traffic problems. To this end, the Intelligent Transportation System (ITS) might come into play and offer a new silverling for traffic problems [3]. Among all difficulties, acquiring big traffic data is the most challenging.
Indeed, the complex urban conditions and overcrowded population have left the conventional way of traffic control almost paralyzed [4]. Thanks to modern technologies, such as the Internet +, Big Data Analytics (BDA), cloud computing, and Artificial Intelligence (AI), ITS-assisted urban traffic The associate editor coordinating the review of this manuscript and approving it for publication was Jjun Cheng . management is now possible [5]. So far, research on efficient traffic data collection, processing, and data-based prediction is most prevalent in ITS-related literature [6]. For example, Han studied the on-road vehicle-oriented Information Acquisition Model (IAM) and feature recognition framework based on Deep Learning. They found that the classification accuracy of the fine-tuned multitask GooGleNet was 99.5%. The positioning accuracy was much higher than in similar research works [7]. Kundu

et al. developed a new Deep
Learning model for botnet detection and classification and applied it to detect traffic records. It was observed that the developed model was superior to the existing Machine Learning model by clearly explaining model decision-making [8]. Wang et al. suggested a Flow2graph method to predict Industrial IoT-oriented network traffic conditions. The isomorphic graph network was introduced to predict future traffic conditions. Finally, the advantages of the proposed model were verified through experiments [9]. Fang et al. proposed a heterogeneous Autonomous Underwater Vehicle (AUV)based auxiliary IAM. The new system maximized the energy efficiency of Internet of Underwater Things (IoUT) nodes in AUV trajectory calculation, resource allocation, and the Age of Information (AoI) scenarios. The simulation results verified the effectiveness and superiority of the proposed strategy [10]. To enhance video recognition accuracy, Peng et al. aggregated the adjacent frames' attributes into the current frame space [11] by using Deep Learning in trajectory detection. Then, Feng et al. used two multi-layer Convolutional Neural Network (CNN) models based on the Region Proposal Network (RPN) structure. This structure combined time and context information by adding background suppression and multi-scale training on the target detection framework [12]. Fang et al. designed an Active Queue Management (AQM) strategy for IoUT nodes. The node transmitted the latest data packets to the destination to adjust the sleep schedule to match the network requirements. The results verified that AQM-based IoUT nodes featured lower Peak Age of Information (PAoI) and energy costs than those with non-AQM strategies [13]. Wei et al. established an Unmanned Aerial/Surface/Underwater Vehicle (UAV-USV-UUV) network and designed an energy-oriented target search model based on the network. Their research solved the target search problem using Deep Q-Network (DQN) algorithm. The simulation results showed that the proposed scheme was suitable for underwater target information search, with an extremely high success rate [14]. In view of the above literature, domestic and foreign scholars have made great progress in the theory and application of static and dynamic IAMs. Nevertheless, there is still a need to improve the static and dynamic information acquisition and monitoring related to ITS, given its low accuracy and slow computing speed. Precisely, the robustness of traffic vehicle recognition and vehicle recognition is insufficient in the dynamic monitoring environment. The training data of the vehicle recognition model is distributed unevenly, and it is challenging to find and identify photos with small feature sizes. Thus, it is urgent to solve these scientific problems.
Thereupon, to improve urban travel conditions and promote smart city construction, this work suggests its innovation points. First, the IoT-based ITS-oriented dynamic and static IAMs are discussed. The information collection mechanisms of the IoT-based ITS are described. Next, Faster Region-based Convolutional Neural Network (Faster R-CNN) is introduced to discover and identify intelligent traffic information. The results are analyzed in terms of accuracy and recall. The research result can help urban managers accurately understand traffic information and has important practical significance in smart city construction and urban traffic management.

A. ITS-ORIENTED IAM
ITS is a large-scale complex system integrating modern technologies. Each subsystem might apply techniques such as communication, information processing, big data, intelligent sensing, and IoT technologies [15].  framework by carefully using the computing, communication, and storage resources of ground stations, AUVs, and IoUT equipment. The simulation results showed the superiority of the proposed scheme [16]. Technological approaches applied in ITS are closely related. They are integrated as an ensemble system featuring systematicness, advancement, and comprehensiveness. Table 1 illustrates the difference between dynamic and static IAMs.
The ITS control stations' sensor deployment and specific applications are investigated [17]. Consequently, coil, magnetic, microwave, infrared, video surveillance, acoustic, vehicle, and video image processors account for 37.75%, 0.11%, 7.13%, 11.35%, 17.03%, 8.17%, 3.92%, and 14.55%, respectively. Microwave radar can also gather electronic data. Fig. 1 explains the microwave radar-based electronic data collection. Fig. 1 shows that the radar beam is sent to a vehicle passing through the coverage area and is reflected back to the radar antenna and the receiver. The reflection information monitors and calculates traffic data, such as road flow, driving speed, and vehicle length. Based on this, this section uses the Kalman filter to estimate the collected ITS's electronic information by combining the advantages of dynamic and static traffic IAMs [18], [19]. Eqs. (1) and (2) show the prediction process of the Kalman filter.
Here, θ ′ k is the predicted value, and A represents the state transition matrix. θ k−1 is the actual state vector at k − 1. B, u k−1 represents the parameters of the system model, and Q represents the covariance matrix [20]. The Kalman filter is upgraded by Eqs. (3) ∼ (6): In Eqs. (3)-(6), C is the observation matrix, and s k represents the noise vector of the state transition process. z k represents the observation vector at time k. R represents the covariance matrix, < θ k > represents the estimated value, K ′ k represents the Kalman gain, and I represents the observation error [21].

B. IOT-BASED INTELLIGENT TRANSPORTATION MONITORING
IoT can connect any items wired or wirelessly to the Internet following specific protocols. Apart from sensing devices, IoT involves Radio Frequency Identification (RFID) technology, sensors, the Global Positioning System (GPS), and laser scanner techniques. Then, these IoT-native items can communicate and update to realize intelligent tracking, identification, and monitoring [22]. A typical IoT architecture is shown in Fig. 2.
As shown in Fig. 2, the perception layer supports the overall IoT design. In other words, IoT data are collected and acquired via the perception layer. At the center of an IoT architecture is the application layer, and the network layer frequently transmits vast amounts of data. The network layer's responsibilities are storage, retrieval, security, and privacy protection. In companion, the application (management) layer is in charge of efficiently integrating and exploiting the data gathered by the perception layer using big data and cloud computing technologies. Also, the application layer is closer to the user end and provides industrial-specific applications [23]. Fig. 3 shows the implementation of the full connection of the perception layer.
In  sensitive and comprehensive sensing ability to realize low power consumption, miniaturization, and low cost [24]. Fig. 4 shows the implementation of the network layer supporting the business platform.
As per Fig. 4, the network layer connects the perception and application layers. Access networks can be different types of wireless/optical fiber access forms, large bandwidth networks with unified Internet Protocol (IP) protocol in the core network, unified management deployment, and operation support of business platforms. Today, 5G (5th Generation Mobile Networks) + AI, intelligent cameras, and other intelligent acquisition devices can offer high transmission  rates, high bandwidth, high reliability, high-definition pictures, and low delay. These technologies in the network layer help the traffic command center to make accurate, effective, and fast safety prevention decisions while alleviating traffic problems [25]. Better still, BDA can process complex traffic videos and images and feed them to ITS. For example, by analyzing big video data of people and vehicles, the ITS can calculate the changes in vehicle space state, traffic density, and road conditions and predict possible traffic congestion. Based on this, ITS can help relevant departments formulate traffic management strategies timely, such as adjusting road signals. Furthermore, the ITS can utilize the IoT's tracking and positioning technologies to locate accidents, evacuate, and give early warning of crowded areas.

C. INTELLIGENT TRAFFIC INFORMATION DETECTION BASED ON DEEP LEARNING
Faster R-CNN, a Deep Learning-based object detection model, puts forward an ''anchor'' and uses CNN to generate regional candidate regions. CNN is mainly used for object detection. The previous architecture used traditional visual algorithms, such as selective search, to generate target candidate frames. In contrast, CNN was only used for feature extraction or final classification and regression [26]. Inspired by SPP Net (Spatial Pyramid Pooling Networks), Fast R-CNN inputs the whole image, not every candidate area, into CNN. Then, features are extracted to obtain a feature map, and then uses RoI Pooling to map candidate areas of different sizes to a uniform size [27]. In addition, it uses Softmax instead of Support Vector Machines (SVM) for classification tasks. Finally, the connection layer, classification, and regression tasks share network weights [28]. Fig. 5 shows the structure of Faster R-CNN.
Four key contents of Faster R-CNN may be identified in Fig. 5. In the CNN, the Conv layer is the target detection method. Faster R-CNN initially employs a collection of fundamental conv+relu+pooling layers to extract the picture feature map. The RPN network layer shares the feature map that comes after the fully connected layer. The RPN network is leveraged to create regional proposals. This layer uses a bounding box area to adjust anchors after determining whether they belong to positive or negative anchors using Softmax to provide appropriate suggestions. In order to select the target category, the RoI pooling layer gathers the input feature map and proposal, extracts the proposal feature map after integrating this data, and then transmits it to the following fully connected layer. The prospective feature map is utilized to determine the possible category, and at the same time, the box area is bound again to determine the detection box's final exact position. In order to determine the corresponding probability for each category of electronic information, the Softmax activation function is utilized in the last step. In order to determine the final location information and its related confidence level, the location of the candidate traffic electronic information is regressed. Following regional pooling, Eq. (7) illustrates the connection between each characteristic pyramid's level and size.
In Eq. (7), k 0 represents the number of achievements in a pyramid, w represents the length of the pool area, h represents the width of the pool area, and size represents the size of the input model [29]. The training loss of RPN reads: In Eq. (8), p i represents the probability of the first anchor target predicted by the network, p * i represents the corresponding actual value, t i represents the parameterized coordinate, which can predict the offset of the area and the anchor target. t * i represents the corresponding real value, N cls represents the setting size of the mini-batch, and N reg represents the number setting of anchor positions, that is, the feature map size [30]. Eqs. (9) and (10) calculate Faster R-CNN's sampling operation during training: Here, R(x) represents interpolation, and the regional coordinates change to (x 1 , y 1 ), n represents the number of rows, and m represents the number of columns in the divided region. (x 0 , y 0 ) represents the original regional coordinates, and w 0 and h 0 represent the width and height of the region, respectively. Eqs. (11) ∼ (14) show the specific calculation.
The Fast R-CNN optimizes R-CNN by using RPN to generate candidate areas and discarding the selective search algorithm. The R-CNN is the first time to apply CNN to target detection. It uses a selective search algorithm to obtain target candidate regions. Then, candidate regions are scaled to the same size and input into CNN to extract features and classify them by SVM. Finally, the classification results are regressed. The whole training process is very tedious, and it is necessary to fine-tune CNN+ training SVM+ boundary regression, which cannot achieve end-to-end [31]. Table 2 shows the training and testing methods of Fast R-CNN.

D. DATA AND SYSTEM CONSTRUCTION ENVIRONMENT SETTINGS
The experimental environment is configured with an Inter® Core (TM) i5-7200U CPU@2.50GHz; Random Access Memory (RAM) 8.00GB; GPU (Graphics Processing Unit): NVIDIA GeForce 94MX. The software environment is Windows 10 64-bit. The Mirror Traffic electronic information data set is selected as experimental data. Mirror Traffic uses image recognition and tracking technology to identify and track traffic participants in real road images from real road traffic data in China. It filters the extracted tracks, finally obtaining track data of various vehicles and pedestrians. Data sets cover various road types (ramps, straights, bends, intersections, etc.) and various traffic flow states (small, medium, congested, etc.) and include various vehicle types and pedestrians. Parameters of the Fast R-CNN network are set as follows: Conv layer: kernel_size = 3, padding = 1, stride = 1. Pooling layer: kernel_size = 2, padding = 0, stride = 2.  Fig. 6.

III. RESULTS AND DISCUSSION
The comparison results of the Faster R-CNN with the R-CNN will be dissected in detail. First, Fig. 6a shows the amount of information collected. Apparently, the information collection capacity of the Faster R-CNN0 is significantly better than the R-CNN in different lanes. Actually, the Faster R-CNN has collected more information than needed. This may be due to the interference of unconsidered objects, such as motorcycles near the edge of the road. The faster R-CNN algorithm counts motorcycles and other objects as the vehicle. Further, Fig. 6 b shows the detection accuracy. Obviously, the detection accuracy of Faster R-CNN is significantly higher than that of the R-CNN algorithm, at least 6.20% higher. This may be due to setting a certain threshold filter in the Faster R-CNN algorithm, which significantly improves the vehicle detection accuracy. Therefore, Faster R-CNN further improves the detection accuracy and features more robust performance in intelligent transportation electronic information acquisition.

B. EFFECTIVENESS ANALYSIS OF FASTER R-CNN IN INTELLIGENT TRANSPORTATION INFORMATION DATA SET
Next, the Faster R-CNN algorithm is verified in an intelligent transportation electronic information data set. Fig. 7 shows the test results.  According to Fig. 7, with the increase of the samples, the recall and accuracy show less fluctuation and tends to stabilize. The average recall and accuracy of Faster R-CNN in the intelligent transportation information dataset is 85.10% and 86.79%. At the same time, when the sample data volume is small in the traffic electronic information dataset, there is a large fluctuation. This may be because the complexity of the collected traffic electronic data feature maps varies greatly. Training Faster R-CNN produces many candidate regions, which reduces the test accuracy, thus leading to the test result deviation. With the increase of sample data, the accuracy and recall tend to stabilize.

C. DISCUSSION
The electronic information data set Mirror Traffic is chosen as the experimental data, and the IoT is employed to create the information network connectivity of ITS. Traffic congestion can be predicted using changes in vehicle space status, traffic density, and road conditions. These changes can also be utilized to predict and adjust intelligent transportation road signals in real-time [32]. The acquisition, positioning, and monitoring of ITS-oriented information (such as on-road people and vehicles) can be realized through IoT data fusion and feature & association extraction. The comparison analysis is completed between the Faster R-CNN and the R-CNN in the context of Deep Learning [33], [34]. Then, the Faster R-CNN's performance on the IoT-based ITS is assessed in terms of detection accuracy and recall. The findings demonstrate that the Faster R-CNN outperforms R-CNN on different intelligent transportation data sets. Specifically, the Faster R-CNN algorithm has an average recall and accuracy of 85.10% and 86.79%. The detection accuracy of Faster R-CNN is significantly higher than R-CNN.

IV. CONCLUSION
Based on Computer Vision, Deep Learning, and IoT technologies, this work designed an ITS-oriented IAM based on Faster R-CNN. The proposed model accurately and efficiently detected and recognized intelligent transportation information. Then experiments were designed to evaluate its detection accuracy and recall performance. It was found that the average recall and accuracy of Faster R-CNN were 83.89% and 86.79%. Its detection accuracy was 6.20% higher than that of the R-CNN algorithm. These findings can provide a theoretical basis for applying Deep Learning models in ITS' electronic information acquisition. Last but not least, some research limitations are expected to be further explored. First, the electronic traffic data is not explicitly categorized due to the intricacy of the ITS objects. Neither the influence of light intensity on electronic information collection nor the factor of light is taken into account while gathering traffic-related data electronically. The most crucial element for electronic data collecting, however, is light. In further studies, the ITS items will be accurately identified, and the designed Deep Learning model will be optimized to lend better to traffic information acquisition.