Vehicle Text Data Compression and Transmission Method Based on Maximum Entropy Neural Network and Optimized Huffman Encoding Algorithms

,


Introduction
With the popularity of mobile Internet in traffic, the application of mobile Internet operating mode and the growing number of operating vehicles have resulted in a large amount of data in the actual operation.Although the hardware has made great progress, one of the most important problems is that it still needs a lot of data speed of mobile Internet application solutions.Although the communication cost has been greatly reduced, it can not only meet the actual operation requirements of the application units but also save wireless data transmission costs, which is still a key problem to be solved in the current industry.Text information transmission after compression is key to solving this problem.Moreover, this method can be accepted because it can meet the data transmission delay and data quality indicators formulated by user units in a complex urban communication environment.Many scholars have done a lot of effective practical research in this area.In order to improve the quality and efficiency of wireless data transmission, researchers have proposed a variety of text data compression and transmission algorithms.For example, Meng [1], Shi [2], Sharma [3], and Hashemian [4], and other scholars proposed Huffman coding technology for wireless transmission of GPS text data to solve the problem of text data compression and transmission.Barr [5], Wang [6], Jou [7], and Hu [8] proposed an optimized LZW compression algorithm.By establishing a fast lookup dictionary method, the time of data compression is greatly reduced, optimized compression is achieved, and the transmission efficiency is improved.In recent years, wireless transmission compression using the wavelet compression [9] method has become a research hot spot.In traffic informatization research, Zhang [10] realized the compression of traffic emergency data based on GML data and realized the deployment architecture of an urban traffic emergency system; Li [11] projected the original high-dimensional data onto the low-dimensional space through a random matrix with a restrictive isometric condition and realized the efficient and fast compression of data.After data transmission, data is decompressed at the end of traffic information processing by a convex optimization algorithm.According to the characteristics of traffic flow data, the method of principal component analysis and independent component analysis was adopted by Zhao [12] to study and compare the data compression.To a certain extent, the application of previous research results provides technical and methodological reference for text data transmission to a vehicle terminal.However, due to the unbalanced distribution of urban mobile communication base stations, the urban communication environment is quite different, the data transmission capacity is quite different in different parts of the city, and some areas even appear as communication blind areas.These methods are rarely used in vehicle terminal data acquisition and transmission and rarely take into account the different communication environments in the city to achieve real-time data transmission.In urban areas where communication base station coverage is uneven and the communication environment varies, how to optimize the compression and transmission of text and other information is a major problem for practical applications.In addition, the transmission of large amounts of text data requires more energy consumption, and low power transmission [12] is important for prolonging the working time of various devices.Therefore, in this paper, the transmission compression method based on a maximum entropy neural network and optimized Huffman encoding algorithm is proposed in order to simplify the algorithm, shorten computation time, and minimize the overhead of the acquisition terminal and the background server.Optimization of the data transmission method after vehicle information collection is of great significance for improving the transmission efficiency of vehicle text information, improving the interpretability and integrity of text information, realizing vehicle monitoring, and grasping real-time traffic conditions.

Vehicle Information Data Compression Method
According to the application requirements of the mobile Internet, the vehicle information collected by the operators includes the location, speed, height, license plates, vehicle state, driving conditions, drivers, passenger positions, and other pieces of text information.Different data formats, sizes, and types through data exchange, through the vehicle terminal wireless communication module for uploading.
In this paper, a compression/decompression mode can ensure the integrity of data transmission, can reduce data traffic, and can reduce communication costs.However, the disadvantage of compression on the terminal is that each set of data needs to be compressed, which delays transmission.The vehicle terminal hardware has a high-performance hardware configuration, technical problems for compressed vehicle terminal, and processing large amounts of data, while ensuring that data compression and transmission of data collection work; two works do not constitute influence or interference.

. . Data Compression Algorithm Based on a Maximum
Entropy Neural Network.Due to the limitation of mobile terminal hardware, in past practical applications less data compression has been carried out on vehicle terminals.Although the hardware technology level and performance index of the vehicle terminal are constantly improving, compression and transmission at the same time as acquisition will inevitably result in a certain amount of terminal overhead.For this reason, in the selection and use of a text data compression algorithm, this paper follows the following principles: under limited bandwidth resources, we must achieve a better compression transmission effect than traditional algorithms, the algorithm cannot be too complex so that we can avoid affecting compression transmission efficiency, and it cannot occupy too much memory (occupying too much memory will affect data acquisition and other occupancies).Other applications with large memory, general transfer protocol, etc.Based on the actual application requirements, this article uses the compression separately, the unified packing, the sending, and so on.
Neural network compression technology has made great progress recently and has achieved very good results in computer vision, speech recognition, and machine translation.At the same time, the popularity of mobile computing platforms also means that many mobile applications also hope to obtain this ability.However, the challenge is that deep learning neural networks are generally large and thus are difficult to integrate into mobile applications (because such applications need to be downloaded to mobile devices and also frequently updated).In vehicle terminal hardware conditions are relatively poor conditions, if the use of cloud based solutions for specific applications and industries, network delay, and privacy will become a problem.The solution is to significantly reduce the size of the deep learning model.A general compression neural network model is composed of three steps: cutting the connection which is not important; enhancing the weight of quantization; and utilizing Huffman encoding.
. .Maximum Entropy Neural Network.As a data compression method, artificial neural networks have become an ideal choice in general lossless compression [13,14].
Algorithms for data processing have also been studied [15,16].One of the distinctive features of the neural network data compression method is to obtain a higher compression ratio and decompression speed, but it is a weakness in a certain period of time the training needs of the network, and it requires two scans of the data, which makes real-time data compression difficult.
In order to get the real probability distribution and consistent prediction results, we need to get knowledge from the sample data and use this knowledge to establish a statistical model which should be consistent with the distribution of the real situation.We then choose a maximum entropy; it may appear absolute advantage [17,18].The maximum entropy neural network is described as follows [17,18].
Assume   = lg   ,   (, ) =   , and because ∑    ( | ) = 1, the formula can be rewritten as () is an algorithm used to adjust   until ( | ) is consistent with the known probability ( | ); this results in a probabilistic model that satisfies both the conditional constraints and the maximum entropy under constraints.The model works in a similar way as the neural network.In fact, the neural network model can be used to solve the problem.
A two-layer neural network is used to predict the character probability distribution model based on context.Every possible context with a single input neuron is expressed as (  ), the output character of each possible context with an output neuron is expressed as (  ), and each input and output neuron has a weight of   to connect them.
In prediction, for ( | ℎ) (indicating that a variable or nonvariable ℎ has been entered, assuming that the current input variable or nonvariable is ), all the corresponding input neurons in the context are set to 1(, ℎ, ℎ), and all other inputs are set to 0; therefore, the output can be represented as   = (∑      ), where () = 1/(1 +  − ).
Then,   represents the probability that the next character is , and its form is consistent with the results obtained by using () and the maximum entropy principle.() adjusts   adaptively so that the output can satisfy all the constraints.The weights in the neural network model are adaptively updated, and the initial ownership value is set to adjust and modify the weights according to the actual input characters after each prediction to reduce the error.The formula is  , =  , + V  .  =   − () represents the error function, which is the difference between the true probability and the predicted value of the next character; and V indicates the learning rate.Therefore, by using such a model of error control and  adjustment with the maximum entropy principle, a probabilistic model satisfying the requirement can be obtained.
. .Optimized Huffman Encoding Method.In order to locate text information such as location, GPS information, and so on, this paper adopts Huffman encoding technology [10,11] to effectively compress GPS data and text data to be transmitted.Considering the working environment of various operating vehicles, the selection of compression algorithms for text data of vehicle operating status follows the following principles: less computation, fast compression, simple algorithm, and easy implementation.At the same time, under the limited bandwidth resources, a better compression effect must be achieved, and the algorithm cannot be too complex so that we can avoid affecting the compression transmission efficiency and can thus meet hardware requirements; requirements for the use of components and communication environment.Text data (including GPS/Beidou positioning information) are mainly collected by on-board equipment of operating vehicles.Huffman coding technology [12] is used to effectively lossless compress the GPS/Beidou position data to be transmitted and the operation status data of other textual operating vehicles.
The Huffman encoding principle is explained in [19,20].The characters that represent the text are represented by a collection  = { 1 ,  2 ,  3 . . .  }, where   stands for different text characters.Suppose the frequency of the character   is   , and the coding length is   .To make the total length of the source text file the shortest, we need to determine the encoding method , which makes the value of ∑  =0     minimum.This Huffman encoding is based on the Huffman tree structure, and the Huffman tree is constructed as below [3,4,21].
(2) Select the minimum weight tree of two root nodes as the left and right subtrees in , and construct a new binary tree.At the same time, the weights of the root nodes of the new binary tree are set as the sum of the weights of the root nodes of its left and right subtrees.
(3) Remove the two trees in , and add the new binary tree to .
(4) Repeat steps (2) and (3) until  contains only one tree; this final tree is the Huffman tree.
Usually, there are a lot of duplicate characters in text data, such as location information.The duplicate characters of location information and other text data in vehicle information can be regarded as redundant information to be removed.On this basis, a Huffman compression encoding table is used to compress the processed data quickly.This data is then stored in a data storage buffer for data postprocessing.The Huffman compression table is pregenerated by the number of characters appearing in the text data by the background server and is prestored in the vehicle terminal Flash.
The Huffman encoding method constructs the coding completely according to the probability of the characters appearing.Huffman coding has no consideration of error protection.The algorithm needs to calculate the probability of the occurrence of the source symbols so as to obtain the probability distribution ratio of the source symbols.It is generally believed that the algorithm is complex in coding and decoding, which is not conducive to hardware implementation of [17,18].Huffman encoding and arithmetic coding involves typical probabilistic models.Many scholars have put forward the dictionary model to optimize the problem.Typical algorithms, such as the LZW algorithm [4,15].The LZW algorithm has high computational efficiency, which is reflected in the speed of compression and decompression, and only needs to scan the compressed text data once.For an input stream with a high repetition rate of source characters, the compression rate of this algorithm is relatively high.However, the algorithm's adaptability is poor, and for some files with low complexity it usually needs to be combined with other algorithms to achieve the desired compression goals.The LZW algorithm cannot be used for vehicle status text data.
Although the Huffman encoding method has some limitations, such as the need to input symbol streams twice before scanning, storage or transmission of Huffman encoding results must occur on the Huffman tree.This method has a high compression rate, simplicity, and practicability, and the text data has unique correspondence when encoding and decoding.Therefore, the Huffman encoding method is very suitable for vehicle information data with higher identification requirements.In order to solve the practical problems of traditional Huffman coding, such as large buffer and high complexity, this paper improves the Huffman tree structure by using a maximum entropy neural network [21,22].The main steps are as follows: (1) Initialize the established binary Huffman tree, arbitrarily select a root node, and set the weight of the root node to 0.
(2) For new characters that do not encode, the two nodes in the newly generated node join the weight of the parent node, and the other node, which has a weight of 0, defines a new weight of 0.
(3) Find the location of the encoding character by searching for the new character that has been encoding, and the nodes with the same coding weights are compared.
(5) Quantify the neural network, and strengthen and adjust the weight of each node.
(6) In accordance with the principle of larger number of nodes with larger weights and larger number of corresponding codes, exchange the nodes conforming to the principle.(7) Repeat steps (2) to (6) until all the characters are encoded.
The improved Huffman coding method sets the weight of its root node to 0, which reduces the number of times that the symbol stream needs to be scanned to one.After optimizing and adjusting the weights of the neural network, only one character with a probability of 1 is scanned in the calculation.Only node numbers are exchanged between the binary trees, which not only reduces the large amount of cache occupancy, but also reduces the complexity of the algorithm.The method's shortcomings are mainly reflected in the loss of coding error protection and relatively high data requirements.

Experiment and Results
According to the requirement of real-time transmission of text data in actual operation, this paper uses a 3G/4G wireless network as the data transmission channel to complete a data transmission test.In the 3G/4G wireless network, a carrier frequency is used for data communication and can only be used by a user alone.In the vehicle text data acquisition environment, due to influence by the communication environment and the number of users, the actual wireless data transmission rate and theoretical wireless transmission rate often differ greatly.In addition, the success of the transmission priority strategy for text data transmission is further verified by a terminal-to-terminal test in the real environment.
. .Data Transmission Test.An urban-rural fringe area is selected as the location for the data transmission test.This communication environment complicates with the requirements of the test environment.We conduct two tests: a network delay test and a TCP transmission rate test.The test environment uses a Unicom 3G/4G wireless network, and we realize data uploading to the server through the vehicle terminal.In 3G/4G networks, the delay is usually measured by round trip time.Low latency in practical application is extremely important.A lower latency for data transmission improves the uplink capacity and data throughput and increases the coverage of high bit rate transmission.Typically, data transfer tests use the PING method, which obtains the response time of the connected server by sending the uploaded data packet and receiving the upload packet response time.PING request packet in the send data upload will first send sequence number order and a response message also marked the corresponding sequence and then through the observation of the PING response message packet to detect the link, such as packet loss, packet duplication, and wrong sequence data transmission delay estimation.
We upload the village environment over several tests (beginning at 10 a.m.).For each test we send an ICMP packet for a total of 240.The 3G/4G wireless bandwidth rate is 120-720 kbps (the urban 3G/4G wireless bandwidth rate is about 960-2400 kbps).The TCP transmission rate is similar to the 3G/4G bandwidth rate (Table 1).
In the data upload test, the maximum delay time of uploading data packets is 15205.6ms, and the average delay time is 3638.6 ms.The data upload basically meets the requirements of real-time data upload.When the network bandwidth is 120-720 kbps, the average packet loss rate of 3G/4G network is 4.334%.In addition, from the data upload test results, it seems that the communication environment in the village area is relatively poor.

. . Data Compression and Decompression Test.
Based on the established vehicle terminal, the 3G/4G network, and the background server system, the collected on-board information is compressed, transmitted, and tested.We compare the results of compressed data transmission and compressed data transmission.The main evaluation parameters of the data compression effect are compression rate and transmission time.Compression rate is one of the most important indicators to measure the quality of a data compression algorithm.It is an intuitive measure of the degree of data compression.Running time indicates the complexity of the algorithm.The combination of these two indicators can allow us to further improve communication efficiency.The transmission error rate is the ratio of the parsed data (including decompression data) to the length of the data before compression when the server receives the text data; the transmission loss rate is the ratio of the number of data packets that the server does not receive to the number of data packets sent by the vehicle terminal.Table 2 presents the comparison of the test results of compressed transmission using four kinds of algorithms: original text data, Huffman encoding, LZW, and maximum entropy neural network combined Huffman coding.In accordance with requirements of data acquisition and transmission technology, we use 15 s, 10 s, and 5 s data acquisition cycles; acquisition time of 300 minutes, the acquisition of the customized public transit vehicle operating status data as test data source, and the original data and the data transmission through the centralized algorithm results are compared.
We get the best test results when the maximum entropy neural network is combined with Huffman encoding technology and the data acquisition cycle is 15 s.The experimental results further verify that the transmission mode with less data and shorter transmission time can achieve a lower transmission error rate and packet loss rate.The maximum entropy neural network combined with Huffman encoding technology has less computation and higher compression efficiency.It is suitable for data transmission in urban villages or suburban areas.For the LZW algorithm, while the algorithm in the global or local correlation data has better compression efficiency, but between the vehicle operating state data field correlation is relatively low, resulting in limited application.The compression efficiency of the LZW algorithm is worse than that of the maximum entropy neural network combined with the Huffman encoding algorithm.In order to improve the real-time application efficiency of vehicle status data, it is necessary to avoid the need for real-time transmission of state data and occupy a large amount of memory and data transmission channels.The test of data compression and transmission by screening and identification is shown in Table 3.
The experimental results show that the transmission error rate and the packet loss rate are low when the original data is not compressed, and their impact on the actual operation is within the acceptable range.Using Huffman encoding, LZW, maximum entropy neural network combined with Huffman encoding algorithm, and maximum entropy neural network combined with Huffman encoding can achieve a relatively high compression rate, relatively low transmission error rate and packet loss rate, and better overall results than the other methods.
The data were screened after compression and transmission; data which need to be sent will be greatly reduced, and the transmission mode greatly compressed real-time data transmission to ensure data transmission and realtime applications.Moreover, data transmission error rate and packet drop rate decreased.In addition, from the text data acquisition cycle, it can be seen that the smaller the text data sampling cycle, the closer the adjacent data, and the higher the compression rate obtained by various algorithms, especially when the data is highly correlated.The system's communication efficiency can be greatly improved and communication costs can be greatly reduced through the correlation operation.
In order to further verify the efficiency of the text data compression algorithm, this paper compares the restored compressed text data with the original text data.The exchange of data with the maximum entropy neural network and Huffman encoding technology and data compression method to compress the data.The original data file and the compressed data file are sent to the server through the 3G/4G wireless network.Before and after compression, the data is transmitted to read the data files received by the server and then decompress and reverse quantization.It should be noted that the speed test data used is the driving data of a passenger bus running in a continuous state for four days.The passenger bus data is acquired over an average period of 10 s, and the speed of data is generated during the period for a total of 12361 (the theory of value should be 12600; in the process of data collection due to various reasons, there are 239 wrong or lost, no samples).The speed change curve before data compression is shown in Figure 1.After data decompression, the speed change curve is as shown in Figure 2. The abscissa in Figures 1 and 2 shows the speed data acquisition time order, and the ordinate indicates the velocity values.Figure 3 presents a comparison of the error values before and after the data compression, an abscissa representation, the data acquisition time sequence, and the ordinate indicating the difference between the decompressed velocity value and the precompression velocity value.
We can see from Figures 1-3 that, after compression, transmission, parsing, decompression, storage, the data is basically consistent with the compression before transmission.Coding, compression, and other data processing techniques meet the requirements of practical applications.In order to accurately describe the data difference before and after compression and the transmission of text data, this paper uses the mean square deviation to distinguish data before and after compression and transmission.The mean square deviation before and after data compression and transmission is only 0.2314, which is in a reasonable range.The main reasons for data deviation are that data is lost and that data cannot be parsed during data transmission.
In the 12600 speed data sent by the vehicle terminal, 12361 data can be accurately parsed.Accurate transmission and parsing of the data ratio of 98.10%; this basically meets the actual application requirements.There are 239 errors, lost and no data collected.The sample contains 59 erroneous data, accounting for 0.47% of the total data, loss of data 134, packet loss rate, accounting for 1.06% of the total data.Moreover, due to various reasons, there is no data acquisition of 46, which accounts for 0.37% of the total data.The experimental results show that the algorithm has high compression efficiency and transmission integrity and can meet the practical application requirements.

Conclusions
The text information collection types (including position information) using the maximum entropy probability neural network prediction model combined with the optimization of Huffman encoding technology realize data transmission and finally realize the analysis of receiving, decompression, vehicle information, and storage.The algorithm is efficient according to the test results.The accurate transmission of data and analysis of the proportion of 98.10% basically meet the requirements of practical application.At the same time, the algorithm plays an important role in improving the efficiency of text data wireless transmission, reducing the cost and time of data transmission, and is of great practical significance for the base station deployment, the mode of communication and use, etc.The data compression method mentioned in this paper is reliable in transmission and results in very little distortion.It can provide reliable data transmission for urban environments with different base station coverage.Furthermore, it can improve the transmission efficiency of text information from vehicles and ensure the integrity of this text information.Moreover, realizing energy saving and emission reduction in transportation industry development is of great significance.

Figure 3 :
Figure 3: Curve of velocity error values.

Table 1 :
Data upload test results.

Table 2 :
Data compression test results.

Table 3 :
Screening data compression test results.