Newspaper article‑based agent control in smart city simulations

,

smart control signals in real time instead of controlling the systems directly can significantly reduce these costs.
Concurrently, it is difficult to evaluate the performance of the models and technologies designed for smart cities [6]. For example, traffic control is one of the important topics covered in smart city research. Evaluating the assorted models and technologies is difficult when different cities implement different intelligent traffic controls [7]. Thus, simulation environments are constructed to validate the models and technologies [8]. In simulation environments, human behaviors and movements need to be considered. For example, very high costs will be incurred in a testing system that is utilized for detecting and tracking pedestrian movements. The simulation techniques aid the design of such a system while drastically reducing the costs; however, there is a disadvantage in that control signals must be collected from various agents to build a simulation environment. Although the simulation environment reduces the cost of collecting data in the experiment, in order to establish a simulation environment, some basic data is needed to formulate rules for the agent.
This paper proposes a deep generative model that generates agent control signals automatically by utilizing sentences extracted from news articles. The agent control signal includes the time, action, and place for a simulated agent in a smart city. Inspired by the great success of generative adversarial network (GAN) models in the areas of computer vision and natural language processing, GANs are utilized for text generation [9]. Because recurrent neural networks are more suitable than other deep neural networks when using sequential text data [10,11], the GANs can be implemented using a recurrent neural network long short-term memory (LSTM) based model to generate the agent control signals based on text.
The proposed method consists of a newspaper article preprocessing phase, an LSTM-based GAN (LSTM-GAN) training phase, and an agent control signal generation phase. In the newspaper article preprocessing phase, the movement related words of an agent are extracted from the text dataset of selected newspaper articles. Specifically, movement related words are utilized to train a Word2Vec model, which generates the vector representations that encode patterns in the context of the extracted movement related words [12,13]. In the LSTM-GAN training phase, the new architecture of the LSTM-GAN is designed, which stacks Word2Vec, GAN, and LSTM together for better generation of sentences from movement related words. In the agent control signal generation phase, the agent control signals are generated based on the trained LSTM-GAN and agent control signal maker. The LSTM-GAN consists of a deep LSTM-based generator network and a deep LSTM-based discriminator network. In particular, the discriminator of the LSTM-GAN utilizes the Bidirectional LSTM (Bi-LSTM) neural network to accommodate the control signals in both directions. The contributions of this paper are as follows: • Agent control signal generation based on natural language. The movement related words, such as action (i.e., verb), place, and time expressions, are extracted from the newspaper articles in the COCA utilizing the part of speech tagging technology in natural language processing [14].
• Expandability of the Word2Vec model. Word2Vec is an embedding model used to generate word vectors that map each word to a continuous vector and represent the relationship between words. The Word2Vec model is trained to capture the syntactic structures of words and transform words into continuous vectors that are taken as input together with noise vectors to train the discriminator network in the LSTM-GAN. By using Word2Vec for preprocessing to generate agent control signals in the smart city simulation, we demonstrate the extensibility of utilizing Word2Vec in smart city simulations. • Control signal generation based on the LSTM-GAN. A large-scale text dataset with 100 test sentences is built to ensure that the proposed framework can generate agent control signals for smart city simulations based on sentence generation.
The remainder of this paper is organized as follows: "Related work" section reviews the literature on the models pertaining to the present research; "Agent control signal generation from articles" section introduces the proposed method; "Experiment" section demonstrates the results of the experiments; "Discussion" section discusses the results of the experiments; and finally, "Conclusion" section concludes the paper.

Smart city simulation
Several previous studies on smart cities have conducted interdisciplinary research on the development of virtual cities with transportation, energy [15,16], etc. The subject of smart cities still faces some challenges in its implementation, but more research projects on implementing smart cities have been conducted. Meanwhile, a photorealistic three-dimensional city simulation method has been recently proposed for training autonomous vehicles in a smart city simulation. Deep learning techniques such as convolutional neural networks (CNNs) have also been diversely applied to simulate the realworld behaviors of various agents and traffic systems [17].
In future smart cities, new information and communication technologies will manage urban resources better. The smart grid infrastructure is evolving into a complex system. Such a system can monitor and control the generation and consumption of energy in the power grid to improve energy efficiency. In [18], it implements and performs location extraction, location recognition, and location prediction on a positionbased services (PBS) intermediate server by utilizing machine learning techniques; however, the user's position from the electronic device in the city is obtained in real time and the movement paths are predicted. Dynamic simulations of smart cities to test new resource optimization methods are introduced in [19]. A simulator based on software agents is built to create the dynamic behaviors of smart cities. It also simulates discrete heterogeneous devices that generate and consume energy. Thus, smart cities utilize multiple technologies to improve the implementation of transportation, energy, and traffic system services, which leads to higher levels of comfort for their citizens. In simulating smart city services, a technology that has significant potential is big data analysis. The most important part of the simulation system for training autonomous agents is generating various real-world situations based on the data acquired from the real-world; however, there are still many challenges in collecting the data of real-world events that occur to the real-world. Therefore, we propose a method for generating text data in this study, which is utilized to simulate the real behaviors of people.

Word embedding method
Natural language processing commonly includes two steps: data preprocessing and feature extraction. The data preprocessing comprises tokenization, cleaning, and lemmatizing phases [20]. The tokenization phase breaks sentences into words. The cleaning phase removes stop or function/non-content words to reduce the occupied database space and processing time. The lemmatizing phase groups the different inflected forms of a word together; thus, words are analyzed as a single item. Therefore, it collapses words with similar meanings into one representative word.
A natural language corpus consists of a large number of words, such that it requires significant computing resources to process. One-hot encoding is an encoding method that represents sentences or words as vectors with one and zero [21]; however, it leads to the "curse of dimensionality" by creating a new dimension for each word. Term Frequency-Inverse Document Frequency (TF-IDF) is another encoding method that represents how important a specific word is in a given document [22]. TF measures the total number of times that a given word appears in the document against the total number of words in the document, whereas IDF measures the proportion of a given word in the entire document. Similarly, word embedding representation is a language modeling technique utilized to map words to vectors of real numbers and learn distributed representations of words. It represents words in dense vector spaces with several dimensions. The important aspect of word embedding is that words occurring in similar contexts tend to be closer to each other in vector space; however, neither one-hot encoding nor TF-IDF elaborates on the semantics of the words. To avoid this drawback, the Word2Vec model consists of two architectures with Continuous Bag of Words (CBOW) and skip-gram for generating word embedding vectors. The CBOW model predicts the current word, given surrounding contextual words within a specific window, whereas the skip-gram model predicts the surrounding contextual words within a specific window given a current word [23]. In this study, the skip-gram architecture of the Word2Vec model is employed to predict continuous embedding vectors based on words extracted from sentences in the dataset. In fact, the Word-2Vec model embeds the control signals extracted from newspaper articles.

Recurrent neural network (RNN)
The most popular approach to sentence generation is utilizing the RNN model. Currently, the most successful RNN model is the LSTM, where the gates in each neuron help the model predict the next word in a sentence based on the surrounding contextual words [24]. Many more models based on the LSTM network have been recently proposed, e.g., bidirectional LSTM [25]. Various studies have employed the LSTM for sentence generation, either directly [26] or as an embedded model [27]. In this paper, the LSTM is applied as a word predictor based on the input which is a sequence of words.

Generative adversarial network (GAN)
Recently, proposed by Goodfellow et al. in 2014, GAN was applied to various applications such as computer vision and natural language processing. GAN was proposed as a way of efficiently training deep generative neural networks. The fully connected GAN consists of two fully connected neural networks, a generator and a discriminator, where the generator attempts to produce realistic samples that fool the discriminator, while the discriminator attempts to distinguish real samples from generated ones [28]. Laplacian Pyramid GAN (LAPGAN) is a generative adversarial model utilizing the CNN within a LAP framework to generate images in a coarse-tofine fashion [29]. Similarly, the GAN is a generative adversarial model utilizing input in the original GAN, which is applied to the class of an image, the attributes of an object, or embedded text descriptions of the image [30]. There are some obstacles in applying GAN to NLP. For example, one such obstacle is the discrete space of words that cannot be differentiated in mathematics. To this end, the sequence GAN (seqGAN) executes text generation by utilizing the softmax function over continuous values for word selections [27]. Inspired by seqGAN, the continuous RNN GAN (C-RNN-GAN) was proposed by Mogren; it works on continuous sequential data without the obstacles and is configured based on the Bi-LSTM to generate music melodies [31]. In addition, an enhanced GAN model that adds one more CNN discriminator based on the C-RNN-GAN to generate music melodies [32] has been proposed. In this study, a GAN model composed of an LSTM-based generator and a Bi-LSTM-based discriminator is trained upon the sequential information of text. The next section will detail how the proposed method automatically generates the agent control signal via the GAN model.

Agent control signal generation from articles
The proposed LSTM-GAN model to generate agent control signals from newspaper articles is depicted in Fig. 1. It consists of three major phases: newspaper article preprocessing, LSTM-GAN training, and agent control signal generation.
First, in the newspaper article preprocessing phase, the article extractor extracts newspaper articles from the database of the text corpus. The preprocessor, together with the additional place and action lists, transforms the extracted newspaper articles into the control signal embedding vectors via the pre-trained Word2Vec model. Second, in the LSTM-GAN training phase, the generator trainer utilizes the noise vectors as inputs to generate the fake control signals. The noise vector is a random distribution that is usually taken as input in the generator network. To map the noise vector to the new control signal vector, the generator network needs to ensure that the dimensions of the noise vector are the same as the dimensions of the embedded control signal. The discriminator is trained to distinguish the generated control signals from the embedded control signals. Finally, in the agent control signal generation phase, a sequence of the generated control signals concatenated with noise vectors is taken as input for the generator executor to generate control signals, which are converted into agent control signals via the agent control signal generator.

Newspaper article preprocessing phase
The control signals are defined as a sequence of words denoting time, action, and place. The agent control signals are a set of control signals arranged according to the times that they are associated with. In Fig. 2, the database comprises a large number of newspaper articles, with the index number marked at the beginning of each article. The paragraphs in the newspaper articles are divided by the symbol "<p>". The article extractor  first extracts the newspaper articles by recognizing the article index number from the database. The word extractor then extracts movement related words of time, action (i.e., verb), and place expressions from the sentences in a paragraph that are split by the symbol "<eos>". The movement related words are extracted from a single sentence to define the agent's action at a given time and place by utilizing Part-Of-Speech (POS) tagging and named-entity recognition (NER) in NLP. The movement related words extracted from sentences will be filtered based on a predefined action (verb) list and place list. Meanwhile, there are two ways of telling the time. The 12-h clock runs from 1 a.m. to 12 o'clock noon and then from 1 pm to 12 o'clock midnight. The 24-h clock uses the numbers 00:00 to 23:59 (midnight is 00:00). The time normalizer normalizes the filtered time words to the 24-h clock and outputs control signals. For example, to avoid any confusion when referring to '(in) the morning' from extracted words, the time word is normalized to the 24-h clock instead. The generated control signals are transformed into continuous control signal embedding vectors via the Word2Vec model. Figure 3 shows the deep training process of the LSTM-GAN network for generating new control signals based on embedded control signals from the newspaper articles. The LSTM-GAN network consists of a generator and a discriminator that are trained simultaneously with adversarial objectives. The generator has two LSTM layers, and the discriminator utilizes the Bi-LSTM model. The noise vectors are fed into the generator as the input, which represents a Gaussian distribution, and the control signals are generated by the generator. The generator takes the uniform noise vectors as inputs and generates control signals. The discriminator takes the generated control signals and the embedding control signal vectors as inputs. Furthermore, it distinguishes the control signals generated by the generator from the real ones in an adversarial manner.  (1) and (2) are employed as the loss functions, which are implemented jointly to train the discriminator and the generator. Equation (1) represents the loss function of the discriminator, and Eq. (2) represents the loss function of the generator, where x and z represent the real control signal embedding vector and the Gaussian normal noise vectors, respectively, G(z) denotes the fake control signal generated by the generator based on the noise vector, and finally, D(x) and D(G(z)) refer to the discriminative results of the real control signal and the fake control signal, respectively. Figure 4 shows the process of generating agent control signals. Various control signals are generated from the trained LSTM-GAN. Similarly, agent control signals are generated by the agent control signal maker based on the sorted control signals. Using noise vectors, the trained generator outputs control signals. The time feature for a particular control signal is defined as the start time and end time for that particular control signal. The probability calculator uses the start time together with the generated control signal as input, then calculates the probability of the action occurring at the corresponding time according to the start time. To randomly select the control signal based on the probability, the probability and the control signal are fed into the control signal selector. The time checker takes the end time and the selected control signal as inputs and subsequently returns the control signal until the time of the control signal does not exceed the given time. Finally, the sorted control signal is generated. In the next section, a series of experiments are presented to prove the performance of the proposed system.

Experiment
In this section, the experimental setup and results are introduced to demonstrate the feasibility of the proposed LSTM-GAN model described in "Agent control signal generation from articles" section.

Data set and experimental environment
The COCA [12] is the largest American English corpus covering a variety of genres; it contains more than 500 million words. As shown in Table 1, experimental data were extracted from newspaper articles dated from 1990 to 2012 in this study. Table 2 presents some of the hyperparameters that were applied during the implementation of the LSTM-GAN model. Input refers to the dimension of the input vector; Output refers to the dimension of the output vector; Layer number is the number of network layers; Layer size is the dimension of the network layer; Learning rate is a tuning parameter in the optimization algorithm that determines the step size at each iteration while approaching the minimum of a loss function; Sequence length denotes the length of the control signal; Batch size denotes the number of training words fed into the input layer in each iteration; and Epoch denotes the number of times that training is performed.
The experimental environment comprised Windows 10, i5-6400, NVIDA GeForce GTX 1050 2 GB, and DDR4 8 GB. The proposed LSTM-GAN network was implemented in Python using TensorFlow and the Natural Language Toolkit (NLTK) and Spacy libraries.

Articles of news value parameter (GAN)
##3000001 <p> He is trying to make the best of it, but the days have seemed to pass like months for Lou Piniella … …. This is a stable situation, and if I do a job I 'll be around for a long time. " ##3000002 <p> Another North American indoor track and field Grand Prix season closes tonight with the USA … … Diane Dixon will be attempting to win her ninth title in the women 's 400 ##3000003 <p> A century ago, this northern corner of the Florida Everglades was snake-infested swampland, … …Otherwise, the bulldozers @ @ @ @ @ @ @ @ @ @ the earth from which it rose …… …… ##4115379 <p> As Spain keeps the world economy on edge while it scrambles to recapitalize its once-mighty … … We deserve what we are getting. "<p>  Table 3 Processing of sentences Paragraph <p> The one-mile turf course is an almost fluorescent green, lush from sunshine and a season of rest. The surrounding mile-and-an-eighth dirt track has long held its own magic, the kind that horses and horsemen love. Even when it is very quick, it is still forgiving. Nowhere else can a horse be galloping so close to you in the morning without your even hearing it. <p>

1
The one-mile turf course is an almost fluorescent green, lush from sunshine and a season of rest <eos> 2 The surrounding mile-and-an-eighth dirt track has long held its own magic, the kind that horses and horsemen love <eos>  Table 3 shows that the sentences, which were in one of the paragraphs in the newspaper articles, were split and indexed by the sent tokenize function in the NLTK library. The words contained in time, action, and place expressions were extracted from each sentence using the Spacy library.

Results of newspaper article preprocessing
The dataset used in this study did not have a sufficient number of training words denoting time, actions, and places. In general, if time and place expressions were included in a sentence in a paragraph, the same expressions were to be omitted if found in subsequent sentence(s). Thus, when time and place expressions were extracted, time and place related words in the preceding sentence(s) were to be adopted in the subsequent sentence(s). Recall that the control signal included three elements: time, action, and place expressions. As shown in Table 3, to ensure the applicability of the action and place related words in the simulation experiment, a predefined list of place expressions and high frequency action words was used.
The sentences without such expressions were eliminated. Furthermore, unclear time expressions in the extracted sentences were normalized. For example, although a time expression such as the word "morning" is a range representing word, it was set as a discrete value from 8:00 to 11:00. Similarly, each discrete time was assigned the same action and place. Through the newspaper article preprocessing method elucidated above, the control signals were processed into the training data, and the predefined list of high frequency action and place expressions shown in Table 4 was utilized. Each list contained 82 words. Only the words that appeared in a predefined list were considered as part of the training data.
The total number of final control signals extracted from the newspaper articles was 325,038. Table 5 presents some samples of the preprocessed control signals. Here, "control signals" refers to complete signals that include triplets comprising time, action, and location expressions. Animation is the simulation of movements by successively displaying a series of images. It is used in the smart city simulation to create the effect of traffic congestion, for example, to control the traffic arising from cars. Therefore, it is a behavior performed by an agent in a smart city simulation. The control signals served as agent control signals when they were matched with animations in simulation environments.
The 325,038 preprocessed control signals were input into the Word2Vec model to transform them into word vectors. Thus, 24 h, 82 actions, and 82 locations were embedded, as shown in Table 6. Training results of LSTM-GAN network Figure 5a shows the accuracy of the discriminator during the training process. The blue line in the figure represents the Real accuracy, which indicates the accuracy of the discriminator in determining real samples as being real. The orange line denotes the Fake accuracy, which represents the accuracy of the discriminator in determining the generated fake samples as being fake. The gray line denotes the accuracy of the discriminator calculated using both the Real accuracy and the Fake accuracy; it represents the accuracy of the discriminator for all the samples [33]. The Real accuracy rate was 50% at Epoch 1, and it converged to 90% after Epoch 6. The Fake accuracy rate was 50% at Epoch 1, and it converged to 11% after Epoch 6. The Accuracy fluctuated substantially at Epoch 5 and converged to 50% after Epoch 6. At the beginning of the training, because the discriminator had not been trained, the accuracy rates for both real and fake samples were 50%. The increase of the Real accuracy indicates that the discriminator gradually improved its performance through training, and the decrease of the Fake accuracy indicates that the fake samples generated by the generator were similar to the real samples owing to training. The Accuracy converged to 50% for all the samples, which means that the discriminator and generator were well trained [34]. Figure 5b shows the training loss of the  In addition, when the loss of the generator and that of the discriminator converged, the loss value of the discriminator was lower than that of the generator. This was because the discriminator could give feedback using the actual data while the generator learned through the feedback produced by the discriminator. Table 7 shows the control signals generated by the trained LSTM-GAN. The generated control signal comprises time, action, and place expressions. The probabilities of time-dependent actions were computed as shown in Table 8 to generate the agent control signals. Generally, the probabilities of time-dependent actions were determined based on the frequency of each action in the control signals generated from the trained LSTM-GAN. However, in this study, to generate more  diverse agent control signals, actions were not extracted based on the highest probability at a given time; instead, they were randomly generated based on whether the probability was high or low. Thus, the actions with low probabilities were occasionally extracted.

Results of agent control signal generation
In the agent control signal generation phase, agent control signals were generated from 21:00 to 20:00 on the next day, as shown in Table 9. The start time was defined as 21:00, and control signals comprising time, action, and place expressions were generated every hour until the end time of 20:00 the following day. Therefore, after the start time, a 24-h agent control signal will be generated for one day. The results obtained in the experiments will be discussed and analyzed in the next section.

Discussion
In this section, we enumerate some advantages of combining the LSTM and GAN models to generate control signals automatically. Concurrently, the performance of the LSTM-GAN is discussed by comparing the text-based GAN-generated control signals with the ground truth control signals available in datasets.
It is evident that GANs have achieved substantial success in computer vision with regard to generating hyper-realistic images. Building on this success, image GANs have recently been extended to tasks such as data augmentation. In this paper, to generate text-based control signals for the simulation, we adopted an LSTM-based generator as the generator network in a GAN; however, GANs have posed the following problems when applied to text generation. First, when an LSTM-based generator is employed as the generator network in a GAN, the latent noise vector is the input hidden state of the LSTM, and the output of the generator is the output sentence yielded by the LSTM. In this paper, instead of training the LSTM to minimize cross-entropy loss with respect to target one-hot vectors, we trained it to increase the probability of the discriminator network classifying the control signals as "real. " While decoding with an LSTM at every time step, we chose the next word by picking the word with the maximum probability from the output of a softmax function. This "picking" operation is non-differentiable. We consider this to be crucial because to train the generator to minimize the term 1-D(G(z)) in the loss function, we need to feed the output of the generator into the discriminator and backpropagate the corresponding loss of the discriminator. For these gradients to reach the generator, they have to go through the non-differentiable "picking" operation at the output of the generator. This is problematic, as backpropagation relies on the differentiability of all the layers in the network. In contrast, this is perfectly feasible when the generated data is continuous, such as image data. In recent times, various methods have been proposed to circumvent this problem. seqGANs are text generation GANs that employ reinforcement learning; however, reinforcement learning-based methods are known to generally yield very poor sentence quality owing to high variance gradient estimates. RelGAN stands for relational generative adversarial networks for text generation [35]; it is based on using Gumbel-softmax for a continuous approximation of the softmax function to effectively model long-term dependencies in the text. AGNL stands for adversarial generation of natural language [36], which is based on the continuous output of the generator. In contrast to the aforementioned methods, the LSTM-GAN model proposed in this study eliminated discrete spaces altogether by employing the continuous output of the generator. Specifically, it was designed to generate text-based control signals; recall that this model integrates word embedding language models such as LSTM and GANs without adopting reinforcement learning, as shown in Fig. 1. Recall that in the first phase, namely, the newspaper article preprocessing phase, newspaper articles were extracted from the database of the text corpus, which were then transformed into control signal embedding continuous vectors via the pre-trained Word2Vec model. Thereafter, in the second phase, that is, the LSTM-GAN network training phase, the noise vectors were used as inputs to generate fake control signals. The discriminator of the proposed LSTM-GAN was then trained to distinguish the generated control signals from the embedded control signals.
Furthermore, text-based agent control signals need to be applied to the subsequent simulation experiment for them to be matched with the animations in the simulation experiment. To solve this problem, the agent control signals were derived from sentences extracted from actual newspaper articles that reflect actual people performing specific actions at specific places. That is, the control signals generated by the LSTM-GAN have a format that comprises time, action, and location expressions in a sequential order. They have a well-arranged structure when compared with the control signals extracted from the newspaper articles. For example, "21, work, university" represents working at the university at nine in the evening. Therefore, the agent control signals generated based on the proposed method were utilized in the simulation experiment, which increased the training efficiency substantially.