Synthesizing Patterns for Videos Based on the Adversarial ConvNet

In this paper, we proposed the application of the adversarial ConvNet mode with back-propagation algorithm and weighted method for synthesizing dynamic patterns. A multi-layer spatiotemporal filter is designed in the adversarial ConvNet to capture the spatiotemporal information that may be lost in the convolution phase. The model defines probability distributions. The problem of pattern design is to obtain the best synthesis by training these probability models and maximizing the objective likelihood function. In the training phase, multiple layers parameters of filters are updated through the matching between the synthesized videos generated based on the adversarial trained ConvNets and the observed training videos. The maximum likelihood learning algorithm is used to realize the synthesis of real natural image patterns. The effectiveness of this method is evaluated on public DynTex datasets and compared with other technology through the experimental results.


Introduction
In social media analysis and processing, synthesizing patterns for videos is an important part. Given some input sample video, synthesize some new video completely different from the given sample video.How to reduce more effectively the error has been an interesting problem for synthesizing and analyzing such dynamic patterns.
Based on convnet [1][2][3][4], various successful dynamic texture modeling and synthesis have been proposed in the field of graphics and vision in recent years. In fact, the research of dynamic pattern synthesis based on convnet is rare.For instance, Synthesizing patterns for videos are generated through the cooperative training of spatiotemporal descriptor network and spatiotemporal generation network [5][6]. The general antagonistic video network with adversarial convolution structure separates the foreground from the background [7]. Ref. [8] used a spatial-temporal discriminative ConvNet for analyzing video data. A nonlinear dynamic texture modeling method based on Bayes is proposed in Ref. [9]. The model can automatically construct a nonlinear kernel function suitable for dynamic modeling, so it can fit various types of dynamics [10]. A vector linear autoregressive model with Gaussian innovations based on single valued decomposition is proposed in Ref. [11]. Ref. [12] extended the spatially generated ConvNet to capture spatiotemporal patterns at different filter scales by increasing the time dimension.
Our work is inspired by the generative ConvNet model proposed in Ref. [13], which is a very effective discriminant learning machine or predictive learning method, and has a broad application prospect. The generative ConvNet model is a random field or energy-based, which appears in the exponential slant form of reference distribution. More commonly used are Gaussian white noise  [14,15]. The main work of this paper is to propose a synthesizing patterns technology using generative ConvNet, aiming to provide a model for synthesizing patterns for videos. The design of the network structure has a confrontational ConvNet network, which is composed of multi-layer filters to obtain spatiotemporal parameters of different scales and synthesize video sequences from the learning model. The conventional part can translate potential informations into signals using the difference between the synthetic video sequence and the initial training image, the model filter parameters are updated. The maximum likelihood learning algorithm is used to realize the synthesis of real natural video modes.
Our model is an energy model based on image space, and we expect to be able to capture complex spatiotemporal patterns more flexibly by using multi-layer networks. Compared with the recursive temporal model, our model does not need a start frame, and is easier to capture time patterns. In our method, the synthesis of dynamic texture patterns has obvious advantages. Firstly, when the training sample is small, the model and unsupervised learning are very effective. Secondly, the existing network maximum likelihood learning algorithm is used in the learning process to capture spatiotemporal patterns through multi-layer spatiotemporal filters of different scales. Finally, the experimental results show that the training adversarial ConvNet method can synthesize real patterns for videos.
The rest is organized as follows. In the next section, we first review the principles of the framework model. A training algorithm is given to obtain the adversarial ConvNet weight parameters. The third section introduces the experiment, evaluates the effectiveness of this method compared with other method, and summarizes the content of this paper in the fourth section.

The Adversarial ConvNet
of pixels, and t indexes the frames in the video sequence.
The adversarial convnet filter can capture spatiotemporal patterns on multiple scales, which are composed of multilayer filetering, the recursive formula is: The design of adversarial learning ConvNet structure defines the probability distribution of video sequences, which is a Markov random field model, as expressed by the following formula: is the normalization constant, ) (I q is the Gaussian white noise model. ) ; (  I f is the scoring function, which is: The filter response is aggregated over all filters, locations, and times at the L level. Space and time pools can identify non-Gaussian spatiotemporal features or patterns.

Learning by Maximum Likelihood
The stochastic gradient algorithm is used to train the model using the training video sequences. Firstly, the video sequences are synthesized from the learned model. Then the difference between the synthetic and the training image is used to update the filter parameters.
The logarithmic probability ) ( L is defined by an adversarial ConvNet to capture spatiotemporal patterns of different scales.
The ConvNets parameters  can be trained by maximum likelihood. The logarithmic probability gradient is defined by: The adversarial ConvNet can be sampled through Langevin dynamics. The expected value is difficult to deal with in analysis, so it must be calculated approximately. The iterative steps are as follows:

Experiments
In our experiments, Matlab code of MatConvNet of [16] and MexConv3D of [17] are used. We run on a Intel (R) Core(TM)i7-8850H CPU@2.60GHz with 32 GB RAM. For comparison, we use public DynTex dataset and run 800 iterations to train the model. Experiments on texture images show that the method can learn and generate real natural patterns. The ConvNet parameters set up just liking [7]. Experiment 1: generating texture/object patterns. Experiments show that the method can learn and generate real natural image patterns. Figure 1 displays the training images, and figure 2 shows the synthesized sequence corresponding to the training images. The results show that our method can learn from images effectively and can handle all kinds of textures from random to structured. Experiment 2: Generating dynamic textures. Figure 3 shows the results. For each category, six frames of the training sequence are showed in the first line, and the corresponding frames generated based on our technology are showed in the second and third lines.   Figure 4 shows the results of the comparison. For each category, first row is the initial sequence, the other two rows are obtained sequences by our algorithm and Ref. [10] respectively. Compared with our method, Ref.
It can be seen from the experiments that our model has the ability to change its structure and parameters, and make its imagination fix the statistical characteristics involved in the model.

Conclusions
The application of adversarial ConvNet pattern in dynamic pattern synthesis is studied in this paper. We define probability distributions in the adversarial ConvNet to grasp spatial-temporal information and obtain the optimal results by maximizing the target likelihood function. Experiments showed that this method can generate patterns with different styles and colors more effectively.