Deep Convolutional Nets Learning Classification for Artistic Style Transfer

and the style reference image which contains patterns or images of famous paintings and the input image which needs to be style and blend them together to produce a new image where the input image is transformed to look like the content image but “sketched” to look like the style image.


Introduction
A decade ago, when machine learning was an emerging application of artificial intelligence providing the ability to automate learning process from foregoing experiences without being explicitly programmed, the only limitation was assumed that a good computer program can never replace a human in creativity [1]. But as the exploration in the field grew, this gave rise to many other subfields like deep learning, which threw the limelight on the solution for replacing humans for their creativity or their process of recognizing objects or people [2]. One of such problems which characterized human from a machine was art. Generating art has no rules which could be used to replicate a man's imagination. is paper renders an input image in the range of well-known art works [3]. is is conceptually close to texture transfer for style transfer. Today, there are few existing systems which replicate art from famous painters. One of these methods is style transfer using a neural network [4].
A neural network can be defined as a circuit of neurons which stimulates the behavior of a human brain. Mathematically, a neural network is a chain of functions which maps the inputs with their respective outputs based on their weights, which defines the amount of influence one function has on the other (where each function represents a neuron) [5]. A neural network engages a larger amount of functions operating similar layers. One layer generates the input and every adjacent layer gathers the output from the previous layers. e final layer generates the output of the system, which is also known as visible layer. As the layers other than the input and output layers are not accessible, they are known as hidden layers [6]. e transformation of the style from one image to another image could be called the texture transfer issue. e aim of doing this is to manufacture an element from an input image and also gather the semantic component to the output image [7]. Several texture transfer methods used the texture synthesis functions of using dissimilar way to conserve the targeted output image. e features of the output image like intensity is computed for effective texture transfer. e frequency related texture data for preserving the target image combined the edge orientation data. A style transfer is utilized to extort the contents from the image for implementing the texture transfer. Hence, a normal procedure is used to analyze the image independent representation to observe the semantic contents and style related structures were demonstrated to control the subset of natural image characterization [8].
e computer vision is used to extract the semantic contents from the images. e labelled information for the concerned tasks like object recognition is to extract the quality image element in general feature extraction from the datasets and other visual elements [9]. e texture transformation algorithm comprises a texture formation technique with feature demonstration based convolutional neural networks (CNN). e style transfer technique minimizes and obtains solutions to the optimization issues in a neural network.
e new images are constructed by implementing a feature extraction for the dataset images. e texture synthesis approach is used to increase the deep image representation [10]. e main contribution of our proposed work is as follows: (i) Deep learning approach is applied to real-time images after implementing the object detection (ii) e proposed methodology executes the style transfer efficiently in which the captured depth image is from the dataset (iii) Validation is performed using the proposed training methodology which may produce the style transfer with reduced complexity (iv) e visual object recognition is applied to the feature space with the usage of max-pooling layers

Related Work
e style images and the recombination of the images are separated by creating the artistic images with high quality using neural representations from the neural algorithm; this utilizes the artificial intelligence technique [11] which presents a way to understand the artistic images algorithmically. Adversarial networks are created by generating the art based learning styles and conflicting from style standards; the system generates imagination based art using art distributions. It focuses on realistic texture rather than pixel accuracy [12]. e essential factors in neural style transfer demonstrate control over position and color data across spatial scale which improves by permitting high quality control over style and also assists in reducing normal faulty situation such as imaginary options.
A fast generator combined with feedforward network for initial style transfer has been proposed to get the styled output as an image with the produced unnoticed element with style image as the input throughput the inference period. e generator is constructed to perform the encoder and decoder by transferring the deep features. e spatial components are utilized to measure the level of styles that have been incorporated. e perceptual loss for every element has been reduced to identify the properties of the image styles from the artistic images. e multidomain related images are created using the mask element to implement the stabilization and evade the collapse in the realtime scenario.
e selective stylized images are basically increasing the effectiveness of optimization related style transfer [13]. e neural techniques combined with patch related synthesis approach have been implemented to attain the stylization feature with high resolution. e neural methods have been trained to increase the stylization level in global standard and predict the output for relevant patch related synthesis at different level. e original artistic media is used in better way to improve the dependability of the image in stylish manner. Without incrementing the pixel size the generated image is in high quality. e visual quality is checked with the related style transfer method and the response is better with feature extraction metrics [14]. e deep learning based feature extraction has been implemented using the methods in which the identification of high level image features segregates the style and image contents. e image style adaptation methodology produces the style characteristics of several paintings and pertained styles for learning from the other images also. e incorporation of artificial intelligence concept with the art strengthens this method which has the improved one [15]. Neural learning techniques were implemented to be efficient for style transfer. A new image has been synthesized to keep the high level image with the low-level features for the style image. Moreover, the convolutional methods are used to implement the feature extraction for style images with semantic features identified from the images. e natural style elements are incorporated to improve the low-level features into the high level feature using machine learning approach [16]. Every painting may confine the integral enhancement for natural stylizations. e style transfer technique is combined with the direction based style transfer for improved texture extraction. An innovative field loss with direction is enhanced to the synthesized images. e loss function is used to identify if there is any loss in the time of style transfer and the incorporated method is constructed to reduce the loss itself without any specific methodology. A 2 Scientific Programming simple interaction technique is developed to manage the generated fields and the direction of the texture in generated images. e texture enhancement method is used to implement the style transfer to the synthesized images [17]. e photographic related style transfer shows the improved output for spatial based semantic segmentation for input. e segmentation has been done within the region of the input image and the style referenced images are dissimilar in spatial contents. A spatial transformation technique is applied before implementing style transfer technique.
e cross-correlation based feature maps are applied to compute the affine transformation to the generated image to implement the active style transfer for the semantically aligned regions. e pretrained CNN model is implemented to identify the reference images when the shadows are removed from the produced images [18]. e semantic related style transfer technique has been identified for getting the solution for the semantic based issues. e semantic matching is implemented to increase the style transfer quality. Every image is segregated into various regions with semantic values and improved painting [19]. e divided regions are further arranged related to the semantic elucidation. e source region is trained using the learning elements to produce an output in stylish manner. e semantic matches have been identified within the regions and the guaranteed semantic matching is ascertained within the source and the target outputs. e semantic gaps are identified whenever dividing the regions based on the semantic values for the real paintings and also the photographs.
e domain adaptation method is developed to decrease semantic gap for regional segregation [20]. e synthesis images have been mostly used in the realtime applications with learning and training model developed to minimize the cost of the resources and also the human values. ere are several differences identified for original images and synthesis images in real-time characteristics [21]. e style transfer is one of the solutions to minimize the gap within these kinds of images. e indoor synthesis images are converted into the improved images by minimizing the light influences. Hence, the content data of the real image is achieved by the style information. e convergence speed is converted to identify the real images in complex situations [22]. e visual artworks have been applied in a photo related style transfer method which may produce the realistic images. e outcome of this technique is used to assist the normal users to expand the style transfer images which are affected by the various real-time issues. e autonomous sky segmentation approach is used to segregate the input image with sky background. e background colors are characterized into the sky segmentation. e natural color transformation technique is adopted with the sky background and correction method to guarantee the quality output production [23]. e style transfer approach is also used in the field of medical images. e CNN and combined deep learning approaches are implemented in the computer vision tasks. e CNN model needs a huge amount of data sets that are very difficult in medical based image processing. For this problem, the image generation is used to increase the computer vision [24]. A new engine is developed to exploit the network to confine the synthetic information. e style transfer method is utilized to enhance the visual realism in the public dataset combined to the semantic features using CNN methodology [25]. e optical images are applied in the deep learning concept under water image information to reduce the bottleneck. e detection of objects in real-time sonar images is utilized for the network based training [26]. e 3D artistic face modeling with controllable manner [27] enables the face geometry space and a face texture space based on 3D face dataset. e experiment is carried out in real time without GPU acceleration to achieve different cartoon characters. e deconstructed integer function [28] is applied to have different attributes as biomorphism, beauty, and symmetry. e random geometric graphs enable the creative artistic composition. e algorithm is introduced with new outline image [29] extracted from the content image. e variation regularization is applied to reduce the noise and to smoothen the boundary region and outline loss function applied on the outline image. e results experience the better design clothing shape which is reserved perfectly. e virtual space technology [30] enables the immersive experience which analyzes the multisensory and multitechnical spatial art style transformation form. e results show the better experience on art style transfer. e neural style transfer has been constructed with several semantic representations with dual semantic loss which has been maintained with the particular values for the stylized outputs to every technique of the computed content images [31]. e color cast has been established according to the illumination modification and the temperature for increasing the productivity. e color calibration technique is used to transfer the exact color with semantic representation. e global attention functionality has removed the color cast from the input image for style transfer [32]. e neural style transfer technique has been implemented to convert the portrait image into the specific realistic image with some style and the pixel motion parameter with the color displacement from specific frames in semantic representation [33]. e optimization problems have the solutions through the metaheuristic techniques as the torus walk bat algorithm and modified bat algorithm are used to enhance the local search capacity ahead of utilizing the standard process [34,35].

Proposed Method
e semantically useful style in a global level without user interaction and the low-level elements is removed and regaining the color related information is done to keep the fidelity without affecting the originality.
e proposed system is constructed to diminish the computational complexity by producing the full resolution based images. e exploitation of previously used style transfer technique will enhance the professionalism. e image analogies have been minimized by using the neural related style transfer technique. e VGG16 is used to pretrain the convolutional neural networks in efficient manner. e dissimilar colors are mixed and produced the single region and are utilized to Scientific Programming perform the style transfer in semantically dissimilar regions. e max-pooling operation enhances the flow of the images in gradient level. An important element is implemented for producing art generation which is related with the creativity of art in real-time scenario.
VGG16, a pretrained neural network, is utilized to implement the content and style representations from its layers. Figure 1 represents the block diagram for art generation using neural algorithm using VGG-16 with 16 convolutional layers and 5 pooling layers. Whenever the input image and style image are passed through the network, input image is initialized to the content image. Style representation is extracted at the initial layers of the network as they extract the pixilated features. In the extraction, the layers extract the content in the image. Once the style and content representations are extracted, the output is generated by reducing the losses between them. e generated output image consists of the content representations of the input image and style representation. Convolution neural networks are used in image processing. ey are made up of layers where every neuron delivers an input, computes a dot product, and pursues with nonlinearity optionally. Here, unlike any particular neural network, the neurons in the layers of CNN are prearranged in 3 dimensions: height, breadth, and depth, where depth refers to activation volume not the depth of the network.
It consists of layers of minute computational units which produce visual data randomly in a feedforward approach. A CNN is mainly built of convolutional based layers, these layers are the compilation of image filters that extract a particular feature from the input, where the feature map will be the targeted output. When CNN are trained for performing the image processing, they construct a representation of image which takes the object data with the processing functionality of the input image modified into the pixel based image representations for producing the quality images. e higher level contents and objects are arranged to execute the pixel values from the original image; the lower layers are reconstructed for the pixel values of the input image.
Convolution layer is the central unit of constructing the CNN that is mathematically operated to combine the group of data. It is implemented on the input data with convolution filter to create the feature map. Feature space is framed to confine the texture data and it is utilized to get the style of the image. e feature space is developed for delivering the filter responses in every layer. It contains the correlations within the dissimilar filter representations. By incorporating the feature correlations of various layers, the multiscale representation of the input image recognizes the texture components. e operation is computed by sliding the filter across the input image. At the particular location, there is a component related matrix formation and the resultant that creates the feature map. e common area of producing the convolution operation is performed by the standard filter which is shown in Figure 2. Figure 3 demonstrates the generating feature map using convolutional input and filter. e filter across the aggregated input is used to produce the convolution results with feature map. e convolution operations are computed in 3 dimensions and the image is demonstrated as the 3-dimensional matrixes like depth, height, and width. A convolution filter holds the particular width and height as 5 × 5 and 3 × 3; it may cover the requirements of 3 dimensions. e feature maps are constructed to produce the final output with different kinds of filters and generated the output from the convolution layer. e convolution filter requires being determined in the input. If equal dimension requires maintenance, the padding concept is utilized to enclose the input with zeros.
In Figure 4, there is an input image with 32 × 32 × 3 dimensions and a filter layer with 5 × 5 × 3 dimensions. e filter has improved depth that matches the depth of the input   and the value for both is 3. Whenever the filter is located in a specific position, it wraps a volume of input and produces the output using the convolution operations. If 10 dissimilar filters are used, 10 different feature maps are used when computing them together combined with the dimension of depth for computing the size of 32 × 32 × 10 convolution layers. e convolution operation for every filter is computed individually and results in a disjoint set. Nonlinearity could be achieved by producing the sum of weighted inputs along with the activation function; the convolution operation is passed along with the nonsaturating activation function called ReLu. e constructed final feature maps are the network with ReLu operation. e sum of the matrix multiplication is performed in 3 dimensions as shown in Figure 5, but the result is scalar with the feature map of size 32 × 32 × 10. e multiple filters are used to decrease the computational cost; it needs to utilize the specific filters within the particular time for learning purpose. e filters are mapped to the input image and learn every part of the image with the tiny filter sliding from one end to another end to produce the output image. e convolution process has the individual value in spite of using the style filter and the convolutions are joined to frame the output through the total amount of filters.
Stride demonstrates the convolution filter required to be moved at every step. Initially, the value of stride is 1 which is demonstrated in Figure 6. e stride is recommended to use the initial value 1. e padding could be frequently using layers in the convolution operation but not in pooling operation. e bigger strides are overlapped within the output fields that generate the output feature map of very minimum value. From Figure 7, it can be observed that the size of the feature map is shorter than the input stride value increased to 2. After the convolution and pooling layers, few connected layers are included to conclude the CNN architecture. It is known that the output of the convolution layers and pooling layers is in 3 dimensions but fully connected layers expect a 1 dimension vector. Hence, the final output of the pooling layer is flattened to a vector which befalls the input to the fully connected layer, where destruction is organizing the 3dimension volumes of numbers into a 1-dimension vector. Figure 8 represents the padding in grey; the input can be embedded with zeros. e padding is the concept of merging the pixels into the image whenever the kernel processing padding value is 0. e testing process has the similar dimensions and the length to highest length will affect the accuracy. e dimensionality of the feature map equals the input value; then the padding is normally used to hold the feature map size that would process at every layer in the convolutional neural networks.
Pooling layers behind sample every feature map individually, diminishing the height and width maintenance with the depth together. Pooling is performed after the convolution operation to diminish the dimensionality, which enable us to reduce the amount of parameters which abbreviates the training time and overfitting. Pooling layer slides a window across its input and takes the highest value in the window where the window size and stride are in particular similar to a convolution. e most general type of pooling is max-pooling which receives the maximum assessment in the window as shown in Figure 9.
From Figure 10, the window and stride configurations segregate the feature map pooling size to minimum that the  e VGG network was trained to implement the object recognition with the feature space constructed using convolutional and max-pooling layers. e network is positioned to improve the scaling and weight of the activated filters within the images. With the arrangement of VGG network for producing the better output, the activation functions are generated for increasing the efficient feature maps. is model contains the fully connected layers for performing the pooling operating to increase the efficiency.
e CNN learns the styles through the parameters for training the restructuring loss Loss R for the targeted output image (O) from an input image (I) computed using the following equation: (1) e perceptual loss Loss P within the style branch is computed using the following equation: where St i is the styled image, Loss st is restructuring loss for style, Loss N is the restructuring loss for the normal, and Loss in is the restructuring loss for intensity. e restructuring loss for style (Loss st ) is computed using the following equation: e restructuring loss for intensity is computed using the following equation: e complexity is analyzed in the network that uses the nonlinear values for fixing the layer position in the network. e input image a → is prearranged in every layer of the CNN using filter reaction to the image. A layer is mapped with the individual filters for producing the feature maps of every matrix and identifies the position of the particular layer. e style representation is obtained from an input image; the feature space is identified to identify the texture data. e feature space has been constructed to reply the filter in the layers. It contains the correlations within the dissimilar filter replies that are expected for spatial related feature maps. ese correlations framed the generalized matrix (Ge l ij ) using feature map Fe l ik in the following equation:

Results and Discussion
e experiments have been done using the MATLAB software and the WikiArt data set has been utilized to implement the performance evaluation. e style classification is performed using the style ambiguity. e proposed     methodology is compared with the related methods of Deeplab [1], CAN [12], and SegEM [23], and the performance metrics used for these experiments are processing time, restructuring loss, and accuracy. Figure 11 demonstrates the accuracy % for successfully performing style transfer; the proposed methodology has efficient functionality of improving the accuracy compared to the related methods. Figure 12 demonstrates the runtime for the measurement of the style transfer and the results illustrated that the proposed methodology has diminished amount of runtime compared with the other methods. e restructuring loss is one of the primary parameters to increase the functionality of the proposed methodology. Figure 13 illustrates that the proposed methodology has reduced amount of restructuring loss percent compared with the related methodologies. e curve is the plot for discovering the performance over time period which has been produced by the machine learning approach.
Our proposed method has produced significantly good results from the neural network based technique. e largescale based artefacts are generated using this proposed approach. e produced result has high quality in resolution and it looks the same as the input image after successfully implementing the style transfer. Eliminating the style ambiguity loss is used to improve the accuracy of implementing the style transfer. Our proposed system has proved producing the new artefacts without affecting the resolution of the images.

Conclusion
e images generated through style transfer are dependent on iterations. e images generated with minimum iterations have more style features while the images generated with maximum iterations have more content features. is traces how humans produce and recognize artistic imagery which gives an algorithmic understanding. In conclusion, that pretrained neural network can be used for not only image recognition but also painting. e feature space demonstrates the larger levels of the input image through the VGG16 neural network for replicating the task while the input image has executed the style transfer to capture the depth image. e validation process is used for producing the style transfer with reduced amount of complexity and the max-pooling layers are involved into the feature space for visual object recognition.

Data Availability
Data will be made available upon request.

Disclosure
is research does not involve any human or animal participation.

Conflicts of Interest
e authors declare that they do not have any conflicts of interest.

Authors' Contributions
R. Dinesh Kumar contributed to writing original draft, writing review and editing, conceptualization, data curation, and validation. E. Golden Julie contributed to conceptualization, formal analysis, and supervision. Y. Harold Robinson contributed to conceptualization, formal analysis, writing original draft, writing review and editing, and supervision. S. Vimal contributed to conceptualization, writing original draft, formal analysis, and supervision. Gaurav Dhiman contributed to conceptualization, writing original  Scientific Programming draft, formal analysis, and supervision. Murugesh Veerasamy contributed to conceptualization, formal analysis, writing original draft, writing review and editing, and supervision. All authors have checked and agreed the submission.