ANALYSIS OF SENTIMENTAL IMAGES USING DEEP LEARNING APPROACH

Deep learning also known as universal learning approach is a kind of machine learning used to carry out classification tasks straightforwardly from Medias like images, text or sound. The paper centers on execution examination of three pre-trained deep learning network with an end goal of classification of images which are related to sentimental analysis. The pre-defined convolutional neural networks (CNN) handled are AlexNet, ResNet50 and VGG16 with different Epoch. These networks are pre-trained on Twitter dataset. We focus on the structure of feelings deduced by our model and contrast it with what has been proposed in the psychology literature, and confirm our model on a bunch of pictures that have been utilized in psychology studies.  At long last, our work likewise gives a helpful instrument to the developing scholarly investigation of pictures consists of both photographs and memes on social networks. The network architectures are analyzed dependent on different means including, accuracy, precision, recall and F1-score. As per the experiment, out of three networks AlexNet gives better outcome as far as precision when compared to other networks.


INTRODUCTION
The online social network has become an indispensable piece of our ordinary life. Users are sharing a ton of a lot of scholarly and visual substance to convey their emotions and sentiments.
These contents exhibit the emotions and practices of billions of individuals all through the world.
Social networks are offering various types of assistance for their users to communicate and exchange information. Users use these services to share various occasions of their life, to communicate conclusions on various issue and to show care and backing towards companions and society. Examining these user created contents can help comprehend and foresee user behavior. Information gained from such frameworks can profit a few applications like such as predictive modeling, product, and service recommender system, online marketing etc.
Researchers have analyzed this pattern and tons of investigations have been performed to analyze sentiment and opinion mining through textual contents of social networks.
In recent days, visual contents acquired generally more fame than textual contents among the users of different social networks such as Facebook, Instagram, SnapChat, Flickr, Twitter, etc.
Status or posts with visual substance regularly contain a short literary depiction or no content by any means. Along these, the visual highlights express a huge part of the people feeling or assessment in these kinds of contents. In addition, pictures can defeat language limit and are more obvious. Fig. 1 shows some picture tweets gathered from Twitter where various sorts of feelings are communicated [13]. While there are huge measure of work for examining the feeling of literary substance, research on visual supposition examination is as yet in its rudimentary stage. Since dissecting supposition from the picture is challenging because of a few reasons.
While object acknowledgment is commonly very much characterized, picture assessment investigation is more theoretical in nature. Visual sentiment analysis includes the capacity to perceive object, scene, action and their emotional context producing hand-created highlights from pictures for foreseeing sentiment requires a significant requires a lot of human exertion and time. On the other side, supervised algorithms need an immense volume of regulated preparing 5476 G. HEREN CHELLAM, V. ROSELINE information which is hard to gather for pictures of various spaces. As a result, passionate parts of pictures are decently ignored contrasted with other computer vision activities such as object recognition, detection, and tracking.

Fig. 1 Sample Tweet images from Twitter
Deep learning is sometimes called universal learning since it tends to be applied to practically any application space deep learning don't need the plan of highlights early. Highlights are consequently discovered that are ideal for the main job. As a result, the regular varieties in the information are consequently educated. The same deep learning approach can be utilized in various applications or with various data types. This approach is often called transfer learning.
Likewise, this methodology is useful where the issue doesn't have adequate accessible information. The profound learning approach is exceptionally adaptable. There is a major initiative at Lawrence Livermore National Laboratory (LLNL) in creating systems for networks this way, which can execute.

RELATED WORKS
Sentiment analysis on text is a well-developed research area in both computer science and psychology, and sentiment analysis has been used to answer psychological questions. However, researchers have cautioned that sentiment analysis focuses on the positive or negative sentiment expressed by a piece of text, rather than on the underlying emotional state of the person who wrote the text [2] and thus is not definitely a reliable measure of latent emotion. As my contribution I have done novelty in the text mining in regards of sentimental analysis with PS-POS for text extraction and sentimental analysis was done using the CNN technique Bi-LSTM giving out a drastic output of 93.05% accuracy [1].
Recently there is an improved interest from various research communities in understanding the emotional response of the viewer during interaction with social media. A psychological study on the effect of colors on emotions based on Pleasure, Arousal and Dominance model shows that more brighter tones are more lovely, less stirring, and prompt less predominance than the more obscure colors [3]. In [4], researchers used factor analysis method and investigated how eleven emotion scales are related with three color emotion factors (i.e., color activity, color weight and color heat) of single colors, which shows that there is stability in the way people perceive colors. To computationally tackle this issue, scientists have done a ton of deals with this. In [5], they used Supporting Vector Machines to estimate the local image statistics. Sartori et al. [6] proposed to use both visual and text information in a combined learning model for abstract painting emotion recognition. K. He et al. [2] used a multi-task learning approach for painting style analysis. These models are all traditional statistic models and don't apply deep neural networks. Subsequently, higher-level visual semantics such as image aesthetic analysis [7] and visual sentiment analysis [8] are getting increasingly manageable. You et al. [9] utilized CNN to learn highlights which are valuable for visual examination.

METHODOLOGY
The schematic outline of the methodology is shown in Fig. 2. At first all the images are preprocessed. Next feature extraction and classification are performed, which are carried out by using pre-trained CNN architecture which includes AlexNet, VGG16 and ResNet50.
Performance analysis of all networks is detailed in the later sections.

Preprocessing
The purpose of preprocessing is to upgrade the picture to required level. Data augmentation is the process by which the total number of images can be increased to manifold. It is basically

CNN Architecture
The CNN, one of the deep learning architecture belongs to the class of feed forward neural network [10]. The two significant steps involved are learning features and classifying data performed by input layer, hidden layer and output layer. The images with pre-defined size are where, W denotes the input height/length, K is the filter size, P means number of zero padding and S is the stride. When there is an increase the number of convolution layer, more complex features can be learned. The goal of batch normalization followed by convolution layer is for regularization. The activation function used is ReLU (rectified linear unit). Eqn 2 gives relation for ReLU.
The max pooling operation performed here is for downsampling, helps in minimizing the size of feature map. The output layer is one where the classification is performed. Which consist of a fully connected layer, softmax layer and classification layer. The input to fully connected layer is given by the relation Here weight matrix W is multiplied with the input obtained in hidden layer and is added with a bias where, y is the output class obtained, x is the feature vector. The softmax layer, it converts the raw values of the output classes into normalized score. The result of prediction is a probability value of class occurrence. Finally classification layer provides class label according to the probability. The following section explains the three pre-trained CNN architectures [15].

i) ResNet50
The basic block diagram of the ResNet50 architecture is depicted in Fig. 3. ResNet50 is a traditional feed forward network with a residual connection. The output of a residual layer can be defined based on the outputs of ( −1)h which comes from the past layer defined as −1 . F ( −1) is the yeild after performing various operations (e.g. convolution with various size of filters, Batch Normalization (BN) trailed by an activation function like the ReLU on −1). The final yeild of residual unit is which can be defined with the following equation: The residual network encompass of few fundamental residual blocks. However, the works in the residual block can be altered based on the distinct architecture of residual networks [9].

iii) AlexNet
The precise structure of AlexNet is shown in Fig. 5. Second, LRN can be applied across the channels or feature maps (neighborhood along the third dimension but a single pixel or location) [13].

Dataset
In this paper, we examine the performance of Resnet50, AlexNet and VGG16 by using Twitter dataset with 8288 images taken from different tweets. The examination is done in different epoch to identify the better approach in above three algorithms. We used Keras and TensorFlow as backend.

Metrics
The performance of all networks are compared using various metrics mentioned below, which are determined from a matrix called confusion matrix. The performance metrics used here are accuracy, precision, recall and F1-score [12].  and F1Score compared to other architecture. Even in accuracy it is almost equal to VGG16. So on comparison the chart given in Fig 6 depicts that AlexNet is the better architecture out of the three. Table I above shows the comparison of better performing architecture with recent similar methodologies.

CONCLUSION
Our goal was to explore a core area of psychology, the study of emotion, using a huge and novel social media dataset. The aim of this work concentrated on comparing the performance of AlexNet, ResNet50 and VGG16 the undertaking of breaking down the sentiment in Twitter images with different performance metrics. The correlation of these architectures presents the benefits, in which they do not need any tedious pre-processing, and they are faster and a profitable training performance. The goal was to find the more suitable model. The AlexNet model with all three layers has given overall better results, which is highly statistically 5485 ANALYSIS OF SENTIMENTAL IMAGES USING DEEP LEARNING APPROACH significant and demonstrates the effectiveness of analyzing images with the combination of CNN and fine-tuning adjustment. In the future, we will assess our model on other picture and text upgrades datasets that have been developed for psychological studies and investigate whether human judges are pretty much precise than our model. Finally, we will explore other psychological components of the structure of emotion, for example day to day and day of week trends in emotion.

CONFLICT OF INTERESTS
The author(s) declare that there is no conflict of interests.