TGSL-Dependent Feature Selection for Boosting the Visual Sentiment Classification

The automatic recognition of the emotions in still images is inherently more challenging than other visual recognition tasks, such as scene recognition, object classification and semantic image classification, as it involves a higher level of abstraction in the human cognition perspective. Symmetry can be found in many objects in the nature and can be used for many purposes such as object detection and recognition. Furthermore, rotating and flipping of the image is employed based on symmetry for training the classifier for the most accurate classification. Hence, there is a need to handle effectively large intra-class variance, scalability and subjectivity during recognition, and it is inherently ambiguous as an image can evoke multiple emotions. To address these issues, many of the existing works focus on improving the image representations. It is motivated by the observation that both global distributions and local image regions carry massive sentiments. In this research, three different pre-trained architectural models are implemented, and the classification performance of binary sentiment classification is examined on five widely-used effective datasets. Moreover, the features from the pre-trained models are selected optimally using the proposed Teaching Gaining Sharing Learning (TGSL) algorithm, which is the major contribution of the research. Extensive experiment results on the five datasets demonstrate that the proposed Visual sentiment analysis based on the TGSL algorithm with data augmentation achieved an improved performance compared to all other conventional techniques. The proposed framework uses the pre-trained model and never utilized any hand-crafted features, boosting the mean accuracy, sensitivity, and specificity to 99.11%, 99.31%, and 99.22%, respectively, for abstract dataset.


Introduction
Nowadays Affective computing draws significant attention from various researchers as it renders great service to various research communities ranging from computer vision to psychology. Advanced computer vision and machine learning techniques automatically classify the object, scenes in images and videos but often lack an affective interpretation of such content. The automatic symmetric detection of the objects is employed for the emotion recognition [1], which provides the most accurate and robust performance for the recognition system. Furthermore, symmetry is widely used for the real emotional recognition combined with convolutional neural network [2]. Initially, affective computing research was towards detecting and recognizing emotions to learn facial expression and gesture recognition. Sentiment analysis belongs to the active field of affective computing research domain [3]. The objective of automatic visual sentiment analysis is to recognize the sentiments that are evoked in images. Sentiment and emotion are closely related entities, but they do not express the same thing. Sentiment refers to an opinion or mental attitude produced by the reflection of a negative or positive feeling. On the other hand, emotion is usually defined as a complicated, multidimensional integrated feeling state that reflects humans' psychological and emotional changes.
Online visual contents in social networks and websites play a prominent role in the users' day-to-day life. More significantly, the users keep a record of their daily activities through different types of media. Moreover, due to the widespread popularity of social networks, internet users intend to express their views through various media types, such asvideos, images, etc.For instance, one can reveal their travel experience or any view on the habitual events through these visual media. Hence, visual sentiment analysis comes into existence, enabling the automatic analysis of the sentiments from animage posted via social networks and other online media. Images and videos display strong sentiments, which strengthens the opinion portrayed in the content, and the sentiments underlying in the visual contents. Therefore, it benefits society in many means, such as targeted marketing, opinion mining [4], biomedical studies (stress/pain monitoring), autism-related assistive technology, e-business, image captioning [5], image retrieval [3] and so on. In computer vision, studies insist on the modelingof generic visual ideas. In contrast, the study on the adjectives concerning the visual sentiments is a question and seemed impossible due to the huge affective gap existing between the low and high-level features [6]. However, visual sentiment analysis is quite in the development stage, and it seems to be a tedious analysis compared with text-based sentiment analysis [7].
In visual sentiment analysis, the semantics are usually hidden in the images, in contrast withthe text semantic, and this seems to be a high-level visual semantic. However, it is worth noting that there is no high-level visual semantic dictionary similar to Wordnet, a semantic dictionary for text analysis. Therefore, there is a need for high-level visual semantic ontology features instead of low-level visual features for acquiring the visual sentiments from images. In the present scenario, the prediction of the visual sentiments is made based on the features acquired from the entire image as per the inspiration from the psychology theory and principles of arts. Moreover, it is peculiar to note that the emotions evoked by an image are from its global appearance and its local regions. Figure 1 shows the example images for the positive and negative sentiment.
Symmetry 2021, 13, x FOR PEER REVIEW 2 of 24 mental attitude produced by the reflection of a negative or positive feeling. On the other hand, emotion is usually defined as a complicated, multidimensional integrated feeling state that reflects humans' psychological and emotional changes. Online visual contents in social networks and websites play a prominent role in the users' day-to-day life. More significantly, the users keep a record of their daily activities through different types of media. Moreover, due to the widespread popularity of social networks, internet users intend to express their views through various media types, such asvideos, images, etc.For instance, one can reveal their travel experience or any view on the habitual events through these visual media. Hence, visual sentiment analysis comes into existence, enabling the automatic analysis of the sentiments from animage posted via social networks and other online media. Images and videos display strong sentiments, which strengthens the opinion portrayed in the content, and the sentiments underlying in the visual contents. Therefore, it benefits society in many means, such as targeted marketing, opinion mining [4], biomedical studies (stress/pain monitoring), autism-related assistive technology, e-business, image captioning [5], image retrieval [3] and so on. In computer vision, studies insist on the modelingof generic visual ideas. In contrast, the study on the adjectives concerning the visual sentiments is a question and seemed impossible due to the huge affective gap existing between the low and high-level features [6]. However, visual sentiment analysis is quite in the development stage, and it seems to be a tedious analysis compared with text-based sentiment analysis [7].
In visual sentiment analysis, the semantics are usually hidden in the images, in contrast withthe text semantic, and this seems to be a high-level visual semantic. However, it is worth noting that there is no high-level visual semantic dictionary similar to Wordnet, a semantic dictionary for text analysis. Therefore, there is a need for high-level visual semantic ontology features instead of low-level visual features for acquiring the visual sentiments from images. In the present scenario, the prediction of the visual sentiments is made based on the features acquired from the entire image as per the inspiration from the psychology theory and principles of arts. Moreover, it is peculiar to note that the emotions evoked by an image are from its global appearance and its local regions. Figure 1 shows the example images for the positive and negative sentiment. Recently, with the tremendous growth of high-performance computing (HPC) resources, various deep approaches have been developed to analyze sentiment [8][9][10][11][12][13]. The efficiency of machine learning-oriented in-depth features has been viewed over handcrafted features, such as color, texture, content and composition [6] on visual sentiment Recently, with the tremendous growth of high-performance computing (HPC) resources, various deep approaches have been developed to analyze sentiment [8][9][10][11][12][13]. The efficiency of machine learning-oriented in-depth features has been viewed over handcrafted features, such as color, texture, content and composition [6] on visual sentiment prediction. In comparison with the conventional recognition tasks, visual sentiment analysis is highly challenging due to a higher level of subjectivity in the human recognition process. It is also quite challenging as images with different looks can have the same emo- tions, and some images with similar looks can have different emotions. Thus, interpreting visual information at the affective level is uncertain, and it is also hard to predict emotions directly from visual information.
In this research, visual sentiment analysis is concentrated. The feature selection is carried outusing the proposed TGSL algorithm, which inherits the Instructor characteristics and the mutual gaining characteristics of the Trainee to explore the highly relevant features from the extracted features using the pre-trained model. Generally, Visual sentiment analysis comprises two major groups; the first is feature extraction, and the second is sentiment classification. In this research, feature selection is placed as an intermediate step that boosts classification performance. Some of the contributions of the research are enlisted in the following statements.

•
To analyze the effectiveness of the three dynamic frameworks such as ResNet50 [14], VGG [15] and Alexnet [16]. In addition, the ImageNet [17] dataset is utilized for training the models above and the classification of the visual sentiments.

•
The three data augmentation, such as Scaling, rotation and Intensityvariation, are explored and analyzed for data augmentation in the visual sentiment classification. • Proposed a novel feature selection algorithm TGSL, which is developed by integrating Instructor and Trainee's characteristic features from the Teaching learning-based algorithm [18] and Gaining-sharing based algorithm [10]. It is used to selectively select the huge feature vector produced from the pre-trained CNN models, which helps improve the classification of sentiments from images.
Following this introduction section, the rest of the paper is organized as Section 2 provides the literature review. Section 3 elaborates the proposed method of visual sentiment prediction. Experimental results and discussions are analyzed in Section 4. Finally, Section 5 concludes the work with future research scope.

Related Works
The progress and impressive performance of CNN is drawing much research attention in computer vision-related research. It has recently been employed to understand the sentiment from visual content and has attracted significant attention. There are two main ways to handle visual sentiment prediction: dimensional models [19] and categorical models [20]. Categorical approaches are easier for a human to understand, and hence we aim to make categorical sentiment predictions. Categorical models are associated with one or more labels. In contrast, in dimension models, the affective states are represented by the 3-D valence-arousal-control emotion space [21], or 2-D valence-arousal space [19,20], or activity-weight-heat space [22]. However, other emotions evoked in an image can also be described using emotional adjectives. Here, adjectives are a common way to express human emotional feelings and impressions or the quality of any facts or events.
Visual sentiment analysis can also be viewed as a high-level visual content classification problem. Several researchers have employed traditional generic image processing features color [23], texture [24,25] and shape [26]. Yanulevskaya et al. [27] employed Wiccest and Gabor feature, global/local RGB histogram color histogram, color moment and color correlogram. Furthermore, the SIFT-based bag of features [28] and Gist features [28,29], and the model was developed with machine learning to achieve sentiment categorization. Machajdik et al. [6] tried to extract eight different pixel-level hand-crafted features based on art and psychology theory to represent the affective content of images. Inspired by the art theories, Wang et al. [30] extracted the aesthetic features such as color patterns, shapes and compositions that are more suitable for effective prediction. Sentiment analyses were also carried out using midlevel features by researchers [29]. In [29] scene, low level features were suggested to bridge the affective gap between the low-level features and high-level sentiments. These features were trained using an SVM classifier to find 102 mid-level attributes of an image to predict its sentiment. Zhao et al. [31] designed a multi-task hypergraph learning framework to predict emotion which considers different factors that influence emotion such as social context, temporal evaluation and location influence. In recent years, the increase in the computation ability of GPU and inspired by the great success of various CNN models, such as AlexNet, VGGNet and GoogLeNet, in many computer vision tasks attempted to develop deep architectures for the task of visual sentiment analysis. AshimaYadav and DineshKumarVishwakarma [32] utilize the residual attention-based deep learning network (RA-DLA network) to evaluate the emotion recognition process in the still network. The powerful emotions are conveyed by the local regions of the specific image. Hence, the residual attention strategy is accomplished in the local region, the emotion-rich region in the image. The high detection accuracy is the prime advantage of the RA-DLA system. However, the RLDA-system is not adaptable for the multi-modal data to recognize multi-modal emotions. Papiya Das et al. [33] utilize the SVM classifier to recognize emotions in still images. The SVM classifier in this method replaces the SoftMax layer of the DNN.
The hashtags or the descriptive textual methods are utilized in the system to map the local image. The SVM classifier is not suitable for non-linear issues and fails to handle high-dimensional datasets. Jie Chen et al. [34] presented the active learning curve to attain effective recognition of emotion in the image. The Deep-CNN is incorporated with active learning for the sentiment analysis process. The active learning methods utilize few standard samples to train the system. The active learning system is needed to be integrated with the uncertainty sampling method to obtain better detection accuracy in the complex sentiment analysis process. Xiong et al. [35] modeled a region-based convolutional neural network (R-CNNGSR) using sparse group regularization for image sentiment classification. In addition, some methods incorporate the model weights learned from a large-scale general dataset and further fine-tuned the CNNs for visual sentiment prediction. Shaojing et al. [36] designed a DNN based model to learn how the image sentiment is related to human attention using the EMOd dataset. Kaikai Song et al. [37] presented a novel Sentiment Networks with visual Attention (SentiNet-A) and a multilayer saliency detection neural network framework to estimate the attention distribution in the task of visual sentiment analysis. These methodologies have attained adequate performance on visual sentiment analysis. However, the complexity present in visual content extraction hinders the performance of visual sentiment analysis and still lags behind textual sentiment analysis.

•
The prime challenge encountered in the active learning framework is in analyzing the sentiments in the complex environment. Hence, uncertainty sampling techniques need to be employed in this active learning technique to analyze the sentiments in the complex environment.

•
The RA-DL Net architecture faces a hectic challenge in integrating the multi-modal data in the form of speech and analyzing the multi-modal for sentiment classification.

•
One of the prime challenges experienced in the Visual analysis using the SVM classifier is to restrain the issues related to the multi-modal sentiment analysis. Therefore, the Deep learning model is developed to restrain the issues related to the multi-modal analysis [7].

Proposed Framework for Sentiment Analysis
The prime intention of the Visual sentimental analysis is to analyze the expressions of a person concerning the visualized image, which invokes the emotions such as disgust, excitation and fear. The diagrammatic representation of the proposed framework is shown in Figure 2. The entire process of visual sentiment analysis is implemented in the fourstep process, such as Data augmentation, feature extraction, feature selection and image classification. First, the clipped images are generated in the data augmentation module using the processes, such as intensity variations, rotation and Scaling of the input images, subject to the feature extraction using the pre-trained models, such asVGG16, Alexnet and ResNet50. The extracted features using the pre-trained model are subjected to the feature  The input visual image is first fed to the convolutional layers in which the filter is applied to the input image. The filters utilized in the convolutional layers summarize the existence of the recognized features and thus, generate the feature maps. The first convolutional layer extracts the low-level features, such asedges, lines and corners, and the higher layer extracts the higher-level features. The feature map obtained from the convolutional layers is then subjected to the pooling layer, which is utilized to sub-sample the feature maps by summing up features in patches of the element map. On the other hand, the output from the pooling layers is fed to the fully connected layers in which the weights are applied to each input to classify the image. The final features are extracted from the eighth fully connected layer, considered the features for sentimental visual classification.

Data Augmentation
The effect of image augmentation has been learned in deep neural networks [38]. Deep learning models require many training samples to build a robust model, which is hard to collect training samples with a large variety. Therefore, deep learning models developed for applications having small datasets have the risk of overfitting. Data augmentation is a strategy that facilitates the addition of replicas of the original samples using techniques such ascropping, flipping androtating, thus, increasing the variation of the training samples. Applying image augmentation to the dataset helps develop a better generalized model for large and small datasets. Furthermore, the model performance can be improved when the pre-trained models are fine-tuned and trained on the augmented data. The commonly used data augmentation techniques are scaling, rotating and Intensity variation. In this research, Scaling, Intensity variation and rotation are applied to analyze visual Symmetry 2021, 13, 1464 6 of 22 sentiment. After preparing the data augmentation database, pre-processing is employed. The individual images are resized to a suitable size and converted to the RGB image for the feature extraction.

Feature Extraction through Pre-Trained Models
Feature extraction is one of the significant tasks to be implemented in visual sentiment analysis to reduce redundant data. Most relevant information from the massive datasets is extracted through the feature extraction process, which restrains the system's computational complexities. In this research, for effective feature extraction, the images are pre-trained through standard training models such as ResNet50, Alexnet and Visual Geometry Group (VGG). A brief description of the training model is given below.

Alexnet
The Alexnet framework restrains the working of the deep neural network, comprised of eight layers involving the five convolutional layers and three fully connected layers. Alexnet is excellent over the traditional CNNs as the possibility of extracting the features is higher than any other deep network. Moreover, it is interesting to note that the features at different layers carry various abstract features. Therefore, the feature extraction is better at the top-level layers and at the output layer, where the dimension is [1 × 1000].

ResNet50
ResNet is the widely utilized training model among the deep network, which acquires the first position in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and Common Objects in Context (COCO). The ResNet is considered one of the innovative solutions for the vanishing gradient issues that result in performance saturation. ResNet has several stacked residual units and developed many different networks with deep layers: 18, 34, 50, 101, 152 and 1202. A deeper CNN stacked with more layers has the vanishing gradient problem, and this architecture overcomes it by the residual learning module. The single residual block of the ResNet50 is demonstrated in Figure 4. The overall architecture of ResNet50 is illustrated in Figure 5. ResNet-50 uses the bottleneck architecture; it has six units, the convolutional layer followed by four stacked residual blocks of 3 × 3 convolutional layers and a fully connected layer; this can quickly train the network. Finally, the 100th fully connected layer features are gathered for establishing the feature vector, where the dimension is  

1
for an input image. The input visual image is first fed to the convolutional layers in which the filter is applied to the input image. The filters utilized in the convolutional layers summarize the existence of the recognized features and thus, generate the feature maps. The first convolutional layer extracts the low-level features, such asedges, lines and corners, and the higher layer extracts the higher-level features. The feature map obtained from the convolutional layers is then subjected to the pooling layer, which is utilized to sub-sample the feature maps by summing up features in patches of the element map. On the other hand, the output from the pooling layers is fed to the fully connected layers in which the weights are applied to each input to classify the image. The final features are extracted from the eighth fully connected layer, considered the features for sentimental visual classification.

ResNet50
ResNet is the widely utilized training model among the deep network, which acquires the first position in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and Common Objects in Context (COCO). The ResNet is considered one of the innovative solutions for the vanishing gradient issues that result in performance saturation. ResNet has several stacked residual units and developed many different networks with deep layers: 18, 34, 50, 101, 152 and 1202. A deeper CNN stacked with more layers has the vanishing gradient problem, and this architecture overcomes it by the residual learning  Figure 4. The overall architecture of ResNet50 is illustrated in Figure 5. ResNet-50 uses the bottleneck architecture; it has six units, the convolutional layer followed by four stacked residual blocks of 3 × 3 convolutional layers and a fully connected layer; this can quickly train the network. Finally, the 100th fully connected layer features are gathered for establishing the feature vector, where the dimension is [1 × 1000] for an input image.
quires the first position in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and Common Objects in Context (COCO). The ResNet is considered one of the innovative solutions for the vanishing gradient issues that result in performance saturation. ResNet has several stacked residual units and developed many different networks with deep layers: 18, 34, 50, 101, 152 and 1202. A deeper CNN stacked with more layers has the vanishing gradient problem, and this architecture overcomes it by the residual learning module. The single residual block of the ResNet50 is demonstrated in Figure 4. The overall architecture of ResNet50 is illustrated in Figure 5. ResNet-50 uses the bottleneck architecture; it has six units, the convolutional layer followed by four stacked residual blocks of 3 × 3 convolutional layers and a fully connected layer; this can quickly train the network. Finally, the 100th fully connected layer features are gathered for establishing the feature vector, where the dimension is  

1
for an input image.

VGG
VGG is one of the advanced frameworks of the Convolutional Neural network, which focuses on enhancing the precision of the network. VGG 16 architecture comprises 13 convolutional layers and three fully connected layers. The input image is fixed at [224 × 224], which is fed to the stack of the convolutional layers, and a small filter of size [3 × 3] is utilized in the VGG16 architecture as an alternative to large filters. The final fully connected layer features are derived as the feature whose dimension is [1 × 1000], and the very last layer is the SoftMax layer. The architecture of VGG16 is depicted in Figure 6.

Proposed TGSL Algorithm for Feature Selection
The feature selection plays a significant role in the classification problems as it tackles the dimensional reduction issues thereby, boosting the classification performance. The algorithm called TGSL optimization is proposed in this research to restrain the highly significant features for classification. The major need for feature selection is to reduce the features that minimize the computational complexity associated with the classification. In the proposed TGSL algorithm, the characteristic features of the instructor, teaching ability and upgrading Trainee's tendency [10,18] are combined to hold advanced features in the proposed algorithm such that the proposed algorithm holds the higher tendency to render a globally optimal solution.

224×
, which is fed to the stack of the convolutional layers, and a small filter of size [ ] 3 3× is utilized in the VGG16 architecture as an alternative to large filters. The final fully connected layer features are derived as the feature whose dimension is [ ]

1×
, and the very last layer is the SoftMax layer. The architecture of VGG16 is depicted in Figure 6.

Proposed TGSL Algorithm for Feature Selection
The feature selection plays a significant role in the classification problems as it tackles the dimensional reduction issues thereby, boosting the classification performance. The algorithm called TGSL optimization is proposed in this research to restrain the highly significant features for classification. The major need for feature selection is to reduce the features that minimize the computational complexity associated with the classification. In the proposed TGSL algorithm, the characteristic features of the instructor, teaching ability and upgrading Trainee's tendency [10,18] are combined to hold advanced features in the proposed algorithm such that the proposed algorithm holds the higher tendency to render a globally optimal solution.

Motivation
The TGSL algorithm is inspired by the learning and sharing characteristics of human beings. A trainee holding better outcomes in terms of grades is considered the best ReLU MaxPooling ReLU Figure 6. VGG-16 architecture.

Motivation
The TGSL algorithm is inspired by the learning and sharing characteristics of human beings. A trainee holding better outcomes in terms of grades is considered the best observer and upgraded to the instructor level. Now, the upgraded Trainee shares the knowledge gained during the learning phase with the other trainees. Thus, the proposed algorithm is comprised of both the instructor phase and the trainee phase. In the trainee phase, the Trainee obtains deep knowledge from their instructor, and they make an effort to enhance their grades. Once the Trainee is the best observer, the Trainee is upgraded to instructor level. Once the Trainee enters the instructor level, knowledge sharing occurs between the instructor and other trainees. In the proposed approach, two ways are adopted by the Trainee to enhance the knowledge. One way is that the Trainee gains knowledge from his/her instructor, and the other way is enhancing the trainees' knowledge through the mutual interaction between the other trainers. The interaction may be in the form of formal communication, group discussion and presentation. The stepwise procedure of the proposed TGSL algorithm is demonstrated in the following section below.
Step 1: Initialization of the Optimization Parameters: Initializing the parameters is the major step involved in the proposed TGSL algorithm. Let us consider the population size as S, the number of generation as N G and the Design variables as Q v within limits l u and l w . The optimization problem is characterized as min Z 0 (P) where, Z o (P) is the objective function and P represents the vector for design variables such that l u ≤ x ≤ l w .
Step 2: Population Initialization: Generating the random population is the next step to the initialization process of parameters. The generated population is based on the population size and the design variables. For example, in the proposed TGSL, the population size represents the number of Trainees under the instructor. The Generated population is expressed in the following equations.
Step 3: Fitness Evaluation: The fitness evaluation of the solutions to be fittest depends on the maximal value of the fitness. In this phase, the instructor's solutions are decided for which the Trainee gains knowledge to becomepromoted as the instructor. The fitness is based on accuracy and is formulated as, where, Acc represents the accuracy, Tru p refers to the true positive value, Tru N refers to the true negative value, Neg r refers to the real negative cases, Pos r refers to the real positive cases.
Step 4: Instructor Phase: The main intention of the instructor is to enhance the Mean score of the trainers to upgrade the trainer's position. The best instructor always provides significant consideration to their Trainee, and they make an effort to shift the grade of their Trainee. Hence, the mean of the Trainee for each subject is estimated in this phase, and the mathematical model for the estimated mean is expressed as, The best solution act as the instructor in this level and the upgraded Trainee is expressed as, P I = P Z 0 (p) = min (4) As mentioned above, the instructor makes an effort to shift the mean of his Trainee to the instructor I, and the new mean obtained is mathematically expressed as, The variations between the new and previous means are estimated in this phase to determine the Trainee's progress. The difference or variation of the two means is mathematically expressed as, where, ∂ Q represents the variation of mean, V is the random variable, I is the instructing factor. The obtained difference is added with the existing solution to obtain the update its value.
If the fitness of the newly generated solution is better, the updated solution P t+1 Q is restored as the best solution of the current instance, while if the P t+1 Q holds lower fitness than the fitness of the best solution corresponding to the previous iteration, the best solution of the previous iteration is maintained as the best solution and discards P t+1 Q during this time.
Step 5: Trainee Phase: The observer can increase his/her knowledge through two different strategies. The first strategy of increasing the knowledge happens through their instructor and the second strategy of knowledge gaining occurs through the interaction between the observers. Let as consider the two trainees T i and T j . The updated equation is given as, where, Z(T i ) and Z T j denote the fitness measure of the trainees. In the mutual interaction, the senior trainees provide more effective solutions as they gain more knowledge than the junior trainees. Hence, the solutions shared by the senior level are utilized in this TGSL algorithm, and the updated gained function value is expressed in the following equation. where, the G d represents the knowledge factor in the above equation. The above equation is the equation that highlights the mutual sharing features, and the mutual interaction of the trainees is considered through which the trainees gain knowledge sooner to become the instructor and the tendency to avoid the global optimal convergence of the solution is boosted. The integration of the mutual interaction with the trainee phase is discussed below.
Case 1. Z(T i ) < Z T j In this phase, the fitness of the trainees is verified and the trainee with the best fitness becomes the Instructor of the iteration, which is the best solution.
Hence, the solution update for the proposed TGSL is given by, Step 6: Termination: The steps are repeated for the maximal iterations to reveal the best solution, which are the highly relevant features for classification.
The proposed TGSL Algorithm shown in Algorithm 1. Initialize the population of the Trainee 5 Evaluate the fitness of the solutions //Instructor phase 6 Estimate the mean µ Q for each design variables 7 Update the mean with the shifted mean µ t+1 Q 8 Estimate the difference in the mean ∂ Q 9 Find the update solution P t+1 Q 12 Update the trainee phase P t+1 The solution dimension obtained by the comparative method such as AlexNet, ResNet 50, VGG16, VGG with SA feature selection, VGG with GOA feature selection, ResNet with TLBO feature selection and the proposed TGSL algorithm is 1000, 1000, 1000, 500, 500, 500 and 1000, respectively. The maximum iteration taken by the comparative methods to attain the solution dimension is 50.

Visual Sentiment Classification Using the Conventional SVM Classifier
The selected features using the proposed optimization are fed to the sentiment classification model for classification, for which the SVM classifier [33] is utilized in this research. The SVM classifier is well suited for unstructured and semi structured datasets. It is also helpful to solve the complex problems using the kernel function. Furthermore, the overfitting is reduced and is useful for high dimensional data classification. The SVM classifier is one of the supervised artificial intelligence, which is utilized to restrain the rectification and the classification issues. The SVM-classifier comprised of the set of the categorized data, which is utilized to classify the undefined data. The SVM classifier estimates the closeness of the data; it transfers the features into a feature space where it classifies the similar data into the same class with kernel functions.

Experimental Results and Discussion
The results and discussion of the proposed TGSL method are elucidated in this section. Furthermore, the performance evaluation and the comparative analysis are implemented in this research to manifest the supremacy of the proposed TGSL method.

Datasets Description
This section portrays the information about the dataset considered for the experimental setup and evaluates the proposed work's performance by comparing the state of art approaches. The visual sentiment analysis system is implemented in MATLAB over the benchmark datasets Abstract [7], ArtPhoto [7], IAPSa [39] and MART [40], and EmotionROI [41].
Abstract:includes 279 peer-rated abstract paintings consisting of color and texture without contextual content.
ArtPhoto: contains Machajdik and Hanbury built 806 artistic photographs of professional artists from a photo-sharing site.
IAPSa: a subset of the IAPSdataset comprises 395 images chosen from the standard emotion-evoking image set categorized into eight emotions.
MART: dataset contains paintings by professional artists. It contains 500 abstract painting collections obtained from the Museum of Modern and Contemporary Art of Trento and Rovereto (MART).
EmotionROI (EmoROI): has a total of 1980 images collected from Flickr in seven discrete emotion categories (happy, sad, anger, disgust, fear, joy and neutral), 330image samples for each category, along with Valance-Arousal scores. Emotion6 dataset has an emotion distribution for each image, which gives a realistic appearance. The emotions anger, disgust, fear, sadness based on valence score are considered negative emotions; remaining emotions are treated as positive. Note that we do not include images with neutral sentiment in the experiment.
All these datasets are highly imbalanced; hence classifying into positive and negative emotion category is challenging. The dataset (IAPSa, Abstract and ArtPhoto) has eight emotions: amusement, awe, contentment, excitement, anger, disgust, fear and sadness. The first four emotions are categorized as positive, and meanwhile, the remaining four emotions as negative. The original images in each dataset had different resolutions; we resized the images to fit the standard size of the AlexNet, VGG and ResNet50 models.

Experimental Settings and Simulation Results
This section enumerates the simulation result of the proposed visual sentiment analysis. For effective sentiment analysis, the images from the five different datasets that induce the positive sentiments and the negative sentiments are utilized in this research. Due to inadequate training images per class, the proposed model uses data augmentation techniques to expand the training datasets. To validate the effectiveness of the data augmentation, the experiments were carried out in two approaches. The first approach of experiments uses the original datasets to train the CNN. The second approach first performs the data augmentation to the original datasets and then uses augmented datasets to train the same CNN. The first and second approaches are termed as without augmentation and with augmentation, respectively. Applying image augmentation to the dataset helps develop a better-generalized model for both large and small datasets. To further investigate the data augmentation approach, classifiers are built using 80% of the augmented images for training and the remaining 20% for testing. Figures 7 and 8 illustrates the data augmentation process of the positive sentiments and negative sentiments of the proposed Visual sentiment analysis. In this research, the data augmentation process, such as Scaling, rotation and Intensity variation, is solicited to restrain the over-fitting issues. Figure 7b depicts the Intensity variation of the input image, in which the intensity of the input image is assorted to obtain precision. Figure 7c demonstrates the rotation of the input image in which the source image is rotated either in clockwise or the anti-clockwise direction to change the position of the source image in the frame. Finally, Figure 7d demonstrates the Scaling of the input image in which the image is resized to the desired size. For the effective analysis of the negative images, the augmentation process, such as Intensity variation, rotating and Scaling is applied to the input sample, which boosts classification performance.
for training and the remaining 20% for testing. Figures 7 and 8 illustrates the data augmentation process of the positive sentiments and negative sentiments of the proposed Visual sentiment analysis. In this research, the data augmentation process, such as Scaling, rotation and Intensity variation, is solicited to restrain the over-fitting issues. Figure 7b depicts the Intensity variation of the input image, in which the intensity of the input image is assorted to obtain precision. Figure 7c demonstrates the rotation of the input image in which the source image is rotated either in clockwise or the anti-clockwise direction to change the position of the source image in the frame. Finally, Figure 7d demonstrates the Scaling of the input image in which the image is resized to the desired size. For the effective analysis of the negative images, the augmentation process, such as Intensity variation, rotating and Scaling is applied to the input sample, which boosts classification performance.

Performance Metrics
The performance metrics, such as accuracy, sensitivity and specificity, are evaluated in the Visual sentiment analysis method. A brief description of the performance metrics is given below.
Specificity: The specificity is characterized as the ratio of True Negatives to the number of real negative cases in the data as formulated as,

Performance Metrics
The performance metrics, such as accuracy, sensitivity and specificity, are evaluated in the Visual sentiment analysis method. A brief description of the performance metrics is given below.
Specificity: The specificity is characterized as the ratio of True Negatives to the number of real negative cases in the data as formulated as, where, Spe in the equation demonstrates the specificity, Tru N represents the true negative value and Neg r represents the real negative cases.

Concurrent Methods
In order to show the effectiveness of feature fusion and the feasibility of the proposed method, two experiments were conducted on the five datasets before and after concatenating the features extracted from three pre-trained CNN models. Even though CNN is a where, Sen in the equation demonstrates the sensitivity, Tru p represents the true positive value and Pos r represents real positive cases. Accuracy: The system's accuracy is characterized as the degree of closeness of obtained quantity to the real quantity. It is expressed in Equation (2).

Concurrent Methods
In order to show the effectiveness of feature fusion and the feasibility of the proposed method, two experiments were conducted on the five datasets before and after concatenating the features extracted from three pre-trained CNN models. Even though CNN is a good feature extractor, there is still a need to explore how to find the relevant features from the multi-CNN dimensional features. Therefore, the TGSL method was proposed to optimally separate the features that do not contribute much to classification. Finally, the proposed method is compared to three other methods for optimized feature selection: SA, GOA and TLBO.

Performance Evaluation
The performance evaluation of the proposed hybrid features +TGSL technique is elucidated in this section. Three metrics, namely, accuracy, sensitivity and specificity, were utilized to determine the proposed techniques' effectiveness. The performance evaluation is carried out in two circumstances: augmentation and without augmentation for the three different CNN models considered in work. For any model to be effective, more images are desirable to train the model. Therefore, data augmentation is applied to expand the dataset by adding virtual images from each original, which eventually upgrades the proposed model performance to attain better accuracy. Table 1 shows the performance evolution of the hybrid features + TGSL method in terms of training percentage and the 5-fold value for with and without augmentation approaches. For the 80% of training, the maximum values of the accuracy achieved by the proposed hybrid features + TGSL method with augmentation in terms of accuracy while using the datasets such as Abstract, Art photo, IAPSa, MART and Emotion ROI are 99.11%, 85.5%, 97.25%, 87.75% and 80.75%, respectively, while 99.49%, 84.50%, 98.25%, 90.15% and 79.35%, respectively, for without augmentation. The sensitivity values obtained by the proposed method with augmentation are 99.31%, 86.5%, 100%, 87.5% and 79.50%, respectively, by the datasets mentioned above, which is better compared to without data augmentation. In specificity, the maximum value obtained by the proposed method with augmentation is 99.22% for the abstract datasets, whereas the maximum specificity values without augmentation are 69.57%. Hence, from the table, the proposed hybrid features + TGSL method in case of training percentage with augmentation performs better than the hybrid features + TGSL method without augmentation. From Table 1, it can be perceived that for 5-fold cross-validation, the proposed hybrid features + TGSL method with augmentation has achieved the maximum accuracy of 99.49%, 84.50%, 98.25%, 90.15% and 79.35% for the datasets as mentioned above in comparison without data augmentation approach. Moreover, it can be observed from the results and above discussion that the hybrid features lead to richer representation of images than processing features from an individual CNN net, and besides with data augmentation effectively improves the accuracy of sentiment classification.

Comparative Analysis
As mentioned above, the comparative analysis is performed in this research to manifest the effectiveness of the proposed TGSL feature selection method. The comparative analysis is carried out in two different scenarios, illustrated in the following sub-section.  Table 2.   From the comparative analysis based on training and the k-fold with and without data augmentation, the proposed method outperformed all the other existing methods for all the datasets.  The receiver operating characteristic (ROC) curve is useful to validate the performance of the binary class image classification task achieved by figuring the TPR and FPR by setting different thresholds. Figure 9 demonstrates the comparative analysis of the proposed TGSL techniques in ROC to different algorithms on the five datasets. It can be discerned from the Figure 9a on the Abstract dataset TPR value of 0, 0.9319, 0.9931,1 and 1 are obtained by the proposed TGSL feature selection method for the FPR value of 0, 0, 0.0042, 0.0261 and 1, respectively. The maximum TRP value of 0.9931 is achieved through the proposed TGSL method, while the maximum TRP obtained by the comparative method such as Alexnet and VGG16 is 0.975.

Computational Complexity
The computational complexity of the proposed method and the existing methods is depicted in Table 4, shown below. The proposed method has a low computational complexity of 150 sec compared to the existing technique.

Computational Complexity
The computational complexity of the proposed method and the existing methods is depicted in Table 4, shown below. The proposed method has a low computational complexity of 150 sec compared to the existing technique.

Conclusions
A visual sentiment analysis method based on the TGSL selection algorithm is proposed in this research for effective sentiment classification. In this research, three different pretrained CNN architectural models are implemented on five widely-used affective datasets, IAPSa, Abstract, ArtPhoto, MART and EmoROI dataset, to validate the performance of the proposed method. To gain enough training images, three different data augmentation techniques were performed. Proper feature selection and classification are important in visual sentiment classification. This is accomplished by a well-organized algorithm known as TGSL is proposed to select the most relevant features from the input data. Extensive experiment results on the five datasets demonstrate that the proposed Visual sentiment analysis based on the TGSL algorithm with data augmentation hasimproved performance compared to all other conventional techniques. Furthermore, the proposed framework is effective and general because it does not require any hand-crafted feature extraction and achieves a maximum accuracy value of 99.49%, 85.5%, 98.25%, 90.15%, and 80.75% for five datasets, respectively. In the future, the robustness of the model could be improved by fusing the features with different weights and devising an effectively supervised classifier for the visual sentiment classification.