A Hybrid Deep Learning Technique for Personality Trait Classification From Text

Recently, Cognitive-based Sentiment Analysis with emphasis on automatic detection of user behaviour, such as personality traits, based on online social media text has gained a lot of attention. However, most of the existing works are based on conventional techniques, which are not sufficient to get promising results. In this research work, we propose a hybrid Deep Learning-based model, namely Convolutional Neural Network concatenated with Long Short-Term Memory, to show the effectiveness of the proposed model for 8 important personality traits (Introversion-Extroversion, Intuition-Sensing, Thinking-Feeling, Judging-Perceiving). We implemented our experimental evaluations on the benchmark dataset to accomplish the personality trait classification task. Evaluations of the datasets have shown better results, which demonstrates that the proposed model can effectively classify the user’s personality traits as compared to the state-of-the-art techniques. Finally, we evaluate the effectiveness of our approach through statistical analysis. With the knowledge obtained from this research, organizations are capable of making their decisions regarding the recruitment of personals in an efficient way. Moreover, they can implement the information obtained from this research as best practices for the selection, management, and optimization of their policies, services, and products.


I. INTRODUCTION
Cognitive Science is a multidisciplinary area of research that aims at addressing different cognitive processes and mental states, including learning, thinking, perception, remembering, and emotions. Among the aforementioned types of cognition, personality plays a pivotal role in identifying the social behaviour of humans. Computer-based personality detection and classification have remained an active area of research for a long time. Personality detection can be performed using multiple media, such as text, images, video, and audio [1]. Cognitive-based sentiment classification from social media-text is an area that originates from several challenges for researchers in cognitive computation. In this area, a lot of work has been done and much more can be investigated. Textual cognitive-based sentiment analysis (SA) is not merely a theoretical area, rather it has several applied The associate editor coordinating the review of this manuscript and approving it for publication was Hiram Ponce . fields, such as health [2], [3], education [4], finance [5], and others [6]. Being a merger of cognitive science and human neurology, it can address the gap between the abstractions of cognitive science and the more emerging area of personality detection from a person's textual feedback expressed on social media [7]. Social media platforms, such as Twitter, Facebook, and Instagram, have experienced an unexpected worldwide spread in recent years. For example, by the 3rd quarter of 2019, Twitter had over 330 million active users per month [8]. Progress in natural language processing and text analytics gives researchers an opportunity to use big-data sources for extracting and analysing the textual personality traits expressed by users while using social media, as long as the data scientists working on social media content are able to address the challenging issues specific to such content.

A. RESEARCH STUDY MOTIVATION
Several studies have been conducted by computational linguistic experts to detect and classify personality-traits at several levels, such as words, phrases, sentences, and reviews [9], [10]. However, in many studies, personality detection and classification is a challenging area in cognitive computation. The prior work [9] on user personality detection used a single Convolutional Neural Network (CNN) deep learning model. The work by [9] exploited the CNN model that only extracts the local features of sentences without retaining the previous context information. Therefore, further work is required to address the aforementioned issues for efficient detection and classification of a user's personality traits.
We propose a hybrid deep learning technique for user personality detection. The main emphasis is to efficiently classify the user's personality traits by adding a Long Short Term Memory (LSTM) layer for persevering with contextual information in an efficient manner. The proposed system is inspired by the prior work [9] performed on the user's personality detection and classification. Previous studies have used a single deep learning model for personality trait classification. However, we introduced a hybrid user's behaviour detection and classification approach using an LSTM layer. The proposed technique aims to classify the user's personality traits efficiently with a fusion of deep learning models.

B. PROBLEM BACKGROUND
In the last few years, cognitive-based SA applications have become popular among online communities for knowing about the opinions and personality traits of individuals pertaining to different issues, policies, and others. However, due to the diverse nature of social media content, it is tedious to analyse text using existing techniques to detect personality traits from such content. Therefore, extraction and analysis of social media content has become essential through automatic classification of personality traits. A lot of work has been carried out in the fields of text-based SA [11]- [13], lexicon construction [14], [15], cognition [16], aspect-based SA [17], and visual SA [18]. However, more work needs to be done in the context of cognitive-based social media, with an emphasis on extracting and classifying personality traits from social media content. The aforementioned issues often result in incorrect classification of cognitive-based sentiment classification of social media content. Therefore, it is necessary to develop a method to classify personality traits in social media by automatic classification of such content.

C. RESEARCH PROBLEM
Existing studies on personality trait classification are based on models with a limited number of personality clues. Furthermore, such techniques need an advanced feature set for efficient classification of users' behaviour on social media content. The problem of personality trait classification tackled in this research work is formulated as follows: Consider a set of P labelled text reviews: P = {p1, p2, p3, . . . pn}, for (i = 1 to P) with labels Pi. The label pi specifies that ith text review is an I(Introversion)-E(Extroversion), N(Intuition)-S(Sensing), T(Thinking)-F(Feeling), and J(Judging)-P(Perception) personality trait. We consider personality trait detection as a classification problem in which the focus is on classifying input text into I(Introversion)-E(Extroversion), N(Intuition)-S(Sensing), T(Thinking)-F(Feeling), and J(Judging)-P (Perception) personality traits. The research problem is to build a reliable and powerful personality trait detection model that can detect and classify a text as I(Introversion)-E(Extroversion), N(Intuition)-S(Sensing), T(Thinking)-F(Feeling), and J(Judging)-P(Perception) personality traits. For instance, the input text from the dataset related to the personality trait classification problem is given as: ''I'm finding the lack of me in these posts very alarming.'' Our, aim is to identify the correct personality traits I-E, N-S, T-F, and J-P of the user from the given text review. In this research work, our objective is to build a robust hybrid deep learning-based personality detection model that can efficiently classify the given text reviews into the desired personality class. media-based textual content into different personalityrelated traits. 6. The proposed system can assist organizations in detecting and analyzing the attitudes of their customers about a service, policy, or product, which can assist in better decision making. The rest of the paper is structured as follows. (i) Section 2 demonstrates the related work on personality trait classification by discussing existing techniques and systems. (ii) Section 3 introduces the proposed methodology for personality trait classification. (iii) Experimental setup is presented in section 4 and the final section concludes the proposed system with its limitations and possible future guidelines.

II. REVIEW OF LITERATURE
This section reports a detailed review of selected literature pertaining to personality trait classification. In the literature, numerous methods are suggested for exploiting different approaches to resolve the prediction issue, and techniques are implemented to present personality prediction. The different studies relevant to machine learning approaches to personality recognition are recorded as follows:

A. SUPERVISED MACHINE LEARNING FOR PERSONALITY RECOGNITION
The Supervised machine learning approach, also called the Corpus based approach (CBA), is one of the most widely used approaches among researchers to explore written and spoken text. It has several advantages, for example, exploring word usage, frequency, collocation, and concordance [19]. However, the Supervised or Corpus-Based approach (CBA) requires an annotated corpus for classifier training and testing [20] which is one of the major drawbacks of such techniques. Ref. [19] addressed the problem of databaseindependent emotion recognition by applying a neuro-fuzzybased inference system. The proposed system can effectively learn and generalize the training samples. The experimental results show the performance improvement of the proposed system with respect to SVM.

B. UNSUPERVISED MACHINE LEARNING FOR PERSONALITY RECOGNITION
This approach is exploited when the annotated data credibility is complicated. The classification of input text is performed based on keyword lists regarding the individual classes. The unsupervised technique is easy to use in the case of investigating domain-dependent data [21]. The following sections present an overview of the selected studies related to machine learning approaches. Ref. [22] proposed a personality recognition system using an unsupervised approach and considered five personality traits (Big Five) to extract and classify personality traits of the user from different social media sites like Friend Feed. They exploited different linguistic features related to personality and built a personality model. The system computes personality scores for a given text and obtains satisfactory results. However, their system didn't address features related to inter-user interaction. Ref. [23] proposed a Twitter Personality based Influential Community Extraction (T-PICE) System that generates the most influential communities in a Twitter network graph. They identified users' personality traits by extending the existing approaches by aggregating data that shows more aspects of user behaviour based on machine learning techniques. For this purpose, an existing modularity-based community detection algorithm is used by including a pre-processing step that removes graph edges based on users' personalities. However, scalability problems need to be addressed while considering a large graph.

C. SEMI-SUPERVISED AND HYBRID APPROACHES FOR PERSONALITY RECOGNITION
In semi-supervised and hybrid techniques, the features regarding supervised approaches and lexicon-based approaches with labeled data records are included. The related literature about hybrid techniques is as follows: [24] developed a personality comparison system between humans and chimpanzees using static images with an emphasis on neutral expressions between them. The results obtained show that humans can perceive more accurate characteristics as compared to chimpanzees. Further experiments are required to investigate more features between humans and chimpanzees.

D. DEEP LEARNING APPROACHES FOR PERSONALITY RECOGNITION
Deep learning is comprised of a group of algorithms that mimic the functionality and structure of the human brain. In simple terms, it contains a set of neurons to receive input and also a set of neurons to transmit output signals. The deep learning-based models can assist with different tasks like speech recognition, computer vision, natural language processing, and handwriting generation [25]. A model has been proposed for the human Big Five personality traits prediction, which needs 8 times less data [26]. An embedding layer used for word extraction underlying user tweets is named the GloVe model. The model's training and testing are performed using the provided Twitter data. Moving forward, the testing of data is performed on three fusions (i) LIWC along with GP, (ii) 3-Gram along with GP, and (iii) GloVe along with RR. The proposed model outperformed the statof-the-artwork with a mean correlation of 0.33 across the Big Five traits. The present work exploited only English Twitter content that needs to be extended to further languages. Furthermore, the proposed model's efficiency can be estimated by an extended number of tweets. A personality identification approach is applied to text data by exploiting a deep neural network, proposed by [10]. The present work used a hierarchical scheme called AttRCNN that is capable of retaining semantic features at a deeper level. The results reveal that the proposed features effectively perform better than the compared features.

E. RESEARCH GAP IDENTIFIED FROM THE EXISTING STUDIES
The aforementioned studies on predicting personality traits have applied different ML, DL, and other techniques. However, ML approaches have used a classical machine bag of words approach as a feature representation technique. This limitation has successfully been overcome by the stateof-the-art word embedding used in DL models. However, the issue of context information handling empowered by rich feature extraction using deep neural networks needs to be addressed using hybrid DL models supported by feature selection for a more efficient classification of personality traits.

III. OVERVIEW OF THE PROPOSED METHODOLOGY A. MOTIVATION
Different deep learning models have been applied for personality classification, including CNN, RNN, GRU, and LSTM. All these individual deep learning models are incapable of capturing semantic information in a better way. However, applying the fusion of deep learning models, such as CNN+LSTM, allows taking advantage of the two models, namely CNN and LSTM, to assist in capturing context information in a better way. Furthermore, the exploitation of the LSTM model facilitates understanding the context in an effective way by saving information in one direction. The current work is aimed at performing the task of personality trait classification by extracting the user's personality traits ''I)-E,'' ''N-S,'' ''T-F,'' and ''J-P'' from textual data. For example, the input text: ''I am finding the lack of me in these posts very alarming,'' is tagged with one of the desired personality traits. For this purpose, we proposed a deep neural network model, namely the Convolutional Neural Network including Long Short-Term Memory (CNN+LSTM). An overview of the proposed technique is presented as follows: The proposed method for personality trait classification from social media text is comprised of different modules: (i) acquiring data; (ii) pre-processing of data; and (iii) implementing a deep neural network (see Fig. 1). The acquisition of data is the initial step in the proposed method, after which certain pre-processing steps are applied to the social media textual reviews. The pre-processed reviews are input into the third module, where the deep neural network is applied to transform the reviews into a machine-readable form, i.e., a real-valued vector [27]. The numeric representation of words, also called word embedding, is then made input to the hidden layer. The models used in the hidden layers are CNN and LSTM. The CNN model acts as a feature extractor that extracts the salient features from the input data [28], and the LSTM model learns long-term information [29] to efficiently classify the user reviews. Lastly, input review text is classified into different personality classes, namely ''I-E,'' ''N-S,'' ''T-F,'' and ''J-P.'' A detailed description of the three modules is presented in the subsequent sections.

B. ACQUIRING DATA
We used the MBTI dataset for conducting experiments on personality trait classification [30] that consists of 8675 rows. Each review in the data is tagged with a unique class with respect to the four sets of personality trait classes: ''I-E,'' ''N-S,'' ''T-F,'' and ''J-P'' (see Table 1). In this work, experiments are implemented using the Python programming language, based on the Anaconda Jupyter notebook [31]. The partitioning of the original dataset is performed with ternary sets that are a set of validation data, a set of test data, and a set of train data. (see Fig. 2).

1) SET OF TRAIN DATA
The set of train data is exploited for model training by providing the input and the associated outcomes [12]. The preparation of the proposed model is performed by utilizing 90% of the information as a training set, which can change for various experiments. To fit the model, training data is used for learning the model [15]. A train set example is given in Table 2. The trainset is saved in a CSV file.

2) SET OF VALIDATION DATA
The model usually shows high accuracy in the training phase, but during the testing phase, the model's efficacy declines. Hence, to overcome the model performance error with respect to under-fitting and overfitting, it is needed to exploit the validation set [12]. We used a 10% validation set. Keras supports two approaches regarding parameter tuning to verify model best functioning [32], namely (i) manual data validation, and (ii) automatic data validation. In the present work, we have implemented manual data validation.

3) SET OF TEST DATA
The set of test data is exploited to estimate the model's efficacy based upon new/unseen examples. It is applied after the model gets fully trained through the training as well as the validation set. The test set performs the model's final estimation [12]. We used 10% of test data, which is independent of the train set. The train set is used once when the model is completely trained. The model is finally evaluated using test data. A test set example is provided in Table 3. Fig. 2 depicts that the dataset is split into 90:10 proportions using the train-test split method of scikit-learn, in which 10% is the validation set. We used a validation set to tune the hyperparameters and to configure and evaluate the model. Table 4 provides detail of the dataset used in this study. After the acquisition of data, it is transferred to the pre-processing module.

C. PRE-PROCESSING OF DATA
The second module of the proposed method is used to apply some basic pre-processing steps to the acquired dataset. described as follows: (i) Applying Lower Casing: In this approach, the entire reviews are modified towards the lower case by applying a python-based code that helps to remove duplicated words within the data file. (ii) Eliminating Stop Words: The data file contains different stop words like 'the,' 'is,' and 'and,' which are eliminated at this stage [33]. and (iii) Tokenization: During tokenization, a review is divided into a sequence of words, called tokens, by applying the Keras tokenizer. The tokenizer enables one to shift the tokens into an integer value [34].

D. IMPLEMENTING DEEP NEURAL NETWORK
In the final step, a deep neural network for personality trait classification is implemented on the input text. The CNN+LSTM model consists of distinct layers, namely: (i) Input Layer, (ii) Hidden Layer, and (iii) Output Layer, described as follows:

1) INPUT LAYER
The deep neural network input layer contains input data [35]. We used the embedding layer [36], with the Keras library layer to perform encoding of words into real-valued vectors, which is a numerical representation of the words. It captures semantic information [37].

2) HIDDEN LAYER
The hidden layer is comprised of different layers of CNN and LSTM. The layers/components of the CNN model are described as under: It is a basic layer and acts as a feature extractor layer because it extracts features from the input data. The feature extraction is performed using a linear operation, namely convolution, in which the filter is moved along the input data to obtain the feature map. After convolution, the feature map is then moved across a nonlinear activation function: 'Relu,' to remove the negative values [33].

b: POOLING LAYER
In this layer, a downsampling operation, namely Maxpooling, is applied to the input received from the prior layer, with the aim of reducing the volume of the feature map after convolution [38]. To learn long-term information, we introduced the LSTM layer, described as follows: The LSTM layer takes input from the CNN model and retains both the current and prior information. It provides the facility of memorizing long-term memory by keeping the information for an extensive time period besides information decay [39]. Finally, the LSTM layer outcome is inserted into the output layer.

3) OUTPUT LAYER
After feature extraction through a convolutional layer, downsampling through the pooling layer, and learning long-term information through the LSTM layer, the classification of the learned features is performed at the output layer [37]. In this layer, the function ''softmax'' is applied, which assists in classifying the given input text ''I am finding the lack of me in these posts very alarming'' into four personality trait classes, namely: ''I-E,'' ''N-S,'' ''T-F,'' and ''J-P.'' The softmax function generates probabilities related to the target label (class), and the label (class) with the maximum likelihood is assigned to the input review text.

E. METHOD ARCHITECTURE FOR PERSONALITY TRAIT CLASSIFICATION
The proposed method uses four main modules, namely (i) Representation of Words using the Embedding Layer, (ii) Feature Extraction using the CNN model, (iii) Learning long-term information using the LSTM model, and (iv) Classification using the Softmax layer. The first module is used to obtain the numeric representation of words, which is then made input to the second module to perform feature extraction. After that, the long, short-term memory takes the output from the previous module to preserve the long-term information, and finally, the last module applies the classification (see Fig. 3). A detailed description of each module is presented as follows:

1) REPRESENTATION OF WORDS USING EMBEDDING LAYER
In the first module, the words are transformed into numerical form, i.e. real-valued vectors [40]. In this layer, the individual review: ''I am finding the lack of me in these posts very alarming'' is treated as a word sequence, while the individual word is expressed as a fixed size vector [41]. The values of vectors are randomly initialized, i.e. a fine-tuned word embedding scheme is used in the embedding layer [40]. Suppose a word w j is randomly initialized with a v dimensional vector. If the maximum number of words in a review is x then a review is defined as: In Eq. (1), the concatenation operator ⊗ generates a single review matrix. The review matrix I ∈ R^(x × v) is created for each review I, while individual row j shows a word embedding wj related to the jth word in the review [41]. The word vectors, also named as word embedding, are passed to the next module. VOLUME 9, 2021 Mitigating Overfitting Issue Using Dropout Rate: When a neural network is trained on a limited training set too much, it causes an overfitting issue, which affects the network performance on the test set negatively. In order to mitigate this issue, we apply the dropout layer ''rate'' parameter. The task of the dropout layer is to deactivate a fraction of neurons (units) by assigning neurons a zero value. The dropout rate falls in the range of [0, 1] [41].

2) FEATURE EXTRACTION USING CNN MODEL
In this module, we exploit the CNN model that consists of three different submodules: (i) Filtering using a convolutional layer, (ii) Adding a Bias and Presenting Nonlinearity using an activation function, and (iii) Downsampling using a pooling layer. The detailed illustration of three submodules of the CNN model is shown in Fig. 3. Each of the submodules is described as follows:

a: FILTERING USING CONVOLUTIONAL LAYER
This submodule aims at extracting local (best) features through the review matrix [28]. Mathematically, the feature map generated during the filtering process is defined as: Eq. (2), shows that 'a' is an activation function, 'b' is a bias term, while 'F [p, q]' represents the feature map elements, that are produced using the filtering operation. During this operation, an overlay, also named as the kernel, is applied to the input review text. This operation is completed in different steps: (i) the overlay is set at the top left side region of the input review matrix, (ii) the multiplication operation is performed between the overlay and the underlay, that is, the input review matrix portion is enclosed by the overlay, (iii) sums all the elements obtained after the multiplication operation. Finally, a single element of the feature map is generated, (iv) The overlay then shifts to the right, and the prior steps (i), (ii), and (iii) are performed again, giving another element to the feature map. In this way, the first row is completed, then we shift to the downside, and replicate the same process, which includes all the prior steps, going on all the way to the last overlay in the bottom right-side region f4. Furthermore, the process of filtering is defined mathematically as below: In Eq. (3), opening up F [p, q] F p, q depicts that 'i' is an individual element of the input review matrix, whereas 'k' is also an individual element of the kernel matrix K R^(m × n). Finally, ' • ' represents the multiplication operator, where ' ' states the summation operator.

b: ADDING A BIAS AND PRESENTING NONLINEARITY USING ACTIVATION FUNCTION
Firstly, a bias term 'b' is added to the feature map elements, and then an activation function is applied. The main objective of the activation function is to declare the nonlinear decision boundary [43]. Here, the activation function, namely ''ReLU (Rectified Linear Unit)'' is exploited, which is an elementwise operation, and it is numerically defined as Relu(z) = max (0, z). The formula states that the output will be 0 if the input is 0, otherwise z will be output. Finally, after introducing the bias term and activation function in Eq. (2), we obtained L[p, q] that shows the elements of the feature map where L R^(c × d).

c: DOWNSAMPLING USING POOLING LAYER
The last submodule is the pooling layer used for downsampling the output, taken from the prior submodule. In this layer, only the utmost salient features/information (maximum value in the feature map) are selected by erasing the low-activation features/information from the feature map. Mathematically, it is written as W = max(L), which states that when the maximum pooling operation is applied to the feature map L, then it cuts the feature map into different window sizes and selects the maximum value from each window. The final feature vector is then passed on to the next module.

3) LEARNING LONG TERM INFORMATION USING LSTM MODEL
The third module applies the LSTM model to the input data taken from the CNN model. The main task of LSTM is that it operates on the current data with the prior information. In other words, it memorizes and moves the prior data (information) in a string like neural network structure. The long-time storage of information without decay is due to long-term memory existence, also known as cell states within the LSTM units. The LSTM structure is comprised of three gates, and in order to put the weights between the neural network and the memory cells, these gates are exploited. Furthermore, the memory cell has a connection with itself. If the first forget gate, also known as the'remember vector,' contains value 1 with a self-connection, also contains weight value 1, then the information is remembered by the memory cell. Otherwise, if the forget gate contains value 0, then the information is forgotten by the memory cell. The equation for the forget gate is given as: If the second input gate, also known as 'save vector' contain value 1, then it allows the network to write the values into the memory cells. The equation for the input gate is given as: If the third output gate is known as 'focus vector,' contains value 1, then it allows the network to read the values from the memory cell. The equation for the output gate is given as: In Eq. (4), (5), (6), remember t , save t , and focus t represents the forget, input, and output gates, weight r , weight s , weight f represents weight matrices, while x t depicts the current input; wgm t−1 shows the previous hidden state, also named as 'working memory'; and bias r , bias s , bias f represents the bias values for the forget, input, and output gates. Lastly, sigmoid is an activation function.
After that, a new candidate value is added to the cell state (long term memory) [44]. In order to compute the candidate values, the following equation is used [45].

Candidate Value
In Eq. (7), lgtm t represents the candidate value, and tanh depicts a tangent hyperbolic activation function.
Each LSTM unit generates two outputs. The first is the new/updated cell state, as well as the other one, is the new hidden state. Now, equations for the two outputs are given as follows:

New Cell State
New hidden State wgm t = focus t * tanh(lgtm t ) In Eq. (8), lgtm t represents the new/update cell state, and lgtm t−1 represents the previous cell state, where in Eq. (9), wgm t represents the new hidden state.
The existence of the three gates secures the LSTM units from the problem of vanishing and exploding gradient. The technique by which LSTM operates the data is sequential. Hence, to effectively perform emotion classification, we applied the LSTM model.

4) FEATURE CLASSIFICATION USING SOFTMAX LAYER
This layer's objective is to classify input features (final depiction) it receives from the prior module [46]. Initially, we calculate the net input using Eq. 10.
where x represents the input vector, f represents weight vector, also b shows bias term. We selected the softmax activation function for the feature classification. The goal of the softmax function is to give decimal probabilities to individual classes, and the addition of these decimal probabilities should be equal to 1. Each class output is normalized between 0 and 1 by the softmax function using Eq. (11) [47]. Finally, the target class comes with the highest probability [48]. Mathematically, the softmax activation function is defined as (see Eq. 11): Applying an Example Case: We take an example of a user review, which is passed through each step of the proposed CNN+LSTM model by classifying into different personality traits: I-E, N-S, T-F, and J-P.

7) CNN MODEL
The output of the dropout layer is received by the CNN model, which is then moved through different layers of the CNN model to perform the task of extracting local (low-level) features from the input text. The working of the CNN model is given in the following way: a: LEVEL-1 APPLYING FILTERING During the filtering process, the filter matrix(overlay) is implemented in the review text. The steps included in the VOLUME 9, 2021 filtering process are described in section 3.4.2. The filtering process for the example sentence is illustrated as follows: Embedding

8) LSTM MODEL
To solve the issue of long term dependencies associated with earlier models like RNN, LSTM is used. It adds gates, and cell state (long term memory) to RNN. It helps in preserving the context information of the past, which makes it feasible to exploit the context information at the sentence beginning. Therefore, we selected the LSTM model for learning the sequence (time series) data [49]. The LSTM model received input from the previous layer of the CNN model. The calculation of the input text involves 4 major items, named as the forget gate (remember t ), input gate (save t ), output gate (focus t ), and a new cell state (lgtm t ). Output of 1 st LSTM cell: The LSTM model is comprised of different cells and each cell needs the current input (x t ), prior hidden state (wgm t−1 ), implements some calculation (see Eqs. [4][5][6][7][8][9], and finally, the hidden state of the 1 st cell (see Fig. 4) is obtained as follows: Output of Last LSTM Cell: The last cell of LSTM needs the current input (x t ), prior hidden state (wgm t−1 ), implements some calculation (see Eqs. [4][5][6][7][8][9], and finally, the hidden state of the last cell (see Fig. 5) is obtained as follows: Now, the hidden state(wgm t−1 ) of the last LSTM cell is taken as a final depiction of the input text that is further moved forward to the Softmax(output) layer.

9) SOFTMAX LAYER
The softmax layer acts as the last layer of the proposed model by exploiting the softmax activation function for the classification of feature vector(final depiction) into the desired personality classes I-E, N-S, T-F, and J-P. The computation using Eq. 10 is depicted as follows.
After obtaining the value of A 1 and A 2 , we will apply the softmax activation function using Eq. 11 as follows: The above-mentioned calculation shows that the highest probability is achieved by the I(Introversion) personality trait. So, the given input text ''I am finding the lack of me in these posts very alarming'' is classified as an Introversion personality trait among all the other classes.
Algorithm 1 shows the steps of the CNN+LSTM model pseudo code for personality trait classification.

Algorithm 1 Pseudocode of Proposed Personality Classification Model
Step1: Input dataset as csv file.
Step II: Spilt into train (Rt rain, NR train )-test (R test, NR test ) using scikit learn.
Step II: Break tweets in words as tokens using keras tokenizer.
Step III: Build the vocabulary to map integer to each word.
Step IV: Transform each tweet into sequence of integers.
Step To insert images in Word, position the cursor at the insertion point and either use Insert | Picture | From File or copy the image to the Windows clipboard and then Edit | Paste Special | Picture (with ''float over text'' unchecked).
IEEE will do the final formatting of your paper. If your paper is intended for a conference, please observe the conference page limits.

IV. RESULTS AND DISCUSSION
The section presents the answers to the posed research questions as follows:

A. ANSWER TO RQ1
To answer RQ1 (How to apply the Deep Learning technique, namely CNN+LSTM, to classifying personality traits from the input text?), the implementation of different CNN+LSTM models with varying parameters is exploited for the classification of input text over distinct personality traits: I-E, N-S, T-F, and J-P. To build various CNN+LSTM models, we used an individual layer with a variant of parameters. Consequently, the efficiency of the classifier is enhanced by fine-tuning the parameters of CNN+LSTM. We applied different values of the ''units'' parameter of an LSTM layer. Moreover, several, additional parameter values, like pool_size, number of filters, epochs, kernel_size, and padding are listed in Table 5.   from 110 to 200, whereas the batch size = 32, and convolutional kernel size = 3 × 3. Table 7 illustrates outcomes of different evaluation metrics, namely: f1-score, recall, precision, and accuracy (see Table 7) of the proposed CNN+LSTM model regarding different personality traits I-E, N-S, T(Thinking)-F(Feeling), and J(Judging)-P(Perception). We conducted different experiments to obtain the optimal values of parameters, yielding the best performance of the CNN+LSTM model. In the case of I-E's binary classifier, it was observed that CNN+LSTM2 outperformed other models in terms of f1-score (88%), recall (88%), precision (88%), and accuracy (88%) with parameter settings of batch size = 32, lstm units = 120, and convolutional kernel size = 33. For the N-S binary classifier, it is observed that CNN+LSTM4 outperformed other models in terms of f1-score (91%), recall (91%), precision (91%), and accuracy (91%) with parameter setting batch size = 32, lstm units = 140, and convolutional kernel size = 33. In the case of the T-F binary classifier, it is noted that CNN+LSTM3 outperformed other models in terms of f1-score (85%), recall (85%), precision (85%), and accuracy (85%) with parameter settings of batch size = 32, lstm units=130, and convolutional kernel size = 33. Finally, for the J (Judging)-P (Perception) binary classifier, it is observed that CNN+LSTM3 outperformed other models in terms of f1-score (80%), recall (80%), precision (80%), and accuracy (80%) with parameter settings of batch size = 32, lstm units = 130, and convolutional kernel size = 33.
The performance of CNN+LSTM in terms of accuracy is presented in Fig. 6. The horizontal side of the graph shows the accuracy measure, and the vertical side shows 10 CNN+LSTM models. Table 8 presents the scores of loss, accuracy, and training time related to the classes I-E, N-S, T-F, and J-P. In the case of the I-E binary classifier, it is found that the rise in lstm unit value results in the decline in accuracy for a CNN+LSTM model. The model's accuracy tends to increase at the lstm unit of 120, which is considered as an optimal value for the lstm unit parameter, and it remains the same at the lstm unit of 170. Additionally, the score loss grows as the decline in accuracy occurs. For the N (Intuition)-S (Sensing) binary classifier, it is noted that the rise in lstm unit value results in a decline in the CNN+LSTM model's accuracy. The model's accuracy tends to increase at the lstm unit of 140, which is considered as an optimal value for the lstm unit parameter, and it remains the same at lstm unit = 190. In addition, the rise in scores losses occurred due to the decline in accuracy. In the case of the T-F binary classifier, it is noted that the rise in the LSTM unit's value results in a decline in the CNN+LSTM model's accuracy. The model's accuracy tends to increase at the lstm unit 130, which is considered as an optimal value for the lstm unit parameter, and it remains the same at the lstm units 120, 150, and 170. Furthermore, the rise in scores losses occurred due to the decline in accuracy.
Finally, for the J-P binary classifier, it is noted that the rise in lstm unit value results in a decline in the CNN+LSTM model's accuracy. The model's accuracy tends to increase at the lstm unit of 130, which is considered as an optimal value for the lstm unit parameter, and it remains the same at the lstm units of 160, 170, and 180. Also, a rise in the loss score occurred due to the decline in accuracy. We conducted experiment #1 to evaluate the proposed CNN+LSTM model's efficiency with respect to I (Introversion)-E (Extroversion) personality traits, and the results are reported in Table 9. It is evident that the CNN+LSTM model attained better performance for the ''I (Introversion)'' personality trait in terms of recall (0.93), precision (0.92), f1-score (0.92) and total accuracy of 0.88%.

b: EXPERIMENT#2-N-S
We conducted experiment #2, to evaluate the efficiency of the proposed CNN+LSTM model with respect to N (Intuition)-S (Sensing) personality traits. The results presented in Table 10 show that the CNN+LSTM model attained better performance for the ''N'' personality trait with recall (0.95), precision (0.94), and f1-score (0.95), and. The total accuracy is 0.91%.

c: EXPERIMENT#3-T-F
We conducted experiment #3 to evaluate the efficiency of the proposed model (CNN+LSTM) for T-F personality traits.
The results reported in Table 11 show that the proposed CNN+LSTM model attained better performance for the ''F (Feeling)'' personality trait in the cases of recall (0.87), precision (0.87), and f1-score (0.87). The total accuracy is 0.85%.

d: EXPERIMENT#4-J-P
We conducted experiment # 4 to evaluate the efficiency of the proposed CNN+LSTM model with respect to J (Judging)-P (Perception) personality traits. The results given in Table 12 show that the CNN+LSTM model attained better performance for the ''P'' personality trait with recall (0.92), f1-score (0.85) and overall accuracy of 0.80%. Furthermore, the model attained a maximum precision (0.82) for the ''J (Judging)'' personality trait.

B. ANSWER TO RQ2
To answer RQ2 (What is the efficiency of the proposed technique w.r.t other machine learning and deep learning techniques?), experimentation was performed to evaluate the efficiency of the proposed CNN+LSTM model that exploits word embedding, with respect to comparing machine learning classifiers that utilize the classical feature representation scheme, i.e., Bag-of-Words (Countvectorizer and TF-IDF). We implemented different ML and DL techniques for   personality traits, and their results are presented in Table 13 and Table 14.

1) COMPARISON OF PROPOSED CNN+LSTM WITH MACHINE LEARNING TECHNIQUES EXPLOIT-ING CLASSICAL FEATURES
To compare the proposed CNN+LSTM method with classical machine learning methods, we performed experiments, reported in the rest of this section.  Machine Learning Techniques Exploiting Classical Features: Different classical feature representation schemes, such as Countvectorizer and TF-IDF, are evaluated. The Countvectorizer scheme inspects the presence of words within a vocabulary by exploiting the bag-of-words technique, and TF-IDF is a numeric estimator that calculates the score of words by computing the frequency of words within a document and a group of documents. Experiments are conducted using various ML algorithms, such as SVM, KNN, LR, RF, DT, and XGBoost, that exploit classical features like countvectorizer and tf-idf. From the experimental comparison, it is found that among different machine learning classifiers, XGBoost attained a maximum accuracy of 83.18%, whereas KNN attained a minimum accuracy of 78.6% for the I-E personality trait. In the case of the N-S class, XGBoost attained the highest accuracy (88.71%), and the minimum accuracy (81.8%) was attained by DT. For class T-F, SVM attained a maximum accuracy of 81.19%, and a minimum accuracy of 56.8% was attained by KNN. Finally, for class T-F, LR attained a maximum accuracy of 74.42%, and a minimum accuracy of 56.91% was attained by RF.
Proposed CNN+LSTM Model: We conducted experiments for personality trait classification by applying the proposed model (CNN+LSTM) that uses word embedding as a feature representation scheme. It is observed that the proposed CNN+LSTM model yields improved performance in relation to the compared machine learning classifiers, such as SVM, KNN, LR, RF, DT, and XGBoost with the classical Bag of words representation approach (Table 13), and a variant of deep neural network techniques like Individual LSTM, Individual CNN, Individual BILSTM, Individual RNN (see Table 14).
The results presented in Table 13 show that the proposed model outperformed the machine learning methods due to the following limitations of ML classifiers: (i) In a bag of words approach, an individual word is considered as a one-hot vector, and because the data is sparse, the approach is incapable of capturing different linguistic word aspects. When the data is sparse, then the length of the one-hot vector is similar to the size of the vocabulary that contains zero values, excluding a dimension that contains 1 value. For instance, in a bag of words approach, the word ''Sunday'' in sparse depiction is given as [0, 0, 0, 0, 0, 1] [27], (ii) The BOW technique does not consider the word context within the given text, i.e. it recognizes a single word as an independent unit [51]. Furthermore, due to the following benefits of word embedding over the traditional bag of words approach, the CNN+LSTM performed efficiently with respect to the compared machine learning models. These are listed as follows: (i) In word embedding, the model is learned to represent an individual word in a real-valued vector (continuous) through training of each sentence. Therefore, word embedding efficiently performs information semantic encoding with respect to the classical feature set, namely bag of words [51], (ii) Using similar words, the classification of the current word, that is unavailable within the training set, becomes achievable since the real-valued feature depiction scheme enables us to map close words towards one another having a similar meaning [51], (iii) Semantic sparsity occurring within a short text is managed by the dense feature representation scheme, i.e. word embedding [52]. Word embedding-based techniques perform effectively over BOW-based methods due to the fact that the information encoding related to the rare words in the training dataset is performed efficiently by word embedding. Furthermore, the reason for the better performance of the CNN+LSTM neural network method over the compared machine learning models is that the hierarchical feature abstraction of the user-created content towards the individual layer [53]. At the convolutional layer, the learning of best (local) features is conducted [54], while at the pooling layer, it maintains the salient local features [55], and finally, at the LSTM layer, it captures the global features [56]. On the other hand, machine learning deals with a less efficient feature engineering process, which is manual [53]. The self-learning ability of deep learning models enables them to achieve significant accuracy and fast processing speed with respect to compared machine learning classifiers exploiting classical features [57]. The reason behind the better performance of the CNN+LSTM model for personality prediction with respect to the machine learning algorithms is that it exploits word embedding feature representation approach, executes automated feature understanding, and works at a deep level of layers.

2) COMPARISON OF PROPOSED CNN+LSTM WITH VARIANTS OF DEEP NEURAL NETWORK METHODS
To compare the proposed CNN+LSTM model with variants of deep neural network methods, we performed different experiments and evaluated the results as follows:

a: VARIANTS OF DEEP LEARNING MODELS
We conducted experiments on different deep neural network models, specifically Individual LSTM, Individual CNN, Individual BILSTM, and Individual RNN, each of which exploits VOLUME 9, 2021 an advanced feature representation scheme, namely word embedding [58].
The experimental results (Table 14) show that, with respect to other deep learning models, CNN attained a maximum accuracy of 81% for I-E, 88% for N-S, 84% for T-F, and 75% for J-P. It is due to the fact that during feature extraction, CNN provides the service of getting local association between connecting words. RNN and BILSTM attained a minimum accuracy of 78% for I-E. In the case of N-S, a minimum accuracy of 87% was achieved by three models, namely LSTM, BILSTM, and RNN, because the LSTM and BILSTM models are not good at capturing important features [59], [60]. For T-F, RNN performed poorly with an accuracy of 54%, and finally, for J-P, the lowest performance of 61% was attained by RNN. The reason for the low performance of RNN is that its memory storage capacity is small, i.e. short-term memory [61].
In the following sections, we present a comparison of the proposed model (CNN+LSTM) with different deep neural network models, namely Individual LSTM, Individual CNN, and Individual BILSTM.

b: COMPARING PROPOSED CNN+LSTM WITH INDIVIDUAL CNN
In this experiment, we evaluated the performance of the proposed model (CNN+LSTM) with respect to individual CNN for personality trait classification. The results presented in Table 14 show that CNN gives a poor performance with precision (81%), recall (81%), f1-score (81%), and accuracy (81%) for class I-E. In the case of N-S, the performance of CNN's model is given as precision (84%), recall (84%, f1-score (84%), and accuracy (84%). For class T-F, the model's Precision = 84%, Accuracy = 84%, F1-score = 84%, and Recall = 84%. Finally, in the case of J-P, CNN gives Accuracy = 75%, Precision = 75%, Recall = 75%, and F1-score = 75%. CNN is capable of extracting important features belonging to the input text, while it lacks the ability to hold long-term dependencies [39].

e: COMPARING PROPOSED CNN+LSTM WITH INDIVIDUAL RNN
In this experiment, we evaluated the performance of individual RNNs with the proposed CNN+LSTM. The results presented in Table 14 show that individual RNNs have shown the worst efficiency with regard to our proposed CNN+LSTM model with a recall = 78%, precision = 78%, accuracy = 78%, and F1-score = 78% for class I-E. In the case of N-S, RNN attained: Precision = 87%, Recall = 87%, Accuracy = 87%, and F1-score = 87%. For class T (Thinking)-F (Feeling), the model's Accuracy = 54%, Precision = 54%, Recall = 54%, and F1-score = 54%. Lastly, in the case of J-P, the RNN provides an Accuracy = 61%, Precision = 61%, F1-score = 61%, and Recall = 61%. The RNN is efficient in passing learned information to the upcoming layer, but it suffers from the issue of vanishing gradient, as the two-time step gap increases [39].
The abovementioned results indicate better results for our proposed CNN+LSTM method when a comparison is performed with different DL models, such as individual LSTM, individual CNN, individual RNN, and individual BILSTM. The proposed CNN+LSTM model attained better performance, i.e. 88% for I-E, 91% for N-S, 85% for T-F, and 80% for J-P, with respect to the compared DL models. It is due to the fact that the proposed method attains the benefits of the two deep learning models, i.e. CNN and LSTM. In the case of CNN, it efficiently extracts significant information related to the input text and LSTM is efficient at holding information for a long time period without information degeneration [39]. Hence, it is noted that for personality traits, our proposed approach has shown improved performance with respect to other variants of Deep Learning models, such as the individual LSTM model, individual CNN model, individual BILSTM, and individual RNN.

C. ANSWER TO RQ3
To address the research question: ''How to estimate the efficiency of the proposed technique regarding personality trait classification w.r.t baseline studies?,'' the experiment was conducted for performance comparison of the proposed model with respect to the baseline study conducted by [9]. In Table 15, experimental results are presented.
In this experiment, we compared the performance of our model with the baseline models.
Majumder et al. [9] used the Convolutional Neural Network method (CNN) to detect personality traits. Experimental results recorded in Table 15 indicate that the proposed approach performed better than the baseline study with an accuracy of 88% for I (Introversion)-E (Extroversion), 91% for N (Intuition)-S (Sensing), and 80% for T (Thinking)-F (Feeling), and 80% for J (Judging)-P (Perception).
Khan et al. [65] applied the XGBoost classifier to classify personality traits using the MBTI dataset. They achieved maximum accuracy (83%). However, the use of conventional feature sets associated with ML classifiers can further be improved by applying deep learning techniques with automated feature engineering.
Proposed (Our) Model's Performance: The justification behind our model's performance enhancement is that we have exploited an ''LSTM'' model which is efficient at saving information for a long time span among the word sequences. So, the information maintained for a large period of time is very beneficial for text classification/prediction tasks. In contrast, an individual CNN layer is not capable of retaining the sequential information underlying text, which is needed for the problems regarding text classification. Moreover, for CNN to produce effective results, it is required to use a large dataset.

D. SIGNIFICANCE TEST
In this section, experimentation was conducted to examine whether the CNN+LSTM (proposed model) exploiting advanced features (word embedding) is statistically significant with respect to the classical machine learning model using traditional features (BOW features) and did not happen by chance.
A random extraction of 868 input reviews from the corpus is performed, where each individual input review is classified by the CNN+BILSTM vs XGBoost for I (Introversion)-E (Extroversion) class. During the experimental setting, the null and alternate hypothesis are presented.
While χ 2 shows the chi-squared statistics, variables n and m indicate the discordant, and the degree of freedom is represented by 1 value in Eq. 12.

1) DISCUSSION
In the first experiment, we examined and reported (see Table 13) the performance of the XGBoost classifier using the classical feature scheme for I(Introversion)-E(Extroversion) personality class. The XGBoost classifier yielded low performance in terms of each evaluation metric like F1-score, recall, accuracy, and precision. The experimental results revealed the poor performance of the XGBoost classifier exploiting the classical feature set (BOW features) for personality trait detection from the textual content. In the next experiment, the CNN+LSTM model with word embedding features produced considerably improved results (Table 14). The classifier performed a prediction of personality traits underlying text data with an accuracy of 88%.
The significance test was conducted to validate that there exists a significant difference between CNN+LSTM (DL) exploiting word embedding based features, and XGBoost (ML) exploiting traditional features. The input text on which the two models differed was 122 (models having distinct feature types exhibit distinct misclassification performance) as shown in Table 13. McNemar's test is applied to compute the p-value. A value of 1.4 is attained by Chi-Squared, and 1 is the degree of freedom, and 0.239 (two-tailed p-value). Finally, the null hypothesis is rejected due to a low p-value of 0.2390.5, and the alternative hypothesis is accepted: the proposed CNN+LSTM method, exploiting word embedding, is statistically significantly more significant than that of the XGBoost model, exploiting traditional BOW features.
The statistical analysis mentioned above revealed that the features based upon word embedding have significantly enhanced the effectiveness of the proposed model, namely CNN+LSTM regarding personality trait classification (see Table 16). We also conducted experiments to inspect whether our proposed (CNN+LSTM) model, exploiting word embedding feature, is statistically significant than the XGBoost for different combinations of personality traits, such as N(Intuition)-S(Sensing), LR for J-P, and SVM for T-F. The results depict that in case of J-P, T-F, and N-S, the advanced word embedding features significantly enhanced the effectiveness of the proposed CNN+LSTM model regarding personality trait classification problem.

V. CONCLUSION
In this work, we investigated the task of personality trait classification from textual content. To accomplish the research task, we proposed applying a deep learning model, namely CNN+LSTM. The proposed study includes the following modules: (i) acquiring data, (ii) pre-processing of data, and (iii) implementing the deep neural network. The proposed CNN+LSTM model for personality trait classification is a merger of CNN and LSTM that assists in classifying the input text into different personality traits like I-E, N-S, T-F, and J-P. The main emphasis of the CNN model is to extract and retain the local features using a convolutional and max-pooling layer. CNN acts as a robust tool for choosing the best features that enhance the prediction accuracy. The LSTM model preserves the prior information regarding context, which helps to exploit significant context information at the start of a sentence. Its benefit is that it takes sequential information through the examination of prior data. After receiving the final representation of an input sentence, it is classified among the different personality traits. The experiments with different machine learning and deep learning models are also conducted and their results are recorded on the personality trait dataset. The results show that the proposed CNN+LSTM model for personality trait classification produced improved results in terms of improved accuracy (88% for I-E, 91% for N-S, 85% for T-F, and 80% for J-P, precision (88% for I-E, 91% for N-S, 85% for T-F, and 80% for J-P, and f1-score (88% for I-E, 91% for N-S, 85% for T-F, and 80% for J-P, and the proposed CNN+LSTM model for personality trait classification (88% for The information obtained from this research acts as best practices for the selection, management, and optimization of their policies, services, and products.

A. LIMITATIONS
The possible limitations of the proposed work are given as follows: (i) In this study, we performed a personality trait classification on limited personality traits pertaining to only textual content in the English language. (ii) The work is limited to random word embedding without the exploitation of several word representation models like Glove, Fasttext, and word2vec, (iii) The attention mechanism was not introduced with CNN+LSTM that assists in extracting relative significant features, (iv) Other combinations of deep learning models like CNN+Bi-LSTM, CNN+GRU, Bi-LSTM+CNN, and CNN+RNN are not applied for personality trait classification. (v) The focus of the current research content is only on the MBTI dataset for personality trait classification. (vi) The limited number of machine learning classifiers is used for experimentation that needs to be further extended with other machine learning classifiers.

B. FUTURE DIRECTIONS
The possible future directions of the work are as follows: (i) The work can be extended by conducting experiments in other languages on an extended set of personality traits using non-textual content such as images and videos, (ii) In future work, we can exploit other pre-trained word representation schemes like Glove, word2vec, and Fasttext regarding word embedding layer, (iii) Introducing an attention mechanism regarding personality classification may enhance the system's performance, (iv) We will explore different combinations of deep neural networks like CNN+Bi-LSTM, CNN+GRU, Bi-LSTM+CNN, and CNN+RNN for personality trait classification, and CNN+Bi-LSTM+CNN for facial recognition. (v) In addition to the MBTI dataset, other different datasets regarding personality trait classification tasks can be exploited.
It is planned to collect additional data and apply the proposed methods to some larger corpora in order to test the effectiveness of the proposed approaches on massive data records, (vi) The combination of other deep neural network frameworks will better handle the problem of personality trait classification tasks. Thus, in the future, we will apply other neural networks, (vii) In the future, we will focus on applying ensemble methods to enhance system performance, and (viii) We will work on including further base models and by searching for other parameters that may assist in enhancing the overall accuracy of the proposed work. MUHAMMAD USAMA ASGHAR received the B.S. degree (Hons.) (four-years) in computer science from ICIT, GU, Dera Ismail Khan, Pakistan, where he is currently pursuing the M.S. degree in computer science (two-years) research. His research interests include text mining and social media analytics with emphasis on psychological aspects of online community. He is a freelance Software Developer in data sciences. His recent publications are listed as follows: 1) An efficient supervised machine learning technique for forecasting stock market trends, URL: https://link.springer.com/chapter/10.1007/978-3-030-75123-4_7; 2) Automatic detection of citrus fruit and leaves diseases using deep neural network mode, URL: https://ieeexplore.ieee.org/abstract/document/ 9481921/authors#authors; and 3) Fake review classification using supervised machine learning, URL: https://link.springer.com/chapter/10.1007/978-3-030-68799-1_19.
MUHAMMAD ZUBAIR ASGHAR is an HEC approved Supervisor recognized by Higher Education Commission (HEC), Pakistan. His Ph.D. research includes recent issues in opinion mining and sentiment analysis, computational linguistics, and natural language processing. His publications include more than 50 articles in journals of international repute (JCR and ISI indexed) and having more than 20 years of university teaching and laboratory experience in social computing, text mining, computational linguistics, opinion mining, and sentiment analysis. Furthermore, he is also acted as the Special Session Chair (Social Computing) of BESC 2018 International Conference (Taiwan) and a lead guest editor of special issues. He is currently acting as a Reviewer and Academic Editor of different top-tier journals, such as IEEE ACCESS and PLOS ONE.
AURANGZEB KHAN is currently working as an Associate Professor with the Department of Computer Science, University of Science & Technology Bannu. His research interests include big data, data science, machine learning, and web mining. His current projects include: 1) intelligent judicial decision support system based on case and reported judgments; 2) RIFT: Rule induction framework for twitter sentiment analysis; and 3) sentiment analysis-based intelligent system for e-governance.
AMIR H. MOSAVI has highly contributed to the research on climate change risk reduction by coining the term ''predictive-decision model.'' His commercial data-driven computation platform provides the state-of-the-art consultations to policy makers, worldwide, for making informed decisions. Through the accurate anticipation of the consequences of the potential decisions a ''predictive-decision model'' suggests wiser decisions in an automated manner. Having the advanced tools for processing the potential decisions in the climate change realm is, in fact, vital as often the environmentally-friendly choices contradict with the economic and political interests of the nations. In this context, his idea is to support organizations and governments through introducing novel business models, which are sustainable and can maintain the profitability. His novel concept has been particularly found to be suitable to address the sustainability issues of the UNESCO's Biosphere reserve.