Emotion AWARE: an artificial intelligence framework for adaptable, robust, explainable, and multi-granular emotion analysis

Emotions are fundamental to human behaviour. How we feel, individually and collectively, determines how humanity evolves and advances into our shared future. The rapid digitalisation of our personal, social and professional lives means we are frequently using digital media to express, understand and respond to emotions. Although recent developments in Artificial Intelligence (AI) are able to analyse sentiment and detect emotions, they are not effective at comprehending the complexity and ambiguity of digital emotion expressions in knowledge-focused activities of customers, people, and organizations. In this paper, we address this challenge by proposing a novel AI framework for the adaptable, robust, and explainable detection of multi-granular assembles of emotions. This framework consolidates lexicon generation and finetuned Large Language Model (LLM) approaches to formulate multi-granular assembles of two, eight and fourteen emotions. The framework is robust to ambiguous emotion expressions that are implied in conversation, adaptable to domain-specific emotion semantics, and the assembles are explainable using constituent terms and intensity. We conducted nine empirical studies using datasets representing diverse human emotion behaviours. The results of these studies comprehensively demonstrate and evaluate the core capabilities of the framework, and consistently outperforms state-of-the-art approaches in adaptable, robust, and explainable multi-granular emotion detection.


Introduction
The rapid digitalisation of society has empowered knowledge-focussed human activities and communication to transpire on hyper-connected, digital platforms.This spectrum of intrapersonal, interpersonal, and group activities have led to the generation and management of high volumes of big social data that represents patterns of behaviour of individuals and organizations, and how they leverage insights drawn from that information for further engagement and collaborative activities [1].Expressions of emotion are encapsulated in these digital platforms which is highly useful towards accurately modelling human behaviour [2].The persistence of this textual digital record enables the use of computational approaches to process, analyse and synthesise emotion expressions.Computational approaches for emotion detection have been classified using several schemes in existing literature.Acheampong et al. [3].proposed three categories, rulebased, machine learning and hybrid methods.Alswaidan et al. [4] proposed a scheme of five categories, keyword-based, rule-based, classical learning, deep learning and hybrid.In reviewing these schemes, we have summarised into three technical categories, (1) heuristics (which includes keywords, rule-based, probabilistic and statistical), (2) Artificial Intelligence (AI) (consisting of classical learning, machine reasoning and deep learning) and (3) hybrids of the two.Despite the maturity of this topic in terms of classification schemes and the prevalence of many approaches across these three classes, the complexity and ambiguity of emotion expressions on digital platforms have not been fully addressed.We substantiated this challenge of complexity and ambiguity in terms of four capabilities, (1) output (granularity of emotion detection output), (2) domain specificity, (3) adaptability, and (4) explainability.
We conducted a systematic literature review of the state-of-the-art of recent emotion analysis and detection research published in the last five years, from 2018 to 2022.The Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) flow diagram for this review in reported in Supplementary Fig. 1 (Filename: emotionaware supp Fig. 1.docx).The review produced 83 articles that aligned with the selection criteria, which then we evaluated in terms of the four capabilities noted above.Supplementary Table 1 (Filename: EmotionAwareSuppTable1.xlsx) presents the results of this evaluation.
Based on the findings of the literature review and the subsequent evaluation against capabilities, we propose a novel framework for Emotion Assembles With Adaptability Robustness and Explainability (AWARE).This Emotion AWARE framework intervolves heuristics and AI techniques with lexicon generation and finetuned Large Language Models (LLM) into a hetero-hierarchical structure that receives text containing emotion expressions as input and produces as output an assemble of emotions with corresponding intensity values.Emotion assembles can be created at three levels of granularity, two, eight and fourteen.The framework is adaptable as the hetero-hierarchical structure can be revised and reintroduced to reflect a domain or topic of interest.The framework is robust in its ability to detect implied emotion expressions through the context of surrounding terms as well as scale the intensity values based on negations, intensifiers, and inhibitors.The framework is explainable in its identification of terms and phrases for each emotion expression, leading up to a collection of terms that can be used to profile and compare multiple assembles.
In comparison to related work on emotion detection, the Emotion AWARE framework is novel in its construction of emotion assembles with intensity values, and the explainability, adaptability and robustness of these emotion assembles.On approach, AWARE leverages prior knowledge of lexicons and learned knowledge of the finetuned language models, in contrast to the singular approaches adopted in related work, and it is the only approach evaluated on eight datasets (across studies).In terms of output, it produces multi-granular emotion assembles of 2,8, and 14 emotions with intensity scores, in contrast to the class-based output produced by other methods.In terms of valence and arousal, the proposed framework detects valence across a broad spectrum of 14 emotion categories, and each category is assigned a score from 0 to 1.This scoring reflects arousal levels and is determined while taking modifiers and negations into consideration.All related methods in recent literature are limited to a specific domain or general application, whereas AWARE is intrinsically generic but can be adapted to a domain of interest.This feature is aptly demonstrated in the experimental results (study 5 and (6).Explainability, adaptability and modifier resolution are similarly more advanced than those reported in existing literature, mainly due to the effectiveness of the hybrid approach of prior knowledge from lexicons and learned knowledge from finetuned language models.

Literature review
As noted above, we conducted a systematic literature review of the state-of-the-art research on emotion analysis published in the last 5 years, from 2018 to 2022.The PRISMA flow diagram and the evaluation of the selected work against the four capabilities are reported in Supplementary Fig. 1 (Filename: EmotionAwareSuppFig1.docx) and Supplementary Table 1 (filename: EmotionAwareSuppTable1.xlsx), respectively.Here, we delineate key findings in terms of the three categories, heuristics, AI and hybrids.
Heuristic approaches include keyword recognition, rule-based logical/grammatical affinities, statistical and probabilistic methods.These methods are grounded in emotional lexicons, corpora and dictionaries that represent prior knowledge of how emotion is expressed in that domain or discipline.The emotion lexicon is typically a list of synonyms and related words used for each emotion category, where each word may also be assigned a fixed intensity value.Besides a list, the lexicon can also be organised hierarchically in a tree structure or interlinked as a graph or map structure.Several emotion lexicons reported in the literature are, Plutchik's emotional terms [5], theWordNet-Affect [6], EmoSenticNet [7], DepecheMood [8], SentiWordNet dictionaries [9].Keyword recognition methods [10] rely on locating keywords representing emotions in a given text and assigning an emotion label based on these keyword counts and other statistics.These methods can be used for explicit emotion detection.For example, "their arrival made me happy" explicitly expresses the emotion happiness/joy with the keyword "happy".But often emotions are not explicitly mentioned and can be negated or modified to give different or opposing interpretations than a keyword search method would suggest.In such cases more advance heuristics are required.Rule-based approaches incorporate text processing methods such as tokenization, part-of-speech tagging, and dependency parsing along with corpora and lexicons to find the most effective rules sets for emotion detection [11,12].Several other approaches use lexical affinity with the support of lexicons to capture contextual and semantic relatedness to generate probabilistic values for each emotion category [13].Furthermore, some approaches utilize dimensionality reduction and categorical feature extraction methods such as Latent semantic analysis (LSA) [14], Probabilistic LSA [15] for improved emotion detection [16].The use of lexicons enables domain adaptation in emotion detection as lexicons can be easily extended or altered to suit the target domain.Furthermore, these methods can be extended for emotion intensity calculation, negation and modifier detection as they can locate the keywords and evaluate the corresponding neighbourhood.However, a major drawback of all heuristic methods is that emotion expressions that are not specified in the lexicon and those that are implied or ambiguous are not detected.Due to these reasons, methods that are purely based on lexicons are not comparable to benchmark performance of AI based methods [3].AI-based methods can be subdivided into two, conventional supervised learning methods situated in annotated datasets and the contemporary transfer learning methods that leverage pre-training contextual language models.The conventional methods require large, labelled datasets where each sentence, paragraph or segment in the corpus is pe-assigned an emotion category (or label), typically by a human expert.This annotated dataset is used to train a multiclass classification model using supervised learning algorithms.Emotion classification and intensity calculation using XGBoost [17,18], Support Vector Machines (SVM) [19,20],Naïve Bayes (NB) [21,22], k-Nearest Neighbor (kNN) [22] and Decision Trees [23,24] are some prominent techniques reported in related literature.More recently, deep learning algorithms such as Long Short Term Memory (LSTM) networks [25,26], Gated Recurrent Units (GRU) [27,28] and Deep Neural Networks (DNN) [29,30] have also been used in the same supervised learning context but with increased performance.Collectively, all supervised learning methods have reported accuracies in the range of 65-80% on benchmark datasets [3].However, supervised learning methods are impeded by two major limitations, the scarcity of large, domain independent labelled datasets and the challenge of ambiguous and implicit emotion expressions.More recent AI methods address these limitations by leveraging the semantic context of emotion expressions embedded in pre-trained language models.Unlike supervised methods, these methods can be fine-tuned with smaller labelled datasets using transfer learning.Emotion extraction using variations of BERT [31][32][33], GPT [34,35], XLNet [36,37] are such methods that leverage the contextual knowledge embedded in language models.These approaches report state-of-the-art accuracies for emotion detection from benchmark datasets in the range of 75-99% [38].However, this strength is also a weakness due to the limited generalisability across new, unforeseen emotion expressions, as well as intensifiers, inhibitors, and negations of emotion expressions, lack of explainability and constrained domain adaptation.Collectively, these limitations question the practical value of the high accuracies reported in empirical evaluation [39].
Several hybrid methods also have been proposed in the recent literature combining heuristics with AI methods to improve accuracy and refine the emotion categories.Tzacheva et al. (2019) [20] proposed lexicon-based emotion annotation to train SVM classifiers, for emotion extraction in tweets.Wu and Chuang [40] utilized a rule-based approach to extract semantics related to emotions and combined it with lexicon ontology to extract emotions.In Salim et al. [41], authors presented self-supervised hybrid methodology for sentiment classification from unlabelled data that combines a machine learning classifier with a lexicon-based strategy.Li et al. [42] proposed a hybrid emotion detection system combining hand crafted rules and lexicon with machine learning based classifier to extract emotional levels in online blogs.
Collectively across all three categories, the practical value of these methods in the management of information and extraction of patterns of behaviour of individuals and organizations is vast.Large scale analyses of social media during elections [43,44], patient-centred care for chronic illnesses such as Alzheimer's disease, cancer, and diabetes [45][46][47], real-time depression detection on social networks [48,49], expressions of emotion and sentiment during the COVID-19 global pandemic [50][51][52], highlight the practical value in social and individual settings.In organisational settings, financial sentiment analysis [53], understanding consumer satisfactions [54], the role of social media in stock price moments [55], and the influence of review credibility and review usefulness [56] are pivotal studies that signify the continuing and incremental value of emotion analysis in digitalised content for all stakeholders.
In concluding the literature review, we elaborate on the four capabilities and their potency in addressing the challenges of the complexity and ambiguity of digital emotion expressions in knowledge-focused activities.The first capability is the output of the emotion detection approach.In most cases, this is limited to an emotion label without an intensity score for that emotion.This emotion label is also limited to a single granularity which cannot be further analysed in terms of its constituents.Most approaches assign a single emotion per atomic unit of text (sentence, paragraph or document), and overlook the presence of multiple emotions.The second capability, domain specificity relates to the generalisability of the approach across diverse domains.Most approaches are highly specific to the syntax or semantics of a given domain, such as emotions in short text like tweets [57,58], emotions in poetry [59,60], emotions in code switched text [61,62], and consumer reviews [63,64].These are developed using supervised learning and then evaluated using labelled custom datasets which further limit generalisability and its application in diverse domains.Despite the custom datasets, some methods can be adapted (or retrained) for a new application, which is the third capability of adaptability.In recent work that is based on language models and annotated datasets, this capability is limited due to the large number of parameters and the opacity of transformer-based learning.They cannot be adapted without a significant volume of work on configuration and finetuning which is equivalent to developing an entirely new approach.The fourth capability is explainability of the detected emotion which is becoming more important given our increasing dependence on AI and automation.Explainability has been overlooked in most approaches, mainly due to design limitations that have focused on producing emotion labels of singular granularity.We do not consider accuracy as a core capability as it can be configured (or tweaked) in the design phase as an offset between the availability of annotated datasets for supervised learning and the need for generalisability across multiple domains.A high-quality human-annotated dataset can be leveraged by a supervised learning approach to produce highly accurate emotion classifications.In summary, the granularity of emotion detection output, domain specificity, adaptability and explainability are the formative capabilities of the proposed method for addressing the complexity and ambiguity of emotion expressions.

Methods
As illustrated in Fig. 1, the emotion AWARE framework consists of three modules, Module 1-Emotion Language Model Finetuning, Module 2-Emotion Lexicon Generation and Module 3-AWARE Core.The components depicted in grey are external sources feeding into the Emotion AWARE framework, where the general instances we have used in this study can be replaced with specialised instances depending on the domain of application (this is demonstrated in Study 5 and 6 for the financial and technology sector).
Module 1 begins with a state-of-the-art language model, such as BERT [65] which has been effectively applied on diverse NLP tasks such as Reading Comprehension [66,67] and Natural Language Inference [68,69].State-of-the-art language models are pretrained on large volumes of unlabelled data to generate deep contextualised word representations by considering syntaxes and semantics [70].In application, these pre-trained models are finetuned using labelled datasets through transfer learning techniques.For this framework, we selected the DistilBERT [71] base-case model with Huggingface [72] PyTorch implementation for the finetuning.As the finetuning dataset we selected Emotion dataset [73] due to its substantial size, granularity of emotions, and widespread acceptance in the research community.It contained 20,000 tweets based on six emotions joy (33.5%), sadness (29.2%), anger (13.5%), fear (12.1%), love (8.2%), and surprise (3.6%).For the finetuning, we combined train and validation sets, randomized and selected a subset of 5653 points where 1000 samples per each emotion except surprise which was 653 points.Finetuning settings were, a default token length of 128 enabled by both padding and truncation, and batch size of 64 with 8 epochs.At a learning rate of 0.00002 and weight decay of 0.01, the finetuning completed with an F1 score of 0.9394 for the test segment of the dataset.The finetuned language model is utilised by Module 2 for the expansion of a curated list of emotion seed words and in Module 3 for emotion embedding space generation.As noted earlier, DistilBERT can be replaced with any other language model that is closely aligned with the domain of interest.
Module 2 initiates with an emotion seed word list constructed and curated using a combination of automated and manual methods.In developing our emotion lexicon, we referenced Plutchik's model [74] which identifies eight primary emotion classes, each further divided into three subcategories, resulting in a comprehensive 24-class system.Initially, seed keywords for each of these classes were manually curated from an online thesaurus [75].However, we encountered a scarcity of unique terms for certain emotions, which necessitated the merging of closely related categories joy and ecstasy, amazement and surprise, disgust and loathing, interest and vigilance, anger (rage, anger, annoyance), fear (terror, fear, apprehension).As a result, we consolidated the model into 14 broader emotion classes, each supported by 15-20 thesaurus-derived terms.
While manually curating seed terms yielded high-quality initial seeds, the number of words was insufficient for comprehensive lexicon construction.Therefore, we utilized the vocabulary of the finetuned DistilBERT model itself and extracted embeddings for each of our seed words and compared them with the raw embeddings from the model's vocabulary terms to find contextually and emotionally similar words.However, due to the ambiguity of individual term embeddings, the relevance of these expanded terms was not highly consistent.To address this, we first clustered seed words into 4[(with k = 4 set via the Elbow method [76])] subgroups using the constrained k-means algorithm [77] and then used the average embedding of each subgroup for the expansion.This process extended each subgroup to include highest similar 25 terms from the model's vocabulary, aiming for a total of 100 terms per each of the 14 emotion classes.Subsequent refinement involved removing duplicates and terms conflicting with Plutchik's polar opposites to improve the lexicon coherence.
The resulting vocabulary size for each emotion class contained between 80 to 100 terms.To standardize the lexicon, we pruned it by considering the centrality of term embeddings where we compared each term's embedding to the average category embedding and retained the 80 most pertinent terms per class.The final emotion lexicon comprised 1120 terms across the 14 classes.Table 1 depicts the alignment of the 2, 8 and 14 emotion classification schemes.The 8 classes of emotion contained 80 words per class with total of 640 terms.The version with two classes contained 480 words per category with total of 960 terms.Module 2 also contains externally sourced lexicons for modifiers (inhibitors and intensifiers) and negations, which was based on the valence detection work described in VADER [78].VADER employs an advanced process that integrates human annotations, heuristic rules, and statistical modelling to determine the valence and polarity of the modifiers.Module 2 provides these two lexicons and the expanded emotion terms as output into Module 3.
Module 3 received the expanded emotion word terms and their corresponding embeddings to generate an emotion embedding space.In case lexicon is constructed from the scratch this step will be skipped as words are already tagged with embeddings during the expansion.For external lexicons each word will go through embedding extractor and tagged with the corresponding embedding.The high dimensional vectors of this emotion embedding space can be visualised using the t-SNE algorithm on a 2-D grid as shown in Fig. 2. Each point on this Fig. 2 corresponds to an emotion term, with clear separation between green and red, where green is for positive emotions, and red for negative, in 14 emotion categorization.Next, the sample input text or an entire text corpus is received by Module 3.This input is pushed through the embedding generator and then projected on to the emotional embedding space.The n nearest neighbour extraction process identifies the closest emotion terms based on this projection.This process is depicted in Module 3 where the nearest neighbours are green dots and the blue are all other emotion embeddings.Based on these nearest neighbours, the Intensity Quantification component calculates the intensities of each of the relevant emotion classes.Here each neighbour will receive a score based on the proximity to the sample input.The terms are sorted and ranked based on similarity, then the terms are grouped based on emotion category and the summed scores for each category are normalised to create the emotion assemble of two, eight and fourteen emotions per input text.See Eq. 1.

Equation 1-Calculating Emotional Intensity
where,θ e intensity of emotion e. n number of nearest neighbours.A subset of nearest neighbours with emotion e. S x distance score of the neighbour x The next phase in Module 3 is the Explainability component.Explainability in AI aims to understand and interpret output made by the model.In the context of emotion AWARE, this is achieved by identifying and extracting the words that have contributed significantly towards forming the emotion profile.Here term embeddings extracted from the input text vector representation are compared with the mean embedding of the entire text.These terms are ranked based on similarity and the top ( 1) Fig. 2 The emotion embedding space generated by module 3 of the emotion AWARE framework N terms are recorded for explainability and also sent across to the intensity rectification component.
The intensity rectification component consisted of two resolution processors for modifiers (intensifiers and inhibitors) and negations.The adjacent terms of the top N terms are passed through the corresponding lexicons to check for negative, intensive or inhibitive terms.Modifier resolution is completed prior to negations in order to detect intensified or inhibited negations.For detected intensifiers and inhibitors, the score of the top emotion in the profile is revised depending on the intensity of the modifier.Then emotion profile will be normalized so that the increment/decrement of top emotion will affect the other emotions in the profile.In case of negations, the emotion categories are revised based on Plutchik's polar opposites.See Eqs. 2 and 3. (2)

Algorithm 1 EDGstar_Pathfingding
Algorithm 1 further describes the explainability component and insensitivity rectification.This algorithm takes nearest neighbours list and current emotion profile as inputs and generates as output, a rectified emotion profile with emotion keywords for explainability.
Figure 3 illustrates an instance of how AWARE constructs an emotion assemble for a given input text, each row of Fig. 3 depicts in the input text and relevant components of the output.The neighbourhood size is 50 and the input text is "The movie had a great start, but the ending was awful".Given the emotional ambiguity of this input, the 'Emotion Assemble' presents similar intensity scores for polar emotions, 'disgust' and 'joy' .This is also visible in the neighbour count vector.The explainable emotion terms are 'awful' and 'great' , which provides a rationale for the polarity of the emotion assemble.

Results
We designed nine studies that demonstrate the capabilities of the framework for the elicitation of multi-granular adaptable, robust, and explainable emotion assembles (Table 2).Each study is composed of a set of experiments where the datasets are drawn from a state-of-the-art collection that represent realistic conversations and content on digital media (Table 3).The results generated from this combination of nine studies across eight datasets confirms and validates the effectiveness of the proposed framework in the detection and analysis of emotions expressed in digital medium.The same configurations were used for all experiments, such as the finetuned language model, modifier and negation lexicons, scoring and explainability modules.Emotion lexicons/embedding spaces were based on the corresponding 2, 8 and 14 classes.

Study 1: Elicitation of two-emotion assembles (positive and negative) using ISEAR and twitter sentiment datasets
This study demonstrates the generation of two-emotion assembles of positive and negative emotions, the accuracy of which is then validated with existing methods for the same binary classification.We used two datasets Twitter Sentiment and ISEAR, in which we aggregated sad, anger, fear, disgust as negative and joy as positive.The two-emotion assembles were evaluated with three other methods reported in the literature, they are (1) linear keyword matching using Plutchik's emotion terms list [], (2) stemmed keyword matching [10] with negation, inhibitor, intensifier detection components and (3) SentiWordNet 3.0 [87].The evaluation was conducted across four metrics, accuracy, precision, recall and F1-score.

("Affective Text"), ISEAR and fairy tales
As noted prior, the proposed framework is capable of detecting all emotions in Plutchik's wheel of emotions [88].However, only a handful of related work have proposed techniques to detect all eight emotions.Therefore, we split the eight emotions into two subsets (common and rare) in order to ensure that Emotion AWARE can be evaluated with state-of-the-art approaches in extant literature.Study 2 evaluates the common subset anger, fear, sadness, joy, while study 3 evaluates the rare subset, disgust, surprise, anticipation, and trust.In study 2, we compared AWARE with rule-based, hybrid as well as machine learning techniques.Rule-based includes emotional linear keyword matching, stemmed keyword matching as well as the more advanced rule-based methods that consider contextuality and affinity-based methods CLSA, CPLSA, DIM.Here, CLSA and CPLSA are categorical classifications based on LSA and PLSA.Additionally, we also compared with context-based emotion vector construction methods [89], namely context-based Wiki, context-based Guten, context-based W-G.For machine learning methods, we finetuned DistilBERT [71] model on Emotion [90] dataset.Collectively, study 2 compares Emotion AWARE with ten similar techniques proposed in recent literature, using SemEval 2007, ISEAR and Fairy Tales datasets.For this, we incorporated the experiments included in previous work [16,89].As presented in Table 5, AWARE outperforms all methods for most combinations of dataset and emotions.

Study 5: Emotion AWARE adapted for the finance sector using the PhraseBank dataset
Domain adaptability is a core capability of Emotion AWARE.In study 5 and 6, we demonstrate this capability for the financial and technology sector.For the financial sector, we used the PhraseBank dataset which contains financial statements classified for positive and negative emotions.Emotion AWARE was adapted to this domain by simply expanding the vocabulary with 20 words each for positive and negative classes using the L&M financial emotion lexicon [91].Following the domain adaptation, twoemotion assembles were generated and compared with the stemmed keyword matching technique, finetuned DistilBERT with the Emotion dataset and SentiWordNet.Emotion AWARE is used with both the default vocabulary and the extended vocabulary using L&M.Table 8 summarizes the results, notably AWARE surpasses all methods across all metrics.

Study 6: Emotion AWARE adapted for the technology sector using Senti4SD8 dataset
Study 6 is the domain adaptation for the technology sector, where we used Senti4SD dataset which contains conversations from the stackoverflow community classified by emotion.Similar to study 5, we evaluated the proposed approach with default vocabulary as well as extended vocabulary along with stemmed keyword matching, SentiWord-Net, and finetuned DistilBERT.Here both positive and negative classes were extended with 20 words extracted using Emotion AWARE running on the training set.As shown in Table 9, Emotion AWARE outperforms all other methods in this adaptability task.

Study 7: Robustness of Emotion AWARE across intensifiers and inhibitors
Intensifiers and inhibitors are subjectively used in emotion expressions, which means an emotion detection method must be robust to intensifiers and inhibitors, specifically in digitalised emotion expressions where physical cues unavailable.To demonstrate this robustness property of Emotion AWARE, we created a new dataset because state-of-the-art datasets used in related work are limited in their inclusion of varying intensifiers and inhibitors.For constructing this manually curated dataset, we selected a random subset of 80 sentences from the Fairy Tales dataset and introduced intensifiers and inhibitors to each sentence to generate additional 160 sentences.Table 10 demonstrates the evaluation of a single sentence using known intensifiers and inhibitors and their corresponding impact on the emotion score and emotion category.Here the valence and intensity of modifiers is derived from prior work of VADER [78].In case of incrementing or decrementing modifier, current top emotion's score will be increased or decreased with a factor of corresponding modifier intensity as explained in the Eq. 2. Then the emotion profile will be normalized according to the Eq. 3.For this experiment we used a sample sentence from SemEval-2018 dataset.As depicted in Table 9, the base sentence "work was good for the first half " is classified as joy_ecstasy with an intensity score of 0.339 and admire with a score of 0.229.In the subsequent rows, we added intensifiers and inhibitors with varied valence that modifies the emotion expressed in the sentence.In descending order of Table 10, the intensity score of the top emotion of the base sentence (joy_ecstasy) decreases.This illustrates that AWARE has correctly identified all modifiers and attributed emotion labels and varied intensity scores accordingly.The manually curated dataset was used to evaluate Emotion AWARE, SentiWordNet, and stemmed keyword matching.Even though these approaches construct multi-facet emotion profiles, for this experiment we have only considered the most significant emotion as it is the most impacted from such modifications.For instance, if the most significant emotion in the original sentence is joy and has score of x, it is expected that intensified sentence score of joy be > x where inhibited sentence score of joy be < x (Table 11).Thus, we considered the most significant emotion score of the original sentence in inhibited and intensified cases to determine if this approach has correctly identified the modifiers.As the dataset consisted of 80 sentences, we calculated the mean of the most significant emotion score as the evaluation metric.Here DistilBERT (Emotion) is not included as it provides only labels (Table 12).As seen in the mean emotion scores, Emotion AWARE has increased from 0.346 intensified case and decreased from 0.161 in inhibited case.This shows that AWARE has correctly modified the emotions compared to corresponding original sentences.Stemmed keyword matching has incorporated the modifiers to some extent but it's bottlenecked with limitations of modifier capturing.When considering SentiWord-Net, none of the modifiers were detected, where it has mitigated the scores even in intensified sentences.

Study 8: Robustness of emotion AWARE in negation detection
Similar to Study 7, we randomly selected 80 sentences from the Fairy Tales dataset and manually negated to create a new dataset of 80 negated sentences.Here we used negation terms such as 'no' , 'not' , 'never' to reverse the emotions.We used this dataset to evaluate robustness of Emotion AWARE with that of stemmed keyword matching, SentiWordNet and DistilBERT finetuned on Emotion dataset.Table 13 presents mean F1 scores of emotion detection for original and negated sentences in this dataset.It is interesting to note that although SentiWordNet and DistilBERT show comparable accuracies to AWARE for the original sentences, they perform poorly for the negated sentences, unlike Emotion AWARE which scores 0.841 F1 score.We hypothesize that this observed behaviour is likely a result of the model's tendency to prioritize emotion-specific terms while disregarding the presence of negating words within the sentences.The datasets used in Study 7 and 8 will be made publicly available as a secondary outcome of this work.This dataset consisted of 320 sentences 80 per original, negated, intensified, and inhibited and optimal for modifier evaluation.Study 9 evaluates explainability of the emotion assembles generated by the Emotion AWARE framework, using both intensity scores and terms that contribute to the detection of an emotion.Figure 4 illustrates this capability for a sample sentence randomly selected from the Fairy Tales dataset, "How fortunate I am; it makes me so happy, it is such a pleasant thing to know that something can be made of me".The framework generates intensity peaks for the terms "fortunate", "happy" and "pleasant", which distinguishes the contributing terms and their significance in the emotion assemble.These intensities are based on the w_dist in the Algorithm 1. scores as explained as w_dist in the algorithm 1. Table 14 presents a further demonstration of explainability with emotion keyword extraction.Here the positive, negative samples are randomly selected from the Fairy Tales dataset.We combined some samples to create a mixed sample.The colour scheme depicts emotion significance, where shades of green are for positive emotions and shades of red are for negative emotions.The intensity scores are depicted on the right side of the image, which further improves the explainability of the emotion assemble.
The following table (Table 15) summarizes the emotion keyword results for the entire fairy tales dataset.Here for each sample in the dataset, top emotion and top keyword is extracted.The table contain each of the emotion category fear, anger, joy, surprise and sadness along with the 10 most frequent keywords per category.These keywords reflect the corresponding emotions which further validates AWARE.

Discussion
The study of emotion has a vibrant history, beginning with the evolutionary context where Charles Darwin [92] posited that emotions are an expressive behaviour that has evolved to increase our chances of survival, right up to Barrett [93] constructivist view where an emotion is constructed by cognitively classifying an affect based on past knowledge of that emotion.A multitude of studies have been conducted on the types of emotions, using methods such as philosophical postulations, factor analytic studies, similarity scaling studies, child development studies, cross cultural studies and facial expression studies.Based on studies of facial expression, Ekman [94,95] proposed six basic emotions; anger, disgust, fear, happiness, sadness and surprise.This was followed by Plutchik's [74] eight primary emotions interlinked by polarity; joy and sadness, trust and disgust, surprise and anticipation, anger and fear.Plutchik also proposed the wheel of emotions, a three-dimensional circumplex that illustrates degrees of similarity/polarity between emotions [74].The wheel is split into eight sectors for eight primary emotions, layers within each sector signify varying intensities (for instance with joy, intense joy being ecstasy and less intense being serenity) and gaps between sectors represent the mix of two primary emotions.The more recent digitalisation of emotion expressions has led to new challenges in complexity and ambiguity due to the absence of physical cues and observer inference Table 15.Emotion AWARE addresses this complexity and ambiguity of emotion detection through its four capabilities of multi-granular emotion assembles, adaptability, robustness and explainability.Unlike related work in emotion detection, the proposed framework generates emotion assembles based on prior knowledge of heuristics and learned knowledge of the finetuned language models.Drawing upon the literature review, we conducted a capability comparison of Emotion AWARE against the most effective and relevant studies as tabulated in Table 16.Following this capability comparison, we developed empirical evidence through the experimental evaluation of Emotion AWARE across nine studies that are based on state-of-the-art datasets containing diverse human emotion expressions.Studies 1-4 evaluate the detection of a spectrum of emotion assembles, starting with binary (or sentiment), the four common emotions from Plutchik's wheel of emotion (anger, fear, sadness, joy), the four rare emotions (disgust, surprise, trust, anticipation), and the increasing granularity of emotions from 2, 4 to 14 categories.2, Emotion AWARE outperforms a finetuned DistilBERT, highlighting the importance of prior knowledge contained in lexicons.Adaptability of the framework is demonstrated in Study 5 and 6 where AWARE was adapted for the finance and  technology domains.In Study 5, AWARE demonstrates a 6% improvement in F1-score with an extended vocabulary compared to finetuned DistilBERT.Most related work in recent literature forego domain adaptability, where the challenges include frequency and scarcity as well as changing emotion polarity across domains.For example, "unpredictable" is frequently used as a positive emotion expression in film reviews (e.g., "The plot of this movie is fun and unpredictable"), whereas it is a negative expression in financial markets or human resource management (e.g., "the impact on share market indices is unpredictable" or "the employee response to governance in unpredictable") [96].Language model-based approaches have limited adaptability across domains due to the scale of training data required for finetuning while lexicon-based approaches require large hand-crafted, domain-specific lexicons [97].Emotion AWARE is able to overcome both limitations by leveraging a short list of domain specific terms with the usage of embeddings, which introduces context through meaning and emotion instead of exact matching.Robustness of the framework is demonstrated in Studies 7 and 8 where implied emotions and the presence of intensifiers, inhibitors and negations are detected and assigned intensity values relative to other emotions expressed in the same text.Also, in Study 8 which demonstrates robustness of negation detection, DistilBERT and SentiWordNet perform poorly in comparison to Emotion AWARE due to its exclusive focus on learned knowledge of emotion expressions.For instance, DistilBERT can accurately identify emotions of sentences "I am truly glad to hear it!"(joy)and "I am truly sad to hear it!"(sad)but incorrectly detect the emotion as joy in the negated version "I am truly not glad to hear it!".This highlights the significance of incorporating a heuristic approach to manage negations in Emotion AWARE, enhancing the accuracy of emotion detection.Finally, study 9 demonstrates the explainability capability where contributing terms and corresponding intensity scores of emotion assembles effectively unpack and rationalise the detected emotions.
The practical implications of this framework are broad.The robust, domain adaptable and explainable detection of emotion expressions has wide application value as we increasingly express emotions using digital media.For instance, in a long-term healthcare setting of multiple stakeholders (such as cancer care involving a clinician, patient, and social worker), this framework can be adapted to suit the vocabulary of each stakeholder and the generated emotion assembles can be explained using the constituent terms, which yields further capabilities of converging or diverging the emotion profiles of all stakeholders for decision value and consensus building among human behaviours in such complex settings.

Conclusion
The exponential transition of knowledge-focussed human activities and communication into digital spaces and physical hybrids has necessitated the manifestation, communication and persistence of our expressions of emotions on digital media.The proposed Emotion AWARE framework enables the objective and unambiguous detection of such emotions, with adaptability, robustness and explainability, for the subsequent generation and management of information that represents patterns of behaviour of individuals and organizations.The results from eight experimental studies confirm its practical value and contribution towards the comprehension of such expressions and behaviour of individuals and organizations.As future work, we intend to address the limitations of Emotion AWARE in complex settings where emotion is implied using either highly technical, jargonistic or informal emoji-based expressions, and figurative expressions of emotion such as the detection of metaphors and similes.We will also work on the integration of detected emotions along with other dimensions and modalities of information into the decision-making activities of individuals and organizations.

Fig. 1
Fig. 1 The modular composition of the emotion AWARE framework

Equation 2 -
Rectifying Emotional Intensity Equation 3-Normalizing Emotional Intensity Here variables are as follows, θ e k * -Updated intensity of top keyword's emotion, θ e k -Current intensity of top keyword's emotion,b -Modifier polarity (intensifier (+1) or inhibitor (−1)), a -Modifier valence, θ e normalized Normalized intensity of emotion e, θ e -Intensity of emotion e E -Set of all intensities in the emotion profile.Both modifier and negation lexicons as well as polarity and valences are based on prior work of VADER [78].

Fig. 3 2 :
Fig. 3 An emotion assemble generated by the emotion AWARE framework for mixed polarity sample text

Table 1
Alignment of the 2, 8 and 14 emotion classification schemes Table 4 presents the results, where Emotion AWARE surpasses all three methods.

Table 2
Nine studies evaluating and demonstrating capabilities of the proposed Emotion AWARE framework Dataset: manually curated dataset based on fairy tales [81] Study 8 Objective: robustness of emotion AWARE in negation detection Dataset: manually curated dataset based on Fairy Tales Explainability Study 9 Objective: explaining emotion assembles using constituent intensity scores and terms of emotional significance Dataset: demonstrated on fairy tales dataset Study

3: Elicitation of four emotion assembles (disgust, surprise, trust, anticipation) using GoEmotions and SemEval-2018
For the rare emotions of disgust, trust, anticipation, and surprise, we used GoEmotions and SemEval-2018 datasets and compared with stemmed keyword matching and DistilBERT model finetuned with the Emotions dataset.Table6presents the results where AWARE outperforms all other methods across the four emotions.

Elicitation of 2, 8 and 14 emotion assembles in increasing granularity
This study demonstrates Emotion AWARE's ability to generate emotion assembles at diverse levels of granularity.Table7presents these granular emotion assembles for the same text.Only the emotions with non-zero scores are shown in this table.For instance, row 2 depicts a positive score in the two-emotion assemble, anticipation and trust as the detected emotions in the eight emotions assemble, and in the 14 emotions assemble, trust is further split into trust, acceptance, and admiration alongside the corresponding intensity scores.

Table 3
Description of datasets used in the experiments, with percentage distribution of each emotion

Table 4
Comparison of results with 95% CI for two-emotion assembles using ISEAR and Twitter

Table 5
Comparison of F1 score with 95% CI for four emotion assembles (anger, fear, sadness, joy)

Table 7
Demonstrating the Elicitation of 2, 8 and 14 emotion assembles in increasing granularity

Table 8
Comparison of results with 95% CI adapted for the finance sector using the PhraseBank dataset

Table 9
Comparison of results with 95% CI when adapted for the technology sector using Senti4SD8 dataset

Table 10
Demonstrating the variation of emotion intensity score based on intensifiers and inhibitors

Table 11
Demonstrating robustness across intensifiers and inhibitors emotion of a sentence

Table 12
Performance of inhibitor and intensifier detection

Table 13
Results for robustness of emotion AWARE in negation detection

Table 14
Contributing terms and corresponding intensity scores for emotion explainability Most gracious father, I will show her to you in the form of a beautiful flower, " and he thrust his hand into his pocket and brought forth the pink, and placed it on the royal table, and it was so beautiful that the king had never seen one to equal it You are so beautiful, I like you very much.'Tweet, tweet, " sang the bird, as he flew out into the green woods, and Tiny felt very sad.The little prince was at first quite frightened at the bird.It was like a giant, compared to such a delicate little creature as himself.But when he saw Tiny, he was delighted, and thought her the prettiest little maiden he had ever seen

Table 15
10 most frequent keywords per emotion category in fairy tales dataset

Table 16
Comparison of Emotion AWARE with related work in emotion detection