Semantic Orientation of Crosslingual Sentiments: Employment of Lexicon and Dictionaries

Sentiment Analysis is a modern discipline at the crossroads of data mining and natural language processing. It is concerned with the computational treatment of public moods shared in the form of text over social networking websites. Social media users express their feelings in conversations through cross-lingual terms, intensifiers, enhancers, reducers, symbols, and Net Lingo. However, the generic Sentiment Analysis (SA) research lacks comprehensive coverage about such abstruseness. In particular, they are inapt in the semantic orientation of Crosslingual based code switching, capitalization and accentuation of opinionative text due to the lack of annotated corpora, computational resources, linguistic processing and inefficient machine translation. This study proposes a Heuristic Framework for Crosslingual Sentiment Analysis (HF-CSA) and takes into consideration the NetLingua, code switching, opinion intensifiers, enhancers and reducers in order to cope with intrinsic linguistic peculiarities. The performance of proposed HF-CSA is examined on the Twitter dataset and the robustness of system is assessed on SemEval-2020 task9. The results show that HF-CSA outperformed the existing systems and reached to 71.6% and 76.18% of average accuracy on Clift and SemEval-2020 datasets respectively.


I. INTRODUCTION
The exponential growth of web-enabled technologies and mobile devices are continuously changing the general trends of online communication and collaboration. Mobile devices and their associated technologies have introduced multifaceted forms of useful and attractive real world applications for sharing information, facts and sentiments. Consequently, these devices have become integral part of life for users belonging to all segments of society. The online publishers from diverse demographic areas are shifting towards Web 2.0 for communication and collaboration. The Microbloging websites provide one of the most enabling platforms to online users and organizations to generate, update, share and publish sentiments, ideas, suggestions and expressions. The The associate editor coordinating the review of this manuscript and approving it for publication was Wai-Keung Fung . statements and views posted by microbloging users about goods, services and other entities carry vital importance for data-scientists and business organizations.
Market Analyzers, business owners and multinational organizations invest in significant resources to know the feedback of consumers regarding their products and services.
In past, the business trends and product acceptance rates were evaluated through traditional methods such as surveys. However, the fast and open-for-all social media channels have led to novel scientific techniques of sentiment analysis and user profiling. The statements, suggestions, speculations, ideas moods etc. published online are termed as opinions or sentiments. The techniques of Natural Language Processing and Text Mining are commonly used for Sentiment Analysis (SA). The purpose of this promising field of Data Science is the identification and orientation of users' opinions towards relevant domains [1]. SA techniques offer convenience to VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ both product users as well as the business organizations for restructuring their organizational strategies and policies. The business analysts employ opinion mining systems to generate pulse-reports in order to know ''what the general public thinks about the products or services offered by them. The big data analytics, opinion mining and Sentiment Analysis Techniques are expedient in discovering trends and patterns that otherwise remain hidden in the ever-growing online data about diverse domains such as social issues, politics, market, Consumer Confidence Prediction, information retrieval, basket analysis, products and services. These methods provide vibrant insights to market analysts in making fast and intelligent decisions. On the contrary, the contemporary trends in online public interaction have brought about novel and unique challenges to sentiment analysis researchers. In particular, the use of informal and cross-lingual opinionative content lowers the effectiveness and efficiency of SA systems to discover and classify public opinions [1], [2]. The general public may use demographically constrained terms, NetLingo, tags, emoji punctuations and linguistically borrowed tokens during their 'natural conversation'. The code-switching and multi-lingual lexica complement in making the task of sentiment analysis further challenging. Several research studies are conducted and a number of techniques are employed for classification of public opinions shared online in textual form. These techniques can be grouped into three main sentiment analysis paradigms; Supervised sentiment classification, Unsupervised classification and Lexicon-Based methods. The supervised and unsupervised techniques are useful for datasets containing pre-annotated, monolingual and standard texts. However, these methods do not yield satisfactory results for cross-lingual and raw (non-standard) public opinions [2]. It is also observed that the usefulness of generic algorithms is limited when applied to natural languages that suffer from scarcity of computational resources. It is fact that sentiment analysis of cross lingual text has been an active research area and many researchers have already contributed actively but few language pairs have been lacking a considerable attention due to the scarcity of resources, target language structure and lingual peculiarities. Table 1. presents the significant gaps explored in existing research for cross lingual sentiment analysis of text. Keeping in view the existing research and lingual peculiarities the following two major gaps are considered to make system more significant and novel.
• Consideration of Informal and Code mixed opinions in cross lingual text.
• Contextualized transformation of Urdu opinions along with enhancer and reducer.
This study proposes a lexicon based solution for cross-lingual and informal opinion bearing text. Development of Urdu-English bi-lingual annotated SA lexicon and classifier for cross-lingual sentiment analysis is the novelty of our proposed framework. We intend to verify the effectiveness of our proposed framework on a twitter dataset of informal, Crosslingual, code switched and Anglicized opinion bearing text and robustness of system is assessed by comparing it with existing state of art results for SemEval-2020 task 9 [3], [4].
This article aims to address the following research objectives in order to cope with above mentioned gaps; • Creation of Urdu-English bi-lingual annotated SA lexicon used in the transformation of Crosslingual sentiment terms.
• Provision of a heuristic framework which can identify, extract and computationally annotate informal as well as cross-lingual opinion bearing tokens along with enhancer, reducer and context shifters.
• Formation of an improved mechanism for text normalization and classification of Urdu-English cross-lingual tokens.
The optimized text normalization and consideration of Urdu/Roman Urdu/ English tokens along with opinion shifter enhances the coverage of sentiment orientation. The experimental evaluation presented in section IV revealed it clearly that consideration of these key objectives played a significant role in semantic orientation of Crosslingual sentiments.
The article is structured as follows. In section II we stated the literature review of Crosslingual SA, Section III presents proposed material and methods, section IV demonstrates the experimental outcomes and Section V concludes the work.

II. LITERATURE REVIEW
Sentiment Analysis is the identification and recognition of feelings and expressions of users, publically shared in textual form. Nasukawa and Yi [5] first coined the term 'Sentiment Analysis (SA)' in 2003 in their research study on detecting the favorability via public sentiments. Second variation used is fast text pre-trained Urdu embedding to obtain word vectors, which are available online [6]. It pro-vides word vectors of 300 dimensions trained using CBOW model. They used rule-based NLP techniques and ML algorithms on opinionative data for analysis and review of organizations and their services. Similarly, the term ''Opinion Mining (OM)'' was first appeared in 2003 for Opinion orientation and semantic analysis of product reviews [7]. Semantic Orientation of user generated contents has great importance not only to observer and analysts of numerous organization but socio monitoring can benefit in reshaping the Decision Making Process [8], [9], [10]. Natural Language Engineering proved to be the key source in the process of sentiment analysis and text normalization for formal as well as informal opinion bearing text [11], [12]. Beside the existence of formal and standard opinionative contents, the contemporary user collaboration contains colloquial as well as non-standard multilingual content, which poses a number of challenges in mining feelings and moods [13].
The extraction and summarization of these cross-lingual opinions is a complicated task. As mentioned above, the three focal paradigms for digital orientation of users' sentiments into generic clauses (positive, negative and neutral) are based on the supervised, semi-supervised and unsupervised approaches. Kiritchenko et al. [9] tested a supervised ML system on informal SMS and tweets in which tweet based opinionative lexica are generated through emoticons and tags. Amiri and Chua [18] proposed an optimization algorithm for SA of urban and slang terms. Sarker and Gonzalez [24] handled non-standard subjective tweets using a Support Vector Machine (SVM) based classifier.
Lately, the multi-lingual and cross-lingual SA and negation handling gained noticeable attention due to popularity of the microblogging sites spawned by diverse areas of users [25]. Machine Translation (MT) systems, bilingual vector space embedding and multilingual lexica have been tested for multilingual SA. Single Linear Transformation (SLT) technique is employed to identify the cross-lingual sentiments in English, Spanish and Chinese [16]. The extracted text is first translated into English and then SA is carried out. Lucas Brönnimann [15] performed SA on multi-lingual tweets of Swiss politicians using dictionaries and universally comprehensible emoticons that do not depend upon any specific natural language. Dashtipour et al. [14] tested 11 methods on two corpora and related their low precision to the lack of information provided in the existing studies and datasets. Mozetič et al. [26] developed classifiers using manually annotated tweets in 13 languages and tested them with 6 different classification algorithms. Their study concluded that; i) there is no explicit difference in the performance of classifiers and, ii) the classification results are proportional to the refinement and volume of training data. In stark contrast to the manual annotation of tweets, Adel et al. [27] proposed a model to handle the cross-lingual SA of Arabic text without tagging. They employed feature reduction mechanism to find the desired solutions. A SentiUnit is a baseline approach used in sentiment analysis of product and movies reviews, in which the linguistic structure, grammar, morphology and technical aspects of Urdu language are highlighted [28]. Bilal et al. [19] used Naïve Bayes, Decision Tree and KNN for the semantic orientation of Roman Urdu Opinions in which they concluded that NB produced more efficient result than DT and KNN but they ignored the other formal and informal English language opinionative contents. Ruifeng et al. [17] used Multi-Kernel SVMs in cross-lingual SA of English and Chinese. They reported that opinion holder extraction is a prominent indicator in cross-lingual SA.
Similarly, Translation of one human language into another via computer mediated devices requires proper computational knowledge as well as linguistic resources. Parallel corpora are one of such resource to many applications of natural language processing which play prodigious role in machine translation. Although its availability is scarce due to limit of size and quality of vocabulary coverage but it plays vital role in MT system. Initially text for statistical machine translation (SMT) tasks were composed opportunistically via web resources [29]. Now, it is indispensable to utilize parallel text corpora with few semantic rules in order to improve SMT systems. Such corpora are the prerequisites of many research activities like machine translation, multilingual analysis, and multilingual high range lexicons creation [30].
Despite the fact that parallel corpus plays significant role in sentiment orientation of publics' attitudes but it is dif- VOLUME 11, 2023 ficult to create a corpus for scare resource language [31]. It has been observed that few languages are considered as rich due to the availability of parallel corpus along with other lingual resources whereas on the other side languages lacking these lingual materials are termed as resource poor language [32]. Gale and church [33] in 1991 compiled the first ever parallel corpus for English-French language pair. Similarly, few high quality parallel corpora are publically available for Hungarian-English, Dutch-English, Bulgarian-English, Czech-English, Greek-English, Spanish-English, Italian-English, Portuguese-English, Romanian-English, German-English and Swedish-English [31], [34], [35], [36]. Although there is massive amount of parallel corpora for resource rich languages however it is lacking for under-resource language such as Laos, Vietnamese and Urdu, further machine translation for such language pair is suffering due to unavailability of these useful resources [37]. Machine translation depends not only on the quantity of parallel corpora but quality is also a big factor in translating one language into another. In fact, an average machine translation system needs up to 100K sentences and parallel corpus of 50M-1000M words [38]. Machine translation can be performed using rule based, example based, statistical and hybrid technique [22]. Rule based technique uses syntactic and semantic rules with the utilization of lexicons, dictionaries and corpora, whereas statistical and example based techniques use parallel corpora while on the other hand hybrid method combines the best features of both statistical rule based technique. Table 1 summarizes the comprehensive literature review conducted to unfold the gaps of Sentiment Analysis in Crosslingual setting in order to make things more rational and transparent. It provides the list of gaps, methodologies and languages covered so far in cross lingual sentiment analysis. Existing work for cross lingual clearly explored that experiments on CLSA are conducted only for few language pairs as it mainly takes English as source and Chinese, Spanish, German, Japanese and French as target language which ultimately limits the orientation and promotion of CLSA for scarce resource language such as Urdu, Hindi and Punjabi etc. Further it is observed that the accentuation of opinions, enhancers, reducers and context shifters were also ignored in sentiment orientation of Crosslingual setting. Keeping in view the identified gaps a heuristic framework is proposed to cope with the consideration of informal, accentuated and code-mixed opinions in Crosslingual setting which is proved as significant solution in identification and orientation of cross lingual as well as informal contents of English-Urdu language pair.

III. MATERIALS AND METHODS
A framework is proposed to find the public sentiments of cross-lingual text by employing the lexicon and dictionary based methods. Figure 1 illustrates the flow of proposed heuristic framework for Crosslingual sentiment analysis. HF-CSA is comprised of following essential steps;

A. DATASET PREPARATION AND LEXICON COMPILATION
The input text for classification purpose is extracted from social media sites but extracted text is not always in desirable form of classification, as there are numbers of tags, symbols and undesired data associated with it. Text must be preprocessed before going towards classification process. The efficiency of sentiment classification is based on the quality of text. Text Preprocessing is the essential phase of each sentiment classification process. We can't mine public moods accordingly if the source text is not clean. Therefore, dataset preparation is performed in the following manners;

1) DATASET PREPARATION
It is mentioned earlier that the major objective of this study is to explore the cross lingual, informal and code-mixed contents of English-Urdu language pair. So, the tweets and sentences relevant to English, Urdu and roman Urdu opinion bearing sense along with opinion enhancers and reducers are marked in the inclusion criteria. Therefore, search queries for above mentioned inclusive criteria is applied and we reached to SemEval-2020 (Hinglish) and Clift datasets. The description of Datasets used in the assessment of HF-CSA is elaborated as follows; 1) ClIfT_Dataset (Crosslingual and Informal Text) 2) SemEval-2020 Task   preprocessed before going towards classification process. The efficiency of sentiment classification is based on the quality of text. Text Preprocessing is the essential phase of each sentiment classification process. We can't mine public moods accordingly if the source text is not clean. Extracted contents are then passed to next phase for linguistic preprocessing. 2) SemEval-2020 Task 9 Dataset Code mixing and crosslingualism is common in the region having multiple or at least bilingual speakers. According to a survey there exist 630 million speakers of Hindi and Urdu whereas India and Pakistan has 30 different languages with more than 1 million speakers [36] In SemEval-2020 task 9 codemixed tweets of Hinglish (Hindi-English) and Spanglish (Spanish-English) language pair are presented [2]. They released two public corpora for research community. The data is extracted from social networking channels and a huge volume of 20k and 19k tweets are collected and annotated for Hinglish and Spanglish dataset respectively. They named this task as Sentimix which aims to predict the sentiment of a given code-mixed tweet. A word and phrase level orientation of positive and negative tweets of both language pair has been performed and text is annotated with lingual as well as polarity labels. Table 2 presents the statistics of SemEval-2020 task 9 datasets. Here in this research Hinglish (Hindi -English) has been considered for the assessment of HF-CSA as proposed study aims to cope with roman Urdu-English language pair and romanized version of Hindi and Urdu has the similar transliteration.

2) LEXICON COMPILATION
The proposed lexicon is novel contribution of this study as sophisticated lexical resource for a combination of informal and Urdu-English crosslingual pair is lacking. In this section lexicon adaptation has been discussed. The contextualized informal and Crosslingual lexicon is compiled using two core steps.
1) Identification and collection of Crosslingual tokens for target language pair. 2) Assignment of contextualized definition polarity labels to each opinionative token via adaptation of existing resources for more relevant scoring of sentiments.
1) Identification and collection of Crosslingual tokens for target language pair. The major consideration adopted in the identification and collection of Crosslingual opinionative tokens is utilization of frequently used Urdu English opinionative terms from social media sites. In addition to these available social networking channels, a huge collection of NetLingua, Crosslingual and informal opinion bearing terms are fetched through existing resources. Table 3 presents the links of resources.
where, LCT = List of Crosslingual Terms, LST = List of Slang Terms, LIT = List of informal Terms Similarly, CLT = Tweet having Crosslingual terms, ST = Tweet having slang terms IT = Tweet having informal terms. A unified list of eq.1, 2 and 3 is created by merging the extracted Crosslingual, slangs and informal words in order to assign appropriate definition and annotation. 2) Assignment of improved definition and polarity labels to each opinionative token via adaptation of existing resources for more relevant scoring of sentiments. The assignment of improved definition is ensured via lingual preprocessing and part of speech tagging whereas scoring and polarity labels are assigned through the adaptation of publically available sentiment lexica SentiWordNet SWN [39]. The entries of proposed lexica are mapped with SWN in the following manner.
Ei =< T , Trans, swn.id, PoS, Scr, Pol > where T, trans, swn.id and PoS represents term, transliteration, SentiWordNet-ID and Parts of Speech respectively. Similarly, Scr and Pol represents score and polarity respectively. The objective (a.k.a non-opinionated) tokens are excluded in subjectivity classification. A word or phrase can be marked as subjective if it conveys some positive or negative sentiment otherwise it is treated as objective. Part of speech tags and subjective information are employed to filter and classify the opinionative contents. Table 4 presents the partial list of lexical entries of proposed Crosslingual lexicon. In order to associate semantic orientation to each opinionative category we adopted SentiWordNet due to its high volume of opinion categories and frequent updates of words along with its senses. SWN associates three polarity labels: Positive, Negative and Neutral to each opinion category based on the orientation and semantic value of opinion word. The semantic orientation ranges between 0.0 to 1.0 and average semantic orientation for term having multiple senses is computed in the following manners; where Pos S core, Neg S core represents the positive, negative score of word and ns denotes the number of senses appeared in SWN against each target (Searched) term. Similarly, dominant semantic orientation of SWN is accessed to reach the relevant polarity of a term and then it is labeled as positive, negative and neutral based on its semantic score as shown in Eq. 5, 6, 7 and 8. In addition to this, proposed lexica is refined via exclusion of the neutral (non-opinionative) tokens on the basis of dominant objective value.
where SemOswn represent the semantic orientation of each word accessed from SWN using eq. 5, 6, 7. Similarly SemO+, SemO− SemOn presents semantic orientation of positive, negative and neutral words.

B. LINGUISTIC PREPROCESSING
Linguistic Preprocessing and Text Normalization is performed on extracted contents in order to produce quality input. Formal and informal opinion indicators such as verbs, adverbs, adjectives, NetLingo, anglicized, enhancer, reducer, cross-lingual Urdu terms and emoji are considered in the scope of desired input. The stop words and other lexemes are discarded as noise. The normalization is carried out in the following manner; Tokenization of Extracted Text: The first and foremost step is to tokenize the extracted content into sentences and subsequently into tokens. Python toolkit is used for tokenization of data. Stop Words Removal (SWR): Frequently used terms having no significance in sentiment orientation are removed for the sake of fast processing. In addition to the commonly used stop words, each domain has its own list of stop words. Table 5 presents the example of initial preprocessing steps. The Python Natural Language Toolkit containing corpora in multiple languages is used for the desired stop words removal. Lemmatization: The tagged sentences/tokens are passed to lemmatization phase for removing ambiguities of inflected forms. Lemmatization is the process of converting inflected forms into root and base forms. In this phase, all the inflected tokens are converted into base form using WordNet Lemmatizer.

C. LINGUISTIC PROCESSING AND SEMANTIC ORIENTATION OF CROSSLINGUAL SENTIMENTS
Linguistic processing involves the annotation and classification of formal, informal, NetLingo and Crosslingual opinion bearing terms. It involves the following essential steps; Cross-lingual, implicit and Informal Text Identification (CLTI): In this phase, beside the consideration of formal English opinion words, cross-lingual words and sentences are identified and categorized for appropriate classification. A cross-lingual (English and Urdu) lexicon is employed in the identification of tokens which remain unidentified in the previous phases of normalization. The extracted tokens from cross-lingual corpora are included in the preprocessed dataset for subsequent experimentation. Similarly, slangs, NetLingo and anglicized terms are defined via utilization of manually compiled informal resources. The Python, Natural Language toolkit (NLTK) is used to import cross-lingual corpora and lexicons in order to identify the Urdu language text. Crosslingual term identification for Urdu and informal text is one of the core contribution of proposed HF-CSA.

D. IDENTIFICATION AND LABELLING OF POLARITY ENHANCERS AND REDUCERS
Twitter data comprises of opinion enhancers as well as reducers. Frequently used emoticons having positive expressions and character emphasization of target opinion terms are treated as enhancers whereas negators and context shifters act as polarity reducer. This study followed explicit lists of frequently used enhancer and reducer and emphasized score is used for the target accentuated term.

E. SYNTACTICAL TAGGING (PoS TAGGING)
The role of proper annotation and Syntactical PoS tagging is vital in Word Sense Disambiguation (WSD). This step is significant in sentiment classification as it helps us to decide whether a lexeme or piece of text is opinionative or not. Python The NLTK has its own PoS tagger list but we utilized senses of SWN to reach more relevant orientation of target word.

F. SUBJECTIVITY CLASSIFICATION
Subjectivity classification is performed in order to separate the opinionative contents from non-opinionative one. Target sentence is scanned for subjectivity in which a sentence is marked as subjective if it contains one or more opinionative tokens either formal or informal including Urdu language terms, otherwise the sentence is marked as objective. Table 6 presents the description of key opinionative features. Subjectivity lexicon is used for identification and classification of subjective and objective contents.

G. SENTIMENT SCORING OF FORMAL, INFORMAL AND CROSS-LINGUAL TEXT
Semantic orientation of sentiments is attained via lexicon based strategy. Lexicon based algorithm carry out classification on the basis of sentiment lexica in which the sentiment scores decide the positivity and negativity of a sentiment. In this method of classification, each sentiment word is assigned a positive or negative semantic orientation (SO) value based on the target lexicon's score and a statistical rule based strategy is adopted by assigning score to each opinion indicator in order to calculate the sentence or document level polarity. As in past, Taboada et al. [40] built a lexicon based algorithm namely ''SO-Calculator'' in which lexicons are used for semantic orientation of reviews on the basis of opinion indicators. Similarly, Turney [6] utilized lexicon based semantic orientation algorithm for rating of positive and negative reviews. Their algorithm composed of three steps; (i) Extraction of phrases having adjectives and adverbs, (ii) Estimation of semantic orientation (SO) using PMI-IR (Pointwise Mutual Information and Information Retrieval) (iii) Classification on the basis of average SO. The preprocessed subjective text is the input of cross-lingual identification phase for semantic orientation of formal, informal and Crosslingual sentiments. However, the sentiment resources, and Crosslingual lexica is used for assigning score to individual tokens by searching the sentiment score of SentiWordNet (SWN). As mentioned earlier SWN is publically available lexical resource used in scoring of sentiments and opinions [39]. It is actually the extension of WordNet. It assigns three labels; Positive, Negative and Neutral to each synset of WordNet. Our experimental setup revealed that lexicon based algorithm has high technical viability and coverage over supervised learning in identifying the abstruseness of text such as enhancers, reducers and cross-lingual sentiment analysis.

H. CROSS-LINGUAL BASED OPINION SUMMARY
In the last phase of sentiment analysis, the aggregated polarity is computed for a complete sentence or Tweet.

SO(Std Ops i )
where Std Ops , N stdNetlingos and Crosslings represent Standard, Non-Standard, Informal and Crosslingual tokens respectively. Eq.9 shows that the tweet Sentiment Score is the sum of scores of each standard, non-standard, NetLingo and cross-lingual opinionative indicators appeared in each Tweet.
Crosslingual Opinion Summary = (Tweet S core)/SC (10) where SC denotes the Sentiment Counter which is actually the count of sentiments in each target tweet. It parallelly counts the number of sentiment indicators appeared in each tweet in order to regulate the score with given range as score for each tweet ranges between −1 and +1. Crosslingual and NetLingo score is calculated using Sen-tiWordNet, the term is first searched in SWN then relevant score is calculated as per following criteria; where n denotes the number of senses appeared in SWN against each target (Searched) term.

IV. EXPERIMENTAL EVALUATION AND RESULTS
This section presents the outcomes and comparative analysis of HF-CSA (Heuristic Framework of Crosslingual Sentiment Analysis). The experimental performance is evaluated on ClIfT_Dataset (Crosslingual and Informal Text). A total of 15486 tweets covering (Jan 2019 to July 2019) are extracted for product reviews including Mobile Phones (Samsung, Vivo, Oppo and IPhone) and robustness of proposed system is ensured on SemEval-2020 Task-9. The detailed description of dataset is mentioned in section III,A and statistics of ClIfT_Dataset and SemEval-2020 Task-9 of Hinglish datasets is shown in Table 7 and  Table 8 respectively   As mentioned earlier the performance of HF-CSA is assessed on Clift and SemEval-2020 datasets and state of art studies have considered standard evaluation parameters; Precision, Recall, F-Measure and Accuracy. The rationality behind adopting these metrics is to make our comparison more adequate and transparent with existing studies.
where PPV and PNV denotes Precision for Positive and Precision for Negative respectively. The lower precision indicates that high numbers of negatives tweets are labeled as positive while a higher precision means less number of negative tweets are incorrectly labeled as positive.

B. RECALL
In sentiment classification it signifies the probability of retrieved tweets that are relevant.

RPV = TP (TP + FN )
and RNV = TN (TN + FP) (14) where RPV and RNV denotes Recall for Positive and Recall for Negative respectively. Recall usually measures the correctly classified tweets from total numbers of tweets classified The higher recall indicates that less numbers of positive tweets are incorrectly labeled as negative.

F1-Measure signifies the harmonic mean of Precision and
Recall mathematically denoted as below; where P and R denotes Precision and Recall respectively.

D. ACCURACY
Accuracy in SA signifies the state of being correct in terms of performance.  search crosslingual_lexicon get definition of word # replace the cross_lingual token with relevant English term process and assign score return score End Function VOLUME 11, 2023

E. SENSITIVITY AND SPECIFICITY
In sentiment classification, sensitivity signifies the true positive rate whereas specificity signifies the true negative rate.
and Specificity = TN (TN + FP) (17) The comparative outcomes of HF-CSA are shown below in terms of Precision, Recall, F-Measure, Sensitivity, Specificity and Accuracy. Figure 2 depicts that HF-CSA attained 73.91% Precision for Positive Instances, 69.71% for Negative and similarly for recall positive it achieved 72.55% and 71.16% for negative instances on ClIfT_dataset whereas F1-Measure for positive instances is achieved as 73.22% and for negative it is 70.43%. Figure 3 depicts that performance of HF-CSA on SemEval-2020 Task-9 achieved 85.15% Precision for Positive Instances, 77.18% for Negative and similarly for recall positive it attained 71.22% and 76.51% for negative instances on SemEval-2020 dataset, whereas F1-Measure for positive instances is achieved as 76.13% and for negative it is 76.84%.
The system performance is also validated via making a qualitative comparison with machine learning and deep learning based methods on SemEval-2020 dataset. As bidirectional LSTM and XLM-RoBERTRA is utilized in assessing Hindi and Romanized text of SemEval-2020 datasets [3], similarly Wu et al. [4], employed a fine tune BERT (Bidirectional Encoder Representation from Transformers) with multitask learning over SemEval-2020 dataset and three different  embedding; word position, encoding word and sentence level encoding has been performed. Figure 4 depicts the comparative performance of HF-CSA on SemEval-2020 Task-9 with ML and deep learning based systems and it is observed that 80.30%, 69.0%, 74.20% precision, recall and f-measure has been observed by aditymalte [3], similarly 79.9%, 76.51% 76.84% precision, recall and f-measure has been observed by Meister_Morxrc [4], Whereas HF-CSA reached to 81.15%, 71.22% and 76.13% of precision, recall and f-measure on positive instances of SemEval-2020 Task-9. Figure 5 depicts the comparative performance of HF-CSA on negative instances of SemEval-2020 Task-9 with state of art systems and it is noticed that 77.30%, 62.20%, 69.0% precision, recall and f-measure has been observed by aditymalte [3], similarly 70.2%, 71.90% 70.0% precision, recall and f-measure has been observed by Meister_Morxrc [4], Whereas HF-CSA reached to 77.18%, 76.51% and 76.84% of precision, recall and f-measure on negative instances of SemEval-2020 Task-9.
Similarly, Figure 6 presents the confusion matrix of HF-CSA on Clift dataset. HF-CSA attained 71.60% and 76.18% accuracy on ClIfT and SemEval-2020 datasets respectively. Table 9 presents the sensitivity, specificity of HF-CSA on Clift dataset and Table 10 fine-grained interpretability of opinion wise accuracies respectively. It is clearly visible that incorporation of informal and Crosslingual features has improved the accuracy as it raised from 54.92 to 71.60.
As mentioned earlier, Crosslingual sentiment analysis is a learning paradigm which transfers the information of one   human language into another in order to help the resource poor language for addressing scarcity of data. Here in this study roman Urdu, informal NetLingua terms are mapped and transformed into resource rich form.
Although there exists a preprocessing model for each BERT encoder but it doesn't handle informality and abstruseness of text, further Romanization of Urdu text is also lacking. Proposed system is capable of handling informality and abstruseness of text. Figure 3, 4, 5 presents it clearly that the outcomes of HF-CSA are better in comparison with baseline systems over informal, anglicized and Crosslingual contents. Existing Literature of deep learning methods, Urdu transliterations systems [41], [42], [43], [44], [45], [46], [47], [48] and experimental setup revealed that unsupervised lexicon based systems generate satisfactory outcomes for standard, formal, informal as well as multilingual text of resource poor languages.
One another solution to multilingualism is facial recognition and it has been reported that facial recognition based sentiment analysis has gained noticeable attention due to the advanced technology in commercial and industrial applications such as smart Master card for online transaction, health related devices, character recognition, IoT, Post Pandemic World, pain detection, criminal identification and security surveillance [49], [50], [51], [52], [53], [54], [55]. HF-CSA makes it possible to sort and utilize unstructured text of resource poor languages in order to assess customer support issues and to support consumers' satisfaction, reputation management, brand monitoring, decision support systems and market analysis.

V. CONCLUSION
Sentiment Analysis of users' opinion has become a de-facto skillset for many organization and companies. Beside the challenges of detecting formal and resource rich languages it is also noticed that social media publishers are adopting informal, Crosslingual, code-switched, anglicized and emphasized nature of opinion bearing terms. Semantic orientation of Crosslingual text is rising research topic of sentiment analysis but for resource poor languages it is less researched due to resource scarcity and aforementioned linguistic peculiarities.
This study proposes a Heuristic Framework for Crosslingual Sentiment Analysis (HF-CSA) in order to improve the efficacy of sentiment classification for resource poor languages. Crosslingual and informal dataset is compiled for English, Urdu and anglicized Opinionative text to assess the performance of proposed HF-CSA. The contribution of this study is the employment of linguistic processing along with lexicon based method for resource poor language and additional contribution is the compilation of Urdu-English bilingual annotated SA lexical resource for anglicized, informal and Crosslingual contents. Experimental setup determined that incorporation of informal text normalization and linguistic processing of Crosslingual contents is proved as backbone for HF_CSA.
The robustness of HF-CSA is ensured on SemEval-2020 task 9 and results articulate that deep learning based methods are still inferior solution on informal data of resource poor language due to inefficient linguistic processing and limit of small size input window.
The results show that HF-CSA achieved better outcomes in comparison with existing systems on resource poor, informal and Crosslingual text but few systematic deficiencies and limitations are still there; • Scarcity of large scale coded corpora for informality of Urdu contents.
• Imbalance morphological complexities of source and target language.
• HF-CSA lacks the handling of ironic, sarcastic and aspect based Orientation.
• Urdu parts of speech tagging and dependency parsing have not been fully explored.
• Sophisticated lexical annotation and proficient word sense disambiguation of Urdu is still lacking.
Keeping in view the above mentioned limitations and systematic deficiencies following key point can be treated as future directions.
• Large scale sentiment lexicon and parallel corpora of Urdu needs to be extended for proficient coverage of target language text.
• Best range of source language needs to be explored to minimized imbalance gap between source and target language. VOLUME 11, 2023 • Provision of ironic, lexicographical and morphological information can improve the efficacy of Crosslingual Sentiment Analysis and it also helps in the orientation of Emotional Intelligence.